From d2ff6615542b6b78425ea363b0fd06d5e9ced075 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 08:57:06 +0000 Subject: [PATCH] iter2 phase5: second-model review, 1 finding rebut, 3 findings adopt MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Plan subagent (sonnet) reviewed Phase 0-4 with raw artifacts. Four findings: 1. GstH265SPSEXT data-source gap — REBUT with empirical evidence. Reviewer confused GstH265SPSEXT (SPS extension params, separate struct) with GstH265ShortTermRefPicSetExt (RPS-extended struct, contains the 4 fields they flagged as missing). Both ShortTermRefPicSet AND ShortTermRefPicSetExt ARE in the vendored gsth265parser.h. Direct read of fetched header confirmed. 2. Per-fd storage for has_hevc_ext_sps_rps — ADOPT. Mirror iter38 pattern of storing rkvdec + hantro fds separately. Add explicit driver_kind=='r' gate for human-readable intent. 3. SPS NAL caching strategy — ADOPT, critical. SPS NALs only arrive at IDR frames; per-frame walk would submit zero-filled RPS for non-IDR frames and re-OOPS. Parse-and-cache at first IDR, reuse on subsequent frames. 4. C3 prediction caveat — ADOPT. Anchor SHA per-clip (HEVC HW vs HEVC SW) not cross-codec; iter1's shared SHA across codecs was lucky empirical convergence, not guaranteed. Three Phase 4 amendments applied as appendix to phase4_plan_iter2.md: - §Step 3 — per-driver-kind probe storage (pair instead of scalar) - §Step 4 — explicit two-struct mapping table; SPS parse-and-cache - §Phase 7 predictions C3 — anchor per-clip Risk register gains risk #6 (SPS absent on non-IDR frames). Per feedback_review_empirical_over_theoretical: the Finding #1 rebut was done by reading the actual vendored header file content, not by source-reading the reviewer's argument. Empirical evidence won, as the memory rule requires. Plan sound with amendments. Phase 6 can proceed. Co-Authored-By: Claude Opus 4.7 --- phase4_plan_iter2.md | 57 +++++++++++++++++++++++ phase5_review_iter2.md | 102 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 159 insertions(+) create mode 100644 phase5_review_iter2.md diff --git a/phase4_plan_iter2.md b/phase4_plan_iter2.md index 6b964c8..68d6b9c 100644 --- a/phase4_plan_iter2.md +++ b/phase4_plan_iter2.md @@ -172,3 +172,60 @@ Commit message: "README: document vendored GStreamer H.265 parser." ## Phase 4 close 5 commits, 8.7k LOC, 5 file modifications. Phase 7 predictions concrete (C1-C8 mapped). 5 risks named, 4 mitigated. Ready for Phase 5 second-model review. + +--- + +## Phase 5 amendments (applied 2026-05-16 post-review) + +### Amendment §Step 3 — per-driver-kind probe storage + +Replace single `bool has_hevc_ext_sps_rps;` field on `struct request_data` with a pair: + +```c +bool has_hevc_ext_sps_rps_rkvdec; +bool has_hevc_ext_sps_rps_hantro; +``` + +Probe runs at the time each fd is opened (init for primary, alt-probe for secondary per iter38). Storage parallels `video_fd_rkvdec` / `video_fd_hantro`. h265_set_controls consults only the rkvdec entry (HEVC routes through rkvdec only). + +§Step 4 control-submission guard becomes explicit: + +```c +if (driver_data->driver_kind == 'r' && driver_data->has_hevc_ext_sps_rps_rkvdec) { + /* … populate + submit EXT_SPS_*_RPS controls … */ +} +``` + +The `driver_kind == 'r'` clause is logically redundant given the probe-result gate but provides reviewer-flagged human-readable intent. + +### Amendment §Step 4 — explicit two-struct usage + SPS caching + +The GStreamer reference `fill_ext_sps_rps` reads from BOTH `GstH265ShortTermRefPicSet` AND `GstH265ShortTermRefPicSetExt` (both structs are in the vendored gsth265parser.h — see Phase 5 Finding #1 rebut). Step 4 mapping: + +| V4L2 field | Source in vendored parser | +|------------|---------------------------| +| `delta_idx_minus1`, `delta_rps_sign`, `abs_delta_rps_minus1`, `num_negative_pics`, `num_positive_pics`, `flags` | `GstH265ShortTermRefPicSet` (members `delta_idx_minus1`, `delta_rps_sign`, `abs_delta_rps_minus1`, `NumNegativePics`, `NumPositivePics`, `inter_ref_pic_set_prediction_flag`) | +| `use_delta_flag`, `used_by_curr_pic`, `delta_poc_s0_minus1[]`, `delta_poc_s1_minus1[]` | `GstH265ShortTermRefPicSetExt` (members `use_delta_flag[]`, `used_by_curr_pic_flag[]`, `delta_poc_s0_minus1[]`, `delta_poc_s1_minus1[]`) | + +If GStreamer's `gst_h265_decoder_get_sps_ext()` helper is in the base-decoder class (not in gsth265parser.c), inline its body — it likely just reaches into `sps->short_term_ref_pic_set_ext[i]` on the parsed `GstH265SPS`. + +**SPS parse-and-cache strategy** (Phase 5 Finding #3): + +- Add a cache slot on `struct request_data`: `struct hevc_sps_cache_entry { uint8_t sps_id; bool valid; GstH265SPS parsed_sps; } hevc_sps_cache[16]` (sized to match the H.265 spec's max of 16 active SPS). +- At first `h265_set_controls` invocation OR whenever the SPS NAL is present in `source_data` (walk for `nal_unit_type == 33`): parse, store at `hevc_sps_cache[sps_id]`, mark valid. +- On subsequent frames: look up `hevc_sps_cache[pps->seq_parameter_set_id]`, reuse if valid; if invalid AND SPS NAL not in current source_data, log warning + skip the EXT_SPS_*_RPS controls (graceful fallback — better to fail soft than re-OOPS). +- Cache cleared in `RequestDestroyContext`. + +This adds approximately +20 LOC to Step 4 (cache plumbing) on top of the original control-submission code. + +### Amendment §"Phase 7 predictions" C3 row + +Replace: +> C3 — PASS — HEVC I-frame at t=0, no inter-prediction, same sha as ampere-fourier iter1 H.264/VP8/MPEG-2 (`3214803d8be74416`) + +With: +> C3 — PASS — HEVC frame 0 HW-decode SHA matches HEVC SW-decode SHA from the same encoded clip (anchored per-clip, not cross-codec). Cross-codec sha convergence observed in iter1 was empirical-not-guaranteed: encoder's quantization choices differ across codecs and the SHA match was a happy coincidence, not a contract. + +### Updated risk register (added risk #6) + +| 6 | SPS NAL absent from `source_data` on non-IDR frames; without caching, h265_set_controls submits zero-filled RPS for every non-IDR → re-OOPS | high (would have fired in practice) | parse-and-cache per Amendment §Step 4 above; first IDR populates cache, subsequent non-IDR frames reuse it; graceful skip + warning if cache miss AND no SPS NAL available | diff --git a/phase5_review_iter2.md b/phase5_review_iter2.md new file mode 100644 index 0000000..5f7023c --- /dev/null +++ b/phase5_review_iter2.md @@ -0,0 +1,102 @@ +# Phase 5 — iter2 second-model review (Plan/sonnet) + +Reviewed 2026-05-16 evening by `Plan` subagent with `model: sonnet`. Full Phase 0-4 markdown pasted verbatim (no curation per `feedback_dev_process` §5). Same guardrails as ampere-fourier iter1 Phase 5 + the new `feedback_upstream_alignment_over_speed` explicitly flagged. + +Reviewer surfaced 4 findings. Per `feedback_review_empirical_over_theoretical`, I test-verified the most consequential before adopting/rebutting. + +## Finding #1 — `GstH265SPSEXT` data-source gap (REFUTED empirically) + +> Looking at the actual GStreamer reference implementation (`gstv4l2codech265dec.c:940`), `gst_v4l2_codec_h265_dec_fill_ext_sps_rps` uses two distinct data sources: (1) `sps->short_term_ref_pic_set[i]` — in `GstH265ShortTermRefPicSet`, populated by `gsth265parser.c`. (2) `sps_ext->short_term_ref_pic_set_ext[i]` — comes from `GstH265SPSEXT` via `gst_h265_decoder_get_sps_ext(decoder, sps)`. This function is part of `gst-plugins-bad`'s H.265 decoder base class, not from `gsth265parser.c`. […] An additional struct/file beyond the two planned vendor files is needed. + +**Verdict**: rebut with empirical evidence. + +Direct read of the fetched `gsth265parser.h` shows the vendored header defines TWO structs (not one): + +`struct _GstH265ShortTermRefPicSet` (~line 939): +```c +guint8 inter_ref_pic_set_prediction_flag; +guint8 delta_idx_minus1; +guint8 delta_rps_sign; +guint16 abs_delta_rps_minus1; +guint8 NumDeltaPocs; +guint8 NumNegativePics; +guint8 NumPositivePics; +guint8 UsedByCurrPicS0[16]; +guint8 UsedByCurrPicS1[16]; +gint32 DeltaPocS0[16]; +gint32 DeltaPocS1[16]; +guint8 NumDeltaPocsOfRefRpsIdx; +``` + +`struct _GstH265ShortTermRefPicSetExt` (~line 966; documented as `Since: 1.28` — landed alongside the V4L2 EXT_SPS_*_RPS work): +```c +guint8 use_delta_flag[16]; +guint8 used_by_curr_pic_flag[16]; +guint32 delta_poc_s0_minus1[16]; +guint32 delta_poc_s1_minus1[16]; +``` + +The reviewer correctly identified that GStreamer's `fill_ext_sps_rps` uses TWO data sources, but conflated `GstH265SPSEXT` (a separate struct that holds SPS-extension parameters like SCC) with `GstH265ShortTermRefPicSetExt` (the RPS-extended struct that has the four fields the reviewer flagged as missing). Both `GstH265ShortTermRefPicSet` and `GstH265ShortTermRefPicSetExt` are in the SAME vendored gsth265parser.h. So the planned vendoring scope (just gsth265parser.c + .h + gstbitreader.c + .h) IS sufficient. + +`gst_h265_decoder_get_sps_ext()` is likely a thin accessor that returns the parsed `GstH265ShortTermRefPicSetExt` array which `gsth265parser.c` writes during SPS parsing. If that accessor lives in GstH265Decoder base class instead of in gsth265parser.c, we just inline-equivalent it in our integration — direct field access on the `GstH265SPS` struct's `short_term_ref_pic_set` and `short_term_ref_pic_set_ext` arrays. + +Phase 4 §Step 4 doesn't need an additional vendor file. It DOES need to call out the two-struct usage explicitly so Phase 6's implementation maps both `GstH265ShortTermRefPicSet` fields AND `GstH265ShortTermRefPicSetExt` fields onto the single `struct v4l2_ctrl_hevc_ext_sps_st_rps` entry. Amending Step 4 accordingly. + +## Finding #2 — per-fd storage for `has_hevc_ext_sps_rps` (ADOPT) + +> The plan should make explicit that `has_hevc_ext_sps_rps` is probed per-device-fd and stored per-device (parallel to how `video_fd_rkvdec` / `video_fd_hantro` are stored separately), not as a single scalar on `driver_data`. Otherwise a future codec routing change could silently misbehave. Adding an explicit `if (kind == 'r' && has_hevc_ext_sps_rps)` guard in `h265_set_controls` also provides a cheap, human-readable explanation of intent — worth the two lines regardless. + +**Verdict**: adopt. + +The iter38 multi-device-probe pattern (memory `feedback_multi_device_probe_design`) stores rkvdec and hantro fds separately on `struct request_data`. Mirroring that, the probe result should be `bool has_hevc_ext_sps_rps_rkvdec; bool has_hevc_ext_sps_rps_hantro;` — one per fd. h265 path only consults the rkvdec one (HEVC routes through rkvdec); hantro one stays false naturally (hantro doesn't have rkvdec-specific controls). + +Plus the explicit `if (kind == 'r' && has_hevc_ext_sps_rps_rkvdec)` guard for human-readable intent regardless of probe correctness. + +Phase 4 amendment: §Step 3 storage changes from single bool to per-driver-kind pair on `struct request_data`; §Step 4 control-submission guard becomes `kind == 'r' && has_hevc_ext_sps_rps_rkvdec`. + +## Finding #3 — SPS NAL caching strategy (ADOPT, critical) + +> This is the most significant open issue the plan glosses over. […] For HEVC decode, ffmpeg-vaapi submits the full "extradata" (SPS + PPS NALs) as additional `VASliceData` buffers before the IDR frame slice. So the SPS NAL is present in `source_data` at frame 0, concatenated before the IDR slice. The NAL walk (look for `nal_unit_type == 33`) will find it — but only when ffmpeg sends it, which is not every frame. For non-IDR frames, `source_data` contains only the slice NALs and the walk will not find an SPS. The plan needs to address this explicitly: parse and cache the SPS at the first frame where it appears, and reuse the cached parser output for subsequent frames. + +**Verdict**: adopt. + +Phase 4 implicitly treated SPS parsing as per-frame, which would (a) waste work on every frame and (b) silently submit zero-filled RPS arrays for non-IDR frames, which would re-trigger the OOPS. + +Storage shape: cache the parsed `GstH265SPS` (which contains both `GstH265ShortTermRefPicSet[]` AND `GstH265ShortTermRefPicSetExt[]`) on `struct request_data` keyed by SPS id, OR on `struct object_surface` since each surface has access to driver_data anyway. Phase 4 §Step 4 amends to: + +1. At first `h265_set_controls` call OR at any frame where the SPS NAL is found in `source_data`: parse the SPS, store the parsed `GstH265SPS` on `driver_data->cached_hevc_sps[sps_id]`. +2. On subsequent frames: skip SPS parsing if a cached SPS exists for the seq_parameter_set_id referenced by the current PPS. Reuse the cached parser output to fill the EXT_SPS_*_RPS controls. +3. Cache invalidation: clear cached SPS on context destroy. + +This adds ~20 LOC to Phase 4's step-4 surface estimate. + +## Finding #4 — C3 prediction caveat (ADOPT, minor) + +> The shared-SHA claim requires a caveat. HEVC frame 0 decoded from a different encoder's output than the H.264/VP8/MPEG-2 frame 0 will not necessarily produce identical pixel values even from the same source content, because the HEVC encoder may have used a different quantization step or reference frame structure than the H.264 encoder did. […] Adjust the prediction to "frame 0 HW-decoded SHA matches SW-decoded SHA for the HEVC clip" rather than matching H.264's SHA. + +**Verdict**: adopt. + +This is a real precision issue in ampere-fourier iter1 Phase 3 too — there I observed sha `3214803d8be74416` was the same across H.264, VP8, MPEG-2 frame 0, but that was lucky empirical convergence (encoders happened to produce bit-identical I-frames at this quality level for THIS source), not guaranteed. For HEVC, the encoder is x265 fast CRF 28, different bitrate target than the other three (5.1 MB vs 13.6/10.5/59.3 MB) — quantization will differ. + +Phase 4 §"Phase 7 predictions" C3 row amends from: +> PASS — HEVC I-frame at t=0, no inter-prediction, same sha as ampere-fourier iter1 H.264/VP8/MPEG-2 (`3214803d8be74416`) + +to: +> PASS — HEVC frame 0 HW-decode SHA matches HEVC SW-decode SHA from the same encoded clip (anchored per-clip, not cross-codec). The cross-codec convergence observed in iter1 was empirical-not-guaranteed. + +## Phase 4 amendments summary + +Three concrete changes to phase4_plan_iter2.md (deferred to a brief diff commit after this review file lands): +1. §Step 3 — `has_hevc_ext_sps_rps` storage changes from single bool to per-driver-kind pair (`_rkvdec` / `_hantro`). +2. §Step 4 — explicit two-struct usage (`GstH265ShortTermRefPicSet` + `GstH265ShortTermRefPicSetExt`); SPS parse-and-cache strategy; cache keyed by sps_id on `driver_data`. +3. §"Phase 7 predictions" C3 — anchor SHA per-clip, not cross-codec. + +Risk register gets one addition (per reviewer): SPS NAL absent from non-IDR frames if SPS parsing isn't cached. Mitigation IS the caching strategy above; risk #6 in updated register. + +## Summary + +Reviewer surfaced 1 critical finding that empirical verification refuted (GstH265SPSEXT gap — was actually two structs both in the vendored header), and 3 finding that genuinely tighten the plan (per-fd storage, SPS caching, C3 anchor). All adopted findings result in concrete Phase 4 amendments before Phase 6 starts. + +Per `feedback_review_empirical_over_theoretical`: the rebut on Finding #1 was done by reading the actual vendored header content, not by source-reading the reviewer's argument. Empirical evidence won. + +The plan is sound with the three amendments. Phase 6 can proceed.