Plan subagent (sonnet) reviewed Phase 0-4 with raw artifacts. Four findings: 1. GstH265SPSEXT data-source gap — REBUT with empirical evidence. Reviewer confused GstH265SPSEXT (SPS extension params, separate struct) with GstH265ShortTermRefPicSetExt (RPS-extended struct, contains the 4 fields they flagged as missing). Both ShortTermRefPicSet AND ShortTermRefPicSetExt ARE in the vendored gsth265parser.h. Direct read of fetched header confirmed. 2. Per-fd storage for has_hevc_ext_sps_rps — ADOPT. Mirror iter38 pattern of storing rkvdec + hantro fds separately. Add explicit driver_kind=='r' gate for human-readable intent. 3. SPS NAL caching strategy — ADOPT, critical. SPS NALs only arrive at IDR frames; per-frame walk would submit zero-filled RPS for non-IDR frames and re-OOPS. Parse-and-cache at first IDR, reuse on subsequent frames. 4. C3 prediction caveat — ADOPT. Anchor SHA per-clip (HEVC HW vs HEVC SW) not cross-codec; iter1's shared SHA across codecs was lucky empirical convergence, not guaranteed. Three Phase 4 amendments applied as appendix to phase4_plan_iter2.md: - §Step 3 — per-driver-kind probe storage (pair instead of scalar) - §Step 4 — explicit two-struct mapping table; SPS parse-and-cache - §Phase 7 predictions C3 — anchor per-clip Risk register gains risk #6 (SPS absent on non-IDR frames). Per feedback_review_empirical_over_theoretical: the Finding #1 rebut was done by reading the actual vendored header file content, not by source-reading the reviewer's argument. Empirical evidence won, as the memory rule requires. Plan sound with amendments. Phase 6 can proceed. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8.8 KiB
Phase 5 — iter2 second-model review (Plan/sonnet)
Reviewed 2026-05-16 evening by Plan subagent with model: sonnet. Full Phase 0-4 markdown pasted verbatim (no curation per feedback_dev_process §5). Same guardrails as ampere-fourier iter1 Phase 5 + the new feedback_upstream_alignment_over_speed explicitly flagged.
Reviewer surfaced 4 findings. Per feedback_review_empirical_over_theoretical, I test-verified the most consequential before adopting/rebutting.
Finding #1 — GstH265SPSEXT data-source gap (REFUTED empirically)
Looking at the actual GStreamer reference implementation (
gstv4l2codech265dec.c:940),gst_v4l2_codec_h265_dec_fill_ext_sps_rpsuses two distinct data sources: (1)sps->short_term_ref_pic_set[i]— inGstH265ShortTermRefPicSet, populated bygsth265parser.c. (2)sps_ext->short_term_ref_pic_set_ext[i]— comes fromGstH265SPSEXTviagst_h265_decoder_get_sps_ext(decoder, sps). This function is part ofgst-plugins-bad's H.265 decoder base class, not fromgsth265parser.c. […] An additional struct/file beyond the two planned vendor files is needed.
Verdict: rebut with empirical evidence.
Direct read of the fetched gsth265parser.h shows the vendored header defines TWO structs (not one):
struct _GstH265ShortTermRefPicSet (~line 939):
guint8 inter_ref_pic_set_prediction_flag;
guint8 delta_idx_minus1;
guint8 delta_rps_sign;
guint16 abs_delta_rps_minus1;
guint8 NumDeltaPocs;
guint8 NumNegativePics;
guint8 NumPositivePics;
guint8 UsedByCurrPicS0[16];
guint8 UsedByCurrPicS1[16];
gint32 DeltaPocS0[16];
gint32 DeltaPocS1[16];
guint8 NumDeltaPocsOfRefRpsIdx;
struct _GstH265ShortTermRefPicSetExt (~line 966; documented as Since: 1.28 — landed alongside the V4L2 EXT_SPS_*_RPS work):
guint8 use_delta_flag[16];
guint8 used_by_curr_pic_flag[16];
guint32 delta_poc_s0_minus1[16];
guint32 delta_poc_s1_minus1[16];
The reviewer correctly identified that GStreamer's fill_ext_sps_rps uses TWO data sources, but conflated GstH265SPSEXT (a separate struct that holds SPS-extension parameters like SCC) with GstH265ShortTermRefPicSetExt (the RPS-extended struct that has the four fields the reviewer flagged as missing). Both GstH265ShortTermRefPicSet and GstH265ShortTermRefPicSetExt are in the SAME vendored gsth265parser.h. So the planned vendoring scope (just gsth265parser.c + .h + gstbitreader.c + .h) IS sufficient.
gst_h265_decoder_get_sps_ext() is likely a thin accessor that returns the parsed GstH265ShortTermRefPicSetExt array which gsth265parser.c writes during SPS parsing. If that accessor lives in GstH265Decoder base class instead of in gsth265parser.c, we just inline-equivalent it in our integration — direct field access on the GstH265SPS struct's short_term_ref_pic_set and short_term_ref_pic_set_ext arrays.
Phase 4 §Step 4 doesn't need an additional vendor file. It DOES need to call out the two-struct usage explicitly so Phase 6's implementation maps both GstH265ShortTermRefPicSet fields AND GstH265ShortTermRefPicSetExt fields onto the single struct v4l2_ctrl_hevc_ext_sps_st_rps entry. Amending Step 4 accordingly.
Finding #2 — per-fd storage for has_hevc_ext_sps_rps (ADOPT)
The plan should make explicit that
has_hevc_ext_sps_rpsis probed per-device-fd and stored per-device (parallel to howvideo_fd_rkvdec/video_fd_hantroare stored separately), not as a single scalar ondriver_data. Otherwise a future codec routing change could silently misbehave. Adding an explicitif (kind == 'r' && has_hevc_ext_sps_rps)guard inh265_set_controlsalso provides a cheap, human-readable explanation of intent — worth the two lines regardless.
Verdict: adopt.
The iter38 multi-device-probe pattern (memory feedback_multi_device_probe_design) stores rkvdec and hantro fds separately on struct request_data. Mirroring that, the probe result should be bool has_hevc_ext_sps_rps_rkvdec; bool has_hevc_ext_sps_rps_hantro; — one per fd. h265 path only consults the rkvdec one (HEVC routes through rkvdec); hantro one stays false naturally (hantro doesn't have rkvdec-specific controls).
Plus the explicit if (kind == 'r' && has_hevc_ext_sps_rps_rkvdec) guard for human-readable intent regardless of probe correctness.
Phase 4 amendment: §Step 3 storage changes from single bool to per-driver-kind pair on struct request_data; §Step 4 control-submission guard becomes kind == 'r' && has_hevc_ext_sps_rps_rkvdec.
Finding #3 — SPS NAL caching strategy (ADOPT, critical)
This is the most significant open issue the plan glosses over. […] For HEVC decode, ffmpeg-vaapi submits the full "extradata" (SPS + PPS NALs) as additional
VASliceDatabuffers before the IDR frame slice. So the SPS NAL is present insource_dataat frame 0, concatenated before the IDR slice. The NAL walk (look fornal_unit_type == 33) will find it — but only when ffmpeg sends it, which is not every frame. For non-IDR frames,source_datacontains only the slice NALs and the walk will not find an SPS. The plan needs to address this explicitly: parse and cache the SPS at the first frame where it appears, and reuse the cached parser output for subsequent frames.
Verdict: adopt.
Phase 4 implicitly treated SPS parsing as per-frame, which would (a) waste work on every frame and (b) silently submit zero-filled RPS arrays for non-IDR frames, which would re-trigger the OOPS.
Storage shape: cache the parsed GstH265SPS (which contains both GstH265ShortTermRefPicSet[] AND GstH265ShortTermRefPicSetExt[]) on struct request_data keyed by SPS id, OR on struct object_surface since each surface has access to driver_data anyway. Phase 4 §Step 4 amends to:
- At first
h265_set_controlscall OR at any frame where the SPS NAL is found insource_data: parse the SPS, store the parsedGstH265SPSondriver_data->cached_hevc_sps[sps_id]. - On subsequent frames: skip SPS parsing if a cached SPS exists for the seq_parameter_set_id referenced by the current PPS. Reuse the cached parser output to fill the EXT_SPS_*_RPS controls.
- Cache invalidation: clear cached SPS on context destroy.
This adds ~20 LOC to Phase 4's step-4 surface estimate.
Finding #4 — C3 prediction caveat (ADOPT, minor)
The shared-SHA claim requires a caveat. HEVC frame 0 decoded from a different encoder's output than the H.264/VP8/MPEG-2 frame 0 will not necessarily produce identical pixel values even from the same source content, because the HEVC encoder may have used a different quantization step or reference frame structure than the H.264 encoder did. […] Adjust the prediction to "frame 0 HW-decoded SHA matches SW-decoded SHA for the HEVC clip" rather than matching H.264's SHA.
Verdict: adopt.
This is a real precision issue in ampere-fourier iter1 Phase 3 too — there I observed sha 3214803d8be74416 was the same across H.264, VP8, MPEG-2 frame 0, but that was lucky empirical convergence (encoders happened to produce bit-identical I-frames at this quality level for THIS source), not guaranteed. For HEVC, the encoder is x265 fast CRF 28, different bitrate target than the other three (5.1 MB vs 13.6/10.5/59.3 MB) — quantization will differ.
Phase 4 §"Phase 7 predictions" C3 row amends from:
PASS — HEVC I-frame at t=0, no inter-prediction, same sha as ampere-fourier iter1 H.264/VP8/MPEG-2 (
3214803d8be74416)
to:
PASS — HEVC frame 0 HW-decode SHA matches HEVC SW-decode SHA from the same encoded clip (anchored per-clip, not cross-codec). The cross-codec convergence observed in iter1 was empirical-not-guaranteed.
Phase 4 amendments summary
Three concrete changes to phase4_plan_iter2.md (deferred to a brief diff commit after this review file lands):
- §Step 3 —
has_hevc_ext_sps_rpsstorage changes from single bool to per-driver-kind pair (_rkvdec/_hantro). - §Step 4 — explicit two-struct usage (
GstH265ShortTermRefPicSet+GstH265ShortTermRefPicSetExt); SPS parse-and-cache strategy; cache keyed by sps_id ondriver_data. - §"Phase 7 predictions" C3 — anchor SHA per-clip, not cross-codec.
Risk register gets one addition (per reviewer): SPS NAL absent from non-IDR frames if SPS parsing isn't cached. Mitigation IS the caching strategy above; risk #6 in updated register.
Summary
Reviewer surfaced 1 critical finding that empirical verification refuted (GstH265SPSEXT gap — was actually two structs both in the vendored header), and 3 finding that genuinely tighten the plan (per-fd storage, SPS caching, C3 anchor). All adopted findings result in concrete Phase 4 amendments before Phase 6 starts.
Per feedback_review_empirical_over_theoretical: the rebut on Finding #1 was done by reading the actual vendored header content, not by source-reading the reviewer's argument. Empirical evidence won.
The plan is sound with the three amendments. Phase 6 can proceed.