From 9865416ed297fd97baec89fbb0e57bf249bd332e Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sun, 10 May 2026 05:49:13 +0000 Subject: [PATCH] =?UTF-8?q?iter4=20Phase=205:=20sonnet-architect=20review?= =?UTF-8?q?=20=E2=80=94=204=20Critical=20findings,=20all=20amendments=20in?= =?UTF-8?q?corporated?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Review by sonnet-architect with cold-context source reads of fork + kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source. Reviewer applied Direction 2 (empirical-over-theoretical) by test-compiling struct sizes, gcc-c-checking VAAPI field accesses, and source-tracing FFmpeg's filter-mode XOR provenance. Critical findings (all empirically validated by author before incorporation per feedback_review_empirical_over_theoretical.md): C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies `filtermode ^ (filtermode <= 1)` when filling VAAPI's mcomp_filter_type. Plan's second XOR was incorrect; would swap EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong loop-filter strength. Fix: direct assignment, no XOR. C2 - LF deltas not persistent: kernel UAPI explicitly says "users should pass its last value" when delta_update=0. Plan memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf state to object_context, init to VP9 spec defaults, update only when parser sees delta_update=1, always copy to kernel control. C3 - reference_mode out-parameter missing: reference_mode lives in FRAME struct, not COMPRESSED_HDR. Plan referenced `compressed_hdr_reference_mode` placeholder which would be an undefined identifier -> compile failure. Fix: add `uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr; derive `allowcompinter` at call site from the 3 sign biases. C4 - Mitigation B scope claim overstated: walk-and-pick-first always selects rkvdec on 7.0 (since video1 enumerates first). Hantro codecs (MPEG-2, VP8) at video3 still require env override. Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_ AUTODETECT=1 escape hatch for legacy callers. 6 Suggested (S1-S6): all confirm plan correctness OR are scope- aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec) baked into Clause 9 amended text. Without this review, iter4 Phase 6 would have failed first compile (C3) + produced wrong inter-frame output (C1+C2) + caused user confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 -> Phase 4 loopback + 1 doc correction. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter4_plan.md | 184 ++++++++++++++++++++-------- phase5_iter4_review.md | 266 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 400 insertions(+), 50 deletions(-) create mode 100644 phase5_iter4_review.md diff --git a/phase4_iter4_plan.md b/phase4_iter4_plan.md index 1170f13..39b0115 100644 --- a/phase4_iter4_plan.md +++ b/phase4_iter4_plan.md @@ -9,7 +9,7 @@ Phase 3 baseline provides: - Frame-1 keyframe verbatim payload prefix (`lf.ref_deltas={1,0,-1,-1}`, `lf.mode_deltas={0,0}`, `quant.base_q_idx=46`) - 4-codec regression root-cause: `/dev/video0` is now `rockchip-rga` on 7.0; backend hardcodes `/dev/video0` in `request.c:149` -User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. Restores baseline functionality on 7.0. +User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. **Phase 5 C4 amendment** — partial restoration only: walk-and-pick-first returns rkvdec on 7.0 (rkvdec at video1 enumerates first). Restores H.264 / HEVC / VP9 (rkvdec codecs) without env override. MPEG-2 and VP8 (hantro at video3) STILL require explicit env override `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1`. Full multi-decoder dispatch is iter4-B1 backlog. ## Contract clauses @@ -130,42 +130,61 @@ if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bi if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT; ``` -### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse) +### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse with persistent LF-delta state) VAAPI exposes `filter_level` and `sharpness_level` (Clause 4), but NOT `lf_delta_enabled`/`lf_delta_update`/`lf_ref_delta[4]`/`lf_mode_delta[2]`. Phase 3 keyframe anchor shows `lf.ref_deltas={1,0,-1,-1}` (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch. VAAPI also doesn't expose `quant.base_q_idx` / `delta_q_y_dc` / `delta_q_uv_dc` / `delta_q_uv_ac`. Phase 3 keyframe anchor shows `base_q_idx=46`; leaving zero produces wrong dequant scale. -Solution: implement a minimal uncompressed-header parser (`vp9_parse_uncompressed_header_lf_quant`) that reads `surface_object->source_data` from offset 0 and extracts the 6 needed fields. The parse runs from offset 0 through the loop-filter and quantization syntax sections (per VP9 spec 6.2 §6.2.4–6.2.5): +**Phase 5 C2 amendment — LF deltas are persistent across frames**: Per kernel UAPI `:2578` verbatim: + +> "If this syntax element is not present in the bitstream, users should pass its last value." + +VP9 spec (matching FFmpeg `vp9.c:666-671`) initializes `lf_delta.ref = {1,0,-1,-1}` and `lf_delta.mode = {0,0}` on keyframe/error-resilient/intra-only, then updates only when `lf_delta.updated == 1`. FFmpeg `v4l2_request_vp9.c:298-302` always copies the persistent state to the kernel control (NOT conditional on `updated`). Memset-zero per-frame loses the persistent state — wrong on inter frames where typically `delta_enabled=1` and `delta_update=0`. + +Solution: add persistent LF-delta state to `object_context` (per-decode-session scope), initialize to VP9 spec defaults, update only when the parsed bitstream's `lf_delta_update` is set, always copy to the kernel control: ```c -static void vp9_parse_uncompressed_header_lf_quant( - const uint8_t *data, uint32_t size, uint32_t header_size, - struct v4l2_vp9_loop_filter *lf, - struct v4l2_vp9_quantization *quant) -{ - /* Bit reader walks frame_marker, profile, show_existing_frame, - * frame_type, show_frame, error_resilient_mode, color_config (if - * keyframe), frame_size_with_refs (if not keyframe), tile_info ... - * up to loop_filter_params + quantization_params syntax sections. - * - * Approach: bit-perfect VP9 spec port for ~50 LOC, reusing the - * VPX bitstream reader (see Clause 8). Fields written: - * lf->ref_deltas[0..3], lf->mode_deltas[0..1], - * lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED if set - * lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE if set - * quant->base_q_idx, - * quant->delta_q_y_dc, delta_q_uv_dc, delta_q_uv_ac - */ - ... +/* In src/context.h, inside object_context (added by Commit B/C): */ +struct { + int8_t ref_deltas[4]; /* VP9 spec default: {1,0,-1,-1} */ + int8_t mode_deltas[2]; /* VP9 spec default: {0,0} */ + bool initialized; /* false until first VP9 frame */ +} vp9_lf; + +/* In Clause 6 fill (vp9.c): */ +if (!context->vp9_lf.initialized || + keyframe || intra_only || error_resilient) { + context->vp9_lf.ref_deltas[0] = 1; + context->vp9_lf.ref_deltas[1] = 0; + context->vp9_lf.ref_deltas[2] = -1; + context->vp9_lf.ref_deltas[3] = -1; + context->vp9_lf.mode_deltas[0] = 0; + context->vp9_lf.mode_deltas[1] = 0; + context->vp9_lf.initialized = true; } + +/* Parse uncompressed header — only updates context state when + * delta_update=1 in the bitstream: */ +vp9_parse_uncompressed_header_lf_quant( + surface_object->source_data, + surface_object->source_size, + frame.uncompressed_header_size, + &frame.lf, /* receives flags + level + sharpness */ + &frame.quant, /* receives base_q_idx + deltas */ + context->vp9_lf.ref_deltas, /* in/out — updated only if parser sees delta_update=1 */ + context->vp9_lf.mode_deltas); /* same */ + +/* ALWAYS copy persistent state to the kernel control: */ +memcpy(frame.lf.ref_deltas, context->vp9_lf.ref_deltas, 4); +memcpy(frame.lf.mode_deltas, context->vp9_lf.mode_deltas, 2); ``` -**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. +The parser (`vp9_parse_uncompressed_header_lf_quant`) walks frame_marker → profile → show_existing_frame → frame_type → show_frame → error_resilient_mode → frame_sync_code → color_config (profile-dependent) → frame_size → render_and_frame_size_different → refresh_frame_flags → loop_filter_params → quantization_params. Per Phase 5 Q1 reviewer note: ~80–100 LOC for a faithful profile-0 keyframe + profile-0 inter-frame parse (revised up from Phase 4's "~50 LOC" estimate). -**Per memory `feedback_review_empirical_over_theoretical.md` Direction 2**: Phase 5 review must verify the parser by extracting these 9 fields from the actual BBB keyframe bitstream (start of `bbb_720p10s_vp9.webm` first frame) and comparing against Phase 3 anchor. If any field disagrees, Phase 5 returns "Critical: parser bug" and Phase 4 loops. +**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. After C2 fix, frame 2 (inter) should also show `lf.ref_deltas={1,0,-1,-1}` from the persisted state — verifiable by strace. -**Out of iter4 scope**: full uncompressed-header parse (color_config, frame_size for inter, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse, not a general bitstream reader. +**Out of iter4 scope**: full uncompressed-header parse (color_config full variant, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse. ### Clause 7 — Segmentation mapping @@ -240,9 +259,37 @@ Syntax elements parsed (per VP9 spec 6.3): Each updated value goes through `inv_map_table[d]`. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability"). -**Lossless special case**: if `s->s.h.lossless` would be set, FFmpeg writes `tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally. We don't have direct access to `lossless` from VAAPI, but `picture->pic_fields.bits.lossless_flag` (bit 31 of pic_fields) maps directly. Read it and apply the same special case. +**Phase 5 C3 amendment — `reference_mode` out-parameter**: The kernel's `reference_mode` field lives in `v4l2_ctrl_vp9_frame` (`:2763`), NOT in `v4l2_ctrl_vp9_compressed_hdr`. FFmpeg's parser reads it from a local `enum CompPredMode comppredmode` during compressed-header parsing (`v4l2_request_vp9.c:104,150-160`) and assigns to FRAME's `reference_mode` separately. Our backend must thread the parsed value out via an explicit out-parameter — without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` references an undefined variable → compile failure. -**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 5/Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output"). +Function signature (Phase 5 C3-amended): + +```c +static void vp9_fill_compressed_hdr( + struct v4l2_ctrl_vp9_compressed_hdr *compressed_hdr, + const uint8_t *buf, uint32_t size, + uint8_t lossless_flag, + bool keyframe_or_intraonly, + bool allowcompinter, + uint8_t *out_reference_mode); /* C3: NEW out-param */ +``` + +`allowcompinter` is true when the sign biases of the 3 active references are not all identical (per VP9 spec 6.4.4). Derive at the call site from `picture->pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias`: + +```c +bool allowcompinter = !( + picture->pic_fields.bits.last_ref_frame_sign_bias == + picture->pic_fields.bits.golden_ref_frame_sign_bias && + picture->pic_fields.bits.golden_ref_frame_sign_bias == + picture->pic_fields.bits.alt_ref_frame_sign_bias); +``` + +For keyframes (`!keyframe_or_intraonly == false`): write `*out_reference_mode = 0` (PRED_SINGLEREF) unconditionally. For inter frames without compinter: same. With compinter: extract from the compressed-header range coder per FFmpeg `v4l2_request_vp9.c:150-160`. + +**Lossless special case**: if `picture->pic_fields.bits.lossless_flag` is set, the parser writes `compressed_hdr->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally per FFmpeg `v4l2_request_vp9.c:117-118`. VAAPI's `lossless_flag` semantics align with FFmpeg's `s->s.h.lossless` per Phase 5 S5 verification (both encode `base_q==0 && y_dc_delta_q==0 && uv_dc_delta_q==0 && uv_ac_delta_q==0`). + +**`uv_mode` memcpy omission** (Phase 5 S4): FFmpeg's `fill_compressed_hdr` ends with `memcpy(ctrl->uv_mode, s->prob.p.uv_mode, sizeof(ctrl->uv_mode))`. rkvdec's `rkvdec-vp9.c:280-287` reads `uv_mode` from `vp9_ctx->probability_tables` (the persistent frame_context table inside the kernel), NOT from the compressed_hdr control. Omitting the memcpy is safe for rkvdec; document and skip. + +**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output"). ### Clause 10 — Frame flags + reference_mode + interpolation_filter @@ -272,16 +319,27 @@ frame.reset_frame_context = ? picture->pic_fields.bits.reset_frame_context - 1 : 0; -/* interpolation_filter: FFmpeg uses (filtermode ^ (filtermode <= 1)). - * VAAPI's mcomp_filter_type is 3 bits (0..7); kernel enum is 0..4. - * The XOR remap aligns FFmpeg's internal filter_mode enum to V4L2's. */ -frame.interpolation_filter = - picture->pic_fields.bits.mcomp_filter_type ^ - (picture->pic_fields.bits.mcomp_filter_type <= 1); +/* Phase 5 C1: VAAPI's mcomp_filter_type is ALREADY post-XOR. + * vaapi_vp9.c:62 applies `h->h.filtermode ^ (h->h.filtermode <= 1)` + * before storing into VAAPI's mcomp_filter_type. Re-applying the XOR + * here would undo the alignment and swap EIGHTTAP/EIGHTTAP_SMOOTH for + * inter frames — wrong loop-filter selection on BBB frames 2-5. + * Direct assignment is correct. */ +frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type; -/* reference_mode: comes from compressed-header parse (NOT VAAPI). - * Read from compressed_hdr's parsed state (see Clause 9). */ -frame.reference_mode = compressed_hdr_reference_mode; /* state from Clause 9 */ +/* Phase 5 C3: reference_mode is in v4l2_ctrl_vp9_frame (offset 160), + * NOT in v4l2_ctrl_vp9_compressed_hdr. Filled by Clause 9's parser + * via the out_reference_mode parameter. */ +uint8_t parsed_reference_mode = 0; /* default for keyframe */ +vp9_fill_compressed_hdr(&compressed_hdr, + surface_object->source_data + frame.uncompressed_header_size, + frame.compressed_header_size, + picture->pic_fields.bits.lossless_flag, + !picture->pic_fields.bits.frame_type || + picture->pic_fields.bits.intra_only, + allowcompinter, + &parsed_reference_mode); +frame.reference_mode = parsed_reference_mode; ``` **Anchor**: Phase 3 verbatim — keyframe `reset_frame_context=0, interpolation_filter=0` (VAAPI's `mcomp_filter_type=0` XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes. @@ -369,19 +427,25 @@ static int find_codec_device(char video_path[32], char media_path[32]) return -1; } -/* In RequestInit: */ -video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH"); -if (video_path == NULL) { - static char auto_video[32], auto_media[32]; - if (find_codec_device(auto_video, auto_media) == 0) { - video_path = auto_video; - if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL) - media_path = auto_media; - request_log("auto-selected codec device: %s + %s\n", - video_path, media_path); - } else { - video_path = "/dev/video0"; /* keep old fallback for callers - we can't enumerate */ +/* In RequestInit (Phase 5 C4 amendment — escape hatch for legacy callers): */ +if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) { + /* Skip walk; old hardcoded behavior. Lets test automation that + * relied on /dev/video0 opt out of the new auto-detect logic. */ + video_path = "/dev/video0"; + media_path = "/dev/media0"; +} else { + video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH"); + if (video_path == NULL) { + static char auto_video[32], auto_media[32]; + if (find_codec_device(auto_video, auto_media) == 0) { + video_path = auto_video; + if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL) + media_path = auto_media; + request_log("auto-selected codec device: %s + %s\n", + video_path, media_path); + } else { + video_path = "/dev/video0"; /* fall back to legacy default */ + } } } ``` @@ -484,7 +548,7 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f | 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` | | 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers | | 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via `mpv -v` log per memory `feedback_hw_decode_engagement_check.md` | -| 5. 4-codec regression | Commit Z restores baseline (mitigation B); Commits A+B+C add new VP9 path purely additively (no shared-state mutation) | +| 5. 4-codec regression | Commit Z restores rkvdec-codec baseline without env override (H.264, HEVC). MPEG-2, VP8 (hantro) still require `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3` env override per Phase 5 C4. Commits A+B+C add new VP9 path purely additively (no shared-state mutation in MPEG-2/HEVC/H.264/VP8 paths). | ## Substrate state at Phase 4 close @@ -493,3 +557,23 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f - All Phase 3 anchors captured + preserved on fresnel `/tmp/iter4_phase3/` and `noether:~/src/fresnel-fourier/iter4_phase3.tgz`. - Memory rules carry forward; new `reference_fresnel_kernel_substrate.md` covers post-besser substrate. - Phase 4 plan ready for sonnet-architect review (Phase 5). + +## Phase 5 amendments incorporated (post-review) + +Sonnet-architect review (`phase5_iter4_review.md`) returned 4 Critical findings + 6 Suggested. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place above: + +| ID | Site | Change | Verified | +|---|---|---|---| +| C1 | Clause 10 | `frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type` (direct, NO XOR; vaapi_vp9.c:62 already applies the XOR before storing into VAAPI's mcomp_filter_type) | ✓ grep `vaapi_vp9.c:62` | +| C2 | Clause 6 | Add persistent `vp9_lf.{ref_deltas[4], mode_deltas[2]}` to `object_context`, init to VP9 spec `{1,0,-1,-1,0,0}`, update only when parser sees `lf_delta.update=1`, ALWAYS copy to kernel control. Without persistence, BBB inter frames send `{0,0,0,0,0,0}` instead of `{1,0,-1,-1,0,0}` → kernel applies wrong loop-filter strength | ✓ kernel UAPI `v4l2-controls.h:2578` "users should pass its last value"; FFmpeg `vp9.c:666-671` defaults | +| C3 | Clauses 9 + 10 | Add `out_reference_mode` parameter to `vp9_fill_compressed_hdr`. Without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` is undefined identifier (compile failure). `reference_mode` lives in v4l2_ctrl_vp9_frame, not _compressed_hdr; FFmpeg uses local `enum CompPredMode comppredmode` and threads it through return path | ✓ `v4l2-controls.h:2763` (FRAME) + `v4l2_request_vp9.c:104` (local var) | +| C4 | Mitigation B + criterion-5 trace + request.c skeleton | (a) Qualify wording: "Restores rkvdec-codec baseline" (NOT all 4 codecs); MPEG-2/VP8 still need env override for hantro path. (b) Add `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` escape hatch for legacy callers | ✓ Plan line 12 vs 393 contradiction; Phase 3 baseline § 4-codec regression table | + +S1, S2, S3, S5, S6 either confirm correctness of existing plan or are scope-aligned non-issues; no amendment needed. S4 (`uv_mode` memcpy omission) baked into Clause 9's amended text. + +Without the Phase 5 review, iter4 Phase 6 would have: +- **Failed first compile attempt** (C3) — undefined `compressed_hdr_reference_mode` identifier +- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged on inter frames (interpolation_filter swapped, ref_deltas zeroed) +- **Then triggered user confusion when MPEG-2/VP8 didn't engage** (C4) — false expectation that `vainfo` would just work + +Estimated saving: 1 Phase 6 compile failure + 1 Phase 7 → Phase 4 loopback for inter-frame byte mismatch + 1 documentation correction. Per memory rule "Reviews are never skippable." diff --git a/phase5_iter4_review.md b/phase5_iter4_review.md new file mode 100644 index 0000000..b5c870f --- /dev/null +++ b/phase5_iter4_review.md @@ -0,0 +1,266 @@ +# Iteration 4 — Phase 5 (review) + +Second-model review of `phase4_iter4_plan.md` by sonnet-architect on 2026-05-10. Reviewer performed cold-context source reads of: fork files at iter3 tip (`e1aca9c`), kernel UAPI at `/usr/include/linux/v4l2-controls.h`, VAAPI header `va_dec_vp9.h`, FFmpeg V4L2-request reference `v4l2_request_vp9.c`, FFmpeg VAAPI reference `vaapi_vp9.c`, FFmpeg internal VP9 decoder `vp9.c` + `vp9shared.h`, and kernel sources `v4l2-vp9.c` + `rkvdec-vp9.c`. Struct sizes verified by compile (`gcc`). VAAPI field accesses verified by compile. All Critical findings were verified empirically before inclusion per memory `feedback_review_empirical_over_theoretical.md` Direction 2. + +**Overall plan health**: The mapping logic for Clauses 1–5 and Clause 8–9's VPX rac port strategy is sound. Three correctness bugs were found: a double-XOR on interpolation_filter (C1, critical for inter frames), a missing persistent-state mechanism for loop-filter deltas across frames (C2, critical for criterion-4 on BBB frames 2–5), and an underspecified function signature for reference_mode extraction from the compressed-header parser (C3). All three will produce Phase 7 byte-mismatches without amendment. Commit Z has a documented but partially-incorrect scope claim (C4). + +--- + +## Critical findings (BLOCKING Phase 6) + +### C1 — `interpolation_filter` double-XOR: VAAPI delivers post-remap value; plan re-remaps it + +**Site**: Clause 10, plan lines ~279–280. + +**Issue**: The plan applies `mcomp_filter_type ^ (mcomp_filter_type <= 1)` to convert VAAPI's `mcomp_filter_type` to V4L2's `interpolation_filter`. But the FFmpeg VAAPI consumer (`vaapi_vp9.c:62`) already applies this exact XOR when filling `mcomp_filter_type`: + +```c +// vaapi_vp9.c:62: +.mcomp_filter_type = h->h.filtermode ^ (h->h.filtermode <= 1), +``` + +The XOR swaps FFmpeg's `FILTER_8TAP_SMOOTH` (0) and `FILTER_8TAP_REGULAR` (1) to align with V4L2 enum order (`V4L2_VP9_INTERP_FILTER_EIGHTTAP=0` = regular, `EIGHTTAP_SMOOTH=1`). VAAPI's `mcomp_filter_type` is therefore already in V4L2 order. The plan applying a second XOR undoes the first, reverting to FFmpeg's internal enum — the opposite of what V4L2 expects. + +**Evidence** (empirically verified by tracing all values): + +| FFmpeg FilterMode | VAAPI `mcomp_filter_type` (post-XOR) | Plan applies XOR again → | V4L2 expected | +|---|---|---|---| +| SMOOTH=0 | 1 (EIGHTTAP_SMOOTH) | 1^1=0 (EIGHTTAP) | **1 WRONG** | +| REGULAR=1 | 0 (EIGHTTAP) | 0^1=1 (EIGHTTAP_SMOOTH) | **0 WRONG** | +| SHARP=2 | 2 | 2^0=2 | 2 OK | +| BILINEAR=3 | 3 | 3^0=3 | 3 OK | +| SWITCHABLE=4 | 4 | 4^0=4 | 4 OK | + +`vaapi_vp9.c` verified at `/home/mfritsche/src/libva-multiplanar/references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c:62`. `v4l2_request_vp9.c` applies XOR to FFmpeg's raw `filtermode` at `v4l2_request_vp9.c:292` — this is the operation that VAAPI already did. + +The plan's own footnote ("XOR with (0 <= 1)=1 → 1 hmm") flags this discrepancy but leaves it unresolved. + +The bug affects only SMOOTH and REGULAR — values 2, 3, 4 are unaffected. For keyframes, the interpolation filter is irrelevant (all-intra). The bug surfaces on BBB inter frames (frames 2–5 in the criterion-4 test), where EIGHTTAP_REGULAR is swapped for EIGHTTAP_SMOOTH → wrong deblocking filter selection → criterion-4 byte mismatch. + +**Proposed fix**: + +```c +/* VAAPI mcomp_filter_type is already in V4L2 enum order. + * vaapi_vp9.c:62 applied the XOR when filling VAAPI's mcomp_filter_type. + * Direct assignment, no re-remap needed. */ +frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type; +``` + +**Empirical validation**: Author should strace a BBB frame-2 submission under FFmpeg `v4l2request` and read the `interpolation_filter` byte from the FRAME payload (offset 157 in the 168-byte struct). Then LIBVA_TRACE-log the same frame under mpv/ffmpeg-vaapi to extract `mcomp_filter_type`. They should be EQUAL. If the plan's XOR were applied, the byte would differ by 1 for SMOOTH/REGULAR content. + +--- + +### C2 — Loop-filter `ref_deltas`/`mode_deltas` not persistent across frames + +**Site**: Clause 6; `vp9_parse_uncompressed_header_lf_quant` design. + +**Issue**: The kernel UAPI comment for `v4l2_vp9_loop_filter.ref_deltas` explicitly states: + +> "If this syntax element is not present in the bitstream, users should pass its last value." + +(`/usr/include/linux/v4l2-controls.h:2578`) + +The plan zero-inits both structs with `memset` each frame. For frames where `loop_filter_delta_update=0` (deltas not updated in the bitstream, but `loop_filter_delta_enabled=1`), the plan's parser correctly sets the DELTA_ENABLED flag but leaves `ref_deltas` and `mode_deltas` at zero from the memset. The kernel needs the persisted values from the most recent update. + +FFmpeg handles this correctly by reading from `s->s.h.lf_delta.ref[i]` (persistent internal state, initialized to `{1,0,-1,-1}` on keyframe per `vp9.c:666–669` and updated when `lf_delta.updated=1`): + +```c +// v4l2_request_vp9.c:298-302: +for (i = 0; i < 4; i++) + ctrl->lf.ref_deltas[i] = s->s.h.lf_delta.ref[i]; // always, even if not updated +``` + +**Evidence**: +- FFmpeg `vp9.c:666–669` sets defaults `{1,0,-1,-1,0,0}` on keyframe (empirically confirmed these match Phase 3 anchor values). +- FFmpeg `vp9.c:693-700` updates from bitstream only when `lf_delta.updated=1`. +- FFmpeg `v4l2_request_vp9.c:298–302` always copies the persistent state (not conditional on `updated`). +- Phase 3 anchor frame 1 (keyframe): `lf.flags=3` (DELTA_ENABLED|DELTA_UPDATE). Frames 2–5 very likely have `DELTA_ENABLED=1` but `DELTA_UPDATE=0` (stable content — no change). The plan would send `{0,0,0,0,0,0}` instead of `{1,0,-1,-1,0,0}`. +- VP9 spec default values `{1,0,-1,-1,0,0}` confirmed at `vp9.c:664–671`: reset occurs on `keyframe || errorres || intraonly`. + +**Proposed fix**: Add persistent LF delta state to `object_context` (not `object_surface` — context is the right scope for VP9 session state): + +```c +// In context.h, inside object_context: +/* VP9 loop-filter delta state — persisted across frames per VP9 spec */ +struct { + int8_t ref_deltas[4]; /* init to {1,0,-1,-1} per VP9 spec 8.5.1 */ + int8_t mode_deltas[2]; /* init to {0,0} */ +} vp9_lf; +``` + +Initialize in `RequestCreateContext` (or at first VP9 frame) to `{1,0,-1,-1,0,0}`. In Clause 6: after parsing the uncompressed header, if `delta_enabled=1 && delta_update=1`, write the new deltas into `context->vp9_lf`. Then always copy `context->vp9_lf` into `frame.lf.ref_deltas`/`frame.lf.mode_deltas` (replacing the parser's local writes). + +**Empirical validation**: After Phase 6 build, strace BBB frame-2 submission. The `ref_deltas` bytes in the FRAME payload (offset 0–3 in `v4l2_vp9_loop_filter`) should be `{1,0,-1,-1}` matching frame-1's values. If the fix is missing, frame-2 shows `{0,0,0,0}` — detectable by strace. + +--- + +### C3 — `reference_mode` extraction from compressed-header parser is underspecified + +**Site**: Clause 10, plan line ~284: `frame.reference_mode = compressed_hdr_reference_mode; (state from Clause 9)`. + +**Issue**: `comppredmode` (reference_mode) is read from the **compressed** header by FFmpeg's VP9 decoder (`vp9.c:1022–1024`, using the VPX range coder `s->c`). The plan correctly identifies Clause 9 as the source. However, Clause 9 only specifies filling `struct v4l2_ctrl_vp9_compressed_hdr`. The `reference_mode` field is in `struct v4l2_ctrl_vp9_frame`, not `compressed_hdr`. The plan's function signature for `vp9_fill_compressed_hdr` (Clause 9) does not show any mechanism to return this value to Clause 10. + +**Evidence**: Verified that `struct v4l2_ctrl_vp9_compressed_hdr` (kernel UAPI at `v4l2-controls.h:2833–2851`) has no `reference_mode` or `comppredmode` field. The frame struct (`v4l2_ctrl_vp9_frame:2741–2765`) has `__u8 reference_mode`. FFmpeg's `fill_compressed_hdr` (`v4l2_request_vp9.c:104`) uses a local `comppredmode` variable for internal parsing only; `fill_frame` reads `s->s.h.comppredmode` (persistent decoder state) separately. + +The placeholder `compressed_hdr_reference_mode` in the plan is a C variable name that doesn't exist in any defined scope. If the author ports `fill_compressed_hdr` without adding an out-parameter, `compressed_hdr_reference_mode` will be an undefined identifier — compile failure. + +For the **BBB keyframe**: `allowcompinter=0` (all refs have same sign bias on keyframe), so the `if (allowcompinter)` block is not entered and `comppredmode` defaults to `PRED_SINGLEREF (0)` from memset. No bug on keyframe. For BBB inter frames 2–5, `allowcompinter` may be true, and the correct `comppredmode` must be extracted. + +**Proposed fix**: Add an out-parameter to `vp9_fill_compressed_hdr`: + +```c +static void vp9_fill_compressed_hdr( + struct v4l2_ctrl_vp9_compressed_hdr *compressed_hdr, + const uint8_t *buf, uint32_t size, + uint8_t lossless_flag, + bool keyframe_or_intraonly, + bool allowcompinter, + uint8_t *out_reference_mode) /* NEW out-param */ +{ + ... + if (!keyframe_or_intraonly) { + if (allowcompinter) { + comppredmode = vp89_rac_get(&c); + if (comppredmode) + comppredmode += vp89_rac_get(&c); + ... + } else { + comppredmode = 0; /* PRED_SINGLEREF */ + } + *out_reference_mode = comppredmode; + } else { + *out_reference_mode = 0; /* PRED_SINGLEREF for keyframe */ + } + ... +} +``` + +`allowcompinter` (VP9 spec: true when sign biases of the 3 active refs are not all identical) must be determined before calling the compressed-header parser. It can be derived from `picture->pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` values. + +**Empirical validation**: Before Phase 6 build, confirm that the plan's `vp9_fill_compressed_hdr` function signature includes `out_reference_mode` output. Compile test without it — expect undefined variable. With it — expect clean compile. Phase 7 byte-compare frame-2 `reference_mode` byte (offset 160 in FRAME struct) vs strace of ffmpeg-v4l2request must match. + +--- + +### C4 — Commit Z 4-codec regression claim is overstated; hantro codecs remain broken without env override + +**Site**: Commit Z description + Phase 1 criteria table (Criterion 5 row). + +**Issue**: The plan states: "Commit Z restores baseline functionality on 7.0" and the criterion-5 trace says "Commit Z restores baseline (mitigation B)." This is **partially incorrect**. Commit Z's `find_codec_device` walks `/dev/video0..15` and picks the **first** device whose driver name matches `{rkvdec, hantro-vpu, cedrus, sun4i_csi}`. On 7.0: + +``` +video0 = rockchip-rga (skip) +video1 = rkvdec (MATCH → returns rkvdec) +video2 = hantro-enc (never reached) +video3 = hantro-dec (never reached) +``` + +Result: `find_codec_device` always returns rkvdec (`video1`). H.264, HEVC, VP9 (all rkvdec) benefit. MPEG-2 and VP8 (hantro at `video3`) still get the wrong device — they will see 0 supported profiles because rkvdec doesn't advertise MPEG-2/VP8 pixel formats. + +This is actually documented in the plan's "End-user UX gap" paragraph: "If user wants the OTHER decoder (e.g., hantro for MPEG-2/VP8), they still need env override." But the criterion-5 trace says Commit Z "restores baseline" — contradicting the UX gap note. The 4-codec regression is NOT fully fixed by Commit Z alone. + +**Evidence**: Phase 3 baseline table shows all 4 codecs needed env override on 7.0. Commit Z only eliminates the override for rkvdec codecs. Hantro codecs still require `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1`. + +**Proposed fix**: Amend the criterion-5 trace to accurately state: "Commit Z restores rkvdec-codec baseline (H.264, HEVC) and enables VP9 enumeration. MPEG-2 and VP8 (hantro) still require explicit env override." This is a documentation correctness fix, not a code fix. The code behavior is acceptable for iter4 scope (iter4-B1 already tracks full multi-decoder dispatch). + +Additionally, the plan should add `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` as a safety knob so existing test automation that relied on `/dev/video0` can opt out of the new walk: + +```c +if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) + goto old_fallback; /* skip walk, use /dev/video0 as before */ +``` + +**Empirical validation**: Before merging Commit Z, verify `vainfo` without env override enumerates H.264/HEVC profiles (rkvdec) but NOT MPEG-2/VP8. Verify `vainfo` WITH `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3` enumerates MPEG-2/VP8. This confirms the scope is correct. + +--- + +## Suggested findings (NON-BLOCKING) + +### S1 — Compile-time assertions: struct sizes are stable on 7.0 UAPI but volatile across kernel major versions + +**Site**: Clause 3. + +The `_Static_assert` values 168 and 2040 were verified empirically by test-compile against `/usr/include/linux/v4l2-controls.h` on this system: + +``` +$ gcc -c /tmp/vp9_size_check.c && ./vp9_size_check +v4l2_ctrl_vp9_frame: 168 +v4l2_ctrl_vp9_compressed_hdr: 2040 +``` + +These match the Phase 3 strace exactly. The plan's assertion strategy is sound. + +Historical stability note: Phase 2 estimated 144/1947 vs Phase 3 empirical 168/2040 — a delta of +24/+93 bytes between the Phase 2 reading and the live 7.0 kernel. This confirms that struct sizes CAN change between kernel versions. The `_Static_assert` approach is the right response: it turns any future growth into a loud build failure, as designed. No change needed. + +### S2 — `seg.pred_probs` 255-fill: plan's direct copy is already correct + +**Site**: Clause 7. + +FFmpeg `vaapi_vp9.c:96–100` fills `segment_pred_probs` with 255 for the non-temporal case before handing the VAAPI buffer to the backend. The plan's direct `for (i=0;i<3;i++) frame.seg.pred_probs[i] = picture->segment_pred_probs[i]` is correct — it receives already-padded values. FFmpeg `v4l2_request_vp9.c:336–341` does the same 255-fill but independently (for the V4L2 path). Both produce identical output. No fix needed. + +### S3 — `reset_frame_context` FFmpeg remap is correct as applied to VAAPI + +**Site**: Clause 10. + +Verified: `vaapi_vp9.c:64` sets `.reset_frame_context = h->h.resetctx` — raw VP9 spec value (0..3), NO pre-remap. The plan's `(resetctx > 0 ? resetctx - 1 : 0)` correctly maps: 0→0(NONE), 1→0(NONE, spec says both 0 and 1 mean no-reset), 2→1(SPEC), 3→2(ALL). This matches V4L2 enum constants. Correct. + +### S4 — `uv_mode` memcpy omission is safe for rkvdec + +**Site**: Clause 9 (compressed header port). + +FFmpeg `fill_compressed_hdr` ends with `memcpy(ctrl->uv_mode, s->prob.p.uv_mode, sizeof(ctrl->uv_mode))`. This sets `compressed_hdr.uv_mode` to current probability values. The plan's port does not mention this memcpy. + +Verified: `v4l2_vp9_fw_update_probs` (`v4l2-vp9.c:1415–1444`) does NOT call any `update_uv_mode_probs` function — there is no `uv_mode` update path from the compressed_hdr struct. `rkvdec-vp9.c:280–287` reads `uv_mode` from `vp9_ctx->probability_tables` (the frame_context table), not from `prob_updates` (the compressed_hdr ctrl). The memcpy in FFmpeg's `fill_compressed_hdr` fills a field that rkvdec never reads from the compressed_hdr control. Omission is safe. + +### S5 — `lossless_flag` semantics align exactly + +**Site**: Clause 9. + +VAAPI (`va_dec_vp9.h:131–140`): `LosslessFlag = base_qindex==0 && y_dc_delta_q==0 && uv_dc_delta_q==0 && uv_ac_delta_q==0`. FFmpeg VP9 decoder (`vp9shared.h:136`): `uint8_t lossless` set by the same condition. FFmpeg `vaapi_vp9.c:78`: `.lossless_flag = h->h.lossless` — direct assignment, no remap. Semantics are identical. Plan's `if (picture->pic_fields.bits.lossless_flag) ctrl->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` is correct. + +### S6 — Per-segment ALT_Q/ALT_L mapping gap is correctly identified but needs explicit V4L2 `UPDATE_DATA` flag handling + +**Site**: Clause 7. + +The plan leaves `V4L2_VP9_SEGMENTATION_FLAG_UPDATE_DATA` and `V4L2_VP9_SEGMENTATION_FLAG_ABS_OR_DELTA_UPDATE` zero for all frames (since VAAPI doesn't expose `segmentation_update_data` or `absolute_vals`). For BBB (segmentation disabled), these flags are irrelevant. The plan documents this as a fidelity gap. This is correct scope-management for iter4. + +Minor clarification worth adding to the plan comment: the kernel only reads `feature_data[]` and `feature_enabled[]` when `SEGMENTATION_FLAG_ENABLED` is set. Since BBB has `ENABLED=0`, the zero'd feature_data arrays are never examined. Safe for iter4 scope. + +--- + +## Direct answers to the 7 queued questions + +**Q1 — Uncompressed-header parser correctness (Clause 6)** + +The spec parse path for BBB's keyframe to reach `loop_filter_params()` requires correctly parsing: frame_marker (2 bits) → profile (2 bits) → show_existing_frame (1 bit) → frame_type (1 bit) → show_frame (1 bit) → error_resilient_mode (1 bit) → frame_sync_code (3 bytes) → color_config() [variable length, profile-dependent] → frame_size() [2×16 bits] → render_and_frame_size_different (1 bit) → refresh_frame_flags (8 bits) → loop_filter_params(). The `color_config()` and inter-frame reference frame stuff make the path non-trivial but well-defined for profile=0 keyframe. The plan's "~50 LOC" estimate is conservative — a faithful profile-0 keyframe + profile-0 inter-frame parse is closer to 80–100 LOC. The Phase 3 anchor (`lf.ref_deltas={1,0,-1,-1}, lf.flags=3, quant.base_q_idx=46`) is the correct empirical reference. See **C2** for the critical persistence issue. + +**Q2 — `reset_frame_context` and `interpolation_filter` remap** + +`reset_frame_context`: correct, no issue (see S3). `interpolation_filter`: WRONG — double-XOR. See **C1**. Fix: direct assignment `frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type` with no XOR. + +**Q3 — Compile-time size assertions stability** + +168/2040 are correct on 7.0 UAPI. Verified by test-compile. Assertions are appropriate and will fire loudly on kernel UAPI bump. If assertions fire, re-run Phase 3 baseline capture against the new kernel — do not just update the numbers. See **S1**. + +**Q4 — Per-segment ALT_Q mapping for non-BBB segmentation-enabled streams** + +VAAPI's `seg_param[s].luma_ac_quant_scale` is the dequantization scale `y_dequant[qindex][1]` (confirmed by `vaapi_vp9.c:153`: `.luma_ac_quant_scale = h->h.segmentation.feat[i].qmul[0][1]`). The kernel needs the raw Q-index delta (ALT_Q), NOT the dequant scale. The inverse mapping would require the VP9 quantization table lookup (spec table in section 8.3.1) which is non-trivial. The plan correctly identifies this as an iter4-Q6 fidelity gap. For BBB it is dead code. For segmentation-enabled streams, the mapping is fundamentally broken without the Q-table inverse. Recommend flagging in Commit B header comment. The flag `V4L2_VP9_SEGMENTATION_FLAG_UPDATE_DATA` also cannot be set without knowing `segmentation_update_data` from the uncompressed header — another VAAPI gap. All correctly documented. + +**Q5 — Test compile field availability** + +All VAAPI field accesses in Clauses 4, 5, 7, and 10 compile cleanly. Verified by `gcc -c` test against inline-definition of the VAAPI structs matching `va_dec_vp9.h`. Fields tested: `frame_width`, `frame_height`, `profile`, `bit_depth`, `log2_tile_columns`, `log2_tile_rows`, `pic_fields.bits.frame_context_idx`, `filter_level`, `sharpness_level`, `frame_header_length_in_bytes`, `first_partition_size`, all `pic_fields.bits.*` in Clauses 5 and 10, `mb_segment_tree_probs[]`, `segment_pred_probs[]`, `seg_param[].segment_flags.fields.*`. All present. Note: `pic_fields.bits.frame_context_idx` is 2 bits (not 8) — assignment to `__u8 frame_context_idx` in the V4L2 struct is safe (value fits: 0..3). + +**Q6 — Mitigation B regression risk** + +See **C4** for the detailed finding. Short answer: Commit Z's walk-and-pick-first selects rkvdec for ALL codecs, not hantro. MPEG-2 and VP8 remain broken without env override. The criterion-5 regression claim is overstated. The `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` safety knob is recommended as a Suggested addition (see C4). + +**Q7 — `lossless_flag` semantic alignment** + +Semantics are identical between VAAPI and FFmpeg. See **S5**. Plan's usage is correct. + +--- + +## Verdict + +Plan needs two code amendments (C1 + C2) before Phase 6 implementation, plus one function-signature clarification (C3) and one documentation correction (C4). C1 is a one-line fix. C2 requires adding 6 bytes of persistent state to `object_context` and threading it through the Clause 6 parser call. C3 requires adding an `out_reference_mode` parameter to `vp9_fill_compressed_hdr`. C4 is a comment/documentation fix plus an optional safety env var. None require re-baselining or a Phase 3 re-capture. + +**Plan ready for Phase 6 after C1 + C2 + C3 amendments. C4 is strongly recommended but may be accepted as a known scope limitation note.**