iter4 Phase 5: sonnet-architect review — 4 Critical findings, all amendments incorporated

Review by sonnet-architect with cold-context source reads of fork +
kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source.
Reviewer applied Direction 2 (empirical-over-theoretical) by
test-compiling struct sizes, gcc-c-checking VAAPI field accesses,
and source-tracing FFmpeg's filter-mode XOR provenance.

Critical findings (all empirically validated by author before
incorporation per feedback_review_empirical_over_theoretical.md):

C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies
     `filtermode ^ (filtermode <= 1)` when filling VAAPI's
     mcomp_filter_type. Plan's second XOR was incorrect; would swap
     EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong
     loop-filter strength. Fix: direct assignment, no XOR.

C2 - LF deltas not persistent: kernel UAPI explicitly says
     "users should pass its last value" when delta_update=0. Plan
     memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter
     frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf
     state to object_context, init to VP9 spec defaults, update only
     when parser sees delta_update=1, always copy to kernel control.

C3 - reference_mode out-parameter missing: reference_mode lives in
     FRAME struct, not COMPRESSED_HDR. Plan referenced
     `compressed_hdr_reference_mode` placeholder which would be an
     undefined identifier -> compile failure. Fix: add
     `uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr;
     derive `allowcompinter` at call site from the 3 sign biases.

C4 - Mitigation B scope claim overstated: walk-and-pick-first always
     selects rkvdec on 7.0 (since video1 enumerates first). Hantro
     codecs (MPEG-2, VP8) at video3 still require env override.
     Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_
     AUTODETECT=1 escape hatch for legacy callers.

6 Suggested (S1-S6): all confirm plan correctness OR are scope-
aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec)
baked into Clause 9 amended text.

Without this review, iter4 Phase 6 would have failed first compile
(C3) + produced wrong inter-frame output (C1+C2) + caused user
confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 ->
Phase 4 loopback + 1 doc correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-10 05:49:13 +00:00
parent 4b36077b17
commit 9865416ed2
2 changed files with 400 additions and 50 deletions
+134 -50
View File
@@ -9,7 +9,7 @@ Phase 3 baseline provides:
- Frame-1 keyframe verbatim payload prefix (`lf.ref_deltas={1,0,-1,-1}`, `lf.mode_deltas={0,0}`, `quant.base_q_idx=46`)
- 4-codec regression root-cause: `/dev/video0` is now `rockchip-rga` on 7.0; backend hardcodes `/dev/video0` in `request.c:149`
User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. Restores baseline functionality on 7.0.
User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. **Phase 5 C4 amendment** — partial restoration only: walk-and-pick-first returns rkvdec on 7.0 (rkvdec at video1 enumerates first). Restores H.264 / HEVC / VP9 (rkvdec codecs) without env override. MPEG-2 and VP8 (hantro at video3) STILL require explicit env override `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1`. Full multi-decoder dispatch is iter4-B1 backlog.
## Contract clauses
@@ -130,42 +130,61 @@ if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bi
if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;
```
### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse)
### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse with persistent LF-delta state)
VAAPI exposes `filter_level` and `sharpness_level` (Clause 4), but NOT `lf_delta_enabled`/`lf_delta_update`/`lf_ref_delta[4]`/`lf_mode_delta[2]`. Phase 3 keyframe anchor shows `lf.ref_deltas={1,0,-1,-1}` (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch.
VAAPI also doesn't expose `quant.base_q_idx` / `delta_q_y_dc` / `delta_q_uv_dc` / `delta_q_uv_ac`. Phase 3 keyframe anchor shows `base_q_idx=46`; leaving zero produces wrong dequant scale.
Solution: implement a minimal uncompressed-header parser (`vp9_parse_uncompressed_header_lf_quant`) that reads `surface_object->source_data` from offset 0 and extracts the 6 needed fields. The parse runs from offset 0 through the loop-filter and quantization syntax sections (per VP9 spec 6.2 §6.2.46.2.5):
**Phase 5 C2 amendment — LF deltas are persistent across frames**: Per kernel UAPI `<linux/v4l2-controls.h>:2578` verbatim:
> "If this syntax element is not present in the bitstream, users should pass its last value."
VP9 spec (matching FFmpeg `vp9.c:666-671`) initializes `lf_delta.ref = {1,0,-1,-1}` and `lf_delta.mode = {0,0}` on keyframe/error-resilient/intra-only, then updates only when `lf_delta.updated == 1`. FFmpeg `v4l2_request_vp9.c:298-302` always copies the persistent state to the kernel control (NOT conditional on `updated`). Memset-zero per-frame loses the persistent state — wrong on inter frames where typically `delta_enabled=1` and `delta_update=0`.
Solution: add persistent LF-delta state to `object_context` (per-decode-session scope), initialize to VP9 spec defaults, update only when the parsed bitstream's `lf_delta_update` is set, always copy to the kernel control:
```c
static void vp9_parse_uncompressed_header_lf_quant(
const uint8_t *data, uint32_t size, uint32_t header_size,
struct v4l2_vp9_loop_filter *lf,
struct v4l2_vp9_quantization *quant)
{
/* Bit reader walks frame_marker, profile, show_existing_frame,
* frame_type, show_frame, error_resilient_mode, color_config (if
* keyframe), frame_size_with_refs (if not keyframe), tile_info ...
* up to loop_filter_params + quantization_params syntax sections.
*
* Approach: bit-perfect VP9 spec port for ~50 LOC, reusing the
* VPX bitstream reader (see Clause 8). Fields written:
* lf->ref_deltas[0..3], lf->mode_deltas[0..1],
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED if set
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE if set
* quant->base_q_idx,
* quant->delta_q_y_dc, delta_q_uv_dc, delta_q_uv_ac
*/
...
/* In src/context.h, inside object_context (added by Commit B/C): */
struct {
int8_t ref_deltas[4]; /* VP9 spec default: {1,0,-1,-1} */
int8_t mode_deltas[2]; /* VP9 spec default: {0,0} */
bool initialized; /* false until first VP9 frame */
} vp9_lf;
/* In Clause 6 fill (vp9.c): */
if (!context->vp9_lf.initialized ||
keyframe || intra_only || error_resilient) {
context->vp9_lf.ref_deltas[0] = 1;
context->vp9_lf.ref_deltas[1] = 0;
context->vp9_lf.ref_deltas[2] = -1;
context->vp9_lf.ref_deltas[3] = -1;
context->vp9_lf.mode_deltas[0] = 0;
context->vp9_lf.mode_deltas[1] = 0;
context->vp9_lf.initialized = true;
}
/* Parse uncompressed header — only updates context state when
* delta_update=1 in the bitstream: */
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data,
surface_object->source_size,
frame.uncompressed_header_size,
&frame.lf, /* receives flags + level + sharpness */
&frame.quant, /* receives base_q_idx + deltas */
context->vp9_lf.ref_deltas, /* in/out — updated only if parser sees delta_update=1 */
context->vp9_lf.mode_deltas); /* same */
/* ALWAYS copy persistent state to the kernel control: */
memcpy(frame.lf.ref_deltas, context->vp9_lf.ref_deltas, 4);
memcpy(frame.lf.mode_deltas, context->vp9_lf.mode_deltas, 2);
```
**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe.
The parser (`vp9_parse_uncompressed_header_lf_quant`) walks frame_marker → profile → show_existing_frame → frame_type → show_frame → error_resilient_mode → frame_sync_code → color_config (profile-dependent) → frame_size → render_and_frame_size_different → refresh_frame_flags → loop_filter_params → quantization_params. Per Phase 5 Q1 reviewer note: ~80100 LOC for a faithful profile-0 keyframe + profile-0 inter-frame parse (revised up from Phase 4's "~50 LOC" estimate).
**Per memory `feedback_review_empirical_over_theoretical.md` Direction 2**: Phase 5 review must verify the parser by extracting these 9 fields from the actual BBB keyframe bitstream (start of `bbb_720p10s_vp9.webm` first frame) and comparing against Phase 3 anchor. If any field disagrees, Phase 5 returns "Critical: parser bug" and Phase 4 loops.
**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. After C2 fix, frame 2 (inter) should also show `lf.ref_deltas={1,0,-1,-1}` from the persisted state — verifiable by strace.
**Out of iter4 scope**: full uncompressed-header parse (color_config, frame_size for inter, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse, not a general bitstream reader.
**Out of iter4 scope**: full uncompressed-header parse (color_config full variant, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse.
### Clause 7 — Segmentation mapping
@@ -240,9 +259,37 @@ Syntax elements parsed (per VP9 spec 6.3):
Each updated value goes through `inv_map_table[d]`. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability").
**Lossless special case**: if `s->s.h.lossless` would be set, FFmpeg writes `tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally. We don't have direct access to `lossless` from VAAPI, but `picture->pic_fields.bits.lossless_flag` (bit 31 of pic_fields) maps directly. Read it and apply the same special case.
**Phase 5 C3 amendment — `reference_mode` out-parameter**: The kernel's `reference_mode` field lives in `v4l2_ctrl_vp9_frame` (`<linux/v4l2-controls.h>:2763`), NOT in `v4l2_ctrl_vp9_compressed_hdr`. FFmpeg's parser reads it from a local `enum CompPredMode comppredmode` during compressed-header parsing (`v4l2_request_vp9.c:104,150-160`) and assigns to FRAME's `reference_mode` separately. Our backend must thread the parsed value out via an explicit out-parameter — without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` references an undefined variable → compile failure.
**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 5/Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").
Function signature (Phase 5 C3-amended):
```c
static void vp9_fill_compressed_hdr(
struct v4l2_ctrl_vp9_compressed_hdr *compressed_hdr,
const uint8_t *buf, uint32_t size,
uint8_t lossless_flag,
bool keyframe_or_intraonly,
bool allowcompinter,
uint8_t *out_reference_mode); /* C3: NEW out-param */
```
`allowcompinter` is true when the sign biases of the 3 active references are not all identical (per VP9 spec 6.4.4). Derive at the call site from `picture->pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias`:
```c
bool allowcompinter = !(
picture->pic_fields.bits.last_ref_frame_sign_bias ==
picture->pic_fields.bits.golden_ref_frame_sign_bias &&
picture->pic_fields.bits.golden_ref_frame_sign_bias ==
picture->pic_fields.bits.alt_ref_frame_sign_bias);
```
For keyframes (`!keyframe_or_intraonly == false`): write `*out_reference_mode = 0` (PRED_SINGLEREF) unconditionally. For inter frames without compinter: same. With compinter: extract from the compressed-header range coder per FFmpeg `v4l2_request_vp9.c:150-160`.
**Lossless special case**: if `picture->pic_fields.bits.lossless_flag` is set, the parser writes `compressed_hdr->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally per FFmpeg `v4l2_request_vp9.c:117-118`. VAAPI's `lossless_flag` semantics align with FFmpeg's `s->s.h.lossless` per Phase 5 S5 verification (both encode `base_q==0 && y_dc_delta_q==0 && uv_dc_delta_q==0 && uv_ac_delta_q==0`).
**`uv_mode` memcpy omission** (Phase 5 S4): FFmpeg's `fill_compressed_hdr` ends with `memcpy(ctrl->uv_mode, s->prob.p.uv_mode, sizeof(ctrl->uv_mode))`. rkvdec's `rkvdec-vp9.c:280-287` reads `uv_mode` from `vp9_ctx->probability_tables` (the persistent frame_context table inside the kernel), NOT from the compressed_hdr control. Omitting the memcpy is safe for rkvdec; document and skip.
**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").
### Clause 10 — Frame flags + reference_mode + interpolation_filter
@@ -272,16 +319,27 @@ frame.reset_frame_context =
? picture->pic_fields.bits.reset_frame_context - 1
: 0;
/* interpolation_filter: FFmpeg uses (filtermode ^ (filtermode <= 1)).
* VAAPI's mcomp_filter_type is 3 bits (0..7); kernel enum is 0..4.
* The XOR remap aligns FFmpeg's internal filter_mode enum to V4L2's. */
frame.interpolation_filter =
picture->pic_fields.bits.mcomp_filter_type ^
(picture->pic_fields.bits.mcomp_filter_type <= 1);
/* Phase 5 C1: VAAPI's mcomp_filter_type is ALREADY post-XOR.
* vaapi_vp9.c:62 applies `h->h.filtermode ^ (h->h.filtermode <= 1)`
* before storing into VAAPI's mcomp_filter_type. Re-applying the XOR
* here would undo the alignment and swap EIGHTTAP/EIGHTTAP_SMOOTH for
* inter frames — wrong loop-filter selection on BBB frames 2-5.
* Direct assignment is correct. */
frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type;
/* reference_mode: comes from compressed-header parse (NOT VAAPI).
* Read from compressed_hdr's parsed state (see Clause 9). */
frame.reference_mode = compressed_hdr_reference_mode; /* state from Clause 9 */
/* Phase 5 C3: reference_mode is in v4l2_ctrl_vp9_frame (offset 160),
* NOT in v4l2_ctrl_vp9_compressed_hdr. Filled by Clause 9's parser
* via the out_reference_mode parameter. */
uint8_t parsed_reference_mode = 0; /* default for keyframe */
vp9_fill_compressed_hdr(&compressed_hdr,
surface_object->source_data + frame.uncompressed_header_size,
frame.compressed_header_size,
picture->pic_fields.bits.lossless_flag,
!picture->pic_fields.bits.frame_type ||
picture->pic_fields.bits.intra_only,
allowcompinter,
&parsed_reference_mode);
frame.reference_mode = parsed_reference_mode;
```
**Anchor**: Phase 3 verbatim — keyframe `reset_frame_context=0, interpolation_filter=0` (VAAPI's `mcomp_filter_type=0` XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes.
@@ -369,19 +427,25 @@ static int find_codec_device(char video_path[32], char media_path[32])
return -1;
}
/* In RequestInit: */
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
static char auto_video[32], auto_media[32];
if (find_codec_device(auto_video, auto_media) == 0) {
video_path = auto_video;
if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL)
media_path = auto_media;
request_log("auto-selected codec device: %s + %s\n",
video_path, media_path);
} else {
video_path = "/dev/video0"; /* keep old fallback for callers
we can't enumerate */
/* In RequestInit (Phase 5 C4 amendment — escape hatch for legacy callers): */
if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) {
/* Skip walk; old hardcoded behavior. Lets test automation that
* relied on /dev/video0 opt out of the new auto-detect logic. */
video_path = "/dev/video0";
media_path = "/dev/media0";
} else {
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
static char auto_video[32], auto_media[32];
if (find_codec_device(auto_video, auto_media) == 0) {
video_path = auto_video;
if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL)
media_path = auto_media;
request_log("auto-selected codec device: %s + %s\n",
video_path, media_path);
} else {
video_path = "/dev/video0"; /* fall back to legacy default */
}
}
}
```
@@ -484,7 +548,7 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` |
| 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers |
| 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via `mpv -v` log per memory `feedback_hw_decode_engagement_check.md` |
| 5. 4-codec regression | Commit Z restores baseline (mitigation B); Commits A+B+C add new VP9 path purely additively (no shared-state mutation) |
| 5. 4-codec regression | Commit Z restores rkvdec-codec baseline without env override (H.264, HEVC). MPEG-2, VP8 (hantro) still require `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3` env override per Phase 5 C4. Commits A+B+C add new VP9 path purely additively (no shared-state mutation in MPEG-2/HEVC/H.264/VP8 paths). |
## Substrate state at Phase 4 close
@@ -493,3 +557,23 @@ Submitting this plan for second-model review (sonnet-architect). Key questions f
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter4_phase3/` and `noether:~/src/fresnel-fourier/iter4_phase3.tgz`.
- Memory rules carry forward; new `reference_fresnel_kernel_substrate.md` covers post-besser substrate.
- Phase 4 plan ready for sonnet-architect review (Phase 5).
## Phase 5 amendments incorporated (post-review)
Sonnet-architect review (`phase5_iter4_review.md`) returned 4 Critical findings + 6 Suggested. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place above:
| ID | Site | Change | Verified |
|---|---|---|---|
| C1 | Clause 10 | `frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type` (direct, NO XOR; vaapi_vp9.c:62 already applies the XOR before storing into VAAPI's mcomp_filter_type) | ✓ grep `vaapi_vp9.c:62` |
| C2 | Clause 6 | Add persistent `vp9_lf.{ref_deltas[4], mode_deltas[2]}` to `object_context`, init to VP9 spec `{1,0,-1,-1,0,0}`, update only when parser sees `lf_delta.update=1`, ALWAYS copy to kernel control. Without persistence, BBB inter frames send `{0,0,0,0,0,0}` instead of `{1,0,-1,-1,0,0}` → kernel applies wrong loop-filter strength | ✓ kernel UAPI `v4l2-controls.h:2578` "users should pass its last value"; FFmpeg `vp9.c:666-671` defaults |
| C3 | Clauses 9 + 10 | Add `out_reference_mode` parameter to `vp9_fill_compressed_hdr`. Without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` is undefined identifier (compile failure). `reference_mode` lives in v4l2_ctrl_vp9_frame, not _compressed_hdr; FFmpeg uses local `enum CompPredMode comppredmode` and threads it through return path | ✓ `v4l2-controls.h:2763` (FRAME) + `v4l2_request_vp9.c:104` (local var) |
| C4 | Mitigation B + criterion-5 trace + request.c skeleton | (a) Qualify wording: "Restores rkvdec-codec baseline" (NOT all 4 codecs); MPEG-2/VP8 still need env override for hantro path. (b) Add `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` escape hatch for legacy callers | ✓ Plan line 12 vs 393 contradiction; Phase 3 baseline § 4-codec regression table |
S1, S2, S3, S5, S6 either confirm correctness of existing plan or are scope-aligned non-issues; no amendment needed. S4 (`uv_mode` memcpy omission) baked into Clause 9's amended text.
Without the Phase 5 review, iter4 Phase 6 would have:
- **Failed first compile attempt** (C3) — undefined `compressed_hdr_reference_mode` identifier
- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged on inter frames (interpolation_filter swapped, ref_deltas zeroed)
- **Then triggered user confusion when MPEG-2/VP8 didn't engage** (C4) — false expectation that `vainfo` would just work
Estimated saving: 1 Phase 6 compile failure + 1 Phase 7 → Phase 4 loopback for inter-frame byte mismatch + 1 documentation correction. Per memory rule "Reviews are never skippable."