Files
fresnel-fourier/phase4_iter4_plan.md
T
marfrit 9865416ed2 iter4 Phase 5: sonnet-architect review — 4 Critical findings, all amendments incorporated
Review by sonnet-architect with cold-context source reads of fork +
kernel UAPI + VAAPI + FFmpeg references + kernel rkvdec source.
Reviewer applied Direction 2 (empirical-over-theoretical) by
test-compiling struct sizes, gcc-c-checking VAAPI field accesses,
and source-tracing FFmpeg's filter-mode XOR provenance.

Critical findings (all empirically validated by author before
incorporation per feedback_review_empirical_over_theoretical.md):

C1 - interpolation_filter double-XOR: vaapi_vp9.c:62 ALREADY applies
     `filtermode ^ (filtermode <= 1)` when filling VAAPI's
     mcomp_filter_type. Plan's second XOR was incorrect; would swap
     EIGHTTAP and EIGHTTAP_SMOOTH for inter frames -> wrong
     loop-filter strength. Fix: direct assignment, no XOR.

C2 - LF deltas not persistent: kernel UAPI explicitly says
     "users should pass its last value" when delta_update=0. Plan
     memset-zeroed each frame; would send {0,0,0,0,0,0} on BBB inter
     frames instead of {1,0,-1,-1,0,0}. Fix: add persistent vp9_lf
     state to object_context, init to VP9 spec defaults, update only
     when parser sees delta_update=1, always copy to kernel control.

C3 - reference_mode out-parameter missing: reference_mode lives in
     FRAME struct, not COMPRESSED_HDR. Plan referenced
     `compressed_hdr_reference_mode` placeholder which would be an
     undefined identifier -> compile failure. Fix: add
     `uint8_t *out_reference_mode` param to vp9_fill_compressed_hdr;
     derive `allowcompinter` at call site from the 3 sign biases.

C4 - Mitigation B scope claim overstated: walk-and-pick-first always
     selects rkvdec on 7.0 (since video1 enumerates first). Hantro
     codecs (MPEG-2, VP8) at video3 still require env override.
     Fix: qualify criterion-5 trace; add LIBVA_V4L2_REQUEST_NO_
     AUTODETECT=1 escape hatch for legacy callers.

6 Suggested (S1-S6): all confirm plan correctness OR are scope-
aligned non-issues. S4 (uv_mode memcpy omission safe for rkvdec)
baked into Clause 9 amended text.

Without this review, iter4 Phase 6 would have failed first compile
(C3) + produced wrong inter-frame output (C1+C2) + caused user
confusion (C4). Estimated saving: 1 compile failure + 1 Phase 7 ->
Phase 4 loopback + 1 doc correction.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 05:49:13 +00:00

37 KiB
Raw Blame History

Iteration 4 — Phase 4 (plan)

Locks the iter4 patch shape against verbatim Phase 3 baseline (phase3_iter4_baseline.md, commit 56abe3d) and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2 (phase2_iter4_situation.md, commit 2651e4c + ID-correction in 56abe3d). Plan structure mirrors iter2/iter3 clause template, expanded for VP9-specific scope (compressed-header parser + uncompressed-header partial parse + device-path mitigation).

Phase 3 baseline provides:

  • Empirical struct sizes 168 B / 2040 B (NOT Phase 2's 144 / 1947 estimates)
  • Correct control IDs 0xa40a2c / 0xa40a2d
  • Frame-1 keyframe verbatim payload prefix (lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, quant.base_q_idx=46)
  • 4-codec regression root-cause: /dev/video0 is now rockchip-rga on 7.0; backend hardcodes /dev/video0 in request.c:149

User picked Mitigation B: in-fork patch — walk /dev/video*, query VIDIOC_QUERYCAP, pick first device whose driver name is in {rkvdec, hantro-vpu}. Adds ~30 LOC to request.c. Phase 5 C4 amendment — partial restoration only: walk-and-pick-first returns rkvdec on 7.0 (rkvdec at video1 enumerates first). Restores H.264 / HEVC / VP9 (rkvdec codecs) without env override. MPEG-2 and VP8 (hantro at video3) STILL require explicit env override LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1. Full multi-decoder dispatch is iter4-B1 backlog.

Contract clauses

Clause 1 — Submission shape (per-frame)

ONE batched VIDIOC_S_EXT_CTRLS per frame, bound to the surface's permanent request_fd. TWO controls (vs iter3's one): V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR. rkvdec-vp9.c::rkvdec_vp9_run_preamble:752 WARN_ON(!ctrl); return -EINVAL if COMPRESSED_HDR absent — hard requirement on RK3399.

struct v4l2_ext_control ctrls[2] = {
    { .id = V4L2_CID_STATELESS_VP9_FRAME,          /* 0xa40a2c */
      .ptr = &frame,
      .size = sizeof frame                         /* MUST be 168 */ },
    { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, /* 0xa40a2d */
      .ptr = &compressed_hdr,
      .size = sizeof compressed_hdr                /* MUST be 2040 */ },
};

rc = v4l2_set_controls(driver_data->video_fd,
                       surface_object->request_fd,
                       ctrls, 2);

v4l2_set_controls wraps with which=V4L2_CTRL_WHICH_REQUEST_VAL. Phase 3 strace verifies ctrl_class=0xf010000 is what the kernel sees, matching iter1+iter2+iter3.

No init-time probe: ffmpeg-v4l2request first runs a count=1 probe (FRAME-only) to check kernel CID support, then count=2. iter4 backend skips the probe — VP9 on rkvdec REQUIRES COMPRESSED_HDR per kernel source; if the kernel doesn't have it, decode would fail anyway. Issuing count=2 unconditionally is correct.

Anchor: Phase 3 baseline § Anchor 2 verbatim ioctl trace.

Clause 2 — Local struct allocation + zero-init

int vp9_set_controls(struct request_data *driver_data,
                     struct object_context *context,
                     struct object_surface *surface_object)
{
    VADecPictureParameterBufferVP9 *picture =
        &surface_object->params.vp9.picture;
    VASliceParameterBufferVP9 *slice =
        &surface_object->params.vp9.slice;

    struct v4l2_ctrl_vp9_frame frame;
    struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr;

    memset(&frame, 0, sizeof frame);
    memset(&compressed_hdr, 0, sizeof compressed_hdr);
    /* Zero is the kernel's "no probability update" default for every
     * field in compressed_hdr, and the safe default for every numeric
     * field of frame except reference_mode (set explicitly later). */
    ...
}

VAAPI doesn't have iter3-style set flags for VP9; both Picture and Slice are unconditionally populated by the consumer per frame (per Phase 2 analysis B9, no per-frame reset needed in RequestBeginPicture).

Clause 3 — Compile-time struct-size assertions

Per Phase 3 finding: kernel UAPI struct sizes are 168 B (FRAME) and 2040 B (COMPRESSED_HDR) on 7.0; iter4's Build Date: post-2026-05-09 will use whatever size the build host's UAPI headers report. Adding compile-time asserts at the top of vp9.c makes any future struct-size drift fail loudly instead of silently corrupting kernel control writes:

_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168,
               "v4l2_ctrl_vp9_frame size mismatch — UAPI changed");
_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040,
               "v4l2_ctrl_vp9_compressed_hdr size mismatch — UAPI changed");

If these fire, treat as a kernel-substrate bump (re-baseline Phase 3) — DO NOT just bump the asserts.

Clause 4 — Frame geometry + per-frame scalars

frame.frame_width_minus_1   = picture->frame_width - 1;
frame.frame_height_minus_1  = picture->frame_height - 1;
frame.render_width_minus_1  = picture->frame_width - 1;   /* VAAPI gap;
frame.render_height_minus_1 = picture->frame_height - 1;     no scaling for BBB */

frame.profile               = picture->profile;
frame.bit_depth             = picture->bit_depth;
frame.tile_cols_log2        = picture->log2_tile_columns;
frame.tile_rows_log2        = picture->log2_tile_rows;
frame.frame_context_idx     = picture->pic_fields.bits.frame_context_idx;

frame.lf.level              = picture->filter_level;
frame.lf.sharpness          = picture->sharpness_level;

frame.uncompressed_header_size = picture->frame_header_length_in_bytes;
frame.compressed_header_size   = picture->first_partition_size;

VAAPI fields verified via test-compile against va_dec_vp9.h:58-192 (per memory feedback_review_empirical_over_theoretical.md Direction 2 — Phase 5 will re-verify the field-name access list via gcc -c test compile).

Phase 3 keyframe anchor: width=1280, height=720, profile=0, bit_depth=8, tile_log2={0,0}, level=3, sharpness=0 — direct match.

Clause 5 — DPB timestamp resolution (3 active references from 8-slot DPB)

VAAPI's picture->reference_frames[0..7] is the full 8-entry DPB. The 3 active references for the current frame are indexed by last_ref_frame/golden_ref_frame/alt_ref_frame (each 3-bit, points into the 8-slot array).

VASurfaceID last_id   = picture->reference_frames[picture->pic_fields.bits.last_ref_frame];
VASurfaceID golden_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame];
VASurfaceID alt_id    = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame];

struct object_surface *last_ref   = (last_id   != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id)   : NULL;
struct object_surface *golden_ref = (golden_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, golden_id) : NULL;
struct object_surface *alt_ref    = (alt_id    != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id)    : NULL;

if (last_ref)   frame.last_frame_ts   = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref)    frame.alt_frame_ts    = v4l2_timeval_to_ns(&alt_ref->timestamp);

Mirrors iter1/iter3 ref-resolution pattern. For keyframes (all refs invalid), timestamps stay 0 from memset.

Sign bias:

if (picture->pic_fields.bits.last_ref_frame_sign_bias)   frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST;
if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.alt_ref_frame_sign_bias)    frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;

Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse with persistent LF-delta state)

VAAPI exposes filter_level and sharpness_level (Clause 4), but NOT lf_delta_enabled/lf_delta_update/lf_ref_delta[4]/lf_mode_delta[2]. Phase 3 keyframe anchor shows lf.ref_deltas={1,0,-1,-1} (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch.

VAAPI also doesn't expose quant.base_q_idx / delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. Phase 3 keyframe anchor shows base_q_idx=46; leaving zero produces wrong dequant scale.

Phase 5 C2 amendment — LF deltas are persistent across frames: Per kernel UAPI <linux/v4l2-controls.h>:2578 verbatim:

"If this syntax element is not present in the bitstream, users should pass its last value."

VP9 spec (matching FFmpeg vp9.c:666-671) initializes lf_delta.ref = {1,0,-1,-1} and lf_delta.mode = {0,0} on keyframe/error-resilient/intra-only, then updates only when lf_delta.updated == 1. FFmpeg v4l2_request_vp9.c:298-302 always copies the persistent state to the kernel control (NOT conditional on updated). Memset-zero per-frame loses the persistent state — wrong on inter frames where typically delta_enabled=1 and delta_update=0.

Solution: add persistent LF-delta state to object_context (per-decode-session scope), initialize to VP9 spec defaults, update only when the parsed bitstream's lf_delta_update is set, always copy to the kernel control:

/* In src/context.h, inside object_context (added by Commit B/C): */
struct {
    int8_t ref_deltas[4];      /* VP9 spec default: {1,0,-1,-1} */
    int8_t mode_deltas[2];     /* VP9 spec default: {0,0} */
    bool initialized;          /* false until first VP9 frame */
} vp9_lf;

/* In Clause 6 fill (vp9.c): */
if (!context->vp9_lf.initialized ||
    keyframe || intra_only || error_resilient) {
    context->vp9_lf.ref_deltas[0] = 1;
    context->vp9_lf.ref_deltas[1] = 0;
    context->vp9_lf.ref_deltas[2] = -1;
    context->vp9_lf.ref_deltas[3] = -1;
    context->vp9_lf.mode_deltas[0] = 0;
    context->vp9_lf.mode_deltas[1] = 0;
    context->vp9_lf.initialized = true;
}

/* Parse uncompressed header — only updates context state when
 * delta_update=1 in the bitstream: */
vp9_parse_uncompressed_header_lf_quant(
    surface_object->source_data,
    surface_object->source_size,
    frame.uncompressed_header_size,
    &frame.lf,                /* receives flags + level + sharpness */
    &frame.quant,             /* receives base_q_idx + deltas */
    context->vp9_lf.ref_deltas,    /* in/out — updated only if parser sees delta_update=1 */
    context->vp9_lf.mode_deltas);  /* same */

/* ALWAYS copy persistent state to the kernel control: */
memcpy(frame.lf.ref_deltas, context->vp9_lf.ref_deltas, 4);
memcpy(frame.lf.mode_deltas, context->vp9_lf.mode_deltas, 2);

The parser (vp9_parse_uncompressed_header_lf_quant) walks frame_marker → profile → show_existing_frame → frame_type → show_frame → error_resilient_mode → frame_sync_code → color_config (profile-dependent) → frame_size → render_and_frame_size_different → refresh_frame_flags → loop_filter_params → quantization_params. Per Phase 5 Q1 reviewer note: ~80100 LOC for a faithful profile-0 keyframe + profile-0 inter-frame parse (revised up from Phase 4's "~50 LOC" estimate).

Anchor: Phase 3 keyframe lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. After C2 fix, frame 2 (inter) should also show lf.ref_deltas={1,0,-1,-1} from the persisted state — verifiable by strace.

Out of iter4 scope: full uncompressed-header parse (color_config full variant, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse.

Clause 7 — Segmentation mapping

VAAPI conveys segmentation via:

  • picture->pic_fields.bits.{segmentation_enabled, segmentation_temporal_update, segmentation_update_map} flags
  • picture->mb_segment_tree_probs[7] (segment tree probs)
  • picture->segment_pred_probs[3] (temporal-update probs; 255-padded if temporal_update == 0)
  • slice->seg_param[8].{segment_flags.fields, filter_level[4][2], luma_*_quant_scale, chroma_*_quant_scale}

Kernel takes per-segment feature_data + feature_enabled bitmaps. The mapping is non-trivial because VAAPI's slice->seg_param[s] carries EFFECTIVE quant scales (already-computed by VAAPI consumer), while kernel wants the per-segment ALT_Q delta or absolute (depends on ABS_OR_DELTA_UPDATE flag).

for (i = 0; i < 7; i++)
    frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i];
for (i = 0; i < 3; i++)
    frame.seg.pred_probs[i] = picture->segment_pred_probs[i];

if (picture->pic_fields.bits.segmentation_enabled)
    frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED;
if (picture->pic_fields.bits.segmentation_update_map)
    frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.segmentation_temporal_update)
    frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
/* UPDATE_DATA + ABS_OR_DELTA_UPDATE: not in VAAPI; left zero.
 * For BBB (segmentation disabled), this is correct — flags ignored
 * by kernel when ENABLED is clear. */

/* Per-segment feature_data (only meaningful when ENABLED):
 * VAAPI's seg_param[s].luma_ac_quant_scale[s] is the EFFECTIVE per-
 * segment scale. Kernel wants ALT_Q absolute Q-index OR delta.
 * Recover via VP9 spec inverse-Q-table OR leave zero (BBB safe). */
for (i = 0; i < 8; i++) {
    if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) {
        frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME;
        frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] =
            slice->seg_param[i].segment_flags.fields.segment_reference;
    }
    if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped)
        frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP;
    /* SEG_LVL_ALT_Q + ALT_L: VAAPI doesn't directly expose per-segment
     * abs/delta intent. Phase 5 review point: BBB has segmentation
     * disabled so this code path is dead; non-BBB fixtures are out of
     * iter4 scope (see backlog). */
}

Anchor: Phase 3 keyframe seg = all zeros (BBB segmentation disabled). The Clause 7 logic is exercised only for inter frames with segmentation_enabled — out of iter4 BBB scope. Document as fidelity gap.

Clause 8 — VPX range coder + inv_map_table (for compressed header parse)

Direct port from FFmpeg v4l2_request_vp9.c:42-97:

  • inv_map_table[255] — copy verbatim
  • vpx_rac_init(c, buf, size) — initialize range coder over the compressed-header bytes
  • vp89_rac_get(c) — read a single bit
  • vp89_rac_get_uint(c, n) — read n bits MSB-first
  • vpx_rac_get_prob_branchy(c, prob) — read with given probability
  • read_prob_delta(c) — the 4-tier VLC + inv_map_table lookup used to update one prob

~80 LOC, all stateless static functions. Implementation can be either inlined in vp9.c (Phase 2 B6 Option A — chosen) or split to vp9_rac.h. Phase 2 default = Option A; Phase 5 may flip to Option B if reuse pressure surfaces.

Clause 9 — Compressed-header parser (vp9_fill_compressed_hdr)

Direct port of FFmpeg v4l2_request_vp9.c:99-261::fill_compressed_hdr. Reads from surface_object->source_data + uncompressed_header_size for compressed_header_size bytes. ~180 LOC.

Syntax elements parsed (per VP9 spec 6.3):

  • tx_mode (2 bits, +1 conditional bit when SELECT)
  • TX 8x8/16x16/32x32 probability deltas (only if tx_mode == SELECT)
  • Coef probability deltas (4-level nested loop with branch probs)
  • Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability deltas (only on inter frames)
  • MV probability deltas (joint/sign/classes/class0_bit/bits/class0_fr/fr/class0_hp/hp)

Each updated value goes through inv_map_table[d]. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability").

Phase 5 C3 amendment — reference_mode out-parameter: The kernel's reference_mode field lives in v4l2_ctrl_vp9_frame (<linux/v4l2-controls.h>:2763), NOT in v4l2_ctrl_vp9_compressed_hdr. FFmpeg's parser reads it from a local enum CompPredMode comppredmode during compressed-header parsing (v4l2_request_vp9.c:104,150-160) and assigns to FRAME's reference_mode separately. Our backend must thread the parsed value out via an explicit out-parameter — without it, Clause 10's frame.reference_mode = compressed_hdr_reference_mode references an undefined variable → compile failure.

Function signature (Phase 5 C3-amended):

static void vp9_fill_compressed_hdr(
    struct v4l2_ctrl_vp9_compressed_hdr *compressed_hdr,
    const uint8_t *buf, uint32_t size,
    uint8_t lossless_flag,
    bool keyframe_or_intraonly,
    bool allowcompinter,
    uint8_t *out_reference_mode);   /* C3: NEW out-param */

allowcompinter is true when the sign biases of the 3 active references are not all identical (per VP9 spec 6.4.4). Derive at the call site from picture->pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias:

bool allowcompinter = !(
    picture->pic_fields.bits.last_ref_frame_sign_bias ==
    picture->pic_fields.bits.golden_ref_frame_sign_bias &&
    picture->pic_fields.bits.golden_ref_frame_sign_bias ==
    picture->pic_fields.bits.alt_ref_frame_sign_bias);

For keyframes (!keyframe_or_intraonly == false): write *out_reference_mode = 0 (PRED_SINGLEREF) unconditionally. For inter frames without compinter: same. With compinter: extract from the compressed-header range coder per FFmpeg v4l2_request_vp9.c:150-160.

Lossless special case: if picture->pic_fields.bits.lossless_flag is set, the parser writes compressed_hdr->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4 unconditionally per FFmpeg v4l2_request_vp9.c:117-118. VAAPI's lossless_flag semantics align with FFmpeg's s->s.h.lossless per Phase 5 S5 verification (both encode base_q==0 && y_dc_delta_q==0 && uv_dc_delta_q==0 && uv_ac_delta_q==0).

uv_mode memcpy omission (Phase 5 S4): FFmpeg's fill_compressed_hdr ends with memcpy(ctrl->uv_mode, s->prob.p.uv_mode, sizeof(ctrl->uv_mode)). rkvdec's rkvdec-vp9.c:280-287 reads uv_mode from vp9_ctx->probability_tables (the persistent frame_context table inside the kernel), NOT from the compressed_hdr control. Omitting the memcpy is safe for rkvdec; document and skip.

Anchor: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").

Clause 10 — Frame flags + reference_mode + interpolation_filter

if (!picture->pic_fields.bits.frame_type)        /* VAAPI inverts: 0 means keyframe */
    frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME;
if (picture->pic_fields.bits.show_frame)         frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME;
if (picture->pic_fields.bits.error_resilient_mode) frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT;
if (picture->pic_fields.bits.intra_only)         frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY;
if (picture->pic_fields.bits.allow_high_precision_mv)
                                                 frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV;
if (picture->pic_fields.bits.refresh_frame_context)
                                                 frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX;
if (picture->pic_fields.bits.frame_parallel_decoding_mode)
                                                 frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE;
if (picture->pic_fields.bits.subsampling_x)      frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING;
if (picture->pic_fields.bits.subsampling_y)      frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING;
/* COLOR_RANGE_FULL_SWING: VAAPI doesn't expose; leave clear (BT.709 limited for BBB). */

/* reset_frame_context: FFmpeg uses (resetctx > 0 ? resetctx - 1 : 0).
 * VAAPI's pic_fields.bits.reset_frame_context is 2 bits (0..3).
 * V4L2 enum is 0..2. The off-by-one is because VP9 spec encodes
 * "no reset" + 3 reset variants into 2 bits, but kernel enum drops
 * the encoder helper offset. Follow FFmpeg's mapping verbatim: */
frame.reset_frame_context =
    picture->pic_fields.bits.reset_frame_context > 0
    ? picture->pic_fields.bits.reset_frame_context - 1
    : 0;

/* Phase 5 C1: VAAPI's mcomp_filter_type is ALREADY post-XOR.
 * vaapi_vp9.c:62 applies `h->h.filtermode ^ (h->h.filtermode <= 1)`
 * before storing into VAAPI's mcomp_filter_type. Re-applying the XOR
 * here would undo the alignment and swap EIGHTTAP/EIGHTTAP_SMOOTH for
 * inter frames — wrong loop-filter selection on BBB frames 2-5.
 * Direct assignment is correct. */
frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type;

/* Phase 5 C3: reference_mode is in v4l2_ctrl_vp9_frame (offset 160),
 * NOT in v4l2_ctrl_vp9_compressed_hdr. Filled by Clause 9's parser
 * via the out_reference_mode parameter. */
uint8_t parsed_reference_mode = 0;  /* default for keyframe */
vp9_fill_compressed_hdr(&compressed_hdr,
                        surface_object->source_data + frame.uncompressed_header_size,
                        frame.compressed_header_size,
                        picture->pic_fields.bits.lossless_flag,
                        !picture->pic_fields.bits.frame_type ||
                            picture->pic_fields.bits.intra_only,
                        allowcompinter,
                        &parsed_reference_mode);
frame.reference_mode = parsed_reference_mode;

Anchor: Phase 3 verbatim — keyframe reset_frame_context=0, interpolation_filter=0 (VAAPI's mcomp_filter_type=0 XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes.

Phase 5 review point: the FFmpeg-inferred mappings for reset_frame_context and interpolation_filter are tied to FFmpeg's internal enum order. VAAPI's enum order may differ. Phase 5 should empirically validate by decoding Phase 3's keyframe payload byte 144 (offset of reset_frame_context) and byte 149 (offset of interpolation_filter) and cross-checking with VAAPI's pic_fields.bits for the same frame. If they disagree, the FFmpeg-inferred remap is wrong.

Clause 11 — Final 2-control batched submission

struct v4l2_ext_control ctrls[2] = {
    { .id = V4L2_CID_STATELESS_VP9_FRAME,
      .ptr = &frame, .size = sizeof frame },
    { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR,
      .ptr = &compressed_hdr, .size = sizeof compressed_hdr },
};

rc = v4l2_set_controls(driver_data->video_fd,
                       surface_object->request_fd,
                       ctrls, 2);
if (rc < 0)
    return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;

Mirrors iter3's Clause 10 with count=2 instead of count=1.

Clause 12 — Bitstream offsetting

Backend hands the kernel the FULL frame bitstream via surface_object->source_data + surface_object->source_size. The kernel uses picture->frame_header_length_in_bytes as the start-of-compressed-header offset. The compressed header parser (Clause 9) reads [uncompressed_header_size, uncompressed_header_size + compressed_header_size) from the bitstream buffer.

const uint8_t *compressed_hdr_start =
    surface_object->source_data + frame.uncompressed_header_size;
uint32_t compressed_hdr_len = frame.compressed_header_size;

vp9_fill_compressed_hdr(&compressed_hdr,
                        compressed_hdr_start,
                        compressed_hdr_len);

/* Same buffer pointer used by Clause 6 for uncompressed-header parse,
 * but with offset 0 + length = uncompressed_header_size. */
vp9_parse_uncompressed_header_lf_quant(
    surface_object->source_data, surface_object->source_size,
    frame.uncompressed_header_size,
    &frame.lf, &frame.quant);

Patch shape (commits)

iter4 implements as 5 commits (mitigation B + iter3-style ABCD):

Commit Z — src/request.c: device-path enumeration (mitigation B)

Replace hardcoded /dev/video0 + /dev/media0 defaults with walk-and-pick-first-known-decoder:

static int find_codec_device(char video_path[32], char media_path[32])
{
    static const char * const known_drivers[] = {
        "rkvdec", "hantro-vpu", "cedrus", "sun4i_csi", NULL
    };
    char path[32];
    struct v4l2_capability caps;
    int fd, i;
    const char * const *kd;

    /* Walk /dev/video0..15 */
    for (i = 0; i < 16; i++) {
        snprintf(path, sizeof path, "/dev/video%d", i);
        fd = open(path, O_RDWR | O_NONBLOCK);
        if (fd < 0) continue;
        if (ioctl(fd, VIDIOC_QUERYCAP, &caps) == 0) {
            for (kd = known_drivers; *kd; kd++) {
                if (strcmp((char *)caps.driver, *kd) == 0) {
                    strncpy(video_path, path, 32);
                    /* Match media device by driver name */
                    find_media_for_driver((char *)caps.driver, media_path);
                    close(fd);
                    return 0;
                }
            }
        }
        close(fd);
    }
    return -1;
}

/* In RequestInit (Phase 5 C4 amendment — escape hatch for legacy callers): */
if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) {
    /* Skip walk; old hardcoded behavior. Lets test automation that
     * relied on /dev/video0 opt out of the new auto-detect logic. */
    video_path = "/dev/video0";
    media_path = "/dev/media0";
} else {
    video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
    if (video_path == NULL) {
        static char auto_video[32], auto_media[32];
        if (find_codec_device(auto_video, auto_media) == 0) {
            video_path = auto_video;
            if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL)
                media_path = auto_media;
            request_log("auto-selected codec device: %s + %s\n",
                        video_path, media_path);
        } else {
            video_path = "/dev/video0";  /* fall back to legacy default */
        }
    }
}

find_media_for_driver walks /dev/media0..15, opens each, calls MEDIA_IOC_DEVICE_INFO, returns the path whose driver field matches. Phase 3 baseline confirmed media0 ↔ rkvdec and media1 ↔ hantro-vpu on 7.0.

Predicted +35 LOC, 1 file modified. Build target after Commit Z: vainfo (no env override) lists the auto-selected decoder's profiles. Independent of VP9 work — can be tested + merged before Commit A.

End-user UX gap (documented, NOT fixed in iter4): backend opens ONE codec device at init. If user wants the OTHER decoder (e.g., default selects rkvdec but user wants hantro for MPEG-2/VP8), they still need env override. Aggregating BOTH decoders simultaneously requires a deeper refactor (multi-fd dispatch); out of iter4 scope, cross-cutting backlog item iter4-B1.

Commit A — src/config.c: VP9 enumeration + dispatch + entrypoints

3 sites mirroring iter3 commit A:

  1. RequestQueryConfigProfiles (after VP8 enumeration block from iter3): add VP9 enumeration block probing V4L2_PIX_FMT_VP9_FRAME against single + MPLANE OUTPUT formats. Adds VAProfileVP9Profile0. ~10 LOC.
  2. RequestCreateConfig (after VP8 case from iter3): add case VAProfileVP9Profile0: break; with comment block. ~5 LOC.
  3. RequestQueryConfigEntrypoints (line ~180): add case VAProfileVP9Profile0: to existing fall-through. ~1 LOC.

Predicted +16 LOC, 1 file modified. Build target after Commit A: vainfo (with env override or post-commit-Z auto-detect) lists VAProfileVP9Profile0 on rkvdec env.

Commit B — NEW src/vp9.c + src/vp9.h + src/meson.build integration

Net-new vp9.c implements vp9_set_controls() per Clauses 1-12 above.

Predicted ~580 LOC for vp9.c (50 LOC infrastructure + 80 LOC VPX rac + 50 LOC uncompressed-header partial parse + 180 LOC compressed-header parser + 50 LOC frame-fill (Clauses 4-5,7,10) + 30 LOC of submission/wrap). ~40 LOC for vp9.h. +2 lines meson.build.

3 files (2 new + 1 modified). Build target after Commit B: vp9.o compiles standalone, picture.c can't dispatch yet.

Commit C — src/picture.c + src/surface.h: dispatcher + buffer routing + union extension

5 sites:

  1. picture.c:34-37 include block: add #include "vp9.h".
  2. picture.c::codec_set_controls: add VP9 dispatch case calling vp9_set_controls. ~6 LOC.
  3. picture.c::codec_store_buffer: add VP9 inner cases for VAPictureParameterBufferType and VASliceParameterBufferType. ~14 LOC. (NO VAProbabilityBufferType for VP9; NO VAIQMatrixBufferType. Confirmed in Phase 2 B8.)
  4. picture.c::RequestBeginPicture: NO change predicted (VP9 doesn't have iter3-style iqmatrix_set flag — Picture/Slice always populated per frame by VAAPI consumer). Phase 2 B9 confirms.
  5. surface.h::object_surface::params union: add vp9 struct after vp8:
struct {
    VADecPictureParameterBufferVP9 picture;
    VASliceParameterBufferVP9 slice;
} vp9;

Predicted +26 LOC, 2 files modified. Build target after Commit C: backend builds clean; mpv-vaapi VP9 decode should engage end-to-end on rkvdec.

Commit D — fix-forward placeholder

Phase 2 B12 predicted no buffer.c changes (VP9's 3 buffer types — Picture, Slice, Data — already in iter3's allow-list). Per memory feedback_runtime_enumerates_allowlists.md, plan for fix-forward if Commit C runtime hits an allow-list miss; otherwise this commit slot stays empty.

Files touched summary

File New Modified LOC delta Commit
src/request.c +35 Z
src/config.c +16 A
src/vp9.c +580 B
src/vp9.h +40 B
src/meson.build +2 B
src/picture.c +20 C
src/surface.h +6 C

Total: ~699 LOC, 7 files (2 new + 5 modified). 4 commits (Z, A, B, C) + optional D. Notably bigger than iter3 (308 LOC) because of: device-path mitigation (35) + uncompressed-header partial parse (50) + compressed-header parser (180) + VPX rac (80).

Cross-cutting backlog (out of iter4 scope)

Items inherited + NEW from iter4:

  • iter4-B1 (NEW) Backend opens ONE codec device at init (rkvdec OR hantro). Aggregating both for unified profile enumeration requires multi-fd dispatch refactor. Defer.
  • iter4-B2 (NEW) ffmpeg-vaapi / mpv-vaapi Could not create device failure mode persists even with env override. Likely a vaapi-DRM render-node path issue separate from device-path. Investigate in Phase 6 if HW=SW byte-compare fails.
  • iter4-Q6 (NEW) VAAPI per-segment seg_param[s] fields are EFFECTIVE quant scales; kernel wants ALT_Q absolute or delta. Mapping back is non-trivial; left zeros for BBB (segmentation disabled). Document as fidelity gap for non-BBB fixtures.
  • iter4-COLOR_RANGE (NEW) VAAPI doesn't expose color_range; backend leaves V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING clear (BT.709 limited). Wrong for full-range JPEG-encoded VP9.
  • B5/B6 mpeg2 vbv polish + h265 SPS bitstream parse (carried from iter1+iter2).
  • L3 vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only.

Phase 5 review prep

Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer (per memory feedback_review_empirical_over_theoretical.md Direction 2 — empirical-over-theoretical in BOTH directions):

  1. Uncompressed-header parser correctness (Clause 6): empirically decode the first ~200 bytes of bbb_720p10s_vp9.webm keyframe and confirm lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3, quant.base_q_idx=46 are the correct parse results — not just the kernel-direct's pre-formatted output. If the spec says the bits encode something different, the parser is wrong even if kernel-direct happens to match.

  2. reset_frame_context and interpolation_filter remap (Clause 10): empirically extract these bytes from Phase 3 strace payload and cross-check FFmpeg's XOR/-1 remap against the bytes' literal interpretation as VP9 spec enums.

  3. Compile-time size assertions (Clause 3): are 168/2040 stable across kernel UAPI versions, or will a 7.1+ kernel grow them again? If unstable, replace with a runtime size assertion via VIDIOC_QUERY_EXT_CTRL + flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY. Phase 5 reviewer call.

  4. Per-segment mapping (Clause 7): BBB doesn't exercise segmentation. For non-BBB segmentation-enabled fixtures (out of iter4 scope), is the planned seg_param[s].luma_ac_quant_scalefeature_data[s][ALT_Q] mapping fundamentally wrong (effective scale vs delta), or just lossy? Document the gap clearly.

  5. Test compile field availability: per Direction 2, every VAAPI field-name reference in this plan should be gcc -c test-compiled before Phase 6. Reviewer should verify the access list in Clauses 4, 5, 7, 10.

  6. Mitigation B regression risk: request.c is shared with all 5 already-shipping codecs. Could the walk-and-pick-first logic regress any existing test fixture if env vars happen to be unset by accident? Phase 5 should suggest a safety knob (e.g., LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 to force old /dev/video0 behavior).

  7. Lossless flag mapping (Clause 9): VAAPI's pic_fields.bits.lossless_flag — is it set the same way as FFmpeg's s->s.h.lossless? VAAPI comment says "LosslessFlag = base_qindex == 0 && y_dc_delta_q == 0 && uv_dc_delta_q == 0 && uv_ac_delta_q == 0" — check that semantics align.

Phase 1 criteria → Phase 4 plan trace

Criterion Plan addresses
1. vainfo enumerates VP9Profile0 Commit Z (device-path) + Commit A (RequestQueryConfigProfiles enumeration block)
2. vaCreateConfig SUCCESS Commit A — RequestCreateConfig case + RequestQueryConfigEntrypoints
3. ffmpeg-vaapi VP9 exit 0 Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers
4. mpv VP9 HW=SW byte-identical Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via mpv -v log per memory feedback_hw_decode_engagement_check.md
5. 4-codec regression Commit Z restores rkvdec-codec baseline without env override (H.264, HEVC). MPEG-2, VP8 (hantro) still require LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 env override per Phase 5 C4. Commits A+B+C add new VP9 path purely additively (no shared-state mutation in MPEG-2/HEVC/H.264/VP8 paths).

Substrate state at Phase 4 close

  • Phase 0+1+2+3 commits at gitea (9a71dbf, 2651e4c+56abe3d ID-correction, 56abe3d).
  • Fork at iter3 tip e1aca9c on noether; Phase 6 patches will land here.
  • All Phase 3 anchors captured + preserved on fresnel /tmp/iter4_phase3/ and noether:~/src/fresnel-fourier/iter4_phase3.tgz.
  • Memory rules carry forward; new reference_fresnel_kernel_substrate.md covers post-besser substrate.
  • Phase 4 plan ready for sonnet-architect review (Phase 5).

Phase 5 amendments incorporated (post-review)

Sonnet-architect review (phase5_iter4_review.md) returned 4 Critical findings + 6 Suggested. All 4 Critical findings empirically verified by author before incorporation per memory feedback_review_empirical_over_theoretical.md Direction 2. Amendments applied in-place above:

ID Site Change Verified
C1 Clause 10 frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type (direct, NO XOR; vaapi_vp9.c:62 already applies the XOR before storing into VAAPI's mcomp_filter_type) ✓ grep vaapi_vp9.c:62
C2 Clause 6 Add persistent vp9_lf.{ref_deltas[4], mode_deltas[2]} to object_context, init to VP9 spec {1,0,-1,-1,0,0}, update only when parser sees lf_delta.update=1, ALWAYS copy to kernel control. Without persistence, BBB inter frames send {0,0,0,0,0,0} instead of {1,0,-1,-1,0,0} → kernel applies wrong loop-filter strength ✓ kernel UAPI v4l2-controls.h:2578 "users should pass its last value"; FFmpeg vp9.c:666-671 defaults
C3 Clauses 9 + 10 Add out_reference_mode parameter to vp9_fill_compressed_hdr. Without it, Clause 10's frame.reference_mode = compressed_hdr_reference_mode is undefined identifier (compile failure). reference_mode lives in v4l2_ctrl_vp9_frame, not _compressed_hdr; FFmpeg uses local enum CompPredMode comppredmode and threads it through return path v4l2-controls.h:2763 (FRAME) + v4l2_request_vp9.c:104 (local var)
C4 Mitigation B + criterion-5 trace + request.c skeleton (a) Qualify wording: "Restores rkvdec-codec baseline" (NOT all 4 codecs); MPEG-2/VP8 still need env override for hantro path. (b) Add LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 escape hatch for legacy callers ✓ Plan line 12 vs 393 contradiction; Phase 3 baseline § 4-codec regression table

S1, S2, S3, S5, S6 either confirm correctness of existing plan or are scope-aligned non-issues; no amendment needed. S4 (uv_mode memcpy omission) baked into Clause 9's amended text.

Without the Phase 5 review, iter4 Phase 6 would have:

  • Failed first compile attempt (C3) — undefined compressed_hdr_reference_mode identifier
  • Then produced visible-but-wrong output (C1 + C2) — Phase 7 byte-compare would have diverged on inter frames (interpolation_filter swapped, ref_deltas zeroed)
  • Then triggered user confusion when MPEG-2/VP8 didn't engage (C4) — false expectation that vainfo would just work

Estimated saving: 1 Phase 6 compile failure + 1 Phase 7 → Phase 4 loopback for inter-frame byte mismatch + 1 documentation correction. Per memory rule "Reviews are never skippable."