Files
fresnel-fourier/phase2_iter4_situation.md
T
marfrit 56abe3d6a2 iter4 Phase 3: VP9 baseline + 4-codec regression on 7.0 substrate
Captured on linux-fresnel-fourier 7.0-1 (post 6.19 decommission).

VP9 baseline (kernel-direct via ffmpeg-v4l2request on rkvdec):
- 5-frame SW reference PNG SHA256 anchors (criterion-4)
- VIDIOC_S_EXT_CTRLS strace with full payload at -s 16384
- Empirical struct sizes 168 B (FRAME) / 2040 B (COMPRESSED_HDR)
  supersede Phase 2 estimates of 144 / 1947
- Probe pattern: count=1 (FRAME-only) then count=2 (FRAME + COMPRESSED_HDR)

Phase 2 doc fix: control IDs corrected 0xa40b2c/d -> 0xa40a2c/d.

4-codec regression (H.264, MPEG-2, HEVC, VP8): all fall back to SW on
default config because /dev/video0 is now rockchip-rga (RGB color
converter), not a codec device. Fork hardcodes /dev/video0 in
request.c:149. Env override LIBVA_V4L2_REQUEST_VIDEO_PATH /
_MEDIA_PATH restores per-driver profile enumeration; mitigation A/B/C
queued for user decision.

New contract clauses surfaced:
- Clause 11: uncompressed-header partial parse for lf_delta /
  base_q_idx (VAAPI doesn't expose these; keyframe ref_deltas non-zero
  for BBB so leave-at-zero is wrong)
- Clause 12: compile-time sizeof asserts on the two control structs
  so future UAPI shifts fail loudly

iter4_phase3.tgz: full Phase 3 artifact bundle (strace + PNG refs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 20:31:53 +00:00

24 KiB
Raw Blame History

Iteration 4 — Phase 2 (situation analysis)

Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip e1aca9c (iter3 close).

This is a contract-before-code analysis per feedback_dev_process.md Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.

Critical finding: rkvdec requires VP9_COMPRESSED_HDR

The biggest scope-shaping discovery: rkvdec on RK3399 requires V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, not optional. From drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble lines 740-754:

ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
if (WARN_ON(!ctrl))
    return -EINVAL;
dec_params = ctrl->p_cur.p;
...
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
if (WARN_ON(!ctrl))
    return -EINVAL;       /* ← rkvdec WILL fail without compressed-header probs */
prob_updates = ctrl->p_cur.p;
vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
...
v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);

VAAPI does NOT expose compressed-header probability updates (per va_dec_vp9.h:50-192 — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So the libva backend must parse the compressed header itself via a VPX boolean decoder.

This shapes iter4's scope significantly larger than iter3 VP8.

Bug enumeration (sites the iter4 patch series must touch)

B1 — src/config.c::RequestQueryConfigProfiles — VP9 enumeration block missing

Site: config.c:121-160.

Bug: no analogous block for V4L2_PIX_FMT_VP9_FRAMEVAProfileVP9Profile0. Same starting condition as iter3 VP8.

Patch shape: ADD enumeration block after iter3's VP8 block. ~10 LOC.

B2 — src/config.c::RequestCreateConfig — VP9 case label missing

Site: config.c:54-78.

Bug: no case VAProfileVP9Profile0:. Mirror iter3 VP8 pattern. ~5 LOC.

B3 — src/config.c::RequestQueryConfigEntrypoints — VP9 case missing

Site: config.c:167-191.

Bug: missing in fall-through case list. ~1 LOC.

B4 — src/vp9.c — file does not exist; needs net-new implementation

Site: NEW FILE src/vp9.c.

Patch shape: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):

  • Includes block
  • Static inv_map_table[255] — direct copy from FFmpeg v4l2_request_vp9.c:43-64
  • VPX range coder helpers (port from FFmpeg vp89_rac.h + boolean decoder primitives) — ~80 LOC
  • vp9_fill_frame() — fill v4l2_ctrl_vp9_frame from VAAPI VADecPictureParameterBufferVP9 + VASliceParameterBufferVP9 — ~150 LOC
  • vp9_fill_compressed_hdr() — parse compressed header bits from surface_object->source_data + uncompressed_header_size, populate v4l2_ctrl_vp9_compressed_hdr — ~180 LOC (port from FFmpeg fill_compressed_hdr lines 99-261)
  • vp9_set_controls() — entry point, allocates both structs, calls vp9_fill_frame + vp9_fill_compressed_hdr, batched 2-element v4l2_ext_control array, single v4l2_set_controls call

B5 — src/vp9.h — header does not exist

Site: NEW FILE src/vp9.h.

Patch shape: declare vp9_set_controls(). Mirror iter3 vp8.h.

B6 — Possibly src/vp9_rac.h — VPX range decoder helpers (decision point)

Site: NEW FILE candidate src/vp9_rac.h.

VP9 boolean decoder primitives (vpx_rac_get_prob_branchy, vp89_rac_get, vp89_rac_get_uint, init function) are needed by vp9_fill_compressed_hdr. Two design options:

  • Option A: inline the ~80 LOC of decoder helpers directly in vp9.c. Simpler; one file. Recommended for first cut.
  • Option B: separate vp9_rac.h/vp9_rac.c. Mirrors FFmpeg's vp89_rac.h upstream pattern. More files, easier reuse if AV1/VP10 work follows.

Phase 4 plan locks Option A unless Phase 5 review surfaces a reason for Option B.

B7 — src/picture.c::codec_set_controls — VP9 dispatch case missing

Site: picture.c:188-225.

Patch shape: ADD case VAProfileVP9Profile0: calling vp9_set_controls. ~6 LOC.

B8 — src/picture.c::codec_store_buffer — 2 VAAPI buffer types unmapped

VAAPI VP9 sends only TWO buffer types per frame (per va_dec_vp9.h:58-303):

VAAPI buffer type VAAPI struct Per-frame
VAPictureParameterBufferType VADecPictureParameterBufferVP9 once
VASliceParameterBufferType VASliceParameterBufferVP9 (with seg_param[8]) once
VASliceDataBufferType raw bitstream once

Different from iter3 VP8: no VAProbabilityBufferType (VP9 keeps probability state in the picture/slice params + parsed compressed header), no VAIQMatrixBufferType (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.

Patch shape: 2 nested case adds in codec_store_buffer outer switch + inner profile dispatch. ~14 LOC total.

B9 — src/picture.c::RequestBeginPicture — per-frame VP9 reset

Site: picture.c:299-302.

Bug: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).

Patch shape: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for params.vp9.<some-flag>. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.

B10 — src/surface.h::object_surface::params union — no vp9 member

Site: surface.h:92-119.

Patch shape: ADD vp9 struct after vp8:

struct {
    VADecPictureParameterBufferVP9 picture;
    VASliceParameterBufferVP9 slice;
} vp9;

VASliceParameterBufferVP9 is large (~340 bytes — seg_param[8] × ~40 bytes each); VADecPictureParameterBufferVP9 ~80 bytes. Union grows by ~420 bytes from this; still dominated by params.h265 with its 64-slot slices[64] array (~17 KB).

B11 — src/meson.buildvp9.c + vp9.h not in lists

Site: meson.build:30-74.

Patch shape: insert 'vp9.c' after 'vp8.c' in sources, insert 'vp9.h' after 'vp8.h' in headers. +2 lines.

B12 — src/buffer.c — buffer-type allow-list (predicted no change needed)

Site: buffer.c:59-70.

VP9 uses VAPictureParameterBufferType + VASliceParameterBufferType + VASliceDataBufferType — all three already in the allow-list (used by H.264 + iter3 VP8). Predicted no change needed.

Per memory feedback_runtime_enumerates_allowlists.md: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").

Non-bugs (intentionally NOT touched)

  • src/context.c — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref v4l2_request_vp9.c:487-503: v4l2_request_vp9_init doesn't issue any device-wide menu sets; per-frame batch only). No context.c changes.
  • src/video.c::formats[] — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via v4l2_find_format() in config.c. No video.c changes.
  • src/v4l2.c — fourcc-agnostic helpers. No v4l2.c changes.
  • include/hevc-ctrls.h — already includes <linux/v4l2-controls.h> which holds VP9 control IDs.

Contract surface (verbatim)

Kernel UAPI: V4L2_CID_STATELESS_VP9_FRAME (<linux/v4l2-controls.h>:2696)

#define V4L2_CID_STATELESS_VP9_FRAME        (V4L2_CID_CODEC_STATELESS_BASE + 300)
                                            /* = 0xa40a2c */

struct v4l2_ctrl_vp9_frame {
    struct v4l2_vp9_loop_filter lf;        /* 16 bytes; ref_deltas[4] + mode_deltas[2]
                                              + level + sharpness + flags + reserved[7] */
    struct v4l2_vp9_quantization quant;    /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
    struct v4l2_vp9_segmentation seg;      /* 80 bytes; feature_data[8][4] + feature_enabled[8]
                                              + tree_probs[7] + pred_probs[3] + flags + reserved[5] */
    __u32 flags;                            /* 6 V4L2_VP9_FRAME_FLAG_* bits per
                                              <linux/v4l2-controls.h>:2665-2674 */
    __u16 compressed_header_size;
    __u16 uncompressed_header_size;
    __u16 frame_width_minus_1;
    __u16 frame_height_minus_1;
    __u16 render_width_minus_1;
    __u16 render_height_minus_1;
    __u64 last_frame_ts;                    /* per-VASurfaceID timestamp lookup */
    __u64 golden_frame_ts;
    __u64 alt_frame_ts;
    __u8 ref_frame_sign_bias;               /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
    __u8 reset_frame_context;               /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
    __u8 frame_context_idx;
    __u8 profile;
    __u8 bit_depth;
    __u8 interpolation_filter;
    __u8 tile_cols_log2;
    __u8 tile_rows_log2;
    __u8 reference_mode;
    __u8 reserved[7];
};

Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).

Kernel UAPI: V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (<linux/v4l2-controls.h>:2797)

#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR  (V4L2_CID_CODEC_STATELESS_BASE + 301)
                                              /* = 0xa40a2d */

struct v4l2_ctrl_vp9_compressed_hdr {
    __u8 tx_mode;                          /* V4L2_VP9_TX_MODE_* (0..4) */
    __u8 tx8[2][1];
    __u8 tx16[2][2];
    __u8 tx32[2][3];
    __u8 coef[4][2][2][6][6][3];           /* HUGE: 1728 bytes */
    __u8 skip[3];
    __u8 inter_mode[7][3];
    __u8 interp_filter[4][2];
    __u8 is_inter[4];
    __u8 comp_mode[5];
    __u8 single_ref[5][2];
    __u8 comp_ref[5];
    __u8 y_mode[4][9];
    __u8 uv_mode[10][9];
    __u8 partition[16][3];
    struct v4l2_vp9_mv_probs mv;           /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
};

Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + inv_map_table[] (per FFmpeg v4l2_request_vp9.c:99-261).

The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params) to apply updates per rkvdec-vp9.c:796.

VAAPI buffer types

VADecPictureParameterBufferVP9 (va_dec_vp9.h:58-192):

  • frame_width, frame_height (u16)
  • reference_frames[8] — 8-entry DPB (vs VP8's 3)
  • pic_fields.bits.{...} — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
  • filter_level, sharpness_level (u8)
  • log2_tile_rows, log2_tile_columns (u8)
  • frame_header_length_in_bytes — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
  • first_partition_size — compressed_header_size (u16)
  • mb_segment_tree_probs[7], segment_pred_probs[3] (u8)
  • profile, bit_depth (u8)

VASliceParameterBufferVP9 (va_dec_vp9.h:279-303):

  • slice_data_size, slice_data_offset, slice_data_flag (u32)
  • seg_param[8] — array of VASegmentParameterVP9 (~40 bytes each):
    • segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped} (u16 packed)
    • filter_level[4][2] (u8) — per-ref-frame × per-mode loop filter levels
    • luma_ac_quant_scale, luma_dc_quant_scale, chroma_ac_quant_scale, chroma_dc_quant_scale (s16) — already-computed effective scale per segment

FFmpeg V4L2 reference (v4l2_request_vp9.c)

Submission shape: 2 batched controls per frame in single S_EXT_CTRLS:

control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
v4l2_set_controls(..., control, 2);

The COMPRESSED_HDR control is conditionally-included based on a runtime probe (v4l2_request_vp9_post_frames_ctx queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at rkvdec-vp9.c:752 (kernel WILL EINVAL if not provided).

Kernel rkvdec driver (rkvdec-vp9.c)

Key reads in rkvdec_vp9_run_preamble:

  • VP9_FRAME control → dec_params = ctrl->p_cur.p → drives register programming via config_registers().
  • VP9_COMPRESSED_HDR control → prob_updates = ctrl->p_cur.p → applied via v4l2_vp9_fw_update_probs().
  • 8-entry reference frame DPB resolved from FRAME's last_frame_ts/golden_frame_ts/alt_frame_ts (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).

Mapping table (VAAPI → V4L2 / kernel)

The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.

v4l2_ctrl_vp9_frame mapping

Kernel field VAAPI source Notes
lf.ref_deltas[4] NOT in VAAPI VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. Open question Phase 3: are these zero in the BBB fixture?
lf.mode_deltas[2] NOT in VAAPI same
lf.level picture->filter_level direct
lf.sharpness picture->sharpness_level direct
lf.flags NOT in VAAPI DELTA_ENABLED + DELTA_UPDATE bits — ditto
quant.base_q_idx DERIVED — no direct VAAPI exposure Open question Phase 3: VAAPI exposes per-segment luma_ac_quant_scale[seg_param[s]] but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from luma_ac_quant_scale[0][1] via VP9 spec quantization table? Or leave zero and let kernel use default?
quant.delta_q_y_dc/uv_dc/uv_ac NOT in VAAPI same — VAAPI only exposes effective per-segment scales
seg.feature_data[8][4] DERIVED from slice->seg_param[s].filter_level[][] + quant scales mapping non-trivial
seg.feature_enabled[8] derived from slice->seg_param[s].segment_flags + segmentation enabled bits non-trivial
seg.tree_probs[7] picture->mb_segment_tree_probs[7] direct
seg.pred_probs[3] picture->segment_pred_probs[3] direct
seg.flags from pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update} + derived segmentation_update_data + absolute_or_delta mostly direct
flags & KEY_FRAME !pic_fields.bits.frame_type VAAPI inverts: frame_type=0 means keyframe
flags & SHOW_FRAME pic_fields.bits.show_frame direct
flags & ERROR_RESILIENT pic_fields.bits.error_resilient_mode direct
flags & INTRA_ONLY pic_fields.bits.intra_only direct
flags & ALLOW_HIGH_PREC_MV pic_fields.bits.allow_high_precision_mv direct
flags & REFRESH_FRAME_CTX pic_fields.bits.refresh_frame_context direct
flags & PARALLEL_DEC_MODE pic_fields.bits.frame_parallel_decoding_mode direct
flags & X/Y_SUBSAMPLING pic_fields.bits.subsampling_x/y direct
flags & COLOR_RANGE_FULL_SWING NOT in VAAPI leave 0 for BT.709 limited (BBB)
compressed_header_size picture->first_partition_size direct (VAAPI mis-named per its own comment)
uncompressed_header_size picture->frame_header_length_in_bytes direct
frame_width_minus_1 picture->frame_width - 1 direct
frame_height_minus_1 picture->frame_height - 1 direct
render_width_minus_1, render_height_minus_1 NOT in VAAPI leave equal to frame_width-1 / frame_height-1 (no scaling for BBB)
last_frame_ts DPB lookup picture->reference_frames[picture->pic_fields.bits.last_ref_frame]surface_object->timestampv4l2_timeval_to_ns() uses last_ref_frame index into 8-entry DPB
golden_frame_ts DPB lookup picture->reference_frames[picture->pic_fields.bits.golden_ref_frame] same
alt_frame_ts DPB lookup picture->reference_frames[picture->pic_fields.bits.alt_ref_frame] same
ref_frame_sign_bias OR of pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias mapped to V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} direct
reset_frame_context pic_fields.bits.reset_frame_context (with FFmpeg's > 0 ? -1 : 0 adjustment per ref) mapping needs inspection
frame_context_idx pic_fields.bits.frame_context_idx direct
profile picture->profile direct
bit_depth picture->bit_depth direct
interpolation_filter pic_fields.bits.mcomp_filter_type (with FFmpeg's ^ (filtermode <= 1) adjustment — see ref) mapping needs inspection
tile_cols_log2, tile_rows_log2 picture->log2_tile_columns, log2_tile_rows direct
reference_mode NOT in VAAPI derive from heuristic OR leave default V4L2_VP9_REFERENCE_MODE_SELECT — Phase 3 baseline answers

v4l2_ctrl_vp9_compressed_hdr mapping

This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over surface_object->source_data + uncompressed_header_size for compressed_header_size bytes, follows the VP9 spec section 6.3, and applies inv_map_table[d] for each updated probability.

The parsing logic is direct port of FFmpeg fill_compressed_hdr (lines 99-261). Key syntax elements parsed:

  • tx_mode (2 bits, then conditional 1 bit)
  • TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
  • Coef probability updates (4-level nested loop with branch probs)
  • Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
  • MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)

Each updated value goes through inv_map_table[] (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.

Patch shape prediction

Site Action LOC delta
src/config.c:121-160 INSERT VP9 enumeration block +10
src/config.c:54-78 INSERT VP9 case + break + comment +5
src/config.c:167-191 INSERT VP9 case in fall-through +1
src/vp9.c NEW FILE +500-600
src/vp9.h NEW FILE +35-45
src/picture.c:34-37 INSERT #include "vp9.h" +1
src/picture.c:188-225 INSERT VP9 dispatch case +6
src/picture.c:54-186 INSERT 2 buffer-type cases +14
src/surface.h:92-119 INSERT vp9 struct +6
src/meson.build:50,73 INSERT 2 entries +2

Total: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.

Predicted commits:

  • Commit A: src/config.c enumeration + dispatch + entrypoints (Criterion 1).
  • Commit B: NEW src/vp9.c + src/vp9.h + src/meson.build (10 contract clauses + VPX rac decoder + compressed-header parser).
  • Commit C: src/picture.c dispatcher + 2 buffer-type cases + src/surface.h union extension (Criteria 2-3).
  • Commit D: optional fix-forward placeholder.

Open questions for Phase 3 baseline

  1. Loop filter ref/mode deltas: VAAPI doesn't expose lf_delta.ref/mode/enabled/updated. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
  2. Quantization base_q_idx + deltas: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
  3. Reference mode: VAAPI doesn't expose comppredmode. Phase 3 baseline: verify default V4L2_VP9_REFERENCE_MODE_SELECT works for BBB.
  4. Interpolation filter mapping: FFmpeg uses filtermode ^ (filtermode <= 1) to remap; VAAPI's mcomp_filter_type may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
  5. Reset frame context mapping: FFmpeg uses > 0 ? - 1 : 0. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
  6. VAAPI per-segment field interpretation: slice->seg_param[s].filter_level[4][2] and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
  7. Does mpv 0.41.0 engage HW for VP9?: Phase 3 capture mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm and grep for Selected decoder: vp9 vs Using software decoding. iter3 VP8 fell back; iter4 VP9 may or may not.
  8. Does rkvdec exhibit the same dma_resv kernel issue as hantro?: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. Predicted: rkvdec works fine for direct readback. Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.

Phase 3 baseline targets (work plan)

  1. Cross-validator capture: strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend decode_vp8.py for VP9 layout).
  2. VAAPI consumer trace: LIBVA_TRACE mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
  3. Cache-safe verify reference: mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm and capture frame-0001/0002 SHA256 (criterion-4 anchor).
  4. rkvdec readback path test: re-run ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5 after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
  5. mpv-VP9-vaapi engagement check: per memory feedback_hw_decode_engagement_check.md, verify HW path engaged via mpv -v log BEFORE claiming criterion 4.

Phase 4 plan structure (anticipated)

Following iter2/iter3's clause template:

  • Clause 1: Submission shape — 2 controls batched per frame
  • Clause 2: Local struct alloc + zero-init (memset both)
  • Clause 3: Frame geometry + scalars + flags
  • Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
  • Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
  • Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
  • Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
  • Clause 8: Compressed header parser — port FFmpeg fill_compressed_hdr + VPX rac decoder + inv_map_table
  • Clause 9: Final 2-control batched submission
  • Clause 10: Bitstream offsetting — surface_object->source_data + uncompressed_header_size is the start of compressed-header bytes; compressed_header_size is the byte length

The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per feedback_dev_process.md Phase 6 contract-before-code.

Substrate state at Phase 2 close

  • iter4 Phase 1 commit 9a71dbf pushed to gitea.
  • Fork on noether at iter3 tip e1aca9c (synced via git fetch && merge --ff-only).
  • All Phase 3 prerequisites identified.
  • Memory rules apply unchanged.
  • Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (uniform_spacing_flag Direction 2 lesson).