Files
fresnel-fourier/phase4_iter4_plan.md
marfrit 42b9ec333a iter4 Phase 6: 4 commits landed (Z+A+B+C), ffmpeg-vaapi VP9 decode PASS
Fork at marfrit/libva-multiplanar tip beaa914:
- Z (7f8fa93) device-path auto-detect via media controller topology;
  walk /dev/media*, MEDIA_IOC_DEVICE_INFO match, MEDIA_IOC_G_TOPOLOGY
  -> MEDIA_INTF_T_V4L_VIDEO -> resolve via /sys/dev/char.
  LIBVA_V4L2_REQUEST_NO_AUTODETECT=1 escape hatch.
- A (16b3973) src/config.c VP9 enumeration + dispatch + entrypoints.
- B (406d08e) NEW src/vp9.c (~750 LOC: VPX rac + inv_map_table +
  uncompressed-header partial parser + compressed-header parser +
  vp9_set_controls) + src/vp9.h + meson.build + context.h
  (persistent vp9_lf state for Phase 5 C2) + surface.h
  (params.vp9 union extension).
- C (beaa914) src/picture.c VP9 dispatcher + 2 buffer-type cases.

NO Commit D — buffer.c allow-list already permissive for VP9's 3
buffer types (Picture, Slice, SliceData; all in iter3 baseline).

Phase 5 amendments all in code: C1 no-XOR direct, C2 persistent
vp9_lf with VP9 spec defaults, C3 out_reference_mode parameter,
C4 NO_AUTODETECT escape, S4 uv_mode memcpy omitted.

Plan amendment to Commit Z section in phase4_iter4_plan.md
documents the canonical media-topology approach (replacing the
original /dev/video* walk).

Verification empirically on fresnel:
- Criterion 1: vainfo enumerates VAProfileVP9Profile0 alongside
  H.264 + HEVC under auto-detect rkvdec.
- Criterion 2 (implicit via successful ffmpeg run).
- Criterion 3: ffmpeg-vaapi VP9 5-frame decode exit 0 at
  0.307x speed, no ioctl errors.
- Criterion 4: deferred to Phase 7 verification.
- Criterion 5: rkvdec codecs work without env override; hantro
  (MPEG-2/VP8) still need env override per iter4-B1 backlog.

Open iter4 backlog: B1 (multi-decoder dispatch refactor),
B2 (mpv-vaapi Could-not-create-device — ffmpeg-vaapi works fine
through same backend, mpv does not), Q6 (per-segment ALT_Q
mapping for non-BBB), COLOR_RANGE (VAAPI gap).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 06:55:45 +00:00

569 lines
37 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 4 — Phase 4 (plan)
Locks the iter4 patch shape against verbatim Phase 3 baseline (`phase3_iter4_baseline.md`, commit `56abe3d`) and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2 (`phase2_iter4_situation.md`, commit `2651e4c` + ID-correction in `56abe3d`). Plan structure mirrors iter2/iter3 clause template, expanded for VP9-specific scope (compressed-header parser + uncompressed-header partial parse + device-path mitigation).
Phase 3 baseline provides:
- Empirical struct sizes 168 B / 2040 B (NOT Phase 2's 144 / 1947 estimates)
- Correct control IDs `0xa40a2c` / `0xa40a2d`
- Frame-1 keyframe verbatim payload prefix (`lf.ref_deltas={1,0,-1,-1}`, `lf.mode_deltas={0,0}`, `quant.base_q_idx=46`)
- 4-codec regression root-cause: `/dev/video0` is now `rockchip-rga` on 7.0; backend hardcodes `/dev/video0` in `request.c:149`
User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. **Phase 5 C4 amendment** — partial restoration only: walk-and-pick-first returns rkvdec on 7.0 (rkvdec at video1 enumerates first). Restores H.264 / HEVC / VP9 (rkvdec codecs) without env override. MPEG-2 and VP8 (hantro at video3) STILL require explicit env override `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media1`. Full multi-decoder dispatch is iter4-B1 backlog.
## Contract clauses
### Clause 1 — Submission shape (per-frame)
ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. **TWO controls** (vs iter3's one): `V4L2_CID_STATELESS_VP9_FRAME` + `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`. `rkvdec-vp9.c::rkvdec_vp9_run_preamble:752` `WARN_ON(!ctrl); return -EINVAL` if COMPRESSED_HDR absent — hard requirement on RK3399.
```c
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME, /* 0xa40a2c */
.ptr = &frame,
.size = sizeof frame /* MUST be 168 */ },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, /* 0xa40a2d */
.ptr = &compressed_hdr,
.size = sizeof compressed_hdr /* MUST be 2040 */ },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
```
`v4l2_set_controls` wraps with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verifies `ctrl_class=0xf010000` is what the kernel sees, matching iter1+iter2+iter3.
**No init-time probe**: ffmpeg-v4l2request first runs a count=1 probe (FRAME-only) to check kernel CID support, then count=2. iter4 backend skips the probe — VP9 on rkvdec REQUIRES COMPRESSED_HDR per kernel source; if the kernel doesn't have it, decode would fail anyway. Issuing count=2 unconditionally is correct.
**Anchor**: Phase 3 baseline § Anchor 2 verbatim ioctl trace.
### Clause 2 — Local struct allocation + zero-init
```c
int vp9_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferVP9 *picture =
&surface_object->params.vp9.picture;
VASliceParameterBufferVP9 *slice =
&surface_object->params.vp9.slice;
struct v4l2_ctrl_vp9_frame frame;
struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr;
memset(&frame, 0, sizeof frame);
memset(&compressed_hdr, 0, sizeof compressed_hdr);
/* Zero is the kernel's "no probability update" default for every
* field in compressed_hdr, and the safe default for every numeric
* field of frame except reference_mode (set explicitly later). */
...
}
```
VAAPI doesn't have iter3-style `set` flags for VP9; both Picture and Slice are unconditionally populated by the consumer per frame (per Phase 2 analysis B9, no per-frame reset needed in `RequestBeginPicture`).
### Clause 3 — Compile-time struct-size assertions
Per Phase 3 finding: kernel UAPI struct sizes are **168 B (FRAME)** and **2040 B (COMPRESSED_HDR)** on 7.0; iter4's `Build Date: post-2026-05-09` will use whatever size the build host's UAPI headers report. Adding compile-time asserts at the top of `vp9.c` makes any future struct-size drift fail loudly instead of silently corrupting kernel control writes:
```c
_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168,
"v4l2_ctrl_vp9_frame size mismatch — UAPI changed");
_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040,
"v4l2_ctrl_vp9_compressed_hdr size mismatch — UAPI changed");
```
If these fire, treat as a kernel-substrate bump (re-baseline Phase 3) — DO NOT just bump the asserts.
### Clause 4 — Frame geometry + per-frame scalars
```c
frame.frame_width_minus_1 = picture->frame_width - 1;
frame.frame_height_minus_1 = picture->frame_height - 1;
frame.render_width_minus_1 = picture->frame_width - 1; /* VAAPI gap;
frame.render_height_minus_1 = picture->frame_height - 1; no scaling for BBB */
frame.profile = picture->profile;
frame.bit_depth = picture->bit_depth;
frame.tile_cols_log2 = picture->log2_tile_columns;
frame.tile_rows_log2 = picture->log2_tile_rows;
frame.frame_context_idx = picture->pic_fields.bits.frame_context_idx;
frame.lf.level = picture->filter_level;
frame.lf.sharpness = picture->sharpness_level;
frame.uncompressed_header_size = picture->frame_header_length_in_bytes;
frame.compressed_header_size = picture->first_partition_size;
```
VAAPI fields verified via test-compile against `va_dec_vp9.h:58-192` (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — Phase 5 will re-verify the field-name access list via `gcc -c` test compile).
Phase 3 keyframe anchor: `width=1280, height=720, profile=0, bit_depth=8, tile_log2={0,0}, level=3, sharpness=0` — direct match.
### Clause 5 — DPB timestamp resolution (3 active references from 8-slot DPB)
VAAPI's `picture->reference_frames[0..7]` is the full 8-entry DPB. The 3 active references for the current frame are indexed by `last_ref_frame`/`golden_ref_frame`/`alt_ref_frame` (each 3-bit, points into the 8-slot array).
```c
VASurfaceID last_id = picture->reference_frames[picture->pic_fields.bits.last_ref_frame];
VASurfaceID golden_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame];
VASurfaceID alt_id = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame];
struct object_surface *last_ref = (last_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id) : NULL;
struct object_surface *golden_ref = (golden_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, golden_id) : NULL;
struct object_surface *alt_ref = (alt_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id) : NULL;
if (last_ref) frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref) frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
```
Mirrors iter1/iter3 ref-resolution pattern. For keyframes (all refs invalid), timestamps stay 0 from memset.
Sign bias:
```c
if (picture->pic_fields.bits.last_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST;
if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;
```
### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse with persistent LF-delta state)
VAAPI exposes `filter_level` and `sharpness_level` (Clause 4), but NOT `lf_delta_enabled`/`lf_delta_update`/`lf_ref_delta[4]`/`lf_mode_delta[2]`. Phase 3 keyframe anchor shows `lf.ref_deltas={1,0,-1,-1}` (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch.
VAAPI also doesn't expose `quant.base_q_idx` / `delta_q_y_dc` / `delta_q_uv_dc` / `delta_q_uv_ac`. Phase 3 keyframe anchor shows `base_q_idx=46`; leaving zero produces wrong dequant scale.
**Phase 5 C2 amendment — LF deltas are persistent across frames**: Per kernel UAPI `<linux/v4l2-controls.h>:2578` verbatim:
> "If this syntax element is not present in the bitstream, users should pass its last value."
VP9 spec (matching FFmpeg `vp9.c:666-671`) initializes `lf_delta.ref = {1,0,-1,-1}` and `lf_delta.mode = {0,0}` on keyframe/error-resilient/intra-only, then updates only when `lf_delta.updated == 1`. FFmpeg `v4l2_request_vp9.c:298-302` always copies the persistent state to the kernel control (NOT conditional on `updated`). Memset-zero per-frame loses the persistent state — wrong on inter frames where typically `delta_enabled=1` and `delta_update=0`.
Solution: add persistent LF-delta state to `object_context` (per-decode-session scope), initialize to VP9 spec defaults, update only when the parsed bitstream's `lf_delta_update` is set, always copy to the kernel control:
```c
/* In src/context.h, inside object_context (added by Commit B/C): */
struct {
int8_t ref_deltas[4]; /* VP9 spec default: {1,0,-1,-1} */
int8_t mode_deltas[2]; /* VP9 spec default: {0,0} */
bool initialized; /* false until first VP9 frame */
} vp9_lf;
/* In Clause 6 fill (vp9.c): */
if (!context->vp9_lf.initialized ||
keyframe || intra_only || error_resilient) {
context->vp9_lf.ref_deltas[0] = 1;
context->vp9_lf.ref_deltas[1] = 0;
context->vp9_lf.ref_deltas[2] = -1;
context->vp9_lf.ref_deltas[3] = -1;
context->vp9_lf.mode_deltas[0] = 0;
context->vp9_lf.mode_deltas[1] = 0;
context->vp9_lf.initialized = true;
}
/* Parse uncompressed header — only updates context state when
* delta_update=1 in the bitstream: */
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data,
surface_object->source_size,
frame.uncompressed_header_size,
&frame.lf, /* receives flags + level + sharpness */
&frame.quant, /* receives base_q_idx + deltas */
context->vp9_lf.ref_deltas, /* in/out — updated only if parser sees delta_update=1 */
context->vp9_lf.mode_deltas); /* same */
/* ALWAYS copy persistent state to the kernel control: */
memcpy(frame.lf.ref_deltas, context->vp9_lf.ref_deltas, 4);
memcpy(frame.lf.mode_deltas, context->vp9_lf.mode_deltas, 2);
```
The parser (`vp9_parse_uncompressed_header_lf_quant`) walks frame_marker → profile → show_existing_frame → frame_type → show_frame → error_resilient_mode → frame_sync_code → color_config (profile-dependent) → frame_size → render_and_frame_size_different → refresh_frame_flags → loop_filter_params → quantization_params. Per Phase 5 Q1 reviewer note: ~80100 LOC for a faithful profile-0 keyframe + profile-0 inter-frame parse (revised up from Phase 4's "~50 LOC" estimate).
**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. After C2 fix, frame 2 (inter) should also show `lf.ref_deltas={1,0,-1,-1}` from the persisted state — verifiable by strace.
**Out of iter4 scope**: full uncompressed-header parse (color_config full variant, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse.
### Clause 7 — Segmentation mapping
VAAPI conveys segmentation via:
- `picture->pic_fields.bits.{segmentation_enabled, segmentation_temporal_update, segmentation_update_map}` flags
- `picture->mb_segment_tree_probs[7]` (segment tree probs)
- `picture->segment_pred_probs[3]` (temporal-update probs; 255-padded if `temporal_update == 0`)
- `slice->seg_param[8].{segment_flags.fields, filter_level[4][2], luma_*_quant_scale, chroma_*_quant_scale}`
Kernel takes per-segment feature_data + feature_enabled bitmaps. The mapping is non-trivial because VAAPI's slice->seg_param[s] carries EFFECTIVE quant scales (already-computed by VAAPI consumer), while kernel wants the per-segment ALT_Q delta or absolute (depends on `ABS_OR_DELTA_UPDATE` flag).
```c
for (i = 0; i < 7; i++)
frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i];
for (i = 0; i < 3; i++)
frame.seg.pred_probs[i] = picture->segment_pred_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED;
if (picture->pic_fields.bits.segmentation_update_map)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.segmentation_temporal_update)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
/* UPDATE_DATA + ABS_OR_DELTA_UPDATE: not in VAAPI; left zero.
* For BBB (segmentation disabled), this is correct — flags ignored
* by kernel when ENABLED is clear. */
/* Per-segment feature_data (only meaningful when ENABLED):
* VAAPI's seg_param[s].luma_ac_quant_scale[s] is the EFFECTIVE per-
* segment scale. Kernel wants ALT_Q absolute Q-index OR delta.
* Recover via VP9 spec inverse-Q-table OR leave zero (BBB safe). */
for (i = 0; i < 8; i++) {
if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) {
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME;
frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] =
slice->seg_param[i].segment_flags.fields.segment_reference;
}
if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped)
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP;
/* SEG_LVL_ALT_Q + ALT_L: VAAPI doesn't directly expose per-segment
* abs/delta intent. Phase 5 review point: BBB has segmentation
* disabled so this code path is dead; non-BBB fixtures are out of
* iter4 scope (see backlog). */
}
```
**Anchor**: Phase 3 keyframe `seg = all zeros` (BBB segmentation disabled). The Clause 7 logic is exercised only for inter frames with segmentation_enabled — out of iter4 BBB scope. Document as fidelity gap.
### Clause 8 — VPX range coder + inv_map_table (for compressed header parse)
Direct port from FFmpeg `v4l2_request_vp9.c:42-97`:
- `inv_map_table[255]` — copy verbatim
- `vpx_rac_init(c, buf, size)` — initialize range coder over the compressed-header bytes
- `vp89_rac_get(c)` — read a single bit
- `vp89_rac_get_uint(c, n)` — read n bits MSB-first
- `vpx_rac_get_prob_branchy(c, prob)` — read with given probability
- `read_prob_delta(c)` — the 4-tier VLC + inv_map_table lookup used to update one prob
~80 LOC, all stateless static functions. Implementation can be either inlined in `vp9.c` (Phase 2 B6 Option A — chosen) or split to `vp9_rac.h`. Phase 2 default = Option A; Phase 5 may flip to Option B if reuse pressure surfaces.
### Clause 9 — Compressed-header parser (`vp9_fill_compressed_hdr`)
Direct port of FFmpeg `v4l2_request_vp9.c:99-261::fill_compressed_hdr`. Reads from `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes. ~180 LOC.
Syntax elements parsed (per VP9 spec 6.3):
- `tx_mode` (2 bits, +1 conditional bit when SELECT)
- TX 8x8/16x16/32x32 probability deltas (only if `tx_mode == SELECT`)
- Coef probability deltas (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability deltas (only on inter frames)
- MV probability deltas (joint/sign/classes/class0_bit/bits/class0_fr/fr/class0_hp/hp)
Each updated value goes through `inv_map_table[d]`. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability").
**Phase 5 C3 amendment — `reference_mode` out-parameter**: The kernel's `reference_mode` field lives in `v4l2_ctrl_vp9_frame` (`<linux/v4l2-controls.h>:2763`), NOT in `v4l2_ctrl_vp9_compressed_hdr`. FFmpeg's parser reads it from a local `enum CompPredMode comppredmode` during compressed-header parsing (`v4l2_request_vp9.c:104,150-160`) and assigns to FRAME's `reference_mode` separately. Our backend must thread the parsed value out via an explicit out-parameter — without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` references an undefined variable → compile failure.
Function signature (Phase 5 C3-amended):
```c
static void vp9_fill_compressed_hdr(
struct v4l2_ctrl_vp9_compressed_hdr *compressed_hdr,
const uint8_t *buf, uint32_t size,
uint8_t lossless_flag,
bool keyframe_or_intraonly,
bool allowcompinter,
uint8_t *out_reference_mode); /* C3: NEW out-param */
```
`allowcompinter` is true when the sign biases of the 3 active references are not all identical (per VP9 spec 6.4.4). Derive at the call site from `picture->pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias`:
```c
bool allowcompinter = !(
picture->pic_fields.bits.last_ref_frame_sign_bias ==
picture->pic_fields.bits.golden_ref_frame_sign_bias &&
picture->pic_fields.bits.golden_ref_frame_sign_bias ==
picture->pic_fields.bits.alt_ref_frame_sign_bias);
```
For keyframes (`!keyframe_or_intraonly == false`): write `*out_reference_mode = 0` (PRED_SINGLEREF) unconditionally. For inter frames without compinter: same. With compinter: extract from the compressed-header range coder per FFmpeg `v4l2_request_vp9.c:150-160`.
**Lossless special case**: if `picture->pic_fields.bits.lossless_flag` is set, the parser writes `compressed_hdr->tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally per FFmpeg `v4l2_request_vp9.c:117-118`. VAAPI's `lossless_flag` semantics align with FFmpeg's `s->s.h.lossless` per Phase 5 S5 verification (both encode `base_q==0 && y_dc_delta_q==0 && uv_dc_delta_q==0 && uv_ac_delta_q==0`).
**`uv_mode` memcpy omission** (Phase 5 S4): FFmpeg's `fill_compressed_hdr` ends with `memcpy(ctrl->uv_mode, s->prob.p.uv_mode, sizeof(ctrl->uv_mode))`. rkvdec's `rkvdec-vp9.c:280-287` reads `uv_mode` from `vp9_ctx->probability_tables` (the persistent frame_context table inside the kernel), NOT from the compressed_hdr control. Omitting the memcpy is safe for rkvdec; document and skip.
**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").
### Clause 10 — Frame flags + reference_mode + interpolation_filter
```c
if (!picture->pic_fields.bits.frame_type) /* VAAPI inverts: 0 means keyframe */
frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME;
if (picture->pic_fields.bits.show_frame) frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME;
if (picture->pic_fields.bits.error_resilient_mode) frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT;
if (picture->pic_fields.bits.intra_only) frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY;
if (picture->pic_fields.bits.allow_high_precision_mv)
frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV;
if (picture->pic_fields.bits.refresh_frame_context)
frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX;
if (picture->pic_fields.bits.frame_parallel_decoding_mode)
frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE;
if (picture->pic_fields.bits.subsampling_x) frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING;
if (picture->pic_fields.bits.subsampling_y) frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING;
/* COLOR_RANGE_FULL_SWING: VAAPI doesn't expose; leave clear (BT.709 limited for BBB). */
/* reset_frame_context: FFmpeg uses (resetctx > 0 ? resetctx - 1 : 0).
* VAAPI's pic_fields.bits.reset_frame_context is 2 bits (0..3).
* V4L2 enum is 0..2. The off-by-one is because VP9 spec encodes
* "no reset" + 3 reset variants into 2 bits, but kernel enum drops
* the encoder helper offset. Follow FFmpeg's mapping verbatim: */
frame.reset_frame_context =
picture->pic_fields.bits.reset_frame_context > 0
? picture->pic_fields.bits.reset_frame_context - 1
: 0;
/* Phase 5 C1: VAAPI's mcomp_filter_type is ALREADY post-XOR.
* vaapi_vp9.c:62 applies `h->h.filtermode ^ (h->h.filtermode <= 1)`
* before storing into VAAPI's mcomp_filter_type. Re-applying the XOR
* here would undo the alignment and swap EIGHTTAP/EIGHTTAP_SMOOTH for
* inter frames — wrong loop-filter selection on BBB frames 2-5.
* Direct assignment is correct. */
frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type;
/* Phase 5 C3: reference_mode is in v4l2_ctrl_vp9_frame (offset 160),
* NOT in v4l2_ctrl_vp9_compressed_hdr. Filled by Clause 9's parser
* via the out_reference_mode parameter. */
uint8_t parsed_reference_mode = 0; /* default for keyframe */
vp9_fill_compressed_hdr(&compressed_hdr,
surface_object->source_data + frame.uncompressed_header_size,
frame.compressed_header_size,
picture->pic_fields.bits.lossless_flag,
!picture->pic_fields.bits.frame_type ||
picture->pic_fields.bits.intra_only,
allowcompinter,
&parsed_reference_mode);
frame.reference_mode = parsed_reference_mode;
```
**Anchor**: Phase 3 verbatim — keyframe `reset_frame_context=0, interpolation_filter=0` (VAAPI's `mcomp_filter_type=0` XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes.
**Phase 5 review point**: the FFmpeg-inferred mappings for `reset_frame_context` and `interpolation_filter` are tied to *FFmpeg's* internal enum order. VAAPI's enum order may differ. Phase 5 should empirically validate by decoding Phase 3's keyframe payload byte 144 (offset of `reset_frame_context`) and byte 149 (offset of `interpolation_filter`) and cross-checking with VAAPI's `pic_fields.bits` for the same frame. If they disagree, the FFmpeg-inferred remap is wrong.
### Clause 11 — Final 2-control batched submission
```c
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME,
.ptr = &frame, .size = sizeof frame },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR,
.ptr = &compressed_hdr, .size = sizeof compressed_hdr },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
```
Mirrors iter3's Clause 10 with count=2 instead of count=1.
### Clause 12 — Bitstream offsetting
Backend hands the kernel the FULL frame bitstream via `surface_object->source_data` + `surface_object->source_size`. The kernel uses `picture->frame_header_length_in_bytes` as the start-of-compressed-header offset. The compressed header parser (Clause 9) reads `[uncompressed_header_size, uncompressed_header_size + compressed_header_size)` from the bitstream buffer.
```c
const uint8_t *compressed_hdr_start =
surface_object->source_data + frame.uncompressed_header_size;
uint32_t compressed_hdr_len = frame.compressed_header_size;
vp9_fill_compressed_hdr(&compressed_hdr,
compressed_hdr_start,
compressed_hdr_len);
/* Same buffer pointer used by Clause 6 for uncompressed-header parse,
* but with offset 0 + length = uncompressed_header_size. */
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data, surface_object->source_size,
frame.uncompressed_header_size,
&frame.lf, &frame.quant);
```
## Patch shape (commits)
iter4 implements as 5 commits (mitigation B + iter3-style ABCD):
### Commit Z — `src/request.c`: device-path auto-detect via media-controller topology (mitigation B, canonical form)
**Approach correction (post-implementation user redirect)**: the original plan walked `/dev/video0..15` + matched by `VIDIOC_QUERYCAP` driver name, then paired media node via a separate scan. This is brittle: dependence on enumeration order + assumption that video↔media pairing follows /dev/* numbering. The canonical v4l2-request discovery path drives discovery from the media controller graph instead (which is what FFmpeg's v4l2-request does):
```c
/* Algorithm:
* 1. Walk /dev/media0..15. MEDIA_IOC_DEVICE_INFO names the driver.
* Match against {rkvdec, hantro-vpu, cedrus, sun4i_csi}.
* 2. MEDIA_IOC_G_TOPOLOGY enumerates the entity/interface graph.
* The MEDIA_INTF_T_V4L_VIDEO interface carries major:minor of
* the V4L2 video node owned by THIS media controller — paired
* by the kernel, NOT by /dev/* enumeration order.
* 3. Resolve major:minor to /dev/videoN via /sys/dev/char/<M>:<N>
* (the kernel's char-device sysfs symlink whose basename is
* the device node name).
*/
static int find_codec_device(char *video_out, size_t video_out_sz,
char *media_out, size_t media_out_sz);
static int find_video_node_via_topology(int media_fd, char *video_out, size_t out_sz);
static int resolve_dev_node(uint32_t major, uint32_t minor, char *out, size_t out_sz);
```
In `RequestInit`:
```c
if (getenv("LIBVA_V4L2_REQUEST_NO_AUTODETECT")) {
video_path = "/dev/video0"; /* legacy escape hatch */
media_path = "/dev/media0";
} else {
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
static char auto_video[32], auto_media[32];
if (find_codec_device(auto_video, sizeof auto_video,
auto_media, sizeof auto_media) == 0) {
video_path = auto_video;
auto_media_set = true;
request_log("auto-selected codec device: %s + %s\n",
auto_video, auto_media);
} else {
video_path = "/dev/video0";
}
}
}
```
Predicted +152 LOC, 1 file modified. Build target after Commit Z: `vainfo` (no env) lists the auto-selected decoder's profiles, paired correctly via topology graph. Independent of VP9 — can be tested + merged before Commit A.
**End-user UX gap (documented, NOT fixed in iter4)**: backend opens ONE codec device at init. If user wants the OTHER decoder (default selects rkvdec; user wants hantro for MPEG-2/VP8), they still need env override. Aggregating BOTH decoders simultaneously requires multi-fd dispatch refactor; out of iter4 scope, cross-cutting backlog item iter4-B1.
**Verified on fresnel `7.0.0-fresnel-fourier` post-Phase-6**:
- `vainfo` (no env) → `auto-selected codec device: /dev/video1 + /dev/media0`, enumerates H.264×5 + HEVCMain (rkvdec) — pair came from topology graph, not from /dev/* numbering.
- `vainfo NO_AUTODETECT=1` → empty list (legacy /dev/video0 = rga).
- `vainfo` with explicit `/dev/video3` + `/dev/media1` → MPEG-2×2 + VP8.
### Commit A — `src/config.c`: VP9 enumeration + dispatch + entrypoints
3 sites mirroring iter3 commit A:
1. `RequestQueryConfigProfiles` (after VP8 enumeration block from iter3): add VP9 enumeration block probing `V4L2_PIX_FMT_VP9_FRAME` against single + MPLANE OUTPUT formats. Adds `VAProfileVP9Profile0`. ~10 LOC.
2. `RequestCreateConfig` (after VP8 case from iter3): add `case VAProfileVP9Profile0: break;` with comment block. ~5 LOC.
3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP9Profile0:` to existing fall-through. ~1 LOC.
Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` (with env override or post-commit-Z auto-detect) lists `VAProfileVP9Profile0` on rkvdec env.
### Commit B — NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` integration
Net-new `vp9.c` implements `vp9_set_controls()` per Clauses 1-12 above.
Predicted ~580 LOC for `vp9.c` (50 LOC infrastructure + 80 LOC VPX rac + 50 LOC uncompressed-header partial parse + 180 LOC compressed-header parser + 50 LOC frame-fill (Clauses 4-5,7,10) + 30 LOC of submission/wrap). ~40 LOC for `vp9.h`. +2 lines `meson.build`.
3 files (2 new + 1 modified). Build target after Commit B: vp9.o compiles standalone, picture.c can't dispatch yet.
### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + buffer routing + union extension
5 sites:
1. `picture.c:34-37` include block: add `#include "vp9.h"`.
2. `picture.c::codec_set_controls`: add VP9 dispatch case calling `vp9_set_controls`. ~6 LOC.
3. `picture.c::codec_store_buffer`: add VP9 inner cases for `VAPictureParameterBufferType` and `VASliceParameterBufferType`. ~14 LOC. (NO `VAProbabilityBufferType` for VP9; NO `VAIQMatrixBufferType`. Confirmed in Phase 2 B8.)
4. `picture.c::RequestBeginPicture`: NO change predicted (VP9 doesn't have iter3-style `iqmatrix_set` flag — Picture/Slice always populated per frame by VAAPI consumer). Phase 2 B9 confirms.
5. `surface.h::object_surface::params` union: add `vp9` struct after `vp8`:
```c
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
```
Predicted +26 LOC, 2 files modified. Build target after Commit C: backend builds clean; mpv-vaapi VP9 decode should engage end-to-end on rkvdec.
### Commit D — fix-forward placeholder
Phase 2 B12 predicted no `buffer.c` changes (VP9's 3 buffer types — Picture, Slice, Data — already in iter3's allow-list). Per memory `feedback_runtime_enumerates_allowlists.md`, plan for fix-forward if Commit C runtime hits an allow-list miss; otherwise this commit slot stays empty.
## Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|:-:|:-:|:-:|:-:|
| `src/request.c` | | ✓ | +35 | Z |
| `src/config.c` | | ✓ | +16 | A |
| `src/vp9.c` | ✓ | | +580 | B |
| `src/vp9.h` | ✓ | | +40 | B |
| `src/meson.build` | | ✓ | +2 | B |
| `src/picture.c` | | ✓ | +20 | C |
| `src/surface.h` | | ✓ | +6 | C |
**Total**: ~699 LOC, 7 files (2 new + 5 modified). 4 commits (Z, A, B, C) + optional D. Notably bigger than iter3 (308 LOC) because of: device-path mitigation (35) + uncompressed-header partial parse (50) + compressed-header parser (180) + VPX rac (80).
## Cross-cutting backlog (out of iter4 scope)
Items inherited + NEW from iter4:
- **iter4-B1** (NEW) Backend opens ONE codec device at init (rkvdec OR hantro). Aggregating both for unified profile enumeration requires multi-fd dispatch refactor. Defer.
- **iter4-B2** (NEW) ffmpeg-vaapi / mpv-vaapi `Could not create device` failure mode persists even with env override. Likely a vaapi-DRM render-node path issue separate from device-path. Investigate in Phase 6 if HW=SW byte-compare fails.
- **iter4-Q6** (NEW) VAAPI per-segment `seg_param[s]` fields are EFFECTIVE quant scales; kernel wants ALT_Q absolute or delta. Mapping back is non-trivial; left zeros for BBB (segmentation disabled). Document as fidelity gap for non-BBB fixtures.
- **iter4-COLOR_RANGE** (NEW) VAAPI doesn't expose color_range; backend leaves `V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING` clear (BT.709 limited). Wrong for full-range JPEG-encoded VP9.
- **B5/B6** mpeg2 vbv polish + h265 SPS bitstream parse (carried from iter1+iter2).
- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only.
## Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — empirical-over-theoretical in BOTH directions):
1. **Uncompressed-header parser correctness (Clause 6)**: empirically decode the first ~200 bytes of `bbb_720p10s_vp9.webm` keyframe and confirm `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3, quant.base_q_idx=46` are the *correct* parse results — not just the kernel-direct's pre-formatted output. If the spec says the bits encode something different, the parser is wrong even if kernel-direct happens to match.
2. **`reset_frame_context` and `interpolation_filter` remap (Clause 10)**: empirically extract these bytes from Phase 3 strace payload and cross-check FFmpeg's XOR/-1 remap against the bytes' literal interpretation as VP9 spec enums.
3. **Compile-time size assertions (Clause 3)**: are 168/2040 stable across kernel UAPI versions, or will a 7.1+ kernel grow them again? If unstable, replace with a runtime size assertion via `VIDIOC_QUERY_EXT_CTRL` + `flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Phase 5 reviewer call.
4. **Per-segment mapping (Clause 7)**: BBB doesn't exercise segmentation. For non-BBB segmentation-enabled fixtures (out of iter4 scope), is the planned `seg_param[s].luma_ac_quant_scale``feature_data[s][ALT_Q]` mapping fundamentally wrong (effective scale vs delta), or just lossy? Document the gap clearly.
5. **Test compile field availability**: per Direction 2, every VAAPI field-name reference in this plan should be `gcc -c` test-compiled before Phase 6. Reviewer should verify the access list in Clauses 4, 5, 7, 10.
6. **Mitigation B regression risk**: `request.c` is shared with all 5 already-shipping codecs. Could the walk-and-pick-first logic regress any existing test fixture if env vars happen to be unset by accident? Phase 5 should suggest a safety knob (e.g., `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` to force old `/dev/video0` behavior).
7. **Lossless flag mapping (Clause 9)**: VAAPI's `pic_fields.bits.lossless_flag` — is it set the same way as FFmpeg's `s->s.h.lossless`? VAAPI comment says "LosslessFlag = base_qindex == 0 && y_dc_delta_q == 0 && uv_dc_delta_q == 0 && uv_ac_delta_q == 0" — check that semantics align.
## Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP9Profile0 | Commit Z (device-path) + Commit A (`RequestQueryConfigProfiles` enumeration block) |
| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` |
| 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers |
| 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via `mpv -v` log per memory `feedback_hw_decode_engagement_check.md` |
| 5. 4-codec regression | Commit Z restores rkvdec-codec baseline without env override (H.264, HEVC). MPEG-2, VP8 (hantro) still require `LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3` env override per Phase 5 C4. Commits A+B+C add new VP9 path purely additively (no shared-state mutation in MPEG-2/HEVC/H.264/VP8 paths). |
## Substrate state at Phase 4 close
- Phase 0+1+2+3 commits at gitea (`9a71dbf`, `2651e4c`+`56abe3d` ID-correction, `56abe3d`).
- Fork at iter3 tip `e1aca9c` on noether; Phase 6 patches will land here.
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter4_phase3/` and `noether:~/src/fresnel-fourier/iter4_phase3.tgz`.
- Memory rules carry forward; new `reference_fresnel_kernel_substrate.md` covers post-besser substrate.
- Phase 4 plan ready for sonnet-architect review (Phase 5).
## Phase 5 amendments incorporated (post-review)
Sonnet-architect review (`phase5_iter4_review.md`) returned 4 Critical findings + 6 Suggested. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place above:
| ID | Site | Change | Verified |
|---|---|---|---|
| C1 | Clause 10 | `frame.interpolation_filter = picture->pic_fields.bits.mcomp_filter_type` (direct, NO XOR; vaapi_vp9.c:62 already applies the XOR before storing into VAAPI's mcomp_filter_type) | ✓ grep `vaapi_vp9.c:62` |
| C2 | Clause 6 | Add persistent `vp9_lf.{ref_deltas[4], mode_deltas[2]}` to `object_context`, init to VP9 spec `{1,0,-1,-1,0,0}`, update only when parser sees `lf_delta.update=1`, ALWAYS copy to kernel control. Without persistence, BBB inter frames send `{0,0,0,0,0,0}` instead of `{1,0,-1,-1,0,0}` → kernel applies wrong loop-filter strength | ✓ kernel UAPI `v4l2-controls.h:2578` "users should pass its last value"; FFmpeg `vp9.c:666-671` defaults |
| C3 | Clauses 9 + 10 | Add `out_reference_mode` parameter to `vp9_fill_compressed_hdr`. Without it, Clause 10's `frame.reference_mode = compressed_hdr_reference_mode` is undefined identifier (compile failure). `reference_mode` lives in v4l2_ctrl_vp9_frame, not _compressed_hdr; FFmpeg uses local `enum CompPredMode comppredmode` and threads it through return path | ✓ `v4l2-controls.h:2763` (FRAME) + `v4l2_request_vp9.c:104` (local var) |
| C4 | Mitigation B + criterion-5 trace + request.c skeleton | (a) Qualify wording: "Restores rkvdec-codec baseline" (NOT all 4 codecs); MPEG-2/VP8 still need env override for hantro path. (b) Add `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` escape hatch for legacy callers | ✓ Plan line 12 vs 393 contradiction; Phase 3 baseline § 4-codec regression table |
S1, S2, S3, S5, S6 either confirm correctness of existing plan or are scope-aligned non-issues; no amendment needed. S4 (`uv_mode` memcpy omission) baked into Clause 9's amended text.
Without the Phase 5 review, iter4 Phase 6 would have:
- **Failed first compile attempt** (C3) — undefined `compressed_hdr_reference_mode` identifier
- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged on inter frames (interpolation_filter swapped, ref_deltas zeroed)
- **Then triggered user confusion when MPEG-2/VP8 didn't engage** (C4) — false expectation that `vainfo` would just work
Estimated saving: 1 Phase 6 compile failure + 1 Phase 7 → Phase 4 loopback for inter-frame byte mismatch + 1 documentation correction. Per memory rule "Reviews are never skippable."