Files
fresnel-fourier/phase2_iter4_situation.md
T
marfrit 56abe3d6a2 iter4 Phase 3: VP9 baseline + 4-codec regression on 7.0 substrate
Captured on linux-fresnel-fourier 7.0-1 (post 6.19 decommission).

VP9 baseline (kernel-direct via ffmpeg-v4l2request on rkvdec):
- 5-frame SW reference PNG SHA256 anchors (criterion-4)
- VIDIOC_S_EXT_CTRLS strace with full payload at -s 16384
- Empirical struct sizes 168 B (FRAME) / 2040 B (COMPRESSED_HDR)
  supersede Phase 2 estimates of 144 / 1947
- Probe pattern: count=1 (FRAME-only) then count=2 (FRAME + COMPRESSED_HDR)

Phase 2 doc fix: control IDs corrected 0xa40b2c/d -> 0xa40a2c/d.

4-codec regression (H.264, MPEG-2, HEVC, VP8): all fall back to SW on
default config because /dev/video0 is now rockchip-rga (RGB color
converter), not a codec device. Fork hardcodes /dev/video0 in
request.c:149. Env override LIBVA_V4L2_REQUEST_VIDEO_PATH /
_MEDIA_PATH restores per-driver profile enumeration; mitigation A/B/C
queued for user decision.

New contract clauses surfaced:
- Clause 11: uncompressed-header partial parse for lf_delta /
  base_q_idx (VAAPI doesn't expose these; keyframe ref_deltas non-zero
  for BBB so leave-at-zero is wrong)
- Clause 12: compile-time sizeof asserts on the two control structs
  so future UAPI shifts fail loudly

iter4_phase3.tgz: full Phase 3 artifact bundle (strace + PNG refs).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 20:31:53 +00:00

381 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 4 — Phase 2 (situation analysis)
Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close).
This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.
## Critical finding: rkvdec requires VP9_COMPRESSED_HDR
The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754:
```c
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
if (WARN_ON(!ctrl))
return -EINVAL;
dec_params = ctrl->p_cur.p;
...
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
if (WARN_ON(!ctrl))
return -EINVAL; /* ← rkvdec WILL fail without compressed-header probs */
prob_updates = ctrl->p_cur.p;
vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
...
v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);
```
VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder.
This shapes iter4's scope significantly larger than iter3 VP8.
## Bug enumeration (sites the iter4 patch series must touch)
### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing
**Site**: `config.c:121-160`.
**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME``VAProfileVP9Profile0`. Same starting condition as iter3 VP8.
**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC.
### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing
**Site**: `config.c:54-78`.
**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC.
### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing
**Site**: `config.c:167-191`.
**Bug**: missing in fall-through case list. ~1 LOC.
### B4 — `src/vp9.c` — file does not exist; needs net-new implementation
**Site**: NEW FILE `src/vp9.c`.
**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):
- Includes block
- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64`
- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC
- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC
- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261)
- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call
### B5 — `src/vp9.h` — header does not exist
**Site**: NEW FILE `src/vp9.h`.
**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h.
### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point)
**Site**: NEW FILE candidate `src/vp9_rac.h`.
VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options:
- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut.
- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows.
**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B.
### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing
**Site**: `picture.c:188-225`.
**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC.
### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped
VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`):
| VAAPI buffer type | VAAPI struct | Per-frame |
|---|---|---|
| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once |
| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once |
| `VASliceDataBufferType` | raw bitstream | once |
**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.
**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total.
### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset
**Site**: `picture.c:299-302`.
**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).
**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.<some-flag>`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.
### B10 — `src/surface.h::object_surface::params` union — no `vp9` member
**Site**: `surface.h:92-119`.
**Patch shape**: ADD `vp9` struct after `vp8`:
```c
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
```
`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB).
### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists
**Site**: `meson.build:30-74`.
**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines.
### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed)
**Site**: `buffer.c:59-70`.
VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.**
Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").
### Non-bugs (intentionally NOT touched)
- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.**
- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.**
- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.**
- `include/hevc-ctrls.h` — already includes `<linux/v4l2-controls.h>` which holds VP9 control IDs.
## Contract surface (verbatim)
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`<linux/v4l2-controls.h>:2696`)
```c
#define V4L2_CID_STATELESS_VP9_FRAME (V4L2_CID_CODEC_STATELESS_BASE + 300)
/* = 0xa40a2c */
struct v4l2_ctrl_vp9_frame {
struct v4l2_vp9_loop_filter lf; /* 16 bytes; ref_deltas[4] + mode_deltas[2]
+ level + sharpness + flags + reserved[7] */
struct v4l2_vp9_quantization quant; /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
struct v4l2_vp9_segmentation seg; /* 80 bytes; feature_data[8][4] + feature_enabled[8]
+ tree_probs[7] + pred_probs[3] + flags + reserved[5] */
__u32 flags; /* 6 V4L2_VP9_FRAME_FLAG_* bits per
<linux/v4l2-controls.h>:2665-2674 */
__u16 compressed_header_size;
__u16 uncompressed_header_size;
__u16 frame_width_minus_1;
__u16 frame_height_minus_1;
__u16 render_width_minus_1;
__u16 render_height_minus_1;
__u64 last_frame_ts; /* per-VASurfaceID timestamp lookup */
__u64 golden_frame_ts;
__u64 alt_frame_ts;
__u8 ref_frame_sign_bias; /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
__u8 reset_frame_context; /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
__u8 frame_context_idx;
__u8 profile;
__u8 bit_depth;
__u8 interpolation_filter;
__u8 tile_cols_log2;
__u8 tile_rows_log2;
__u8 reference_mode;
__u8 reserved[7];
};
```
Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`<linux/v4l2-controls.h>:2797`)
```c
#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (V4L2_CID_CODEC_STATELESS_BASE + 301)
/* = 0xa40a2d */
struct v4l2_ctrl_vp9_compressed_hdr {
__u8 tx_mode; /* V4L2_VP9_TX_MODE_* (0..4) */
__u8 tx8[2][1];
__u8 tx16[2][2];
__u8 tx32[2][3];
__u8 coef[4][2][2][6][6][3]; /* HUGE: 1728 bytes */
__u8 skip[3];
__u8 inter_mode[7][3];
__u8 interp_filter[4][2];
__u8 is_inter[4];
__u8 comp_mode[5];
__u8 single_ref[5][2];
__u8 comp_ref[5];
__u8 y_mode[4][9];
__u8 uv_mode[10][9];
__u8 partition[16][3];
struct v4l2_vp9_mv_probs mv; /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
};
```
Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`).
The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`.
### VAAPI buffer types
`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`):
- `frame_width`, `frame_height` (u16)
- `reference_frames[8]` — 8-entry DPB (vs VP8's 3)
- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
- `filter_level`, `sharpness_level` (u8)
- `log2_tile_rows`, `log2_tile_columns` (u8)
- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
- `first_partition_size` — compressed_header_size (u16)
- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8)
- `profile`, `bit_depth` (u8)
`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`):
- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32)
- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each):
- `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed)
- `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels
- `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment
### FFmpeg V4L2 reference (`v4l2_request_vp9.c`)
Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`:
```c
control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
v4l2_set_controls(..., control, 2);
```
The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided).
### Kernel rkvdec driver (`rkvdec-vp9.c`)
Key reads in `rkvdec_vp9_run_preamble`:
- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`.
- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`.
- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).
## Mapping table (VAAPI → V4L2 / kernel)
The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.
### `v4l2_ctrl_vp9_frame` mapping
| Kernel field | VAAPI source | Notes |
|---|---|---|
| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? |
| `lf.mode_deltas[2]` | NOT in VAAPI | same |
| `lf.level` | `picture->filter_level` | direct |
| `lf.sharpness` | `picture->sharpness_level` | direct |
| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto |
| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? |
| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales |
| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial |
| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial |
| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct |
| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct |
| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct |
| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe |
| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct |
| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct |
| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct |
| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct |
| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct |
| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct |
| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct |
| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) |
| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) |
| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct |
| `frame_width_minus_1` | `picture->frame_width - 1` | direct |
| `frame_height_minus_1` | `picture->frame_height - 1` | direct |
| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) |
| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]``surface_object->timestamp``v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB |
| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same |
| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same |
| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct |
| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection |
| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct |
| `profile` | `picture->profile` | direct |
| `bit_depth` | `picture->bit_depth` | direct |
| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection |
| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct |
| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers |
### `v4l2_ctrl_vp9_compressed_hdr` mapping
This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability.
The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed:
- `tx_mode` (2 bits, then conditional 1 bit)
- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
- Coef probability updates (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)
Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.
## Patch shape prediction
| Site | Action | LOC delta |
|---|---|---|
| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 |
| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 |
| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 |
| `src/vp9.c` | NEW FILE | +500-600 |
| `src/vp9.h` | NEW FILE | +35-45 |
| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 |
| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 |
| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 |
| `src/surface.h:92-119` | INSERT vp9 struct | +6 |
| `src/meson.build:50,73` | INSERT 2 entries | +2 |
**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.
Predicted commits:
- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1).
- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser).
- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3).
- **Commit D**: optional fix-forward placeholder.
## Open questions for Phase 3 baseline
1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB.
4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not.
8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.
## Phase 3 baseline targets (work plan)
1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout).
2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor).
4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4.
## Phase 4 plan structure (anticipated)
Following iter2/iter3's clause template:
- Clause 1: Submission shape — 2 controls batched per frame
- Clause 2: Local struct alloc + zero-init (memset both)
- Clause 3: Frame geometry + scalars + flags
- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
- Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table
- Clause 9: Final 2-control batched submission
- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length
The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code.
## Substrate state at Phase 2 close
- iter4 Phase 1 commit `9a71dbf` pushed to gitea.
- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`).
- All Phase 3 prerequisites identified.
- Memory rules apply unchanged.
- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).