iter4 Phase 2: situation analysis — VP9 backend gaps + compressed-

header parser requirement Source-read of every file the iter4 patch series will touch, plus kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip e1aca9c (iter3 close). Critical scope-shaping finding: rkvdec on RK3399 REQUIRES V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (not optional). Per drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble lines 752-754: ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR); if (WARN_ON(!ctrl)) return -EINVAL; VAAPI does NOT expose compressed-header probability updates (va_dec_vp9.h:50-192 — only frame parameters + segmentation; vendor VAAPI drivers parse compressed header in firmware/GPU). Therefore the libva backend MUST parse the compressed header itself via a VPX boolean decoder + inv_map_table[]. ~150-200 LOC of bitstream parsing logic (port from FFmpeg v4l2_request_vp9.c::fill_compressed_hdr). Bug enumeration (12 sites): B1 config.c::RequestQueryConfigProfiles enum block missing B2 config.c::RequestCreateConfig VP9 case missing B3 config.c::RequestQueryConfigEntrypoints VP9 case missing B4 src/vp9.c new file ~500-600 LOC B5 src/vp9.h new file ~35-45 LOC B6 src/vp9_rac.h NEW or inline (Phase 4 plan locks Option A: inline in vp9.c) B7 picture.c::codec_set_controls VP9 dispatch missing B8 picture.c::codec_store_buffer 2 buffer-type cases (Picture + Slice; NOT 4 like VP8) B9 picture.c::RequestBeginPicture predicted no reset needed (no flag-state like VP8 iqmatrix_set) B10 surface.h::object_surface::params union vp9 member missing B11 meson.build vp9.c/vp9.h not in lists B12 buffer.c predicted no change needed (VP9 uses Picture/Slice/SliceData — all whitelisted) Non-bugs (intentionally untouched): context.c (no DECODE_MODE/ START_CODE menus per FFmpeg ref), video.c (CAPTURE-side format list), v4l2.c (fourcc-agnostic), include/hevc-ctrls.h (already includes <linux/v4l2-controls.h>). Contract surface cited verbatim: V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c (~144 bytes — much smaller than VP8's 1232 bytes because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR) V4L2_CID_STATELESS_VP9_COMPRESSED_HDR = 0xa40b2d (~1947 bytes — coef[4][2][2][6][6][3] alone is 1728 bytes) Per-frame submission: 2 controls batched in single S_EXT_CTRLS v4l2_request_vp9.c references confirmed: 2-control shape, runtime-probed COMPRESSED_HDR availability (rkvdec advertises it; we MUST provide) VAAPI buffer types: 2 per frame (Picture + Slice) vs iter3 VP8's 4. NO Probability buffer (VP9 keeps probs in compressed header). NO IQMatrix (VP9 keeps quant in slice's per-segment seg_param[8]). VAAPI → V4L2 mapping table: 30+ fields enumerated. Several gap candidates identified for Phase 3 empirical resolution: Q1 lf.ref_deltas/mode_deltas/flags — not in VAAPI; FFmpeg reads from VP9Context internal. BBB likely zero. Q2 quant.base_q_idx + deltas — VAAPI exposes only effective per-segment scales. Inverse-derive needed. Q3 reference_mode — not in VAAPI. Default to SELECT? Q4 interpolation_filter mapping (FFmpeg ^ remap) Q5 reset_frame_context off-by-one (FFmpeg > 0 ? - 1 : 0) Q6 Per-segment feature_data[8][4] derivation from VAAPI's effective scales is non-trivial Q7 mpv 0.41.0 VP9 hwdec engagement (per memory feedback_hw_ decode_engagement_check.md — known gap from iter3 VP8) Q8 rkvdec dma_resv issue? (predicted NO based on iter1+iter2 successful mpv-DMA-BUF-GL on rkvdec) Patch-shape prediction: ~580-690 LOC across 5 modified + 2 new files (closer to iter2 HEVC's 470 than iter3 VP8's 370). Compressed- header parser is the dominant cost. Phase 3 baseline targets queued: cross-validator strace verbatim S_EXT_CTRLS payloads (both controls), VAAPI consumer trace, mpv- VP9-vaapi engagement check, rkvdec readback non-zero check. Phase 4 plan structure anticipated: 10-clause template per iter2/iter3, with new Clause 8 dedicated to compressed-header parser. Refs: phase0_findings_iter4.md (Phase 1 lock) phase8_iteration3_close.md (predecessor) references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c (V4L2 ref) references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c (VAAPI ref) /home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/ rkvdec-vp9.c (kernel driver — confirms COMPRESSED_HDR requirement at lines 752-754) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 05:20:07 +00:00
parent 9a71dbf4c3
commit 2651e4cfdf
1 changed files with 380 additions and 0 deletions
@@ -0,0 +1,380 @@
+# Iteration 4 — Phase 2 (situation analysis)
+
+Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close).
+
+This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.
+
+## Critical finding: rkvdec requires VP9_COMPRESSED_HDR
+
+The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754:
+
+```c
+ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
+if (WARN_ON(!ctrl))
+    return -EINVAL;
+dec_params = ctrl->p_cur.p;
+...
+ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
+if (WARN_ON(!ctrl))
+    return -EINVAL;       /* ← rkvdec WILL fail without compressed-header probs */
+prob_updates = ctrl->p_cur.p;
+vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
+...
+v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);
+```
+
+VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder.
+
+This shapes iter4's scope significantly larger than iter3 VP8.
+
+## Bug enumeration (sites the iter4 patch series must touch)
+
+### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing
+
+**Site**: `config.c:121-160`.
+
+**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME` → `VAProfileVP9Profile0`. Same starting condition as iter3 VP8.
+
+**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC.
+
+### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing
+
+**Site**: `config.c:54-78`.
+
+**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC.
+
+### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing
+
+**Site**: `config.c:167-191`.
+
+**Bug**: missing in fall-through case list. ~1 LOC.
+
+### B4 — `src/vp9.c` — file does not exist; needs net-new implementation
+
+**Site**: NEW FILE `src/vp9.c`.
+
+**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):
+
+- Includes block
+- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64`
+- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC
+- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC
+- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261)
+- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call
+
+### B5 — `src/vp9.h` — header does not exist
+
+**Site**: NEW FILE `src/vp9.h`.
+
+**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h.
+
+### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point)
+
+**Site**: NEW FILE candidate `src/vp9_rac.h`.
+
+VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options:
+
+- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut.
+- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows.
+
+**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B.
+
+### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing
+
+**Site**: `picture.c:188-225`.
+
+**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC.
+
+### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped
+
+VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`):
+
+| VAAPI buffer type | VAAPI struct | Per-frame |
+|---|---|---|
+| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once |
+| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once |
+| `VASliceDataBufferType` | raw bitstream | once |
+
+**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.
+
+**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total.
+
+### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset
+
+**Site**: `picture.c:299-302`.
+
+**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).
+
+**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.<some-flag>`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.
+
+### B10 — `src/surface.h::object_surface::params` union — no `vp9` member
+
+**Site**: `surface.h:92-119`.
+
+**Patch shape**: ADD `vp9` struct after `vp8`:
+
+```c
+struct {
+    VADecPictureParameterBufferVP9 picture;
+    VASliceParameterBufferVP9 slice;
+} vp9;
+```
+
+`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB).
+
+### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists
+
+**Site**: `meson.build:30-74`.
+
+**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines.
+
+### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed)
+
+**Site**: `buffer.c:59-70`.
+
+VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.**
+
+Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").
+
+### Non-bugs (intentionally NOT touched)
+
+- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.**
+- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.**
+- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.**
+- `include/hevc-ctrls.h` — already includes `<linux/v4l2-controls.h>` which holds VP9 control IDs.
+
+## Contract surface (verbatim)
+
+### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`<linux/v4l2-controls.h>:2696`)
+
+```c
+#define V4L2_CID_STATELESS_VP9_FRAME        (V4L2_CID_CODEC_STATELESS_BASE + 300)
+                                            /* = 0xa40b2c */
+
+struct v4l2_ctrl_vp9_frame {
+    struct v4l2_vp9_loop_filter lf;        /* 16 bytes; ref_deltas[4] + mode_deltas[2]
+                                              + level + sharpness + flags + reserved[7] */
+    struct v4l2_vp9_quantization quant;    /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
+    struct v4l2_vp9_segmentation seg;      /* 80 bytes; feature_data[8][4] + feature_enabled[8]
+                                              + tree_probs[7] + pred_probs[3] + flags + reserved[5] */
+    __u32 flags;                            /* 6 V4L2_VP9_FRAME_FLAG_* bits per
+                                              <linux/v4l2-controls.h>:2665-2674 */
+    __u16 compressed_header_size;
+    __u16 uncompressed_header_size;
+    __u16 frame_width_minus_1;
+    __u16 frame_height_minus_1;
+    __u16 render_width_minus_1;
+    __u16 render_height_minus_1;
+    __u64 last_frame_ts;                    /* per-VASurfaceID timestamp lookup */
+    __u64 golden_frame_ts;
+    __u64 alt_frame_ts;
+    __u8 ref_frame_sign_bias;               /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
+    __u8 reset_frame_context;               /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
+    __u8 frame_context_idx;
+    __u8 profile;
+    __u8 bit_depth;
+    __u8 interpolation_filter;
+    __u8 tile_cols_log2;
+    __u8 tile_rows_log2;
+    __u8 reference_mode;
+    __u8 reserved[7];
+};
+```
+
+Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).
+
+### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`<linux/v4l2-controls.h>:2797`)
+
+```c
+#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR  (V4L2_CID_CODEC_STATELESS_BASE + 301)
+                                              /* = 0xa40b2d */
+
+struct v4l2_ctrl_vp9_compressed_hdr {
+    __u8 tx_mode;                          /* V4L2_VP9_TX_MODE_* (0..4) */
+    __u8 tx8[2][1];
+    __u8 tx16[2][2];
+    __u8 tx32[2][3];
+    __u8 coef[4][2][2][6][6][3];           /* HUGE: 1728 bytes */
+    __u8 skip[3];
+    __u8 inter_mode[7][3];
+    __u8 interp_filter[4][2];
+    __u8 is_inter[4];
+    __u8 comp_mode[5];
+    __u8 single_ref[5][2];
+    __u8 comp_ref[5];
+    __u8 y_mode[4][9];
+    __u8 uv_mode[10][9];
+    __u8 partition[16][3];
+    struct v4l2_vp9_mv_probs mv;           /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
+};
+```
+
+Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`).
+
+The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`.
+
+### VAAPI buffer types
+
+`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`):
+- `frame_width`, `frame_height` (u16)
+- `reference_frames[8]` — 8-entry DPB (vs VP8's 3)
+- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
+- `filter_level`, `sharpness_level` (u8)
+- `log2_tile_rows`, `log2_tile_columns` (u8)
+- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
+- `first_partition_size` — compressed_header_size (u16)
+- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8)
+- `profile`, `bit_depth` (u8)
+
+`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`):
+- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32)
+- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each):
+  - `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed)
+  - `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels
+  - `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment
+
+### FFmpeg V4L2 reference (`v4l2_request_vp9.c`)
+
+Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`:
+
+```c
+control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
+control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
+v4l2_set_controls(..., control, 2);
+```
+
+The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided).
+
+### Kernel rkvdec driver (`rkvdec-vp9.c`)
+
+Key reads in `rkvdec_vp9_run_preamble`:
+- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`.
+- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`.
+- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).
+
+## Mapping table (VAAPI → V4L2 / kernel)
+
+The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.
+
+### `v4l2_ctrl_vp9_frame` mapping
+
+| Kernel field | VAAPI source | Notes |
+|---|---|---|
+| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? |
+| `lf.mode_deltas[2]` | NOT in VAAPI | same |
+| `lf.level` | `picture->filter_level` | direct |
+| `lf.sharpness` | `picture->sharpness_level` | direct |
+| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto |
+| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? |
+| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales |
+| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial |
+| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial |
+| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct |
+| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct |
+| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct |
+| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe |
+| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct |
+| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct |
+| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct |
+| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct |
+| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct |
+| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct |
+| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct |
+| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) |
+| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) |
+| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct |
+| `frame_width_minus_1` | `picture->frame_width - 1` | direct |
+| `frame_height_minus_1` | `picture->frame_height - 1` | direct |
+| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) |
+| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]` → `surface_object->timestamp` → `v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB |
+| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same |
+| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same |
+| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct |
+| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection |
+| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct |
+| `profile` | `picture->profile` | direct |
+| `bit_depth` | `picture->bit_depth` | direct |
+| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection |
+| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct |
+| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers |
+
+### `v4l2_ctrl_vp9_compressed_hdr` mapping
+
+This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability.
+
+The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed:
+
+- `tx_mode` (2 bits, then conditional 1 bit)
+- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
+- Coef probability updates (4-level nested loop with branch probs)
+- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
+- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)
+
+Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.
+
+## Patch shape prediction
+
+| Site | Action | LOC delta |
+|---|---|---|
+| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 |
+| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 |
+| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 |
+| `src/vp9.c` | NEW FILE | +500-600 |
+| `src/vp9.h` | NEW FILE | +35-45 |
+| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 |
+| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 |
+| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 |
+| `src/surface.h:92-119` | INSERT vp9 struct | +6 |
+| `src/meson.build:50,73` | INSERT 2 entries | +2 |
+
+**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.
+
+Predicted commits:
+- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1).
+- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser).
+- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3).
+- **Commit D**: optional fix-forward placeholder.
+
+## Open questions for Phase 3 baseline
+
+1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
+2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
+3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB.
+4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
+5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
+6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
+7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not.
+8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.
+
+## Phase 3 baseline targets (work plan)
+
+1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout).
+2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
+3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor).
+4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
+5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4.
+
+## Phase 4 plan structure (anticipated)
+
+Following iter2/iter3's clause template:
+
+- Clause 1: Submission shape — 2 controls batched per frame
+- Clause 2: Local struct alloc + zero-init (memset both)
+- Clause 3: Frame geometry + scalars + flags
+- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
+- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
+- Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
+- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
+- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table
+- Clause 9: Final 2-control batched submission
+- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length
+
+The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code.
+
+## Substrate state at Phase 2 close
+
+- iter4 Phase 1 commit `9a71dbf` pushed to gitea.
+- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`).
+- All Phase 3 prerequisites identified.
+- Memory rules apply unchanged.
+- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).