fresnel-fourier/phase2_iter4_situation.md

# Iteration 4 — Phase 2 (situation analysis)

Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close).

This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.

## Critical finding: rkvdec requires VP9_COMPRESSED_HDR

The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754:

```c
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
if (WARN_ON(!ctrl))
    return -EINVAL;
dec_params = ctrl->p_cur.p;
...
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
if (WARN_ON(!ctrl))
    return -EINVAL;       /* ← rkvdec WILL fail without compressed-header probs */
prob_updates = ctrl->p_cur.p;
vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
...
v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);
```

VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder.

This shapes iter4's scope significantly larger than iter3 VP8.

## Bug enumeration (sites the iter4 patch series must touch)

### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing

**Site**: `config.c:121-160`.

**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME` → `VAProfileVP9Profile0`. Same starting condition as iter3 VP8.

**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC.

### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing

**Site**: `config.c:54-78`.

**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC.

### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing

**Site**: `config.c:167-191`.

**Bug**: missing in fall-through case list. ~1 LOC.

### B4 — `src/vp9.c` — file does not exist; needs net-new implementation

**Site**: NEW FILE `src/vp9.c`.

**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):

- Includes block
- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64`
- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC
- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC
- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261)
- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call

### B5 — `src/vp9.h` — header does not exist

**Site**: NEW FILE `src/vp9.h`.

**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h.

### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point)

**Site**: NEW FILE candidate `src/vp9_rac.h`.

VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options:

- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut.
- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows.

**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B.

### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing

**Site**: `picture.c:188-225`.

**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC.

### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped

VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`):

| VAAPI buffer type | VAAPI struct | Per-frame |
|---|---|---|
| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once |
| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once |
| `VASliceDataBufferType` | raw bitstream | once |

**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.

**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total.

### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset

**Site**: `picture.c:299-302`.

**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).

**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.<some-flag>`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.

### B10 — `src/surface.h::object_surface::params` union — no `vp9` member

**Site**: `surface.h:92-119`.

**Patch shape**: ADD `vp9` struct after `vp8`:

```c
struct {
    VADecPictureParameterBufferVP9 picture;
    VASliceParameterBufferVP9 slice;
} vp9;
```

`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB).

### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists

**Site**: `meson.build:30-74`.

**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines.

### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed)

**Site**: `buffer.c:59-70`.

VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.**

Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").

### Non-bugs (intentionally NOT touched)

- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.**
- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.**
- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.**
- `include/hevc-ctrls.h` — already includes `<linux/v4l2-controls.h>` which holds VP9 control IDs.

## Contract surface (verbatim)

### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`<linux/v4l2-controls.h>:2696`)

```c
#define V4L2_CID_STATELESS_VP9_FRAME        (V4L2_CID_CODEC_STATELESS_BASE + 300)
                                            /* = 0xa40b2c */

struct v4l2_ctrl_vp9_frame {
    struct v4l2_vp9_loop_filter lf;        /* 16 bytes; ref_deltas[4] + mode_deltas[2]
                                              + level + sharpness + flags + reserved[7] */
    struct v4l2_vp9_quantization quant;    /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
    struct v4l2_vp9_segmentation seg;      /* 80 bytes; feature_data[8][4] + feature_enabled[8]
                                              + tree_probs[7] + pred_probs[3] + flags + reserved[5] */
    __u32 flags;                            /* 6 V4L2_VP9_FRAME_FLAG_* bits per
                                              <linux/v4l2-controls.h>:2665-2674 */
    __u16 compressed_header_size;
    __u16 uncompressed_header_size;
    __u16 frame_width_minus_1;
    __u16 frame_height_minus_1;
    __u16 render_width_minus_1;
    __u16 render_height_minus_1;
    __u64 last_frame_ts;                    /* per-VASurfaceID timestamp lookup */
    __u64 golden_frame_ts;
    __u64 alt_frame_ts;
    __u8 ref_frame_sign_bias;               /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
    __u8 reset_frame_context;               /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
    __u8 frame_context_idx;
    __u8 profile;
    __u8 bit_depth;
    __u8 interpolation_filter;
    __u8 tile_cols_log2;
    __u8 tile_rows_log2;
    __u8 reference_mode;
    __u8 reserved[7];
};
```

Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).

### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`<linux/v4l2-controls.h>:2797`)

```c
#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR  (V4L2_CID_CODEC_STATELESS_BASE + 301)
                                              /* = 0xa40b2d */

struct v4l2_ctrl_vp9_compressed_hdr {
    __u8 tx_mode;                          /* V4L2_VP9_TX_MODE_* (0..4) */
    __u8 tx8[2][1];
    __u8 tx16[2][2];
    __u8 tx32[2][3];
    __u8 coef[4][2][2][6][6][3];           /* HUGE: 1728 bytes */
    __u8 skip[3];
    __u8 inter_mode[7][3];
    __u8 interp_filter[4][2];
    __u8 is_inter[4];
    __u8 comp_mode[5];
    __u8 single_ref[5][2];
    __u8 comp_ref[5];
    __u8 y_mode[4][9];
    __u8 uv_mode[10][9];
    __u8 partition[16][3];
    struct v4l2_vp9_mv_probs mv;           /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
};
```

Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`).

The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`.

### VAAPI buffer types

`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`):
- `frame_width`, `frame_height` (u16)
- `reference_frames[8]` — 8-entry DPB (vs VP8's 3)
- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
- `filter_level`, `sharpness_level` (u8)
- `log2_tile_rows`, `log2_tile_columns` (u8)
- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
- `first_partition_size` — compressed_header_size (u16)
- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8)
- `profile`, `bit_depth` (u8)

`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`):
- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32)
- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each):
  - `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed)
  - `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels
  - `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment

### FFmpeg V4L2 reference (`v4l2_request_vp9.c`)

Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`:

```c
control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
v4l2_set_controls(..., control, 2);
```

The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided).

### Kernel rkvdec driver (`rkvdec-vp9.c`)

Key reads in `rkvdec_vp9_run_preamble`:
- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`.
- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`.
- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).

## Mapping table (VAAPI → V4L2 / kernel)

The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.

### `v4l2_ctrl_vp9_frame` mapping

| Kernel field | VAAPI source | Notes |
|---|---|---|
| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? |
| `lf.mode_deltas[2]` | NOT in VAAPI | same |
| `lf.level` | `picture->filter_level` | direct |
| `lf.sharpness` | `picture->sharpness_level` | direct |
| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto |
| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? |
| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales |
| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial |
| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial |
| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct |
| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct |
| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct |
| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe |
| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct |
| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct |
| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct |
| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct |
| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct |
| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct |
| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct |
| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) |
| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) |
| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct |
| `frame_width_minus_1` | `picture->frame_width - 1` | direct |
| `frame_height_minus_1` | `picture->frame_height - 1` | direct |
| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) |
| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]` → `surface_object->timestamp` → `v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB |
| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same |
| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same |
| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct |
| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection |
| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct |
| `profile` | `picture->profile` | direct |
| `bit_depth` | `picture->bit_depth` | direct |
| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection |
| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct |
| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers |

### `v4l2_ctrl_vp9_compressed_hdr` mapping

This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability.

The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed:

- `tx_mode` (2 bits, then conditional 1 bit)
- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
- Coef probability updates (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)

Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.

## Patch shape prediction

| Site | Action | LOC delta |
|---|---|---|
| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 |
| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 |
| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 |
| `src/vp9.c` | NEW FILE | +500-600 |
| `src/vp9.h` | NEW FILE | +35-45 |
| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 |
| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 |
| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 |
| `src/surface.h:92-119` | INSERT vp9 struct | +6 |
| `src/meson.build:50,73` | INSERT 2 entries | +2 |

**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.

Predicted commits:
- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1).
- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser).
- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3).
- **Commit D**: optional fix-forward placeholder.

## Open questions for Phase 3 baseline

1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB.
4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not.
8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.

## Phase 3 baseline targets (work plan)

1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout).
2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor).
4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4.

## Phase 4 plan structure (anticipated)

Following iter2/iter3's clause template:

- Clause 1: Submission shape — 2 controls batched per frame
- Clause 2: Local struct alloc + zero-init (memset both)
- Clause 3: Frame geometry + scalars + flags
- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
- Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table
- Clause 9: Final 2-control batched submission
- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length

The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code.

## Substrate state at Phase 2 close

- iter4 Phase 1 commit `9a71dbf` pushed to gitea.
- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`).
- All Phase 3 prerequisites identified.
- Memory rules apply unchanged.
- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).