Files
fresnel-fourier/phase2_iter4_situation.md
T
claude-noether 2651e4cfdf iter4 Phase 2: situation analysis — VP9 backend gaps + compressed-
header parser requirement

Source-read of every file the iter4 patch series will touch, plus
kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference
sources. Conducted on noether against fork tip e1aca9c (iter3 close).

Critical scope-shaping finding: rkvdec on RK3399 REQUIRES
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (not optional). Per
drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble
lines 752-754:

  ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
                        V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
  if (WARN_ON(!ctrl))
      return -EINVAL;

VAAPI does NOT expose compressed-header probability updates
(va_dec_vp9.h:50-192 — only frame parameters + segmentation;
vendor VAAPI drivers parse compressed header in firmware/GPU).
Therefore the libva backend MUST parse the compressed header
itself via a VPX boolean decoder + inv_map_table[]. ~150-200 LOC
of bitstream parsing logic (port from FFmpeg
v4l2_request_vp9.c::fill_compressed_hdr).

Bug enumeration (12 sites):

  B1   config.c::RequestQueryConfigProfiles    enum block missing
  B2   config.c::RequestCreateConfig           VP9 case missing
  B3   config.c::RequestQueryConfigEntrypoints VP9 case missing
  B4   src/vp9.c                               new file ~500-600 LOC
  B5   src/vp9.h                               new file ~35-45 LOC
  B6   src/vp9_rac.h                           NEW or inline (Phase 4
                                                 plan locks Option A:
                                                 inline in vp9.c)
  B7   picture.c::codec_set_controls           VP9 dispatch missing
  B8   picture.c::codec_store_buffer           2 buffer-type cases
                                                 (Picture + Slice;
                                                 NOT 4 like VP8)
  B9   picture.c::RequestBeginPicture          predicted no reset
                                                 needed (no flag-state
                                                 like VP8 iqmatrix_set)
  B10  surface.h::object_surface::params union vp9 member missing
  B11  meson.build                             vp9.c/vp9.h not in lists
  B12  buffer.c                                predicted no change
                                                 needed (VP9 uses
                                                 Picture/Slice/SliceData
                                                 — all whitelisted)

Non-bugs (intentionally untouched): context.c (no DECODE_MODE/
START_CODE menus per FFmpeg ref), video.c (CAPTURE-side format
list), v4l2.c (fourcc-agnostic), include/hevc-ctrls.h (already
includes <linux/v4l2-controls.h>).

Contract surface cited verbatim:

  V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c (~144 bytes — much
    smaller than VP8's 1232 bytes because VP9_FRAME carries no
    entropy table; that's in COMPRESSED_HDR)
  V4L2_CID_STATELESS_VP9_COMPRESSED_HDR = 0xa40b2d (~1947 bytes
    — coef[4][2][2][6][6][3] alone is 1728 bytes)
  Per-frame submission: 2 controls batched in single S_EXT_CTRLS
  v4l2_request_vp9.c references confirmed: 2-control shape,
    runtime-probed COMPRESSED_HDR availability (rkvdec advertises
    it; we MUST provide)

VAAPI buffer types: 2 per frame (Picture + Slice) vs iter3 VP8's
4. NO Probability buffer (VP9 keeps probs in compressed header).
NO IQMatrix (VP9 keeps quant in slice's per-segment seg_param[8]).

VAAPI → V4L2 mapping table: 30+ fields enumerated. Several gap
candidates identified for Phase 3 empirical resolution:

  Q1 lf.ref_deltas/mode_deltas/flags — not in VAAPI; FFmpeg reads
     from VP9Context internal. BBB likely zero.
  Q2 quant.base_q_idx + deltas — VAAPI exposes only effective
     per-segment scales. Inverse-derive needed.
  Q3 reference_mode — not in VAAPI. Default to SELECT?
  Q4 interpolation_filter mapping (FFmpeg ^ remap)
  Q5 reset_frame_context off-by-one (FFmpeg > 0 ? - 1 : 0)
  Q6 Per-segment feature_data[8][4] derivation from VAAPI's
     effective scales is non-trivial
  Q7 mpv 0.41.0 VP9 hwdec engagement (per memory feedback_hw_
     decode_engagement_check.md — known gap from iter3 VP8)
  Q8 rkvdec dma_resv issue? (predicted NO based on iter1+iter2
     successful mpv-DMA-BUF-GL on rkvdec)

Patch-shape prediction: ~580-690 LOC across 5 modified + 2 new
files (closer to iter2 HEVC's 470 than iter3 VP8's 370). Compressed-
header parser is the dominant cost.

Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payloads (both controls), VAAPI consumer trace, mpv-
VP9-vaapi engagement check, rkvdec readback non-zero check.

Phase 4 plan structure anticipated: 10-clause template per
iter2/iter3, with new Clause 8 dedicated to compressed-header
parser.

Refs:
  phase0_findings_iter4.md (Phase 1 lock)
  phase8_iteration3_close.md (predecessor)
  references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c (V4L2 ref)
  references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c (VAAPI ref)
  /home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/
    rkvdec-vp9.c (kernel driver — confirms COMPRESSED_HDR
    requirement at lines 752-754)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 05:20:07 +00:00

381 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 4 — Phase 2 (situation analysis)
Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close).
This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.
## Critical finding: rkvdec requires VP9_COMPRESSED_HDR
The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754:
```c
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
if (WARN_ON(!ctrl))
return -EINVAL;
dec_params = ctrl->p_cur.p;
...
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
if (WARN_ON(!ctrl))
return -EINVAL; /* ← rkvdec WILL fail without compressed-header probs */
prob_updates = ctrl->p_cur.p;
vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
...
v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);
```
VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder.
This shapes iter4's scope significantly larger than iter3 VP8.
## Bug enumeration (sites the iter4 patch series must touch)
### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing
**Site**: `config.c:121-160`.
**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME``VAProfileVP9Profile0`. Same starting condition as iter3 VP8.
**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC.
### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing
**Site**: `config.c:54-78`.
**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC.
### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing
**Site**: `config.c:167-191`.
**Bug**: missing in fall-through case list. ~1 LOC.
### B4 — `src/vp9.c` — file does not exist; needs net-new implementation
**Site**: NEW FILE `src/vp9.c`.
**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):
- Includes block
- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64`
- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC
- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC
- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261)
- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call
### B5 — `src/vp9.h` — header does not exist
**Site**: NEW FILE `src/vp9.h`.
**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h.
### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point)
**Site**: NEW FILE candidate `src/vp9_rac.h`.
VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options:
- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut.
- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows.
**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B.
### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing
**Site**: `picture.c:188-225`.
**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC.
### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped
VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`):
| VAAPI buffer type | VAAPI struct | Per-frame |
|---|---|---|
| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once |
| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once |
| `VASliceDataBufferType` | raw bitstream | once |
**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.
**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total.
### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset
**Site**: `picture.c:299-302`.
**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).
**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.<some-flag>`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.
### B10 — `src/surface.h::object_surface::params` union — no `vp9` member
**Site**: `surface.h:92-119`.
**Patch shape**: ADD `vp9` struct after `vp8`:
```c
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
```
`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB).
### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists
**Site**: `meson.build:30-74`.
**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines.
### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed)
**Site**: `buffer.c:59-70`.
VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.**
Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").
### Non-bugs (intentionally NOT touched)
- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.**
- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.**
- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.**
- `include/hevc-ctrls.h` — already includes `<linux/v4l2-controls.h>` which holds VP9 control IDs.
## Contract surface (verbatim)
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`<linux/v4l2-controls.h>:2696`)
```c
#define V4L2_CID_STATELESS_VP9_FRAME (V4L2_CID_CODEC_STATELESS_BASE + 300)
/* = 0xa40b2c */
struct v4l2_ctrl_vp9_frame {
struct v4l2_vp9_loop_filter lf; /* 16 bytes; ref_deltas[4] + mode_deltas[2]
+ level + sharpness + flags + reserved[7] */
struct v4l2_vp9_quantization quant; /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
struct v4l2_vp9_segmentation seg; /* 80 bytes; feature_data[8][4] + feature_enabled[8]
+ tree_probs[7] + pred_probs[3] + flags + reserved[5] */
__u32 flags; /* 6 V4L2_VP9_FRAME_FLAG_* bits per
<linux/v4l2-controls.h>:2665-2674 */
__u16 compressed_header_size;
__u16 uncompressed_header_size;
__u16 frame_width_minus_1;
__u16 frame_height_minus_1;
__u16 render_width_minus_1;
__u16 render_height_minus_1;
__u64 last_frame_ts; /* per-VASurfaceID timestamp lookup */
__u64 golden_frame_ts;
__u64 alt_frame_ts;
__u8 ref_frame_sign_bias; /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
__u8 reset_frame_context; /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
__u8 frame_context_idx;
__u8 profile;
__u8 bit_depth;
__u8 interpolation_filter;
__u8 tile_cols_log2;
__u8 tile_rows_log2;
__u8 reference_mode;
__u8 reserved[7];
};
```
Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`<linux/v4l2-controls.h>:2797`)
```c
#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (V4L2_CID_CODEC_STATELESS_BASE + 301)
/* = 0xa40b2d */
struct v4l2_ctrl_vp9_compressed_hdr {
__u8 tx_mode; /* V4L2_VP9_TX_MODE_* (0..4) */
__u8 tx8[2][1];
__u8 tx16[2][2];
__u8 tx32[2][3];
__u8 coef[4][2][2][6][6][3]; /* HUGE: 1728 bytes */
__u8 skip[3];
__u8 inter_mode[7][3];
__u8 interp_filter[4][2];
__u8 is_inter[4];
__u8 comp_mode[5];
__u8 single_ref[5][2];
__u8 comp_ref[5];
__u8 y_mode[4][9];
__u8 uv_mode[10][9];
__u8 partition[16][3];
struct v4l2_vp9_mv_probs mv; /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
};
```
Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`).
The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`.
### VAAPI buffer types
`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`):
- `frame_width`, `frame_height` (u16)
- `reference_frames[8]` — 8-entry DPB (vs VP8's 3)
- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
- `filter_level`, `sharpness_level` (u8)
- `log2_tile_rows`, `log2_tile_columns` (u8)
- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
- `first_partition_size` — compressed_header_size (u16)
- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8)
- `profile`, `bit_depth` (u8)
`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`):
- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32)
- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each):
- `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed)
- `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels
- `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment
### FFmpeg V4L2 reference (`v4l2_request_vp9.c`)
Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`:
```c
control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
v4l2_set_controls(..., control, 2);
```
The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided).
### Kernel rkvdec driver (`rkvdec-vp9.c`)
Key reads in `rkvdec_vp9_run_preamble`:
- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`.
- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`.
- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).
## Mapping table (VAAPI → V4L2 / kernel)
The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.
### `v4l2_ctrl_vp9_frame` mapping
| Kernel field | VAAPI source | Notes |
|---|---|---|
| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? |
| `lf.mode_deltas[2]` | NOT in VAAPI | same |
| `lf.level` | `picture->filter_level` | direct |
| `lf.sharpness` | `picture->sharpness_level` | direct |
| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto |
| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? |
| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales |
| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial |
| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial |
| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct |
| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct |
| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct |
| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe |
| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct |
| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct |
| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct |
| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct |
| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct |
| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct |
| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct |
| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) |
| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) |
| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct |
| `frame_width_minus_1` | `picture->frame_width - 1` | direct |
| `frame_height_minus_1` | `picture->frame_height - 1` | direct |
| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) |
| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]``surface_object->timestamp``v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB |
| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same |
| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same |
| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct |
| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection |
| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct |
| `profile` | `picture->profile` | direct |
| `bit_depth` | `picture->bit_depth` | direct |
| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection |
| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct |
| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers |
### `v4l2_ctrl_vp9_compressed_hdr` mapping
This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability.
The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed:
- `tx_mode` (2 bits, then conditional 1 bit)
- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
- Coef probability updates (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)
Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.
## Patch shape prediction
| Site | Action | LOC delta |
|---|---|---|
| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 |
| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 |
| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 |
| `src/vp9.c` | NEW FILE | +500-600 |
| `src/vp9.h` | NEW FILE | +35-45 |
| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 |
| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 |
| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 |
| `src/surface.h:92-119` | INSERT vp9 struct | +6 |
| `src/meson.build:50,73` | INSERT 2 entries | +2 |
**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.
Predicted commits:
- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1).
- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser).
- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3).
- **Commit D**: optional fix-forward placeholder.
## Open questions for Phase 3 baseline
1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB.
4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not.
8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.
## Phase 3 baseline targets (work plan)
1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout).
2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor).
4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4.
## Phase 4 plan structure (anticipated)
Following iter2/iter3's clause template:
- Clause 1: Submission shape — 2 controls batched per frame
- Clause 2: Local struct alloc + zero-init (memset both)
- Clause 3: Frame geometry + scalars + flags
- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
- Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table
- Clause 9: Final 2-control batched submission
- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length
The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code.
## Substrate state at Phase 2 close
- iter4 Phase 1 commit `9a71dbf` pushed to gitea.
- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`).
- All Phase 3 prerequisites identified.
- Memory rules apply unchanged.
- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).