diff --git a/phase2_iter4_situation.md b/phase2_iter4_situation.md new file mode 100644 index 0000000..e5165a3 --- /dev/null +++ b/phase2_iter4_situation.md @@ -0,0 +1,380 @@ +# Iteration 4 — Phase 2 (situation analysis) + +Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close). + +This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions. + +## Critical finding: rkvdec requires VP9_COMPRESSED_HDR + +The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754: + +```c +ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME); +if (WARN_ON(!ctrl)) + return -EINVAL; +dec_params = ctrl->p_cur.p; +... +ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR); +if (WARN_ON(!ctrl)) + return -EINVAL; /* ← rkvdec WILL fail without compressed-header probs */ +prob_updates = ctrl->p_cur.p; +vp9_ctx->cur.tx_mode = prob_updates->tx_mode; +... +v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params); +``` + +VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder. + +This shapes iter4's scope significantly larger than iter3 VP8. + +## Bug enumeration (sites the iter4 patch series must touch) + +### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing + +**Site**: `config.c:121-160`. + +**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME` → `VAProfileVP9Profile0`. Same starting condition as iter3 VP8. + +**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC. + +### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing + +**Site**: `config.c:54-78`. + +**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC. + +### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing + +**Site**: `config.c:167-191`. + +**Bug**: missing in fall-through case list. ~1 LOC. + +### B4 — `src/vp9.c` — file does not exist; needs net-new implementation + +**Site**: NEW FILE `src/vp9.c`. + +**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser): + +- Includes block +- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64` +- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC +- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC +- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261) +- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call + +### B5 — `src/vp9.h` — header does not exist + +**Site**: NEW FILE `src/vp9.h`. + +**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h. + +### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point) + +**Site**: NEW FILE candidate `src/vp9_rac.h`. + +VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options: + +- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut. +- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows. + +**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B. + +### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing + +**Site**: `picture.c:188-225`. + +**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC. + +### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped + +VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`): + +| VAAPI buffer type | VAAPI struct | Per-frame | +|---|---|---| +| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once | +| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once | +| `VASliceDataBufferType` | raw bitstream | once | + +**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4. + +**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total. + +### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset + +**Site**: `picture.c:299-302`. + +**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler). + +**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state. + +### B10 — `src/surface.h::object_surface::params` union — no `vp9` member + +**Site**: `surface.h:92-119`. + +**Patch shape**: ADD `vp9` struct after `vp8`: + +```c +struct { + VADecPictureParameterBufferVP9 picture; + VASliceParameterBufferVP9 slice; +} vp9; +``` + +`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB). + +### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists + +**Site**: `meson.build:30-74`. + +**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines. + +### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed) + +**Site**: `buffer.c:59-70`. + +VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.** + +Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate"). + +### Non-bugs (intentionally NOT touched) + +- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.** +- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.** +- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.** +- `include/hevc-ctrls.h` — already includes `` which holds VP9 control IDs. + +## Contract surface (verbatim) + +### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`:2696`) + +```c +#define V4L2_CID_STATELESS_VP9_FRAME (V4L2_CID_CODEC_STATELESS_BASE + 300) + /* = 0xa40b2c */ + +struct v4l2_ctrl_vp9_frame { + struct v4l2_vp9_loop_filter lf; /* 16 bytes; ref_deltas[4] + mode_deltas[2] + + level + sharpness + flags + reserved[7] */ + struct v4l2_vp9_quantization quant; /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */ + struct v4l2_vp9_segmentation seg; /* 80 bytes; feature_data[8][4] + feature_enabled[8] + + tree_probs[7] + pred_probs[3] + flags + reserved[5] */ + __u32 flags; /* 6 V4L2_VP9_FRAME_FLAG_* bits per + :2665-2674 */ + __u16 compressed_header_size; + __u16 uncompressed_header_size; + __u16 frame_width_minus_1; + __u16 frame_height_minus_1; + __u16 render_width_minus_1; + __u16 render_height_minus_1; + __u64 last_frame_ts; /* per-VASurfaceID timestamp lookup */ + __u64 golden_frame_ts; + __u64 alt_frame_ts; + __u8 ref_frame_sign_bias; /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */ + __u8 reset_frame_context; /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */ + __u8 frame_context_idx; + __u8 profile; + __u8 bit_depth; + __u8 interpolation_filter; + __u8 tile_cols_log2; + __u8 tile_rows_log2; + __u8 reference_mode; + __u8 reserved[7]; +}; +``` + +Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR). + +### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`:2797`) + +```c +#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (V4L2_CID_CODEC_STATELESS_BASE + 301) + /* = 0xa40b2d */ + +struct v4l2_ctrl_vp9_compressed_hdr { + __u8 tx_mode; /* V4L2_VP9_TX_MODE_* (0..4) */ + __u8 tx8[2][1]; + __u8 tx16[2][2]; + __u8 tx32[2][3]; + __u8 coef[4][2][2][6][6][3]; /* HUGE: 1728 bytes */ + __u8 skip[3]; + __u8 inter_mode[7][3]; + __u8 interp_filter[4][2]; + __u8 is_inter[4]; + __u8 comp_mode[5]; + __u8 single_ref[5][2]; + __u8 comp_ref[5]; + __u8 y_mode[4][9]; + __u8 uv_mode[10][9]; + __u8 partition[16][3]; + struct v4l2_vp9_mv_probs mv; /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */ +}; +``` + +Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`). + +The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`. + +### VAAPI buffer types + +`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`): +- `frame_width`, `frame_height` (u16) +- `reference_frames[8]` — 8-entry DPB (vs VP8's 3) +- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag) +- `filter_level`, `sharpness_level` (u8) +- `log2_tile_rows`, `log2_tile_columns` (u8) +- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB) +- `first_partition_size` — compressed_header_size (u16) +- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8) +- `profile`, `bit_depth` (u8) + +`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`): +- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32) +- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each): + - `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed) + - `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels + - `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment + +### FFmpeg V4L2 reference (`v4l2_request_vp9.c`) + +Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`: + +```c +control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... }; +control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... }; +v4l2_set_controls(..., control, 2); +``` + +The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided). + +### Kernel rkvdec driver (`rkvdec-vp9.c`) + +Key reads in `rkvdec_vp9_run_preamble`: +- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`. +- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`. +- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB). + +## Mapping table (VAAPI → V4L2 / kernel) + +The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs. + +### `v4l2_ctrl_vp9_frame` mapping + +| Kernel field | VAAPI source | Notes | +|---|---|---| +| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? | +| `lf.mode_deltas[2]` | NOT in VAAPI | same | +| `lf.level` | `picture->filter_level` | direct | +| `lf.sharpness` | `picture->sharpness_level` | direct | +| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto | +| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? | +| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales | +| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial | +| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial | +| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct | +| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct | +| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct | +| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe | +| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct | +| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct | +| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct | +| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct | +| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct | +| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct | +| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct | +| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) | +| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) | +| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct | +| `frame_width_minus_1` | `picture->frame_width - 1` | direct | +| `frame_height_minus_1` | `picture->frame_height - 1` | direct | +| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) | +| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]` → `surface_object->timestamp` → `v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB | +| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same | +| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same | +| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct | +| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection | +| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct | +| `profile` | `picture->profile` | direct | +| `bit_depth` | `picture->bit_depth` | direct | +| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection | +| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct | +| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers | + +### `v4l2_ctrl_vp9_compressed_hdr` mapping + +This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability. + +The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed: + +- `tx_mode` (2 bits, then conditional 1 bit) +- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT) +- Coef probability updates (4-level nested loop with branch probs) +- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames) +- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp) + +Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct. + +## Patch shape prediction + +| Site | Action | LOC delta | +|---|---|---| +| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 | +| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 | +| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 | +| `src/vp9.c` | NEW FILE | +500-600 | +| `src/vp9.h` | NEW FILE | +35-45 | +| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 | +| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 | +| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 | +| `src/surface.h:92-119` | INSERT vp9 struct | +6 | +| `src/meson.build:50,73` | INSERT 2 entries | +2 | + +**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost. + +Predicted commits: +- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1). +- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser). +- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3). +- **Commit D**: optional fix-forward placeholder. + +## Open questions for Phase 3 baseline + +1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values. +2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table. +3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB. +4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check. +5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify. +6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check. +7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not. +8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero. + +## Phase 3 baseline targets (work plan) + +1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout). +2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces. +3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor). +4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there). +5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4. + +## Phase 4 plan structure (anticipated) + +Following iter2/iter3's clause template: + +- Clause 1: Submission shape — 2 controls batched per frame +- Clause 2: Local struct alloc + zero-init (memset both) +- Clause 3: Frame geometry + scalars + flags +- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB) +- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1) +- Clause 6: Quantization mapping (with VAAPI gap notes per Q2) +- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6) +- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table +- Clause 9: Final 2-control batched submission +- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length + +The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code. + +## Substrate state at Phase 2 close + +- iter4 Phase 1 commit `9a71dbf` pushed to gitea. +- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`). +- All Phase 3 prerequisites identified. +- Memory rules apply unchanged. +- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).