iter4 Phase 2: situation analysis — VP9 backend gaps + compressed-
header parser requirement
Source-read of every file the iter4 patch series will touch, plus
kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference
sources. Conducted on noether against fork tip e1aca9c (iter3 close).
Critical scope-shaping finding: rkvdec on RK3399 REQUIRES
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (not optional). Per
drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble
lines 752-754:
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
if (WARN_ON(!ctrl))
return -EINVAL;
VAAPI does NOT expose compressed-header probability updates
(va_dec_vp9.h:50-192 — only frame parameters + segmentation;
vendor VAAPI drivers parse compressed header in firmware/GPU).
Therefore the libva backend MUST parse the compressed header
itself via a VPX boolean decoder + inv_map_table[]. ~150-200 LOC
of bitstream parsing logic (port from FFmpeg
v4l2_request_vp9.c::fill_compressed_hdr).
Bug enumeration (12 sites):
B1 config.c::RequestQueryConfigProfiles enum block missing
B2 config.c::RequestCreateConfig VP9 case missing
B3 config.c::RequestQueryConfigEntrypoints VP9 case missing
B4 src/vp9.c new file ~500-600 LOC
B5 src/vp9.h new file ~35-45 LOC
B6 src/vp9_rac.h NEW or inline (Phase 4
plan locks Option A:
inline in vp9.c)
B7 picture.c::codec_set_controls VP9 dispatch missing
B8 picture.c::codec_store_buffer 2 buffer-type cases
(Picture + Slice;
NOT 4 like VP8)
B9 picture.c::RequestBeginPicture predicted no reset
needed (no flag-state
like VP8 iqmatrix_set)
B10 surface.h::object_surface::params union vp9 member missing
B11 meson.build vp9.c/vp9.h not in lists
B12 buffer.c predicted no change
needed (VP9 uses
Picture/Slice/SliceData
— all whitelisted)
Non-bugs (intentionally untouched): context.c (no DECODE_MODE/
START_CODE menus per FFmpeg ref), video.c (CAPTURE-side format
list), v4l2.c (fourcc-agnostic), include/hevc-ctrls.h (already
includes <linux/v4l2-controls.h>).
Contract surface cited verbatim:
V4L2_CID_STATELESS_VP9_FRAME = 0xa40b2c (~144 bytes — much
smaller than VP8's 1232 bytes because VP9_FRAME carries no
entropy table; that's in COMPRESSED_HDR)
V4L2_CID_STATELESS_VP9_COMPRESSED_HDR = 0xa40b2d (~1947 bytes
— coef[4][2][2][6][6][3] alone is 1728 bytes)
Per-frame submission: 2 controls batched in single S_EXT_CTRLS
v4l2_request_vp9.c references confirmed: 2-control shape,
runtime-probed COMPRESSED_HDR availability (rkvdec advertises
it; we MUST provide)
VAAPI buffer types: 2 per frame (Picture + Slice) vs iter3 VP8's
4. NO Probability buffer (VP9 keeps probs in compressed header).
NO IQMatrix (VP9 keeps quant in slice's per-segment seg_param[8]).
VAAPI → V4L2 mapping table: 30+ fields enumerated. Several gap
candidates identified for Phase 3 empirical resolution:
Q1 lf.ref_deltas/mode_deltas/flags — not in VAAPI; FFmpeg reads
from VP9Context internal. BBB likely zero.
Q2 quant.base_q_idx + deltas — VAAPI exposes only effective
per-segment scales. Inverse-derive needed.
Q3 reference_mode — not in VAAPI. Default to SELECT?
Q4 interpolation_filter mapping (FFmpeg ^ remap)
Q5 reset_frame_context off-by-one (FFmpeg > 0 ? - 1 : 0)
Q6 Per-segment feature_data[8][4] derivation from VAAPI's
effective scales is non-trivial
Q7 mpv 0.41.0 VP9 hwdec engagement (per memory feedback_hw_
decode_engagement_check.md — known gap from iter3 VP8)
Q8 rkvdec dma_resv issue? (predicted NO based on iter1+iter2
successful mpv-DMA-BUF-GL on rkvdec)
Patch-shape prediction: ~580-690 LOC across 5 modified + 2 new
files (closer to iter2 HEVC's 470 than iter3 VP8's 370). Compressed-
header parser is the dominant cost.
Phase 3 baseline targets queued: cross-validator strace verbatim
S_EXT_CTRLS payloads (both controls), VAAPI consumer trace, mpv-
VP9-vaapi engagement check, rkvdec readback non-zero check.
Phase 4 plan structure anticipated: 10-clause template per
iter2/iter3, with new Clause 8 dedicated to compressed-header
parser.
Refs:
phase0_findings_iter4.md (Phase 1 lock)
phase8_iteration3_close.md (predecessor)
references/ffmpeg-kwiboo/libavcodec/v4l2_request_vp9.c (V4L2 ref)
references/ffmpeg-kwiboo/libavcodec/vaapi_vp9.c (VAAPI ref)
/home/mfritsche/src/linux-rfc/drivers/staging/media/rkvdec/
rkvdec-vp9.c (kernel driver — confirms COMPRESSED_HDR
requirement at lines 752-754)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,380 @@
|
||||
# Iteration 4 — Phase 2 (situation analysis)
|
||||
|
||||
Source-read of every file the iter4 patch series will touch, plus the kernel UAPI + VAAPI + downstream FFmpeg + kernel rkvdec reference sources. Conducted on noether against fork tip `e1aca9c` (iter3 close).
|
||||
|
||||
This is a contract-before-code analysis per `feedback_dev_process.md` Phase 2: enumerate the bugs, cite the contract verbatim, predict the patch shape, queue the Phase 3 baseline questions.
|
||||
|
||||
## Critical finding: rkvdec requires VP9_COMPRESSED_HDR
|
||||
|
||||
The biggest scope-shaping discovery: **rkvdec on RK3399 requires `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`**, not optional. From `drivers/staging/media/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run_preamble` lines 740-754:
|
||||
|
||||
```c
|
||||
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_FRAME);
|
||||
if (WARN_ON(!ctrl))
|
||||
return -EINVAL;
|
||||
dec_params = ctrl->p_cur.p;
|
||||
...
|
||||
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_VP9_COMPRESSED_HDR);
|
||||
if (WARN_ON(!ctrl))
|
||||
return -EINVAL; /* ← rkvdec WILL fail without compressed-header probs */
|
||||
prob_updates = ctrl->p_cur.p;
|
||||
vp9_ctx->cur.tx_mode = prob_updates->tx_mode;
|
||||
...
|
||||
v4l2_vp9_fw_update_probs(&vp9_ctx->probability_tables, prob_updates, dec_params);
|
||||
```
|
||||
|
||||
VAAPI does NOT expose compressed-header probability updates (per `va_dec_vp9.h:50-192` — only frame parameters + segmentation, no probability deltas; vendor VAAPI drivers parse compressed header in firmware/GPU). So **the libva backend must parse the compressed header itself** via a VPX boolean decoder.
|
||||
|
||||
This shapes iter4's scope significantly larger than iter3 VP8.
|
||||
|
||||
## Bug enumeration (sites the iter4 patch series must touch)
|
||||
|
||||
### B1 — `src/config.c::RequestQueryConfigProfiles` — VP9 enumeration block missing
|
||||
|
||||
**Site**: `config.c:121-160`.
|
||||
|
||||
**Bug**: no analogous block for `V4L2_PIX_FMT_VP9_FRAME` → `VAProfileVP9Profile0`. Same starting condition as iter3 VP8.
|
||||
|
||||
**Patch shape**: ADD enumeration block after iter3's VP8 block. ~10 LOC.
|
||||
|
||||
### B2 — `src/config.c::RequestCreateConfig` — VP9 case label missing
|
||||
|
||||
**Site**: `config.c:54-78`.
|
||||
|
||||
**Bug**: no `case VAProfileVP9Profile0:`. Mirror iter3 VP8 pattern. ~5 LOC.
|
||||
|
||||
### B3 — `src/config.c::RequestQueryConfigEntrypoints` — VP9 case missing
|
||||
|
||||
**Site**: `config.c:167-191`.
|
||||
|
||||
**Bug**: missing in fall-through case list. ~1 LOC.
|
||||
|
||||
### B4 — `src/vp9.c` — file does not exist; needs net-new implementation
|
||||
|
||||
**Site**: NEW FILE `src/vp9.c`.
|
||||
|
||||
**Patch shape**: NEW file, ~500-600 LOC (substantially larger than iter3 vp8.c due to compressed-header parser):
|
||||
|
||||
- Includes block
|
||||
- Static `inv_map_table[255]` — direct copy from FFmpeg `v4l2_request_vp9.c:43-64`
|
||||
- VPX range coder helpers (port from FFmpeg `vp89_rac.h` + boolean decoder primitives) — ~80 LOC
|
||||
- `vp9_fill_frame()` — fill `v4l2_ctrl_vp9_frame` from VAAPI `VADecPictureParameterBufferVP9` + `VASliceParameterBufferVP9` — ~150 LOC
|
||||
- `vp9_fill_compressed_hdr()` — parse compressed header bits from `surface_object->source_data + uncompressed_header_size`, populate `v4l2_ctrl_vp9_compressed_hdr` — ~180 LOC (port from FFmpeg `fill_compressed_hdr` lines 99-261)
|
||||
- `vp9_set_controls()` — entry point, allocates both structs, calls `vp9_fill_frame` + `vp9_fill_compressed_hdr`, batched 2-element `v4l2_ext_control` array, single `v4l2_set_controls` call
|
||||
|
||||
### B5 — `src/vp9.h` — header does not exist
|
||||
|
||||
**Site**: NEW FILE `src/vp9.h`.
|
||||
|
||||
**Patch shape**: declare `vp9_set_controls()`. Mirror iter3 vp8.h.
|
||||
|
||||
### B6 — Possibly `src/vp9_rac.h` — VPX range decoder helpers (decision point)
|
||||
|
||||
**Site**: NEW FILE candidate `src/vp9_rac.h`.
|
||||
|
||||
VP9 boolean decoder primitives (`vpx_rac_get_prob_branchy`, `vp89_rac_get`, `vp89_rac_get_uint`, init function) are needed by `vp9_fill_compressed_hdr`. Two design options:
|
||||
|
||||
- **Option A**: inline the ~80 LOC of decoder helpers directly in `vp9.c`. Simpler; one file. Recommended for first cut.
|
||||
- **Option B**: separate `vp9_rac.h`/`vp9_rac.c`. Mirrors FFmpeg's `vp89_rac.h` upstream pattern. More files, easier reuse if AV1/VP10 work follows.
|
||||
|
||||
**Phase 4 plan locks Option A** unless Phase 5 review surfaces a reason for Option B.
|
||||
|
||||
### B7 — `src/picture.c::codec_set_controls` — VP9 dispatch case missing
|
||||
|
||||
**Site**: `picture.c:188-225`.
|
||||
|
||||
**Patch shape**: ADD `case VAProfileVP9Profile0:` calling `vp9_set_controls`. ~6 LOC.
|
||||
|
||||
### B8 — `src/picture.c::codec_store_buffer` — 2 VAAPI buffer types unmapped
|
||||
|
||||
VAAPI VP9 sends only TWO buffer types per frame (per `va_dec_vp9.h:58-303`):
|
||||
|
||||
| VAAPI buffer type | VAAPI struct | Per-frame |
|
||||
|---|---|---|
|
||||
| `VAPictureParameterBufferType` | `VADecPictureParameterBufferVP9` | once |
|
||||
| `VASliceParameterBufferType` | `VASliceParameterBufferVP9` (with `seg_param[8]`) | once |
|
||||
| `VASliceDataBufferType` | raw bitstream | once |
|
||||
|
||||
**Different from iter3 VP8**: no `VAProbabilityBufferType` (VP9 keeps probability state in the picture/slice params + parsed compressed header), no `VAIQMatrixBufferType` (VP9 keeps quantization in the slice's per-segment seg_param array). Just 2 cases vs VP8's 4.
|
||||
|
||||
**Patch shape**: 2 nested case adds in `codec_store_buffer` outer switch + inner profile dispatch. ~14 LOC total.
|
||||
|
||||
### B9 — `src/picture.c::RequestBeginPicture` — per-frame VP9 reset
|
||||
|
||||
**Site**: `picture.c:299-302`.
|
||||
|
||||
**Bug**: VP9 doesn't have an iqmatrix_set / probability_set flag pattern; the picture/slice params are unconditionally fully-populated by VAAPI consumer per frame. Possibly NO reset needed (analogous to MPEG-2's iqmatrix-only pattern but even simpler).
|
||||
|
||||
**Patch shape**: likely no edit. If Phase 5 review reveals a hidden state-leak risk (e.g., VAAPI reusing the surface for a new context with stale params), add reset for `params.vp9.<some-flag>`. Default plan: no reset added; revisit if Phase 7 byte-compare shows stale state.
|
||||
|
||||
### B10 — `src/surface.h::object_surface::params` union — no `vp9` member
|
||||
|
||||
**Site**: `surface.h:92-119`.
|
||||
|
||||
**Patch shape**: ADD `vp9` struct after `vp8`:
|
||||
|
||||
```c
|
||||
struct {
|
||||
VADecPictureParameterBufferVP9 picture;
|
||||
VASliceParameterBufferVP9 slice;
|
||||
} vp9;
|
||||
```
|
||||
|
||||
`VASliceParameterBufferVP9` is large (~340 bytes — `seg_param[8]` × ~40 bytes each); `VADecPictureParameterBufferVP9` ~80 bytes. Union grows by ~420 bytes from this; still dominated by `params.h265` with its 64-slot slices[64] array (~17 KB).
|
||||
|
||||
### B11 — `src/meson.build` — `vp9.c` + `vp9.h` not in lists
|
||||
|
||||
**Site**: `meson.build:30-74`.
|
||||
|
||||
**Patch shape**: insert `'vp9.c'` after `'vp8.c'` in sources, insert `'vp9.h'` after `'vp8.h'` in headers. +2 lines.
|
||||
|
||||
### B12 — `src/buffer.c` — buffer-type allow-list (predicted no change needed)
|
||||
|
||||
**Site**: `buffer.c:59-70`.
|
||||
|
||||
VP9 uses `VAPictureParameterBufferType` + `VASliceParameterBufferType` + `VASliceDataBufferType` — all three already in the allow-list (used by H.264 + iter3 VP8). **Predicted no change needed.**
|
||||
|
||||
Per memory `feedback_runtime_enumerates_allowlists.md`: plan for fix-forward Commit D if a runtime miss surfaces (would be unexpected for VP9 given the buffer types are H.264-shape; but the iter3 lesson is "don't audit exhaustively — let runtime enumerate").
|
||||
|
||||
### Non-bugs (intentionally NOT touched)
|
||||
|
||||
- `src/context.c` — no DECODE_MODE/START_CODE menus for VP9 (per FFmpeg V4L2 ref `v4l2_request_vp9.c:487-503`: `v4l2_request_vp9_init` doesn't issue any device-wide menu sets; per-frame batch only). **No context.c changes.**
|
||||
- `src/video.c::formats[]` — CAPTURE-side format list (NV12); VP9 is OUTPUT-side fourcc, probed via `v4l2_find_format()` in config.c. **No video.c changes.**
|
||||
- `src/v4l2.c` — fourcc-agnostic helpers. **No v4l2.c changes.**
|
||||
- `include/hevc-ctrls.h` — already includes `<linux/v4l2-controls.h>` which holds VP9 control IDs.
|
||||
|
||||
## Contract surface (verbatim)
|
||||
|
||||
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_FRAME` (`<linux/v4l2-controls.h>:2696`)
|
||||
|
||||
```c
|
||||
#define V4L2_CID_STATELESS_VP9_FRAME (V4L2_CID_CODEC_STATELESS_BASE + 300)
|
||||
/* = 0xa40b2c */
|
||||
|
||||
struct v4l2_ctrl_vp9_frame {
|
||||
struct v4l2_vp9_loop_filter lf; /* 16 bytes; ref_deltas[4] + mode_deltas[2]
|
||||
+ level + sharpness + flags + reserved[7] */
|
||||
struct v4l2_vp9_quantization quant; /* 8 bytes; base_q_idx + 3 deltas + reserved[4] */
|
||||
struct v4l2_vp9_segmentation seg; /* 80 bytes; feature_data[8][4] + feature_enabled[8]
|
||||
+ tree_probs[7] + pred_probs[3] + flags + reserved[5] */
|
||||
__u32 flags; /* 6 V4L2_VP9_FRAME_FLAG_* bits per
|
||||
<linux/v4l2-controls.h>:2665-2674 */
|
||||
__u16 compressed_header_size;
|
||||
__u16 uncompressed_header_size;
|
||||
__u16 frame_width_minus_1;
|
||||
__u16 frame_height_minus_1;
|
||||
__u16 render_width_minus_1;
|
||||
__u16 render_height_minus_1;
|
||||
__u64 last_frame_ts; /* per-VASurfaceID timestamp lookup */
|
||||
__u64 golden_frame_ts;
|
||||
__u64 alt_frame_ts;
|
||||
__u8 ref_frame_sign_bias; /* OR of V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT} */
|
||||
__u8 reset_frame_context; /* V4L2_VP9_RESET_FRAME_CTX_* (0..2) */
|
||||
__u8 frame_context_idx;
|
||||
__u8 profile;
|
||||
__u8 bit_depth;
|
||||
__u8 interpolation_filter;
|
||||
__u8 tile_cols_log2;
|
||||
__u8 tile_rows_log2;
|
||||
__u8 reference_mode;
|
||||
__u8 reserved[7];
|
||||
};
|
||||
```
|
||||
|
||||
Total size: ~144 bytes (vs iter3 VP8's 1232 bytes — much smaller because VP9_FRAME carries no entropy table; that's in COMPRESSED_HDR).
|
||||
|
||||
### Kernel UAPI: `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR` (`<linux/v4l2-controls.h>:2797`)
|
||||
|
||||
```c
|
||||
#define V4L2_CID_STATELESS_VP9_COMPRESSED_HDR (V4L2_CID_CODEC_STATELESS_BASE + 301)
|
||||
/* = 0xa40b2d */
|
||||
|
||||
struct v4l2_ctrl_vp9_compressed_hdr {
|
||||
__u8 tx_mode; /* V4L2_VP9_TX_MODE_* (0..4) */
|
||||
__u8 tx8[2][1];
|
||||
__u8 tx16[2][2];
|
||||
__u8 tx32[2][3];
|
||||
__u8 coef[4][2][2][6][6][3]; /* HUGE: 1728 bytes */
|
||||
__u8 skip[3];
|
||||
__u8 inter_mode[7][3];
|
||||
__u8 interp_filter[4][2];
|
||||
__u8 is_inter[4];
|
||||
__u8 comp_mode[5];
|
||||
__u8 single_ref[5][2];
|
||||
__u8 comp_ref[5];
|
||||
__u8 y_mode[4][9];
|
||||
__u8 uv_mode[10][9];
|
||||
__u8 partition[16][3];
|
||||
struct v4l2_vp9_mv_probs mv; /* 79 bytes; joint/sign/classes/class0_bit/bits/etc */
|
||||
};
|
||||
```
|
||||
|
||||
Total size: ~1947 bytes. Filled by parsing the compressed header bits via VPX boolean decoder + `inv_map_table[]` (per FFmpeg `v4l2_request_vp9.c:99-261`).
|
||||
|
||||
The kernel uses these as PROBABILITY UPDATES (not absolutes): a value of zero in any array element means "no update — keep prior probability." The kernel runs `v4l2_vp9_fw_update_probs(&probability_tables, prob_updates, dec_params)` to apply updates per `rkvdec-vp9.c:796`.
|
||||
|
||||
### VAAPI buffer types
|
||||
|
||||
`VADecPictureParameterBufferVP9` (`va_dec_vp9.h:58-192`):
|
||||
- `frame_width`, `frame_height` (u16)
|
||||
- `reference_frames[8]` — 8-entry DPB (vs VP8's 3)
|
||||
- `pic_fields.bits.{...}` — 27 single-bit/multi-bit fields (subsampling_x/y, frame_type, show_frame, error_resilient_mode, intra_only, allow_high_precision_mv, mcomp_filter_type[3 bits], frame_parallel_decoding_mode, reset_frame_context[2 bits], refresh_frame_context, frame_context_idx[2 bits], segmentation_*, last/golden/alt_ref_frame[3 bits each, indexes into reference_frames[8]], *_sign_bias, lossless_flag)
|
||||
- `filter_level`, `sharpness_level` (u8)
|
||||
- `log2_tile_rows`, `log2_tile_columns` (u8)
|
||||
- `frame_header_length_in_bytes` — uncompressed_header_size (u8 — note 8-bit width may overflow for super-frames; typical < 256 for BBB)
|
||||
- `first_partition_size` — compressed_header_size (u16)
|
||||
- `mb_segment_tree_probs[7]`, `segment_pred_probs[3]` (u8)
|
||||
- `profile`, `bit_depth` (u8)
|
||||
|
||||
`VASliceParameterBufferVP9` (`va_dec_vp9.h:279-303`):
|
||||
- `slice_data_size`, `slice_data_offset`, `slice_data_flag` (u32)
|
||||
- `seg_param[8]` — array of `VASegmentParameterVP9` (~40 bytes each):
|
||||
- `segment_flags.fields.{segment_reference_enabled, segment_reference[2 bits], segment_reference_skipped}` (u16 packed)
|
||||
- `filter_level[4][2]` (u8) — per-ref-frame × per-mode loop filter levels
|
||||
- `luma_ac_quant_scale`, `luma_dc_quant_scale`, `chroma_ac_quant_scale`, `chroma_dc_quant_scale` (s16) — already-computed effective scale per segment
|
||||
|
||||
### FFmpeg V4L2 reference (`v4l2_request_vp9.c`)
|
||||
|
||||
Submission shape: 2 batched controls per frame in single `S_EXT_CTRLS`:
|
||||
|
||||
```c
|
||||
control[0] = { .id = V4L2_CID_STATELESS_VP9_FRAME, ... };
|
||||
control[1] = { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, ... };
|
||||
v4l2_set_controls(..., control, 2);
|
||||
```
|
||||
|
||||
The COMPRESSED_HDR control is conditionally-included based on a runtime probe (`v4l2_request_vp9_post_frames_ctx` queries the kernel; if the control isn't advertised, falls back to FRAME-only). For rkvdec on RK3399, the kernel advertises COMPRESSED_HDR — verified at `rkvdec-vp9.c:752` (kernel WILL EINVAL if not provided).
|
||||
|
||||
### Kernel rkvdec driver (`rkvdec-vp9.c`)
|
||||
|
||||
Key reads in `rkvdec_vp9_run_preamble`:
|
||||
- VP9_FRAME control → `dec_params = ctrl->p_cur.p` → drives register programming via `config_registers()`.
|
||||
- VP9_COMPRESSED_HDR control → `prob_updates = ctrl->p_cur.p` → applied via `v4l2_vp9_fw_update_probs()`.
|
||||
- 8-entry reference frame DPB resolved from FRAME's `last_frame_ts`/`golden_frame_ts`/`alt_frame_ts` (only 3 active references at a time, despite VAAPI exposing 8 — kernel uses last/golden/alt indexes into the picture's 8-frame DPB).
|
||||
|
||||
## Mapping table (VAAPI → V4L2 / kernel)
|
||||
|
||||
The libva backend's job: read VAAPI's per-frame buffers (Picture + Slice) AND parse the compressed header from the bitstream, write the kernel's two structs.
|
||||
|
||||
### `v4l2_ctrl_vp9_frame` mapping
|
||||
|
||||
| Kernel field | VAAPI source | Notes |
|
||||
|---|---|---|
|
||||
| `lf.ref_deltas[4]` | NOT in VAAPI | VAAPI doesn't expose loop-filter ref deltas separately; FFmpeg's V4L2 ref reads from VP9Context internal state. **Open question Phase 3**: are these zero in the BBB fixture? |
|
||||
| `lf.mode_deltas[2]` | NOT in VAAPI | same |
|
||||
| `lf.level` | `picture->filter_level` | direct |
|
||||
| `lf.sharpness` | `picture->sharpness_level` | direct |
|
||||
| `lf.flags` | NOT in VAAPI | DELTA_ENABLED + DELTA_UPDATE bits — ditto |
|
||||
| `quant.base_q_idx` | DERIVED — no direct VAAPI exposure | **Open question Phase 3**: VAAPI exposes per-segment `luma_ac_quant_scale[seg_param[s]]` but those are EFFECTIVE Q-scales, not the base index. Inverse-derive from `luma_ac_quant_scale[0][1]` via VP9 spec quantization table? Or leave zero and let kernel use default? |
|
||||
| `quant.delta_q_y_dc/uv_dc/uv_ac` | NOT in VAAPI | same — VAAPI only exposes effective per-segment scales |
|
||||
| `seg.feature_data[8][4]` | DERIVED from `slice->seg_param[s].filter_level[][]` + quant scales | mapping non-trivial |
|
||||
| `seg.feature_enabled[8]` | derived from `slice->seg_param[s].segment_flags` + segmentation enabled bits | non-trivial |
|
||||
| `seg.tree_probs[7]` | `picture->mb_segment_tree_probs[7]` | direct |
|
||||
| `seg.pred_probs[3]` | `picture->segment_pred_probs[3]` | direct |
|
||||
| `seg.flags` | from `pic_fields.bits.{segmentation_enabled, segmentation_update_map, segmentation_temporal_update}` + derived segmentation_update_data + absolute_or_delta | mostly direct |
|
||||
| `flags & KEY_FRAME` | `!pic_fields.bits.frame_type` | VAAPI inverts: frame_type=0 means keyframe |
|
||||
| `flags & SHOW_FRAME` | `pic_fields.bits.show_frame` | direct |
|
||||
| `flags & ERROR_RESILIENT` | `pic_fields.bits.error_resilient_mode` | direct |
|
||||
| `flags & INTRA_ONLY` | `pic_fields.bits.intra_only` | direct |
|
||||
| `flags & ALLOW_HIGH_PREC_MV` | `pic_fields.bits.allow_high_precision_mv` | direct |
|
||||
| `flags & REFRESH_FRAME_CTX` | `pic_fields.bits.refresh_frame_context` | direct |
|
||||
| `flags & PARALLEL_DEC_MODE` | `pic_fields.bits.frame_parallel_decoding_mode` | direct |
|
||||
| `flags & X/Y_SUBSAMPLING` | `pic_fields.bits.subsampling_x/y` | direct |
|
||||
| `flags & COLOR_RANGE_FULL_SWING` | NOT in VAAPI | leave 0 for BT.709 limited (BBB) |
|
||||
| `compressed_header_size` | `picture->first_partition_size` | direct (VAAPI mis-named per its own comment) |
|
||||
| `uncompressed_header_size` | `picture->frame_header_length_in_bytes` | direct |
|
||||
| `frame_width_minus_1` | `picture->frame_width - 1` | direct |
|
||||
| `frame_height_minus_1` | `picture->frame_height - 1` | direct |
|
||||
| `render_width_minus_1`, `render_height_minus_1` | NOT in VAAPI | leave equal to frame_width-1 / frame_height-1 (no scaling for BBB) |
|
||||
| `last_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.last_ref_frame]` → `surface_object->timestamp` → `v4l2_timeval_to_ns()` | uses `last_ref_frame` index into 8-entry DPB |
|
||||
| `golden_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]` | same |
|
||||
| `alt_frame_ts` | DPB lookup `picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]` | same |
|
||||
| `ref_frame_sign_bias` | OR of `pic_fields.bits.{last,golden,alt}_ref_frame_sign_bias` mapped to `V4L2_VP9_SIGN_BIAS_{LAST,GOLDEN,ALT}` | direct |
|
||||
| `reset_frame_context` | `pic_fields.bits.reset_frame_context` (with FFmpeg's `> 0 ? -1 : 0` adjustment per ref) | mapping needs inspection |
|
||||
| `frame_context_idx` | `pic_fields.bits.frame_context_idx` | direct |
|
||||
| `profile` | `picture->profile` | direct |
|
||||
| `bit_depth` | `picture->bit_depth` | direct |
|
||||
| `interpolation_filter` | `pic_fields.bits.mcomp_filter_type` (with FFmpeg's `^ (filtermode <= 1)` adjustment — see ref) | mapping needs inspection |
|
||||
| `tile_cols_log2`, `tile_rows_log2` | `picture->log2_tile_columns`, `log2_tile_rows` | direct |
|
||||
| `reference_mode` | NOT in VAAPI | derive from heuristic OR leave default `V4L2_VP9_REFERENCE_MODE_SELECT` — Phase 3 baseline answers |
|
||||
|
||||
### `v4l2_ctrl_vp9_compressed_hdr` mapping
|
||||
|
||||
This struct is filled by PARSING the compressed header bitstream — NOT from VAAPI. The libva backend runs a VPX boolean decoder over `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes, follows the VP9 spec section 6.3, and applies `inv_map_table[d]` for each updated probability.
|
||||
|
||||
The parsing logic is direct port of FFmpeg `fill_compressed_hdr` (lines 99-261). Key syntax elements parsed:
|
||||
|
||||
- `tx_mode` (2 bits, then conditional 1 bit)
|
||||
- TX 8x8/16x16/32x32 probability updates (only if tx_mode == SELECT)
|
||||
- Coef probability updates (4-level nested loop with branch probs)
|
||||
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability updates (only on inter frames)
|
||||
- MV probability updates (joint / sign / classes / class0_bit / bits / class0_fr / fr / class0_hp / hp)
|
||||
|
||||
Each updated value goes through `inv_map_table[]` (256-byte lookup). Each "no update" bit leaves zero in the kernel struct.
|
||||
|
||||
## Patch shape prediction
|
||||
|
||||
| Site | Action | LOC delta |
|
||||
|---|---|---|
|
||||
| `src/config.c:121-160` | INSERT VP9 enumeration block | +10 |
|
||||
| `src/config.c:54-78` | INSERT VP9 case + break + comment | +5 |
|
||||
| `src/config.c:167-191` | INSERT VP9 case in fall-through | +1 |
|
||||
| `src/vp9.c` | NEW FILE | +500-600 |
|
||||
| `src/vp9.h` | NEW FILE | +35-45 |
|
||||
| `src/picture.c:34-37` | INSERT `#include "vp9.h"` | +1 |
|
||||
| `src/picture.c:188-225` | INSERT VP9 dispatch case | +6 |
|
||||
| `src/picture.c:54-186` | INSERT 2 buffer-type cases | +14 |
|
||||
| `src/surface.h:92-119` | INSERT vp9 struct | +6 |
|
||||
| `src/meson.build:50,73` | INSERT 2 entries | +2 |
|
||||
|
||||
**Total**: ~580-690 LOC, 5 modified + 2 new files. Larger than iter3 VP8 (370 LOC) and comparable to iter2 HEVC (470 LOC). Compressed-header parser is the dominant cost.
|
||||
|
||||
Predicted commits:
|
||||
- **Commit A**: `src/config.c` enumeration + dispatch + entrypoints (Criterion 1).
|
||||
- **Commit B**: NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` (10 contract clauses + VPX rac decoder + compressed-header parser).
|
||||
- **Commit C**: `src/picture.c` dispatcher + 2 buffer-type cases + `src/surface.h` union extension (Criteria 2-3).
|
||||
- **Commit D**: optional fix-forward placeholder.
|
||||
|
||||
## Open questions for Phase 3 baseline
|
||||
|
||||
1. **Loop filter ref/mode deltas**: VAAPI doesn't expose `lf_delta.ref/mode/enabled/updated`. Are these always zero for BBB? Phase 3 strace of FFmpeg-v4l2request VP9 will reveal verbatim values.
|
||||
2. **Quantization base_q_idx + deltas**: VAAPI exposes effective per-segment scales but not the base. Phase 3 baseline: capture verbatim FRAME control payload to see what FFmpeg-v4l2request writes; correlate against VAAPI's per-segment scale via VP9 spec quantization table.
|
||||
3. **Reference mode**: VAAPI doesn't expose `comppredmode`. Phase 3 baseline: verify default `V4L2_VP9_REFERENCE_MODE_SELECT` works for BBB.
|
||||
4. **Interpolation filter mapping**: FFmpeg uses `filtermode ^ (filtermode <= 1)` to remap; VAAPI's `mcomp_filter_type` may already be in V4L2 enum order (no remap needed) OR in a different order. Empirically check.
|
||||
5. **Reset frame context mapping**: FFmpeg uses `> 0 ? - 1 : 0`. Either FFmpeg's source enum is offset by 1 from V4L2's, or there's an off-by-one. Empirically verify.
|
||||
6. **VAAPI per-segment field interpretation**: `slice->seg_param[s].filter_level[4][2]` and quant scales are EFFECTIVE values (computed by mpv-VAAPI consumer). Mapping back to kernel's "ALT_Q delta" + "ALT_L delta" + "REF_FRAME" feature bits is non-trivial. Phase 3 verbatim payload + mapping-back-to-VAAPI cross-check.
|
||||
7. **Does mpv 0.41.0 engage HW for VP9?**: Phase 3 capture `mpv -v --hwdec=vaapi --vo=null --frames=2 ~/fourier-test/bbb_720p10s_vp9.webm` and grep for `Selected decoder: vp9` vs `Using software decoding`. iter3 VP8 fell back; iter4 VP9 may or may not.
|
||||
8. **Does rkvdec exhibit the same dma_resv kernel issue as hantro?**: iter3 found hantro CAPTURE returns all-zero pages from libva readback. rkvdec is a different driver subsystem; iter1+iter2 successfully verified via mpv-DMA-BUF-GL on rkvdec. **Predicted: rkvdec works fine for direct readback.** Phase 3 baseline: re-test ffmpeg-vaapi-hwdownload on rkvdec for VP9 and check if output is non-zero.
|
||||
|
||||
## Phase 3 baseline targets (work plan)
|
||||
|
||||
1. **Cross-validator capture**: `strace -ff -tt -y -v -e trace=ioctl ffmpeg -hwaccel v4l2request bbb_720p10s_vp9.webm -frames:v 5 -f null - 2>strace.log`. Decode VP9_FRAME + COMPRESSED_HDR payloads via Phase 3 decoder (extend `decode_vp8.py` for VP9 layout).
|
||||
2. **VAAPI consumer trace**: `LIBVA_TRACE` mpv-SW + mpv-vaapi runs to see what buffer types mpv produces.
|
||||
3. **Cache-safe verify reference**: `mpv --hwdec=no --vo=image --frames=2 --start=00:00:02 ~/fourier-test/bbb_720p10s_vp9.webm` and capture frame-0001/0002 SHA256 (criterion-4 anchor).
|
||||
4. **rkvdec readback path test**: re-run `ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -vf hwdownload bbb_720p10s_vp9.webm -frames:v 5` after install (would be Phase 6 actually; Phase 3 just baseline-captures the SW reference). Confirm whether rkvdec hits dma_resv issue or not (predicted: NO based on iter1+iter2 working there).
|
||||
5. **mpv-VP9-vaapi engagement check**: per memory `feedback_hw_decode_engagement_check.md`, verify HW path engaged via `mpv -v` log BEFORE claiming criterion 4.
|
||||
|
||||
## Phase 4 plan structure (anticipated)
|
||||
|
||||
Following iter2/iter3's clause template:
|
||||
|
||||
- Clause 1: Submission shape — 2 controls batched per frame
|
||||
- Clause 2: Local struct alloc + zero-init (memset both)
|
||||
- Clause 3: Frame geometry + scalars + flags
|
||||
- Clause 4: DPB timestamp resolution (3 active refs from 8-slot DPB)
|
||||
- Clause 5: Loop filter mapping (with VAAPI gap notes per Q1)
|
||||
- Clause 6: Quantization mapping (with VAAPI gap notes per Q2)
|
||||
- Clause 7: Segmentation mapping (with VAAPI per-segment effective-vs-delta unpacking per Q6)
|
||||
- Clause 8: Compressed header parser — port FFmpeg `fill_compressed_hdr` + VPX rac decoder + inv_map_table
|
||||
- Clause 9: Final 2-control batched submission
|
||||
- Clause 10: Bitstream offsetting — `surface_object->source_data + uncompressed_header_size` is the start of compressed-header bytes; `compressed_header_size` is the byte length
|
||||
|
||||
The plan will cite verbatim Phase 3 baseline payload bytes for all fields where mapping is non-obvious (loop-filter deltas, quant base, segmentation feature mapping) per `feedback_dev_process.md` Phase 6 contract-before-code.
|
||||
|
||||
## Substrate state at Phase 2 close
|
||||
|
||||
- iter4 Phase 1 commit `9a71dbf` pushed to gitea.
|
||||
- Fork on noether at iter3 tip `e1aca9c` (synced via `git fetch && merge --ff-only`).
|
||||
- All Phase 3 prerequisites identified.
|
||||
- Memory rules apply unchanged.
|
||||
- Phase 3 questions queued (8 items, mostly empirical). Phase 5 review will catch the field-availability + mapping questions analogous to iter3 (`uniform_spacing_flag` Direction 2 lesson).
|
||||
Reference in New Issue
Block a user