Files
fresnel-fourier/phase4_iter4_plan.md
T
marfrit 4b36077b17 iter4 Phase 4: plan locks 12 contract clauses + Mitigation B
5-commit plan (Z, A, B, C, optional D):
- Commit Z: src/request.c — walk /dev/video* + /dev/media*, match by
  driver name in {rkvdec, hantro-vpu, cedrus, sun4i_csi}; restores
  baseline functionality on 7.0 (where /dev/video0 is rockchip-rga).
- Commit A: src/config.c — VAProfileVP9Profile0 enumeration + dispatch
  + entrypoints (~16 LOC, 1 file).
- Commit B: NEW src/vp9.c + .h + meson — 12 contract clauses; ~580 LOC
  vp9.c (50 infra + 80 VPX rac + 50 uncompressed-header partial parse +
  180 compressed-header parser + ~200 frame-fill).
- Commit C: src/picture.c + surface.h — VP9 dispatch + 2 buffer-type
  cases + union extension; NO BeginPicture reset (VP9 has no
  iqmatrix_set-style flags).
- Commit D: optional fix-forward placeholder (predicted no-op per
  feedback_runtime_enumerates_allowlists.md).

Total ~699 LOC, 7 files.

12 contract clauses include 2 NEW vs iter3:
- Clause 3: compile-time _Static_assert sizeof v4l2_ctrl_vp9_frame ==
  168 && ..._compressed_hdr == 2040 (any UAPI shift fails loudly).
- Clause 6: uncompressed-header partial parse for lf_delta_* +
  base_q_idx (VAAPI doesn't expose; BBB keyframe needs non-zero
  ref_deltas={1,0,-1,-1} per Phase 3 anchor).

7 Phase 5 review questions queued, all empirical-leaning per
feedback_review_empirical_over_theoretical.md Direction 2:
parser-vs-bitstream cross-check, FFmpeg-XOR-remap validation,
struct-size stability, mitigation B regression risk.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:10:47 +00:00

496 lines
29 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 4 — Phase 4 (plan)
Locks the iter4 patch shape against verbatim Phase 3 baseline (`phase3_iter4_baseline.md`, commit `56abe3d`) and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2 (`phase2_iter4_situation.md`, commit `2651e4c` + ID-correction in `56abe3d`). Plan structure mirrors iter2/iter3 clause template, expanded for VP9-specific scope (compressed-header parser + uncompressed-header partial parse + device-path mitigation).
Phase 3 baseline provides:
- Empirical struct sizes 168 B / 2040 B (NOT Phase 2's 144 / 1947 estimates)
- Correct control IDs `0xa40a2c` / `0xa40a2d`
- Frame-1 keyframe verbatim payload prefix (`lf.ref_deltas={1,0,-1,-1}`, `lf.mode_deltas={0,0}`, `quant.base_q_idx=46`)
- 4-codec regression root-cause: `/dev/video0` is now `rockchip-rga` on 7.0; backend hardcodes `/dev/video0` in `request.c:149`
User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. Restores baseline functionality on 7.0.
## Contract clauses
### Clause 1 — Submission shape (per-frame)
ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. **TWO controls** (vs iter3's one): `V4L2_CID_STATELESS_VP9_FRAME` + `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`. `rkvdec-vp9.c::rkvdec_vp9_run_preamble:752` `WARN_ON(!ctrl); return -EINVAL` if COMPRESSED_HDR absent — hard requirement on RK3399.
```c
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME, /* 0xa40a2c */
.ptr = &frame,
.size = sizeof frame /* MUST be 168 */ },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, /* 0xa40a2d */
.ptr = &compressed_hdr,
.size = sizeof compressed_hdr /* MUST be 2040 */ },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
```
`v4l2_set_controls` wraps with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verifies `ctrl_class=0xf010000` is what the kernel sees, matching iter1+iter2+iter3.
**No init-time probe**: ffmpeg-v4l2request first runs a count=1 probe (FRAME-only) to check kernel CID support, then count=2. iter4 backend skips the probe — VP9 on rkvdec REQUIRES COMPRESSED_HDR per kernel source; if the kernel doesn't have it, decode would fail anyway. Issuing count=2 unconditionally is correct.
**Anchor**: Phase 3 baseline § Anchor 2 verbatim ioctl trace.
### Clause 2 — Local struct allocation + zero-init
```c
int vp9_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferVP9 *picture =
&surface_object->params.vp9.picture;
VASliceParameterBufferVP9 *slice =
&surface_object->params.vp9.slice;
struct v4l2_ctrl_vp9_frame frame;
struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr;
memset(&frame, 0, sizeof frame);
memset(&compressed_hdr, 0, sizeof compressed_hdr);
/* Zero is the kernel's "no probability update" default for every
* field in compressed_hdr, and the safe default for every numeric
* field of frame except reference_mode (set explicitly later). */
...
}
```
VAAPI doesn't have iter3-style `set` flags for VP9; both Picture and Slice are unconditionally populated by the consumer per frame (per Phase 2 analysis B9, no per-frame reset needed in `RequestBeginPicture`).
### Clause 3 — Compile-time struct-size assertions
Per Phase 3 finding: kernel UAPI struct sizes are **168 B (FRAME)** and **2040 B (COMPRESSED_HDR)** on 7.0; iter4's `Build Date: post-2026-05-09` will use whatever size the build host's UAPI headers report. Adding compile-time asserts at the top of `vp9.c` makes any future struct-size drift fail loudly instead of silently corrupting kernel control writes:
```c
_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168,
"v4l2_ctrl_vp9_frame size mismatch — UAPI changed");
_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040,
"v4l2_ctrl_vp9_compressed_hdr size mismatch — UAPI changed");
```
If these fire, treat as a kernel-substrate bump (re-baseline Phase 3) — DO NOT just bump the asserts.
### Clause 4 — Frame geometry + per-frame scalars
```c
frame.frame_width_minus_1 = picture->frame_width - 1;
frame.frame_height_minus_1 = picture->frame_height - 1;
frame.render_width_minus_1 = picture->frame_width - 1; /* VAAPI gap;
frame.render_height_minus_1 = picture->frame_height - 1; no scaling for BBB */
frame.profile = picture->profile;
frame.bit_depth = picture->bit_depth;
frame.tile_cols_log2 = picture->log2_tile_columns;
frame.tile_rows_log2 = picture->log2_tile_rows;
frame.frame_context_idx = picture->pic_fields.bits.frame_context_idx;
frame.lf.level = picture->filter_level;
frame.lf.sharpness = picture->sharpness_level;
frame.uncompressed_header_size = picture->frame_header_length_in_bytes;
frame.compressed_header_size = picture->first_partition_size;
```
VAAPI fields verified via test-compile against `va_dec_vp9.h:58-192` (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — Phase 5 will re-verify the field-name access list via `gcc -c` test compile).
Phase 3 keyframe anchor: `width=1280, height=720, profile=0, bit_depth=8, tile_log2={0,0}, level=3, sharpness=0` — direct match.
### Clause 5 — DPB timestamp resolution (3 active references from 8-slot DPB)
VAAPI's `picture->reference_frames[0..7]` is the full 8-entry DPB. The 3 active references for the current frame are indexed by `last_ref_frame`/`golden_ref_frame`/`alt_ref_frame` (each 3-bit, points into the 8-slot array).
```c
VASurfaceID last_id = picture->reference_frames[picture->pic_fields.bits.last_ref_frame];
VASurfaceID golden_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame];
VASurfaceID alt_id = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame];
struct object_surface *last_ref = (last_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id) : NULL;
struct object_surface *golden_ref = (golden_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, golden_id) : NULL;
struct object_surface *alt_ref = (alt_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id) : NULL;
if (last_ref) frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref) frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
```
Mirrors iter1/iter3 ref-resolution pattern. For keyframes (all refs invalid), timestamps stay 0 from memset.
Sign bias:
```c
if (picture->pic_fields.bits.last_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST;
if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;
```
### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse)
VAAPI exposes `filter_level` and `sharpness_level` (Clause 4), but NOT `lf_delta_enabled`/`lf_delta_update`/`lf_ref_delta[4]`/`lf_mode_delta[2]`. Phase 3 keyframe anchor shows `lf.ref_deltas={1,0,-1,-1}` (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch.
VAAPI also doesn't expose `quant.base_q_idx` / `delta_q_y_dc` / `delta_q_uv_dc` / `delta_q_uv_ac`. Phase 3 keyframe anchor shows `base_q_idx=46`; leaving zero produces wrong dequant scale.
Solution: implement a minimal uncompressed-header parser (`vp9_parse_uncompressed_header_lf_quant`) that reads `surface_object->source_data` from offset 0 and extracts the 6 needed fields. The parse runs from offset 0 through the loop-filter and quantization syntax sections (per VP9 spec 6.2 §6.2.46.2.5):
```c
static void vp9_parse_uncompressed_header_lf_quant(
const uint8_t *data, uint32_t size, uint32_t header_size,
struct v4l2_vp9_loop_filter *lf,
struct v4l2_vp9_quantization *quant)
{
/* Bit reader walks frame_marker, profile, show_existing_frame,
* frame_type, show_frame, error_resilient_mode, color_config (if
* keyframe), frame_size_with_refs (if not keyframe), tile_info ...
* up to loop_filter_params + quantization_params syntax sections.
*
* Approach: bit-perfect VP9 spec port for ~50 LOC, reusing the
* VPX bitstream reader (see Clause 8). Fields written:
* lf->ref_deltas[0..3], lf->mode_deltas[0..1],
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED if set
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE if set
* quant->base_q_idx,
* quant->delta_q_y_dc, delta_q_uv_dc, delta_q_uv_ac
*/
...
}
```
**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe.
**Per memory `feedback_review_empirical_over_theoretical.md` Direction 2**: Phase 5 review must verify the parser by extracting these 9 fields from the actual BBB keyframe bitstream (start of `bbb_720p10s_vp9.webm` first frame) and comparing against Phase 3 anchor. If any field disagrees, Phase 5 returns "Critical: parser bug" and Phase 4 loops.
**Out of iter4 scope**: full uncompressed-header parse (color_config, frame_size for inter, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse, not a general bitstream reader.
### Clause 7 — Segmentation mapping
VAAPI conveys segmentation via:
- `picture->pic_fields.bits.{segmentation_enabled, segmentation_temporal_update, segmentation_update_map}` flags
- `picture->mb_segment_tree_probs[7]` (segment tree probs)
- `picture->segment_pred_probs[3]` (temporal-update probs; 255-padded if `temporal_update == 0`)
- `slice->seg_param[8].{segment_flags.fields, filter_level[4][2], luma_*_quant_scale, chroma_*_quant_scale}`
Kernel takes per-segment feature_data + feature_enabled bitmaps. The mapping is non-trivial because VAAPI's slice->seg_param[s] carries EFFECTIVE quant scales (already-computed by VAAPI consumer), while kernel wants the per-segment ALT_Q delta or absolute (depends on `ABS_OR_DELTA_UPDATE` flag).
```c
for (i = 0; i < 7; i++)
frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i];
for (i = 0; i < 3; i++)
frame.seg.pred_probs[i] = picture->segment_pred_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED;
if (picture->pic_fields.bits.segmentation_update_map)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.segmentation_temporal_update)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
/* UPDATE_DATA + ABS_OR_DELTA_UPDATE: not in VAAPI; left zero.
* For BBB (segmentation disabled), this is correct — flags ignored
* by kernel when ENABLED is clear. */
/* Per-segment feature_data (only meaningful when ENABLED):
* VAAPI's seg_param[s].luma_ac_quant_scale[s] is the EFFECTIVE per-
* segment scale. Kernel wants ALT_Q absolute Q-index OR delta.
* Recover via VP9 spec inverse-Q-table OR leave zero (BBB safe). */
for (i = 0; i < 8; i++) {
if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) {
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME;
frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] =
slice->seg_param[i].segment_flags.fields.segment_reference;
}
if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped)
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP;
/* SEG_LVL_ALT_Q + ALT_L: VAAPI doesn't directly expose per-segment
* abs/delta intent. Phase 5 review point: BBB has segmentation
* disabled so this code path is dead; non-BBB fixtures are out of
* iter4 scope (see backlog). */
}
```
**Anchor**: Phase 3 keyframe `seg = all zeros` (BBB segmentation disabled). The Clause 7 logic is exercised only for inter frames with segmentation_enabled — out of iter4 BBB scope. Document as fidelity gap.
### Clause 8 — VPX range coder + inv_map_table (for compressed header parse)
Direct port from FFmpeg `v4l2_request_vp9.c:42-97`:
- `inv_map_table[255]` — copy verbatim
- `vpx_rac_init(c, buf, size)` — initialize range coder over the compressed-header bytes
- `vp89_rac_get(c)` — read a single bit
- `vp89_rac_get_uint(c, n)` — read n bits MSB-first
- `vpx_rac_get_prob_branchy(c, prob)` — read with given probability
- `read_prob_delta(c)` — the 4-tier VLC + inv_map_table lookup used to update one prob
~80 LOC, all stateless static functions. Implementation can be either inlined in `vp9.c` (Phase 2 B6 Option A — chosen) or split to `vp9_rac.h`. Phase 2 default = Option A; Phase 5 may flip to Option B if reuse pressure surfaces.
### Clause 9 — Compressed-header parser (`vp9_fill_compressed_hdr`)
Direct port of FFmpeg `v4l2_request_vp9.c:99-261::fill_compressed_hdr`. Reads from `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes. ~180 LOC.
Syntax elements parsed (per VP9 spec 6.3):
- `tx_mode` (2 bits, +1 conditional bit when SELECT)
- TX 8x8/16x16/32x32 probability deltas (only if `tx_mode == SELECT`)
- Coef probability deltas (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability deltas (only on inter frames)
- MV probability deltas (joint/sign/classes/class0_bit/bits/class0_fr/fr/class0_hp/hp)
Each updated value goes through `inv_map_table[d]`. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability").
**Lossless special case**: if `s->s.h.lossless` would be set, FFmpeg writes `tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally. We don't have direct access to `lossless` from VAAPI, but `picture->pic_fields.bits.lossless_flag` (bit 31 of pic_fields) maps directly. Read it and apply the same special case.
**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 5/Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").
### Clause 10 — Frame flags + reference_mode + interpolation_filter
```c
if (!picture->pic_fields.bits.frame_type) /* VAAPI inverts: 0 means keyframe */
frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME;
if (picture->pic_fields.bits.show_frame) frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME;
if (picture->pic_fields.bits.error_resilient_mode) frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT;
if (picture->pic_fields.bits.intra_only) frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY;
if (picture->pic_fields.bits.allow_high_precision_mv)
frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV;
if (picture->pic_fields.bits.refresh_frame_context)
frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX;
if (picture->pic_fields.bits.frame_parallel_decoding_mode)
frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE;
if (picture->pic_fields.bits.subsampling_x) frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING;
if (picture->pic_fields.bits.subsampling_y) frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING;
/* COLOR_RANGE_FULL_SWING: VAAPI doesn't expose; leave clear (BT.709 limited for BBB). */
/* reset_frame_context: FFmpeg uses (resetctx > 0 ? resetctx - 1 : 0).
* VAAPI's pic_fields.bits.reset_frame_context is 2 bits (0..3).
* V4L2 enum is 0..2. The off-by-one is because VP9 spec encodes
* "no reset" + 3 reset variants into 2 bits, but kernel enum drops
* the encoder helper offset. Follow FFmpeg's mapping verbatim: */
frame.reset_frame_context =
picture->pic_fields.bits.reset_frame_context > 0
? picture->pic_fields.bits.reset_frame_context - 1
: 0;
/* interpolation_filter: FFmpeg uses (filtermode ^ (filtermode <= 1)).
* VAAPI's mcomp_filter_type is 3 bits (0..7); kernel enum is 0..4.
* The XOR remap aligns FFmpeg's internal filter_mode enum to V4L2's. */
frame.interpolation_filter =
picture->pic_fields.bits.mcomp_filter_type ^
(picture->pic_fields.bits.mcomp_filter_type <= 1);
/* reference_mode: comes from compressed-header parse (NOT VAAPI).
* Read from compressed_hdr's parsed state (see Clause 9). */
frame.reference_mode = compressed_hdr_reference_mode; /* state from Clause 9 */
```
**Anchor**: Phase 3 verbatim — keyframe `reset_frame_context=0, interpolation_filter=0` (VAAPI's `mcomp_filter_type=0` XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes.
**Phase 5 review point**: the FFmpeg-inferred mappings for `reset_frame_context` and `interpolation_filter` are tied to *FFmpeg's* internal enum order. VAAPI's enum order may differ. Phase 5 should empirically validate by decoding Phase 3's keyframe payload byte 144 (offset of `reset_frame_context`) and byte 149 (offset of `interpolation_filter`) and cross-checking with VAAPI's `pic_fields.bits` for the same frame. If they disagree, the FFmpeg-inferred remap is wrong.
### Clause 11 — Final 2-control batched submission
```c
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME,
.ptr = &frame, .size = sizeof frame },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR,
.ptr = &compressed_hdr, .size = sizeof compressed_hdr },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
```
Mirrors iter3's Clause 10 with count=2 instead of count=1.
### Clause 12 — Bitstream offsetting
Backend hands the kernel the FULL frame bitstream via `surface_object->source_data` + `surface_object->source_size`. The kernel uses `picture->frame_header_length_in_bytes` as the start-of-compressed-header offset. The compressed header parser (Clause 9) reads `[uncompressed_header_size, uncompressed_header_size + compressed_header_size)` from the bitstream buffer.
```c
const uint8_t *compressed_hdr_start =
surface_object->source_data + frame.uncompressed_header_size;
uint32_t compressed_hdr_len = frame.compressed_header_size;
vp9_fill_compressed_hdr(&compressed_hdr,
compressed_hdr_start,
compressed_hdr_len);
/* Same buffer pointer used by Clause 6 for uncompressed-header parse,
* but with offset 0 + length = uncompressed_header_size. */
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data, surface_object->source_size,
frame.uncompressed_header_size,
&frame.lf, &frame.quant);
```
## Patch shape (commits)
iter4 implements as 5 commits (mitigation B + iter3-style ABCD):
### Commit Z — `src/request.c`: device-path enumeration (mitigation B)
Replace hardcoded `/dev/video0` + `/dev/media0` defaults with walk-and-pick-first-known-decoder:
```c
static int find_codec_device(char video_path[32], char media_path[32])
{
static const char * const known_drivers[] = {
"rkvdec", "hantro-vpu", "cedrus", "sun4i_csi", NULL
};
char path[32];
struct v4l2_capability caps;
int fd, i;
const char * const *kd;
/* Walk /dev/video0..15 */
for (i = 0; i < 16; i++) {
snprintf(path, sizeof path, "/dev/video%d", i);
fd = open(path, O_RDWR | O_NONBLOCK);
if (fd < 0) continue;
if (ioctl(fd, VIDIOC_QUERYCAP, &caps) == 0) {
for (kd = known_drivers; *kd; kd++) {
if (strcmp((char *)caps.driver, *kd) == 0) {
strncpy(video_path, path, 32);
/* Match media device by driver name */
find_media_for_driver((char *)caps.driver, media_path);
close(fd);
return 0;
}
}
}
close(fd);
}
return -1;
}
/* In RequestInit: */
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
static char auto_video[32], auto_media[32];
if (find_codec_device(auto_video, auto_media) == 0) {
video_path = auto_video;
if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL)
media_path = auto_media;
request_log("auto-selected codec device: %s + %s\n",
video_path, media_path);
} else {
video_path = "/dev/video0"; /* keep old fallback for callers
we can't enumerate */
}
}
```
`find_media_for_driver` walks `/dev/media0..15`, opens each, calls `MEDIA_IOC_DEVICE_INFO`, returns the path whose `driver` field matches. Phase 3 baseline confirmed `media0 ↔ rkvdec` and `media1 ↔ hantro-vpu` on 7.0.
Predicted +35 LOC, 1 file modified. Build target after Commit Z: `vainfo` (no env override) lists the auto-selected decoder's profiles. Independent of VP9 work — can be tested + merged before Commit A.
**End-user UX gap (documented, NOT fixed in iter4)**: backend opens ONE codec device at init. If user wants the OTHER decoder (e.g., default selects rkvdec but user wants hantro for MPEG-2/VP8), they still need env override. Aggregating BOTH decoders simultaneously requires a deeper refactor (multi-fd dispatch); out of iter4 scope, cross-cutting backlog item iter4-B1.
### Commit A — `src/config.c`: VP9 enumeration + dispatch + entrypoints
3 sites mirroring iter3 commit A:
1. `RequestQueryConfigProfiles` (after VP8 enumeration block from iter3): add VP9 enumeration block probing `V4L2_PIX_FMT_VP9_FRAME` against single + MPLANE OUTPUT formats. Adds `VAProfileVP9Profile0`. ~10 LOC.
2. `RequestCreateConfig` (after VP8 case from iter3): add `case VAProfileVP9Profile0: break;` with comment block. ~5 LOC.
3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP9Profile0:` to existing fall-through. ~1 LOC.
Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` (with env override or post-commit-Z auto-detect) lists `VAProfileVP9Profile0` on rkvdec env.
### Commit B — NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` integration
Net-new `vp9.c` implements `vp9_set_controls()` per Clauses 1-12 above.
Predicted ~580 LOC for `vp9.c` (50 LOC infrastructure + 80 LOC VPX rac + 50 LOC uncompressed-header partial parse + 180 LOC compressed-header parser + 50 LOC frame-fill (Clauses 4-5,7,10) + 30 LOC of submission/wrap). ~40 LOC for `vp9.h`. +2 lines `meson.build`.
3 files (2 new + 1 modified). Build target after Commit B: vp9.o compiles standalone, picture.c can't dispatch yet.
### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + buffer routing + union extension
5 sites:
1. `picture.c:34-37` include block: add `#include "vp9.h"`.
2. `picture.c::codec_set_controls`: add VP9 dispatch case calling `vp9_set_controls`. ~6 LOC.
3. `picture.c::codec_store_buffer`: add VP9 inner cases for `VAPictureParameterBufferType` and `VASliceParameterBufferType`. ~14 LOC. (NO `VAProbabilityBufferType` for VP9; NO `VAIQMatrixBufferType`. Confirmed in Phase 2 B8.)
4. `picture.c::RequestBeginPicture`: NO change predicted (VP9 doesn't have iter3-style `iqmatrix_set` flag — Picture/Slice always populated per frame by VAAPI consumer). Phase 2 B9 confirms.
5. `surface.h::object_surface::params` union: add `vp9` struct after `vp8`:
```c
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
```
Predicted +26 LOC, 2 files modified. Build target after Commit C: backend builds clean; mpv-vaapi VP9 decode should engage end-to-end on rkvdec.
### Commit D — fix-forward placeholder
Phase 2 B12 predicted no `buffer.c` changes (VP9's 3 buffer types — Picture, Slice, Data — already in iter3's allow-list). Per memory `feedback_runtime_enumerates_allowlists.md`, plan for fix-forward if Commit C runtime hits an allow-list miss; otherwise this commit slot stays empty.
## Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|:-:|:-:|:-:|:-:|
| `src/request.c` | | ✓ | +35 | Z |
| `src/config.c` | | ✓ | +16 | A |
| `src/vp9.c` | ✓ | | +580 | B |
| `src/vp9.h` | ✓ | | +40 | B |
| `src/meson.build` | | ✓ | +2 | B |
| `src/picture.c` | | ✓ | +20 | C |
| `src/surface.h` | | ✓ | +6 | C |
**Total**: ~699 LOC, 7 files (2 new + 5 modified). 4 commits (Z, A, B, C) + optional D. Notably bigger than iter3 (308 LOC) because of: device-path mitigation (35) + uncompressed-header partial parse (50) + compressed-header parser (180) + VPX rac (80).
## Cross-cutting backlog (out of iter4 scope)
Items inherited + NEW from iter4:
- **iter4-B1** (NEW) Backend opens ONE codec device at init (rkvdec OR hantro). Aggregating both for unified profile enumeration requires multi-fd dispatch refactor. Defer.
- **iter4-B2** (NEW) ffmpeg-vaapi / mpv-vaapi `Could not create device` failure mode persists even with env override. Likely a vaapi-DRM render-node path issue separate from device-path. Investigate in Phase 6 if HW=SW byte-compare fails.
- **iter4-Q6** (NEW) VAAPI per-segment `seg_param[s]` fields are EFFECTIVE quant scales; kernel wants ALT_Q absolute or delta. Mapping back is non-trivial; left zeros for BBB (segmentation disabled). Document as fidelity gap for non-BBB fixtures.
- **iter4-COLOR_RANGE** (NEW) VAAPI doesn't expose color_range; backend leaves `V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING` clear (BT.709 limited). Wrong for full-range JPEG-encoded VP9.
- **B5/B6** mpeg2 vbv polish + h265 SPS bitstream parse (carried from iter1+iter2).
- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only.
## Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — empirical-over-theoretical in BOTH directions):
1. **Uncompressed-header parser correctness (Clause 6)**: empirically decode the first ~200 bytes of `bbb_720p10s_vp9.webm` keyframe and confirm `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3, quant.base_q_idx=46` are the *correct* parse results — not just the kernel-direct's pre-formatted output. If the spec says the bits encode something different, the parser is wrong even if kernel-direct happens to match.
2. **`reset_frame_context` and `interpolation_filter` remap (Clause 10)**: empirically extract these bytes from Phase 3 strace payload and cross-check FFmpeg's XOR/-1 remap against the bytes' literal interpretation as VP9 spec enums.
3. **Compile-time size assertions (Clause 3)**: are 168/2040 stable across kernel UAPI versions, or will a 7.1+ kernel grow them again? If unstable, replace with a runtime size assertion via `VIDIOC_QUERY_EXT_CTRL` + `flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Phase 5 reviewer call.
4. **Per-segment mapping (Clause 7)**: BBB doesn't exercise segmentation. For non-BBB segmentation-enabled fixtures (out of iter4 scope), is the planned `seg_param[s].luma_ac_quant_scale``feature_data[s][ALT_Q]` mapping fundamentally wrong (effective scale vs delta), or just lossy? Document the gap clearly.
5. **Test compile field availability**: per Direction 2, every VAAPI field-name reference in this plan should be `gcc -c` test-compiled before Phase 6. Reviewer should verify the access list in Clauses 4, 5, 7, 10.
6. **Mitigation B regression risk**: `request.c` is shared with all 5 already-shipping codecs. Could the walk-and-pick-first logic regress any existing test fixture if env vars happen to be unset by accident? Phase 5 should suggest a safety knob (e.g., `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` to force old `/dev/video0` behavior).
7. **Lossless flag mapping (Clause 9)**: VAAPI's `pic_fields.bits.lossless_flag` — is it set the same way as FFmpeg's `s->s.h.lossless`? VAAPI comment says "LosslessFlag = base_qindex == 0 && y_dc_delta_q == 0 && uv_dc_delta_q == 0 && uv_ac_delta_q == 0" — check that semantics align.
## Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP9Profile0 | Commit Z (device-path) + Commit A (`RequestQueryConfigProfiles` enumeration block) |
| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` |
| 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers |
| 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via `mpv -v` log per memory `feedback_hw_decode_engagement_check.md` |
| 5. 4-codec regression | Commit Z restores baseline (mitigation B); Commits A+B+C add new VP9 path purely additively (no shared-state mutation) |
## Substrate state at Phase 4 close
- Phase 0+1+2+3 commits at gitea (`9a71dbf`, `2651e4c`+`56abe3d` ID-correction, `56abe3d`).
- Fork at iter3 tip `e1aca9c` on noether; Phase 6 patches will land here.
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter4_phase3/` and `noether:~/src/fresnel-fourier/iter4_phase3.tgz`.
- Memory rules carry forward; new `reference_fresnel_kernel_substrate.md` covers post-besser substrate.
- Phase 4 plan ready for sonnet-architect review (Phase 5).