From 4b36077b171d7ab7f5015911caf79c7161e7794d Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 9 May 2026 23:10:47 +0000 Subject: [PATCH] iter4 Phase 4: plan locks 12 contract clauses + Mitigation B MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit 5-commit plan (Z, A, B, C, optional D): - Commit Z: src/request.c — walk /dev/video* + /dev/media*, match by driver name in {rkvdec, hantro-vpu, cedrus, sun4i_csi}; restores baseline functionality on 7.0 (where /dev/video0 is rockchip-rga). - Commit A: src/config.c — VAProfileVP9Profile0 enumeration + dispatch + entrypoints (~16 LOC, 1 file). - Commit B: NEW src/vp9.c + .h + meson — 12 contract clauses; ~580 LOC vp9.c (50 infra + 80 VPX rac + 50 uncompressed-header partial parse + 180 compressed-header parser + ~200 frame-fill). - Commit C: src/picture.c + surface.h — VP9 dispatch + 2 buffer-type cases + union extension; NO BeginPicture reset (VP9 has no iqmatrix_set-style flags). - Commit D: optional fix-forward placeholder (predicted no-op per feedback_runtime_enumerates_allowlists.md). Total ~699 LOC, 7 files. 12 contract clauses include 2 NEW vs iter3: - Clause 3: compile-time _Static_assert sizeof v4l2_ctrl_vp9_frame == 168 && ..._compressed_hdr == 2040 (any UAPI shift fails loudly). - Clause 6: uncompressed-header partial parse for lf_delta_* + base_q_idx (VAAPI doesn't expose; BBB keyframe needs non-zero ref_deltas={1,0,-1,-1} per Phase 3 anchor). 7 Phase 5 review questions queued, all empirical-leaning per feedback_review_empirical_over_theoretical.md Direction 2: parser-vs-bitstream cross-check, FFmpeg-XOR-remap validation, struct-size stability, mitigation B regression risk. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter4_plan.md | 495 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 495 insertions(+) create mode 100644 phase4_iter4_plan.md diff --git a/phase4_iter4_plan.md b/phase4_iter4_plan.md new file mode 100644 index 0000000..1170f13 --- /dev/null +++ b/phase4_iter4_plan.md @@ -0,0 +1,495 @@ +# Iteration 4 — Phase 4 (plan) + +Locks the iter4 patch shape against verbatim Phase 3 baseline (`phase3_iter4_baseline.md`, commit `56abe3d`) and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2 (`phase2_iter4_situation.md`, commit `2651e4c` + ID-correction in `56abe3d`). Plan structure mirrors iter2/iter3 clause template, expanded for VP9-specific scope (compressed-header parser + uncompressed-header partial parse + device-path mitigation). + +Phase 3 baseline provides: + +- Empirical struct sizes 168 B / 2040 B (NOT Phase 2's 144 / 1947 estimates) +- Correct control IDs `0xa40a2c` / `0xa40a2d` +- Frame-1 keyframe verbatim payload prefix (`lf.ref_deltas={1,0,-1,-1}`, `lf.mode_deltas={0,0}`, `quant.base_q_idx=46`) +- 4-codec regression root-cause: `/dev/video0` is now `rockchip-rga` on 7.0; backend hardcodes `/dev/video0` in `request.c:149` + +User picked **Mitigation B**: in-fork patch — walk `/dev/video*`, query `VIDIOC_QUERYCAP`, pick first device whose driver name is in `{rkvdec, hantro-vpu}`. Adds ~30 LOC to `request.c`. Restores baseline functionality on 7.0. + +## Contract clauses + +### Clause 1 — Submission shape (per-frame) + +ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. **TWO controls** (vs iter3's one): `V4L2_CID_STATELESS_VP9_FRAME` + `V4L2_CID_STATELESS_VP9_COMPRESSED_HDR`. `rkvdec-vp9.c::rkvdec_vp9_run_preamble:752` `WARN_ON(!ctrl); return -EINVAL` if COMPRESSED_HDR absent — hard requirement on RK3399. + +```c +struct v4l2_ext_control ctrls[2] = { + { .id = V4L2_CID_STATELESS_VP9_FRAME, /* 0xa40a2c */ + .ptr = &frame, + .size = sizeof frame /* MUST be 168 */ }, + { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, /* 0xa40a2d */ + .ptr = &compressed_hdr, + .size = sizeof compressed_hdr /* MUST be 2040 */ }, +}; + +rc = v4l2_set_controls(driver_data->video_fd, + surface_object->request_fd, + ctrls, 2); +``` + +`v4l2_set_controls` wraps with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verifies `ctrl_class=0xf010000` is what the kernel sees, matching iter1+iter2+iter3. + +**No init-time probe**: ffmpeg-v4l2request first runs a count=1 probe (FRAME-only) to check kernel CID support, then count=2. iter4 backend skips the probe — VP9 on rkvdec REQUIRES COMPRESSED_HDR per kernel source; if the kernel doesn't have it, decode would fail anyway. Issuing count=2 unconditionally is correct. + +**Anchor**: Phase 3 baseline § Anchor 2 verbatim ioctl trace. + +### Clause 2 — Local struct allocation + zero-init + +```c +int vp9_set_controls(struct request_data *driver_data, + struct object_context *context, + struct object_surface *surface_object) +{ + VADecPictureParameterBufferVP9 *picture = + &surface_object->params.vp9.picture; + VASliceParameterBufferVP9 *slice = + &surface_object->params.vp9.slice; + + struct v4l2_ctrl_vp9_frame frame; + struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr; + + memset(&frame, 0, sizeof frame); + memset(&compressed_hdr, 0, sizeof compressed_hdr); + /* Zero is the kernel's "no probability update" default for every + * field in compressed_hdr, and the safe default for every numeric + * field of frame except reference_mode (set explicitly later). */ + ... +} +``` + +VAAPI doesn't have iter3-style `set` flags for VP9; both Picture and Slice are unconditionally populated by the consumer per frame (per Phase 2 analysis B9, no per-frame reset needed in `RequestBeginPicture`). + +### Clause 3 — Compile-time struct-size assertions + +Per Phase 3 finding: kernel UAPI struct sizes are **168 B (FRAME)** and **2040 B (COMPRESSED_HDR)** on 7.0; iter4's `Build Date: post-2026-05-09` will use whatever size the build host's UAPI headers report. Adding compile-time asserts at the top of `vp9.c` makes any future struct-size drift fail loudly instead of silently corrupting kernel control writes: + +```c +_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168, + "v4l2_ctrl_vp9_frame size mismatch — UAPI changed"); +_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040, + "v4l2_ctrl_vp9_compressed_hdr size mismatch — UAPI changed"); +``` + +If these fire, treat as a kernel-substrate bump (re-baseline Phase 3) — DO NOT just bump the asserts. + +### Clause 4 — Frame geometry + per-frame scalars + +```c +frame.frame_width_minus_1 = picture->frame_width - 1; +frame.frame_height_minus_1 = picture->frame_height - 1; +frame.render_width_minus_1 = picture->frame_width - 1; /* VAAPI gap; +frame.render_height_minus_1 = picture->frame_height - 1; no scaling for BBB */ + +frame.profile = picture->profile; +frame.bit_depth = picture->bit_depth; +frame.tile_cols_log2 = picture->log2_tile_columns; +frame.tile_rows_log2 = picture->log2_tile_rows; +frame.frame_context_idx = picture->pic_fields.bits.frame_context_idx; + +frame.lf.level = picture->filter_level; +frame.lf.sharpness = picture->sharpness_level; + +frame.uncompressed_header_size = picture->frame_header_length_in_bytes; +frame.compressed_header_size = picture->first_partition_size; +``` + +VAAPI fields verified via test-compile against `va_dec_vp9.h:58-192` (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — Phase 5 will re-verify the field-name access list via `gcc -c` test compile). + +Phase 3 keyframe anchor: `width=1280, height=720, profile=0, bit_depth=8, tile_log2={0,0}, level=3, sharpness=0` — direct match. + +### Clause 5 — DPB timestamp resolution (3 active references from 8-slot DPB) + +VAAPI's `picture->reference_frames[0..7]` is the full 8-entry DPB. The 3 active references for the current frame are indexed by `last_ref_frame`/`golden_ref_frame`/`alt_ref_frame` (each 3-bit, points into the 8-slot array). + +```c +VASurfaceID last_id = picture->reference_frames[picture->pic_fields.bits.last_ref_frame]; +VASurfaceID golden_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame]; +VASurfaceID alt_id = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame]; + +struct object_surface *last_ref = (last_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id) : NULL; +struct object_surface *golden_ref = (golden_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, golden_id) : NULL; +struct object_surface *alt_ref = (alt_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id) : NULL; + +if (last_ref) frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp); +if (golden_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp); +if (alt_ref) frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp); +``` + +Mirrors iter1/iter3 ref-resolution pattern. For keyframes (all refs invalid), timestamps stay 0 from memset. + +Sign bias: + +```c +if (picture->pic_fields.bits.last_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST; +if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN; +if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT; +``` + +### Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse) + +VAAPI exposes `filter_level` and `sharpness_level` (Clause 4), but NOT `lf_delta_enabled`/`lf_delta_update`/`lf_ref_delta[4]`/`lf_mode_delta[2]`. Phase 3 keyframe anchor shows `lf.ref_deltas={1,0,-1,-1}` (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch. + +VAAPI also doesn't expose `quant.base_q_idx` / `delta_q_y_dc` / `delta_q_uv_dc` / `delta_q_uv_ac`. Phase 3 keyframe anchor shows `base_q_idx=46`; leaving zero produces wrong dequant scale. + +Solution: implement a minimal uncompressed-header parser (`vp9_parse_uncompressed_header_lf_quant`) that reads `surface_object->source_data` from offset 0 and extracts the 6 needed fields. The parse runs from offset 0 through the loop-filter and quantization syntax sections (per VP9 spec 6.2 §6.2.4–6.2.5): + +```c +static void vp9_parse_uncompressed_header_lf_quant( + const uint8_t *data, uint32_t size, uint32_t header_size, + struct v4l2_vp9_loop_filter *lf, + struct v4l2_vp9_quantization *quant) +{ + /* Bit reader walks frame_marker, profile, show_existing_frame, + * frame_type, show_frame, error_resilient_mode, color_config (if + * keyframe), frame_size_with_refs (if not keyframe), tile_info ... + * up to loop_filter_params + quantization_params syntax sections. + * + * Approach: bit-perfect VP9 spec port for ~50 LOC, reusing the + * VPX bitstream reader (see Clause 8). Fields written: + * lf->ref_deltas[0..3], lf->mode_deltas[0..1], + * lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED if set + * lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE if set + * quant->base_q_idx, + * quant->delta_q_y_dc, delta_q_uv_dc, delta_q_uv_ac + */ + ... +} +``` + +**Anchor**: Phase 3 keyframe `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0`. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe. + +**Per memory `feedback_review_empirical_over_theoretical.md` Direction 2**: Phase 5 review must verify the parser by extracting these 9 fields from the actual BBB keyframe bitstream (start of `bbb_720p10s_vp9.webm` first frame) and comparing against Phase 3 anchor. If any field disagrees, Phase 5 returns "Critical: parser bug" and Phase 4 loops. + +**Out of iter4 scope**: full uncompressed-header parse (color_config, frame_size for inter, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse, not a general bitstream reader. + +### Clause 7 — Segmentation mapping + +VAAPI conveys segmentation via: +- `picture->pic_fields.bits.{segmentation_enabled, segmentation_temporal_update, segmentation_update_map}` flags +- `picture->mb_segment_tree_probs[7]` (segment tree probs) +- `picture->segment_pred_probs[3]` (temporal-update probs; 255-padded if `temporal_update == 0`) +- `slice->seg_param[8].{segment_flags.fields, filter_level[4][2], luma_*_quant_scale, chroma_*_quant_scale}` + +Kernel takes per-segment feature_data + feature_enabled bitmaps. The mapping is non-trivial because VAAPI's slice->seg_param[s] carries EFFECTIVE quant scales (already-computed by VAAPI consumer), while kernel wants the per-segment ALT_Q delta or absolute (depends on `ABS_OR_DELTA_UPDATE` flag). + +```c +for (i = 0; i < 7; i++) + frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i]; +for (i = 0; i < 3; i++) + frame.seg.pred_probs[i] = picture->segment_pred_probs[i]; + +if (picture->pic_fields.bits.segmentation_enabled) + frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED; +if (picture->pic_fields.bits.segmentation_update_map) + frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP; +if (picture->pic_fields.bits.segmentation_temporal_update) + frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE; +/* UPDATE_DATA + ABS_OR_DELTA_UPDATE: not in VAAPI; left zero. + * For BBB (segmentation disabled), this is correct — flags ignored + * by kernel when ENABLED is clear. */ + +/* Per-segment feature_data (only meaningful when ENABLED): + * VAAPI's seg_param[s].luma_ac_quant_scale[s] is the EFFECTIVE per- + * segment scale. Kernel wants ALT_Q absolute Q-index OR delta. + * Recover via VP9 spec inverse-Q-table OR leave zero (BBB safe). */ +for (i = 0; i < 8; i++) { + if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) { + frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME; + frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] = + slice->seg_param[i].segment_flags.fields.segment_reference; + } + if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped) + frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP; + /* SEG_LVL_ALT_Q + ALT_L: VAAPI doesn't directly expose per-segment + * abs/delta intent. Phase 5 review point: BBB has segmentation + * disabled so this code path is dead; non-BBB fixtures are out of + * iter4 scope (see backlog). */ +} +``` + +**Anchor**: Phase 3 keyframe `seg = all zeros` (BBB segmentation disabled). The Clause 7 logic is exercised only for inter frames with segmentation_enabled — out of iter4 BBB scope. Document as fidelity gap. + +### Clause 8 — VPX range coder + inv_map_table (for compressed header parse) + +Direct port from FFmpeg `v4l2_request_vp9.c:42-97`: + +- `inv_map_table[255]` — copy verbatim +- `vpx_rac_init(c, buf, size)` — initialize range coder over the compressed-header bytes +- `vp89_rac_get(c)` — read a single bit +- `vp89_rac_get_uint(c, n)` — read n bits MSB-first +- `vpx_rac_get_prob_branchy(c, prob)` — read with given probability +- `read_prob_delta(c)` — the 4-tier VLC + inv_map_table lookup used to update one prob + +~80 LOC, all stateless static functions. Implementation can be either inlined in `vp9.c` (Phase 2 B6 Option A — chosen) or split to `vp9_rac.h`. Phase 2 default = Option A; Phase 5 may flip to Option B if reuse pressure surfaces. + +### Clause 9 — Compressed-header parser (`vp9_fill_compressed_hdr`) + +Direct port of FFmpeg `v4l2_request_vp9.c:99-261::fill_compressed_hdr`. Reads from `surface_object->source_data + uncompressed_header_size` for `compressed_header_size` bytes. ~180 LOC. + +Syntax elements parsed (per VP9 spec 6.3): +- `tx_mode` (2 bits, +1 conditional bit when SELECT) +- TX 8x8/16x16/32x32 probability deltas (only if `tx_mode == SELECT`) +- Coef probability deltas (4-level nested loop with branch probs) +- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability deltas (only on inter frames) +- MV probability deltas (joint/sign/classes/class0_bit/bits/class0_fr/fr/class0_hp/hp) + +Each updated value goes through `inv_map_table[d]`. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability"). + +**Lossless special case**: if `s->s.h.lossless` would be set, FFmpeg writes `tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4` unconditionally. We don't have direct access to `lossless` from VAAPI, but `picture->pic_fields.bits.lossless_flag` (bit 31 of pic_fields) maps directly. Read it and apply the same special case. + +**Anchor**: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 5/Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output"). + +### Clause 10 — Frame flags + reference_mode + interpolation_filter + +```c +if (!picture->pic_fields.bits.frame_type) /* VAAPI inverts: 0 means keyframe */ + frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME; +if (picture->pic_fields.bits.show_frame) frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME; +if (picture->pic_fields.bits.error_resilient_mode) frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT; +if (picture->pic_fields.bits.intra_only) frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY; +if (picture->pic_fields.bits.allow_high_precision_mv) + frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV; +if (picture->pic_fields.bits.refresh_frame_context) + frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX; +if (picture->pic_fields.bits.frame_parallel_decoding_mode) + frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE; +if (picture->pic_fields.bits.subsampling_x) frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING; +if (picture->pic_fields.bits.subsampling_y) frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING; +/* COLOR_RANGE_FULL_SWING: VAAPI doesn't expose; leave clear (BT.709 limited for BBB). */ + +/* reset_frame_context: FFmpeg uses (resetctx > 0 ? resetctx - 1 : 0). + * VAAPI's pic_fields.bits.reset_frame_context is 2 bits (0..3). + * V4L2 enum is 0..2. The off-by-one is because VP9 spec encodes + * "no reset" + 3 reset variants into 2 bits, but kernel enum drops + * the encoder helper offset. Follow FFmpeg's mapping verbatim: */ +frame.reset_frame_context = + picture->pic_fields.bits.reset_frame_context > 0 + ? picture->pic_fields.bits.reset_frame_context - 1 + : 0; + +/* interpolation_filter: FFmpeg uses (filtermode ^ (filtermode <= 1)). + * VAAPI's mcomp_filter_type is 3 bits (0..7); kernel enum is 0..4. + * The XOR remap aligns FFmpeg's internal filter_mode enum to V4L2's. */ +frame.interpolation_filter = + picture->pic_fields.bits.mcomp_filter_type ^ + (picture->pic_fields.bits.mcomp_filter_type <= 1); + +/* reference_mode: comes from compressed-header parse (NOT VAAPI). + * Read from compressed_hdr's parsed state (see Clause 9). */ +frame.reference_mode = compressed_hdr_reference_mode; /* state from Clause 9 */ +``` + +**Anchor**: Phase 3 verbatim — keyframe `reset_frame_context=0, interpolation_filter=0` (VAAPI's `mcomp_filter_type=0` XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes. + +**Phase 5 review point**: the FFmpeg-inferred mappings for `reset_frame_context` and `interpolation_filter` are tied to *FFmpeg's* internal enum order. VAAPI's enum order may differ. Phase 5 should empirically validate by decoding Phase 3's keyframe payload byte 144 (offset of `reset_frame_context`) and byte 149 (offset of `interpolation_filter`) and cross-checking with VAAPI's `pic_fields.bits` for the same frame. If they disagree, the FFmpeg-inferred remap is wrong. + +### Clause 11 — Final 2-control batched submission + +```c +struct v4l2_ext_control ctrls[2] = { + { .id = V4L2_CID_STATELESS_VP9_FRAME, + .ptr = &frame, .size = sizeof frame }, + { .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, + .ptr = &compressed_hdr, .size = sizeof compressed_hdr }, +}; + +rc = v4l2_set_controls(driver_data->video_fd, + surface_object->request_fd, + ctrls, 2); +if (rc < 0) + return VA_STATUS_ERROR_OPERATION_FAILED; +return 0; +``` + +Mirrors iter3's Clause 10 with count=2 instead of count=1. + +### Clause 12 — Bitstream offsetting + +Backend hands the kernel the FULL frame bitstream via `surface_object->source_data` + `surface_object->source_size`. The kernel uses `picture->frame_header_length_in_bytes` as the start-of-compressed-header offset. The compressed header parser (Clause 9) reads `[uncompressed_header_size, uncompressed_header_size + compressed_header_size)` from the bitstream buffer. + +```c +const uint8_t *compressed_hdr_start = + surface_object->source_data + frame.uncompressed_header_size; +uint32_t compressed_hdr_len = frame.compressed_header_size; + +vp9_fill_compressed_hdr(&compressed_hdr, + compressed_hdr_start, + compressed_hdr_len); + +/* Same buffer pointer used by Clause 6 for uncompressed-header parse, + * but with offset 0 + length = uncompressed_header_size. */ +vp9_parse_uncompressed_header_lf_quant( + surface_object->source_data, surface_object->source_size, + frame.uncompressed_header_size, + &frame.lf, &frame.quant); +``` + +## Patch shape (commits) + +iter4 implements as 5 commits (mitigation B + iter3-style ABCD): + +### Commit Z — `src/request.c`: device-path enumeration (mitigation B) + +Replace hardcoded `/dev/video0` + `/dev/media0` defaults with walk-and-pick-first-known-decoder: + +```c +static int find_codec_device(char video_path[32], char media_path[32]) +{ + static const char * const known_drivers[] = { + "rkvdec", "hantro-vpu", "cedrus", "sun4i_csi", NULL + }; + char path[32]; + struct v4l2_capability caps; + int fd, i; + const char * const *kd; + + /* Walk /dev/video0..15 */ + for (i = 0; i < 16; i++) { + snprintf(path, sizeof path, "/dev/video%d", i); + fd = open(path, O_RDWR | O_NONBLOCK); + if (fd < 0) continue; + if (ioctl(fd, VIDIOC_QUERYCAP, &caps) == 0) { + for (kd = known_drivers; *kd; kd++) { + if (strcmp((char *)caps.driver, *kd) == 0) { + strncpy(video_path, path, 32); + /* Match media device by driver name */ + find_media_for_driver((char *)caps.driver, media_path); + close(fd); + return 0; + } + } + } + close(fd); + } + return -1; +} + +/* In RequestInit: */ +video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH"); +if (video_path == NULL) { + static char auto_video[32], auto_media[32]; + if (find_codec_device(auto_video, auto_media) == 0) { + video_path = auto_video; + if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL) + media_path = auto_media; + request_log("auto-selected codec device: %s + %s\n", + video_path, media_path); + } else { + video_path = "/dev/video0"; /* keep old fallback for callers + we can't enumerate */ + } +} +``` + +`find_media_for_driver` walks `/dev/media0..15`, opens each, calls `MEDIA_IOC_DEVICE_INFO`, returns the path whose `driver` field matches. Phase 3 baseline confirmed `media0 ↔ rkvdec` and `media1 ↔ hantro-vpu` on 7.0. + +Predicted +35 LOC, 1 file modified. Build target after Commit Z: `vainfo` (no env override) lists the auto-selected decoder's profiles. Independent of VP9 work — can be tested + merged before Commit A. + +**End-user UX gap (documented, NOT fixed in iter4)**: backend opens ONE codec device at init. If user wants the OTHER decoder (e.g., default selects rkvdec but user wants hantro for MPEG-2/VP8), they still need env override. Aggregating BOTH decoders simultaneously requires a deeper refactor (multi-fd dispatch); out of iter4 scope, cross-cutting backlog item iter4-B1. + +### Commit A — `src/config.c`: VP9 enumeration + dispatch + entrypoints + +3 sites mirroring iter3 commit A: + +1. `RequestQueryConfigProfiles` (after VP8 enumeration block from iter3): add VP9 enumeration block probing `V4L2_PIX_FMT_VP9_FRAME` against single + MPLANE OUTPUT formats. Adds `VAProfileVP9Profile0`. ~10 LOC. +2. `RequestCreateConfig` (after VP8 case from iter3): add `case VAProfileVP9Profile0: break;` with comment block. ~5 LOC. +3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP9Profile0:` to existing fall-through. ~1 LOC. + +Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` (with env override or post-commit-Z auto-detect) lists `VAProfileVP9Profile0` on rkvdec env. + +### Commit B — NEW `src/vp9.c` + `src/vp9.h` + `src/meson.build` integration + +Net-new `vp9.c` implements `vp9_set_controls()` per Clauses 1-12 above. + +Predicted ~580 LOC for `vp9.c` (50 LOC infrastructure + 80 LOC VPX rac + 50 LOC uncompressed-header partial parse + 180 LOC compressed-header parser + 50 LOC frame-fill (Clauses 4-5,7,10) + 30 LOC of submission/wrap). ~40 LOC for `vp9.h`. +2 lines `meson.build`. + +3 files (2 new + 1 modified). Build target after Commit B: vp9.o compiles standalone, picture.c can't dispatch yet. + +### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + buffer routing + union extension + +5 sites: + +1. `picture.c:34-37` include block: add `#include "vp9.h"`. +2. `picture.c::codec_set_controls`: add VP9 dispatch case calling `vp9_set_controls`. ~6 LOC. +3. `picture.c::codec_store_buffer`: add VP9 inner cases for `VAPictureParameterBufferType` and `VASliceParameterBufferType`. ~14 LOC. (NO `VAProbabilityBufferType` for VP9; NO `VAIQMatrixBufferType`. Confirmed in Phase 2 B8.) +4. `picture.c::RequestBeginPicture`: NO change predicted (VP9 doesn't have iter3-style `iqmatrix_set` flag — Picture/Slice always populated per frame by VAAPI consumer). Phase 2 B9 confirms. +5. `surface.h::object_surface::params` union: add `vp9` struct after `vp8`: + +```c +struct { + VADecPictureParameterBufferVP9 picture; + VASliceParameterBufferVP9 slice; +} vp9; +``` + +Predicted +26 LOC, 2 files modified. Build target after Commit C: backend builds clean; mpv-vaapi VP9 decode should engage end-to-end on rkvdec. + +### Commit D — fix-forward placeholder + +Phase 2 B12 predicted no `buffer.c` changes (VP9's 3 buffer types — Picture, Slice, Data — already in iter3's allow-list). Per memory `feedback_runtime_enumerates_allowlists.md`, plan for fix-forward if Commit C runtime hits an allow-list miss; otherwise this commit slot stays empty. + +## Files touched summary + +| File | New | Modified | LOC delta | Commit | +|---|:-:|:-:|:-:|:-:| +| `src/request.c` | | ✓ | +35 | Z | +| `src/config.c` | | ✓ | +16 | A | +| `src/vp9.c` | ✓ | | +580 | B | +| `src/vp9.h` | ✓ | | +40 | B | +| `src/meson.build` | | ✓ | +2 | B | +| `src/picture.c` | | ✓ | +20 | C | +| `src/surface.h` | | ✓ | +6 | C | + +**Total**: ~699 LOC, 7 files (2 new + 5 modified). 4 commits (Z, A, B, C) + optional D. Notably bigger than iter3 (308 LOC) because of: device-path mitigation (35) + uncompressed-header partial parse (50) + compressed-header parser (180) + VPX rac (80). + +## Cross-cutting backlog (out of iter4 scope) + +Items inherited + NEW from iter4: + +- **iter4-B1** (NEW) Backend opens ONE codec device at init (rkvdec OR hantro). Aggregating both for unified profile enumeration requires multi-fd dispatch refactor. Defer. +- **iter4-B2** (NEW) ffmpeg-vaapi / mpv-vaapi `Could not create device` failure mode persists even with env override. Likely a vaapi-DRM render-node path issue separate from device-path. Investigate in Phase 6 if HW=SW byte-compare fails. +- **iter4-Q6** (NEW) VAAPI per-segment `seg_param[s]` fields are EFFECTIVE quant scales; kernel wants ALT_Q absolute or delta. Mapping back is non-trivial; left zeros for BBB (segmentation disabled). Document as fidelity gap for non-BBB fixtures. +- **iter4-COLOR_RANGE** (NEW) VAAPI doesn't expose color_range; backend leaves `V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWING` clear (BT.709 limited). Wrong for full-range JPEG-encoded VP9. +- **B5/B6** mpeg2 vbv polish + h265 SPS bitstream parse (carried from iter1+iter2). +- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only. + +## Phase 5 review prep + +Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer (per memory `feedback_review_empirical_over_theoretical.md` Direction 2 — empirical-over-theoretical in BOTH directions): + +1. **Uncompressed-header parser correctness (Clause 6)**: empirically decode the first ~200 bytes of `bbb_720p10s_vp9.webm` keyframe and confirm `lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3, quant.base_q_idx=46` are the *correct* parse results — not just the kernel-direct's pre-formatted output. If the spec says the bits encode something different, the parser is wrong even if kernel-direct happens to match. + +2. **`reset_frame_context` and `interpolation_filter` remap (Clause 10)**: empirically extract these bytes from Phase 3 strace payload and cross-check FFmpeg's XOR/-1 remap against the bytes' literal interpretation as VP9 spec enums. + +3. **Compile-time size assertions (Clause 3)**: are 168/2040 stable across kernel UAPI versions, or will a 7.1+ kernel grow them again? If unstable, replace with a runtime size assertion via `VIDIOC_QUERY_EXT_CTRL` + `flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY`. Phase 5 reviewer call. + +4. **Per-segment mapping (Clause 7)**: BBB doesn't exercise segmentation. For non-BBB segmentation-enabled fixtures (out of iter4 scope), is the planned `seg_param[s].luma_ac_quant_scale` → `feature_data[s][ALT_Q]` mapping fundamentally wrong (effective scale vs delta), or just lossy? Document the gap clearly. + +5. **Test compile field availability**: per Direction 2, every VAAPI field-name reference in this plan should be `gcc -c` test-compiled before Phase 6. Reviewer should verify the access list in Clauses 4, 5, 7, 10. + +6. **Mitigation B regression risk**: `request.c` is shared with all 5 already-shipping codecs. Could the walk-and-pick-first logic regress any existing test fixture if env vars happen to be unset by accident? Phase 5 should suggest a safety knob (e.g., `LIBVA_V4L2_REQUEST_NO_AUTODETECT=1` to force old `/dev/video0` behavior). + +7. **Lossless flag mapping (Clause 9)**: VAAPI's `pic_fields.bits.lossless_flag` — is it set the same way as FFmpeg's `s->s.h.lossless`? VAAPI comment says "LosslessFlag = base_qindex == 0 && y_dc_delta_q == 0 && uv_dc_delta_q == 0 && uv_ac_delta_q == 0" — check that semantics align. + +## Phase 1 criteria → Phase 4 plan trace + +| Criterion | Plan addresses | +|---|---| +| 1. vainfo enumerates VP9Profile0 | Commit Z (device-path) + Commit A (`RequestQueryConfigProfiles` enumeration block) | +| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` | +| 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers | +| 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via `mpv -v` log per memory `feedback_hw_decode_engagement_check.md` | +| 5. 4-codec regression | Commit Z restores baseline (mitigation B); Commits A+B+C add new VP9 path purely additively (no shared-state mutation) | + +## Substrate state at Phase 4 close + +- Phase 0+1+2+3 commits at gitea (`9a71dbf`, `2651e4c`+`56abe3d` ID-correction, `56abe3d`). +- Fork at iter3 tip `e1aca9c` on noether; Phase 6 patches will land here. +- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter4_phase3/` and `noether:~/src/fresnel-fourier/iter4_phase3.tgz`. +- Memory rules carry forward; new `reference_fresnel_kernel_substrate.md` covers post-besser substrate. +- Phase 4 plan ready for sonnet-architect review (Phase 5).