5-commit plan (Z, A, B, C, optional D):
- Commit Z: src/request.c — walk /dev/video* + /dev/media*, match by
driver name in {rkvdec, hantro-vpu, cedrus, sun4i_csi}; restores
baseline functionality on 7.0 (where /dev/video0 is rockchip-rga).
- Commit A: src/config.c — VAProfileVP9Profile0 enumeration + dispatch
+ entrypoints (~16 LOC, 1 file).
- Commit B: NEW src/vp9.c + .h + meson — 12 contract clauses; ~580 LOC
vp9.c (50 infra + 80 VPX rac + 50 uncompressed-header partial parse +
180 compressed-header parser + ~200 frame-fill).
- Commit C: src/picture.c + surface.h — VP9 dispatch + 2 buffer-type
cases + union extension; NO BeginPicture reset (VP9 has no
iqmatrix_set-style flags).
- Commit D: optional fix-forward placeholder (predicted no-op per
feedback_runtime_enumerates_allowlists.md).
Total ~699 LOC, 7 files.
12 contract clauses include 2 NEW vs iter3:
- Clause 3: compile-time _Static_assert sizeof v4l2_ctrl_vp9_frame ==
168 && ..._compressed_hdr == 2040 (any UAPI shift fails loudly).
- Clause 6: uncompressed-header partial parse for lf_delta_* +
base_q_idx (VAAPI doesn't expose; BBB keyframe needs non-zero
ref_deltas={1,0,-1,-1} per Phase 3 anchor).
7 Phase 5 review questions queued, all empirical-leaning per
feedback_review_empirical_over_theoretical.md Direction 2:
parser-vs-bitstream cross-check, FFmpeg-XOR-remap validation,
struct-size stability, mitigation B regression risk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
29 KiB
Iteration 4 — Phase 4 (plan)
Locks the iter4 patch shape against verbatim Phase 3 baseline (phase3_iter4_baseline.md, commit 56abe3d) and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2 (phase2_iter4_situation.md, commit 2651e4c + ID-correction in 56abe3d). Plan structure mirrors iter2/iter3 clause template, expanded for VP9-specific scope (compressed-header parser + uncompressed-header partial parse + device-path mitigation).
Phase 3 baseline provides:
- Empirical struct sizes 168 B / 2040 B (NOT Phase 2's 144 / 1947 estimates)
- Correct control IDs
0xa40a2c/0xa40a2d - Frame-1 keyframe verbatim payload prefix (
lf.ref_deltas={1,0,-1,-1},lf.mode_deltas={0,0},quant.base_q_idx=46) - 4-codec regression root-cause:
/dev/video0is nowrockchip-rgaon 7.0; backend hardcodes/dev/video0inrequest.c:149
User picked Mitigation B: in-fork patch — walk /dev/video*, query VIDIOC_QUERYCAP, pick first device whose driver name is in {rkvdec, hantro-vpu}. Adds ~30 LOC to request.c. Restores baseline functionality on 7.0.
Contract clauses
Clause 1 — Submission shape (per-frame)
ONE batched VIDIOC_S_EXT_CTRLS per frame, bound to the surface's permanent request_fd. TWO controls (vs iter3's one): V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR. rkvdec-vp9.c::rkvdec_vp9_run_preamble:752 WARN_ON(!ctrl); return -EINVAL if COMPRESSED_HDR absent — hard requirement on RK3399.
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME, /* 0xa40a2c */
.ptr = &frame,
.size = sizeof frame /* MUST be 168 */ },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR, /* 0xa40a2d */
.ptr = &compressed_hdr,
.size = sizeof compressed_hdr /* MUST be 2040 */ },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
v4l2_set_controls wraps with which=V4L2_CTRL_WHICH_REQUEST_VAL. Phase 3 strace verifies ctrl_class=0xf010000 is what the kernel sees, matching iter1+iter2+iter3.
No init-time probe: ffmpeg-v4l2request first runs a count=1 probe (FRAME-only) to check kernel CID support, then count=2. iter4 backend skips the probe — VP9 on rkvdec REQUIRES COMPRESSED_HDR per kernel source; if the kernel doesn't have it, decode would fail anyway. Issuing count=2 unconditionally is correct.
Anchor: Phase 3 baseline § Anchor 2 verbatim ioctl trace.
Clause 2 — Local struct allocation + zero-init
int vp9_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VADecPictureParameterBufferVP9 *picture =
&surface_object->params.vp9.picture;
VASliceParameterBufferVP9 *slice =
&surface_object->params.vp9.slice;
struct v4l2_ctrl_vp9_frame frame;
struct v4l2_ctrl_vp9_compressed_hdr compressed_hdr;
memset(&frame, 0, sizeof frame);
memset(&compressed_hdr, 0, sizeof compressed_hdr);
/* Zero is the kernel's "no probability update" default for every
* field in compressed_hdr, and the safe default for every numeric
* field of frame except reference_mode (set explicitly later). */
...
}
VAAPI doesn't have iter3-style set flags for VP9; both Picture and Slice are unconditionally populated by the consumer per frame (per Phase 2 analysis B9, no per-frame reset needed in RequestBeginPicture).
Clause 3 — Compile-time struct-size assertions
Per Phase 3 finding: kernel UAPI struct sizes are 168 B (FRAME) and 2040 B (COMPRESSED_HDR) on 7.0; iter4's Build Date: post-2026-05-09 will use whatever size the build host's UAPI headers report. Adding compile-time asserts at the top of vp9.c makes any future struct-size drift fail loudly instead of silently corrupting kernel control writes:
_Static_assert(sizeof(struct v4l2_ctrl_vp9_frame) == 168,
"v4l2_ctrl_vp9_frame size mismatch — UAPI changed");
_Static_assert(sizeof(struct v4l2_ctrl_vp9_compressed_hdr) == 2040,
"v4l2_ctrl_vp9_compressed_hdr size mismatch — UAPI changed");
If these fire, treat as a kernel-substrate bump (re-baseline Phase 3) — DO NOT just bump the asserts.
Clause 4 — Frame geometry + per-frame scalars
frame.frame_width_minus_1 = picture->frame_width - 1;
frame.frame_height_minus_1 = picture->frame_height - 1;
frame.render_width_minus_1 = picture->frame_width - 1; /* VAAPI gap;
frame.render_height_minus_1 = picture->frame_height - 1; no scaling for BBB */
frame.profile = picture->profile;
frame.bit_depth = picture->bit_depth;
frame.tile_cols_log2 = picture->log2_tile_columns;
frame.tile_rows_log2 = picture->log2_tile_rows;
frame.frame_context_idx = picture->pic_fields.bits.frame_context_idx;
frame.lf.level = picture->filter_level;
frame.lf.sharpness = picture->sharpness_level;
frame.uncompressed_header_size = picture->frame_header_length_in_bytes;
frame.compressed_header_size = picture->first_partition_size;
VAAPI fields verified via test-compile against va_dec_vp9.h:58-192 (per memory feedback_review_empirical_over_theoretical.md Direction 2 — Phase 5 will re-verify the field-name access list via gcc -c test compile).
Phase 3 keyframe anchor: width=1280, height=720, profile=0, bit_depth=8, tile_log2={0,0}, level=3, sharpness=0 — direct match.
Clause 5 — DPB timestamp resolution (3 active references from 8-slot DPB)
VAAPI's picture->reference_frames[0..7] is the full 8-entry DPB. The 3 active references for the current frame are indexed by last_ref_frame/golden_ref_frame/alt_ref_frame (each 3-bit, points into the 8-slot array).
VASurfaceID last_id = picture->reference_frames[picture->pic_fields.bits.last_ref_frame];
VASurfaceID golden_id = picture->reference_frames[picture->pic_fields.bits.golden_ref_frame];
VASurfaceID alt_id = picture->reference_frames[picture->pic_fields.bits.alt_ref_frame];
struct object_surface *last_ref = (last_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, last_id) : NULL;
struct object_surface *golden_ref = (golden_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, golden_id) : NULL;
struct object_surface *alt_ref = (alt_id != VA_INVALID_SURFACE) ? SURFACE(driver_data, alt_id) : NULL;
if (last_ref) frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref) frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref) frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
Mirrors iter1/iter3 ref-resolution pattern. For keyframes (all refs invalid), timestamps stay 0 from memset.
Sign bias:
if (picture->pic_fields.bits.last_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_LAST;
if (picture->pic_fields.bits.golden_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.alt_ref_frame_sign_bias) frame.ref_frame_sign_bias |= V4L2_VP9_SIGN_BIAS_ALT;
Clause 6 — Loop filter deltas + base quantization (uncompressed-header partial parse)
VAAPI exposes filter_level and sharpness_level (Clause 4), but NOT lf_delta_enabled/lf_delta_update/lf_ref_delta[4]/lf_mode_delta[2]. Phase 3 keyframe anchor shows lf.ref_deltas={1,0,-1,-1} (non-zero on BBB); leaving these zero produces wrong loop-filter behavior → criterion-4 byte mismatch.
VAAPI also doesn't expose quant.base_q_idx / delta_q_y_dc / delta_q_uv_dc / delta_q_uv_ac. Phase 3 keyframe anchor shows base_q_idx=46; leaving zero produces wrong dequant scale.
Solution: implement a minimal uncompressed-header parser (vp9_parse_uncompressed_header_lf_quant) that reads surface_object->source_data from offset 0 and extracts the 6 needed fields. The parse runs from offset 0 through the loop-filter and quantization syntax sections (per VP9 spec 6.2 §6.2.4–6.2.5):
static void vp9_parse_uncompressed_header_lf_quant(
const uint8_t *data, uint32_t size, uint32_t header_size,
struct v4l2_vp9_loop_filter *lf,
struct v4l2_vp9_quantization *quant)
{
/* Bit reader walks frame_marker, profile, show_existing_frame,
* frame_type, show_frame, error_resilient_mode, color_config (if
* keyframe), frame_size_with_refs (if not keyframe), tile_info ...
* up to loop_filter_params + quantization_params syntax sections.
*
* Approach: bit-perfect VP9 spec port for ~50 LOC, reusing the
* VPX bitstream reader (see Clause 8). Fields written:
* lf->ref_deltas[0..3], lf->mode_deltas[0..1],
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_ENABLED if set
* lf->flags |= V4L2_VP9_LOOP_FILTER_FLAG_DELTA_UPDATE if set
* quant->base_q_idx,
* quant->delta_q_y_dc, delta_q_uv_dc, delta_q_uv_ac
*/
...
}
Anchor: Phase 3 keyframe lf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3 (DELTA_ENABLED|DELTA_UPDATE), quant.base_q_idx=46, deltas=0. Implementation must reproduce these exact values byte-for-byte against the BBB keyframe.
Per memory feedback_review_empirical_over_theoretical.md Direction 2: Phase 5 review must verify the parser by extracting these 9 fields from the actual BBB keyframe bitstream (start of bbb_720p10s_vp9.webm first frame) and comparing against Phase 3 anchor. If any field disagrees, Phase 5 returns "Critical: parser bug" and Phase 4 loops.
Out of iter4 scope: full uncompressed-header parse (color_config, frame_size for inter, segmentation update_data, tile_info). Those fields are either available via VAAPI (Clauses 4, 5, 7) or are not write-back to kernel. The parser is a TARGETED partial parse, not a general bitstream reader.
Clause 7 — Segmentation mapping
VAAPI conveys segmentation via:
picture->pic_fields.bits.{segmentation_enabled, segmentation_temporal_update, segmentation_update_map}flagspicture->mb_segment_tree_probs[7](segment tree probs)picture->segment_pred_probs[3](temporal-update probs; 255-padded iftemporal_update == 0)slice->seg_param[8].{segment_flags.fields, filter_level[4][2], luma_*_quant_scale, chroma_*_quant_scale}
Kernel takes per-segment feature_data + feature_enabled bitmaps. The mapping is non-trivial because VAAPI's slice->seg_param[s] carries EFFECTIVE quant scales (already-computed by VAAPI consumer), while kernel wants the per-segment ALT_Q delta or absolute (depends on ABS_OR_DELTA_UPDATE flag).
for (i = 0; i < 7; i++)
frame.seg.tree_probs[i] = picture->mb_segment_tree_probs[i];
for (i = 0; i < 3; i++)
frame.seg.pred_probs[i] = picture->segment_pred_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_ENABLED;
if (picture->pic_fields.bits.segmentation_update_map)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.segmentation_temporal_update)
frame.seg.flags |= V4L2_VP9_SEGMENTATION_FLAG_TEMPORAL_UPDATE;
/* UPDATE_DATA + ABS_OR_DELTA_UPDATE: not in VAAPI; left zero.
* For BBB (segmentation disabled), this is correct — flags ignored
* by kernel when ENABLED is clear. */
/* Per-segment feature_data (only meaningful when ENABLED):
* VAAPI's seg_param[s].luma_ac_quant_scale[s] is the EFFECTIVE per-
* segment scale. Kernel wants ALT_Q absolute Q-index OR delta.
* Recover via VP9 spec inverse-Q-table OR leave zero (BBB safe). */
for (i = 0; i < 8; i++) {
if (slice->seg_param[i].segment_flags.fields.segment_reference_enabled) {
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_REF_FRAME;
frame.seg.feature_data[i][V4L2_VP9_SEG_LVL_REF_FRAME] =
slice->seg_param[i].segment_flags.fields.segment_reference;
}
if (slice->seg_param[i].segment_flags.fields.segment_reference_skipped)
frame.seg.feature_enabled[i] |= 1 << V4L2_VP9_SEG_LVL_SKIP;
/* SEG_LVL_ALT_Q + ALT_L: VAAPI doesn't directly expose per-segment
* abs/delta intent. Phase 5 review point: BBB has segmentation
* disabled so this code path is dead; non-BBB fixtures are out of
* iter4 scope (see backlog). */
}
Anchor: Phase 3 keyframe seg = all zeros (BBB segmentation disabled). The Clause 7 logic is exercised only for inter frames with segmentation_enabled — out of iter4 BBB scope. Document as fidelity gap.
Clause 8 — VPX range coder + inv_map_table (for compressed header parse)
Direct port from FFmpeg v4l2_request_vp9.c:42-97:
inv_map_table[255]— copy verbatimvpx_rac_init(c, buf, size)— initialize range coder over the compressed-header bytesvp89_rac_get(c)— read a single bitvp89_rac_get_uint(c, n)— read n bits MSB-firstvpx_rac_get_prob_branchy(c, prob)— read with given probabilityread_prob_delta(c)— the 4-tier VLC + inv_map_table lookup used to update one prob
~80 LOC, all stateless static functions. Implementation can be either inlined in vp9.c (Phase 2 B6 Option A — chosen) or split to vp9_rac.h. Phase 2 default = Option A; Phase 5 may flip to Option B if reuse pressure surfaces.
Clause 9 — Compressed-header parser (vp9_fill_compressed_hdr)
Direct port of FFmpeg v4l2_request_vp9.c:99-261::fill_compressed_hdr. Reads from surface_object->source_data + uncompressed_header_size for compressed_header_size bytes. ~180 LOC.
Syntax elements parsed (per VP9 spec 6.3):
tx_mode(2 bits, +1 conditional bit when SELECT)- TX 8x8/16x16/32x32 probability deltas (only if
tx_mode == SELECT) - Coef probability deltas (4-level nested loop with branch probs)
- Skip / inter_mode / interp_filter / is_inter / comp_mode / single_ref / comp_ref / y_mode / partition probability deltas (only on inter frames)
- MV probability deltas (joint/sign/classes/class0_bit/bits/class0_fr/fr/class0_hp/hp)
Each updated value goes through inv_map_table[d]. Each "no update" bit leaves zero in the kernel struct (kernel interprets zero as "keep prior probability").
Lossless special case: if s->s.h.lossless would be set, FFmpeg writes tx_mode = V4L2_VP9_TX_MODE_ONLY_4X4 unconditionally. We don't have direct access to lossless from VAAPI, but picture->pic_fields.bits.lossless_flag (bit 31 of pic_fields) maps directly. Read it and apply the same special case.
Anchor: Phase 3 strace shows COMPRESSED_HDR payload size 2040 B; kernel never EINVAL'd → port produces correctly-sized struct. Field-level decode of the keyframe payload is deferred to Phase 5/Phase 7 byte-compare (the parser is the primary reference for itself; cross-validation is via "kernel decodes the same hash both ways" not "we manually decode the parser output").
Clause 10 — Frame flags + reference_mode + interpolation_filter
if (!picture->pic_fields.bits.frame_type) /* VAAPI inverts: 0 means keyframe */
frame.flags |= V4L2_VP9_FRAME_FLAG_KEY_FRAME;
if (picture->pic_fields.bits.show_frame) frame.flags |= V4L2_VP9_FRAME_FLAG_SHOW_FRAME;
if (picture->pic_fields.bits.error_resilient_mode) frame.flags |= V4L2_VP9_FRAME_FLAG_ERROR_RESILIENT;
if (picture->pic_fields.bits.intra_only) frame.flags |= V4L2_VP9_FRAME_FLAG_INTRA_ONLY;
if (picture->pic_fields.bits.allow_high_precision_mv)
frame.flags |= V4L2_VP9_FRAME_FLAG_ALLOW_HIGH_PREC_MV;
if (picture->pic_fields.bits.refresh_frame_context)
frame.flags |= V4L2_VP9_FRAME_FLAG_REFRESH_FRAME_CTX;
if (picture->pic_fields.bits.frame_parallel_decoding_mode)
frame.flags |= V4L2_VP9_FRAME_FLAG_PARALLEL_DEC_MODE;
if (picture->pic_fields.bits.subsampling_x) frame.flags |= V4L2_VP9_FRAME_FLAG_X_SUBSAMPLING;
if (picture->pic_fields.bits.subsampling_y) frame.flags |= V4L2_VP9_FRAME_FLAG_Y_SUBSAMPLING;
/* COLOR_RANGE_FULL_SWING: VAAPI doesn't expose; leave clear (BT.709 limited for BBB). */
/* reset_frame_context: FFmpeg uses (resetctx > 0 ? resetctx - 1 : 0).
* VAAPI's pic_fields.bits.reset_frame_context is 2 bits (0..3).
* V4L2 enum is 0..2. The off-by-one is because VP9 spec encodes
* "no reset" + 3 reset variants into 2 bits, but kernel enum drops
* the encoder helper offset. Follow FFmpeg's mapping verbatim: */
frame.reset_frame_context =
picture->pic_fields.bits.reset_frame_context > 0
? picture->pic_fields.bits.reset_frame_context - 1
: 0;
/* interpolation_filter: FFmpeg uses (filtermode ^ (filtermode <= 1)).
* VAAPI's mcomp_filter_type is 3 bits (0..7); kernel enum is 0..4.
* The XOR remap aligns FFmpeg's internal filter_mode enum to V4L2's. */
frame.interpolation_filter =
picture->pic_fields.bits.mcomp_filter_type ^
(picture->pic_fields.bits.mcomp_filter_type <= 1);
/* reference_mode: comes from compressed-header parse (NOT VAAPI).
* Read from compressed_hdr's parsed state (see Clause 9). */
frame.reference_mode = compressed_hdr_reference_mode; /* state from Clause 9 */
Anchor: Phase 3 verbatim — keyframe reset_frame_context=0, interpolation_filter=0 (VAAPI's mcomp_filter_type=0 XOR with (0 <= 1)=1 → 1 hmm). Phase 5 must verify the XOR remap empirically against the keyframe bytes.
Phase 5 review point: the FFmpeg-inferred mappings for reset_frame_context and interpolation_filter are tied to FFmpeg's internal enum order. VAAPI's enum order may differ. Phase 5 should empirically validate by decoding Phase 3's keyframe payload byte 144 (offset of reset_frame_context) and byte 149 (offset of interpolation_filter) and cross-checking with VAAPI's pic_fields.bits for the same frame. If they disagree, the FFmpeg-inferred remap is wrong.
Clause 11 — Final 2-control batched submission
struct v4l2_ext_control ctrls[2] = {
{ .id = V4L2_CID_STATELESS_VP9_FRAME,
.ptr = &frame, .size = sizeof frame },
{ .id = V4L2_CID_STATELESS_VP9_COMPRESSED_HDR,
.ptr = &compressed_hdr, .size = sizeof compressed_hdr },
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 2);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
Mirrors iter3's Clause 10 with count=2 instead of count=1.
Clause 12 — Bitstream offsetting
Backend hands the kernel the FULL frame bitstream via surface_object->source_data + surface_object->source_size. The kernel uses picture->frame_header_length_in_bytes as the start-of-compressed-header offset. The compressed header parser (Clause 9) reads [uncompressed_header_size, uncompressed_header_size + compressed_header_size) from the bitstream buffer.
const uint8_t *compressed_hdr_start =
surface_object->source_data + frame.uncompressed_header_size;
uint32_t compressed_hdr_len = frame.compressed_header_size;
vp9_fill_compressed_hdr(&compressed_hdr,
compressed_hdr_start,
compressed_hdr_len);
/* Same buffer pointer used by Clause 6 for uncompressed-header parse,
* but with offset 0 + length = uncompressed_header_size. */
vp9_parse_uncompressed_header_lf_quant(
surface_object->source_data, surface_object->source_size,
frame.uncompressed_header_size,
&frame.lf, &frame.quant);
Patch shape (commits)
iter4 implements as 5 commits (mitigation B + iter3-style ABCD):
Commit Z — src/request.c: device-path enumeration (mitigation B)
Replace hardcoded /dev/video0 + /dev/media0 defaults with walk-and-pick-first-known-decoder:
static int find_codec_device(char video_path[32], char media_path[32])
{
static const char * const known_drivers[] = {
"rkvdec", "hantro-vpu", "cedrus", "sun4i_csi", NULL
};
char path[32];
struct v4l2_capability caps;
int fd, i;
const char * const *kd;
/* Walk /dev/video0..15 */
for (i = 0; i < 16; i++) {
snprintf(path, sizeof path, "/dev/video%d", i);
fd = open(path, O_RDWR | O_NONBLOCK);
if (fd < 0) continue;
if (ioctl(fd, VIDIOC_QUERYCAP, &caps) == 0) {
for (kd = known_drivers; *kd; kd++) {
if (strcmp((char *)caps.driver, *kd) == 0) {
strncpy(video_path, path, 32);
/* Match media device by driver name */
find_media_for_driver((char *)caps.driver, media_path);
close(fd);
return 0;
}
}
}
close(fd);
}
return -1;
}
/* In RequestInit: */
video_path = getenv("LIBVA_V4L2_REQUEST_VIDEO_PATH");
if (video_path == NULL) {
static char auto_video[32], auto_media[32];
if (find_codec_device(auto_video, auto_media) == 0) {
video_path = auto_video;
if (getenv("LIBVA_V4L2_REQUEST_MEDIA_PATH") == NULL)
media_path = auto_media;
request_log("auto-selected codec device: %s + %s\n",
video_path, media_path);
} else {
video_path = "/dev/video0"; /* keep old fallback for callers
we can't enumerate */
}
}
find_media_for_driver walks /dev/media0..15, opens each, calls MEDIA_IOC_DEVICE_INFO, returns the path whose driver field matches. Phase 3 baseline confirmed media0 ↔ rkvdec and media1 ↔ hantro-vpu on 7.0.
Predicted +35 LOC, 1 file modified. Build target after Commit Z: vainfo (no env override) lists the auto-selected decoder's profiles. Independent of VP9 work — can be tested + merged before Commit A.
End-user UX gap (documented, NOT fixed in iter4): backend opens ONE codec device at init. If user wants the OTHER decoder (e.g., default selects rkvdec but user wants hantro for MPEG-2/VP8), they still need env override. Aggregating BOTH decoders simultaneously requires a deeper refactor (multi-fd dispatch); out of iter4 scope, cross-cutting backlog item iter4-B1.
Commit A — src/config.c: VP9 enumeration + dispatch + entrypoints
3 sites mirroring iter3 commit A:
RequestQueryConfigProfiles(after VP8 enumeration block from iter3): add VP9 enumeration block probingV4L2_PIX_FMT_VP9_FRAMEagainst single + MPLANE OUTPUT formats. AddsVAProfileVP9Profile0. ~10 LOC.RequestCreateConfig(after VP8 case from iter3): addcase VAProfileVP9Profile0: break;with comment block. ~5 LOC.RequestQueryConfigEntrypoints(line ~180): addcase VAProfileVP9Profile0:to existing fall-through. ~1 LOC.
Predicted +16 LOC, 1 file modified. Build target after Commit A: vainfo (with env override or post-commit-Z auto-detect) lists VAProfileVP9Profile0 on rkvdec env.
Commit B — NEW src/vp9.c + src/vp9.h + src/meson.build integration
Net-new vp9.c implements vp9_set_controls() per Clauses 1-12 above.
Predicted ~580 LOC for vp9.c (50 LOC infrastructure + 80 LOC VPX rac + 50 LOC uncompressed-header partial parse + 180 LOC compressed-header parser + 50 LOC frame-fill (Clauses 4-5,7,10) + 30 LOC of submission/wrap). ~40 LOC for vp9.h. +2 lines meson.build.
3 files (2 new + 1 modified). Build target after Commit B: vp9.o compiles standalone, picture.c can't dispatch yet.
Commit C — src/picture.c + src/surface.h: dispatcher + buffer routing + union extension
5 sites:
picture.c:34-37include block: add#include "vp9.h".picture.c::codec_set_controls: add VP9 dispatch case callingvp9_set_controls. ~6 LOC.picture.c::codec_store_buffer: add VP9 inner cases forVAPictureParameterBufferTypeandVASliceParameterBufferType. ~14 LOC. (NOVAProbabilityBufferTypefor VP9; NOVAIQMatrixBufferType. Confirmed in Phase 2 B8.)picture.c::RequestBeginPicture: NO change predicted (VP9 doesn't have iter3-styleiqmatrix_setflag — Picture/Slice always populated per frame by VAAPI consumer). Phase 2 B9 confirms.surface.h::object_surface::paramsunion: addvp9struct aftervp8:
struct {
VADecPictureParameterBufferVP9 picture;
VASliceParameterBufferVP9 slice;
} vp9;
Predicted +26 LOC, 2 files modified. Build target after Commit C: backend builds clean; mpv-vaapi VP9 decode should engage end-to-end on rkvdec.
Commit D — fix-forward placeholder
Phase 2 B12 predicted no buffer.c changes (VP9's 3 buffer types — Picture, Slice, Data — already in iter3's allow-list). Per memory feedback_runtime_enumerates_allowlists.md, plan for fix-forward if Commit C runtime hits an allow-list miss; otherwise this commit slot stays empty.
Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|---|---|---|---|
src/request.c |
✓ | +35 | Z | |
src/config.c |
✓ | +16 | A | |
src/vp9.c |
✓ | +580 | B | |
src/vp9.h |
✓ | +40 | B | |
src/meson.build |
✓ | +2 | B | |
src/picture.c |
✓ | +20 | C | |
src/surface.h |
✓ | +6 | C |
Total: ~699 LOC, 7 files (2 new + 5 modified). 4 commits (Z, A, B, C) + optional D. Notably bigger than iter3 (308 LOC) because of: device-path mitigation (35) + uncompressed-header partial parse (50) + compressed-header parser (180) + VPX rac (80).
Cross-cutting backlog (out of iter4 scope)
Items inherited + NEW from iter4:
- iter4-B1 (NEW) Backend opens ONE codec device at init (rkvdec OR hantro). Aggregating both for unified profile enumeration requires multi-fd dispatch refactor. Defer.
- iter4-B2 (NEW) ffmpeg-vaapi / mpv-vaapi
Could not create devicefailure mode persists even with env override. Likely a vaapi-DRM render-node path issue separate from device-path. Investigate in Phase 6 if HW=SW byte-compare fails. - iter4-Q6 (NEW) VAAPI per-segment
seg_param[s]fields are EFFECTIVE quant scales; kernel wants ALT_Q absolute or delta. Mapping back is non-trivial; left zeros for BBB (segmentation disabled). Document as fidelity gap for non-BBB fixtures. - iter4-COLOR_RANGE (NEW) VAAPI doesn't expose color_range; backend leaves
V4L2_VP9_FRAME_FLAG_COLOR_RANGE_FULL_SWINGclear (BT.709 limited). Wrong for full-range JPEG-encoded VP9. - B5/B6 mpeg2 vbv polish + h265 SPS bitstream parse (carried from iter1+iter2).
- L3 vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only.
Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer (per memory feedback_review_empirical_over_theoretical.md Direction 2 — empirical-over-theoretical in BOTH directions):
-
Uncompressed-header parser correctness (Clause 6): empirically decode the first ~200 bytes of
bbb_720p10s_vp9.webmkeyframe and confirmlf.ref_deltas={1,0,-1,-1}, lf.mode_deltas={0,0}, lf.flags=3, quant.base_q_idx=46are the correct parse results — not just the kernel-direct's pre-formatted output. If the spec says the bits encode something different, the parser is wrong even if kernel-direct happens to match. -
reset_frame_contextandinterpolation_filterremap (Clause 10): empirically extract these bytes from Phase 3 strace payload and cross-check FFmpeg's XOR/-1 remap against the bytes' literal interpretation as VP9 spec enums. -
Compile-time size assertions (Clause 3): are 168/2040 stable across kernel UAPI versions, or will a 7.1+ kernel grow them again? If unstable, replace with a runtime size assertion via
VIDIOC_QUERY_EXT_CTRL+flags & V4L2_CTRL_FLAG_DYNAMIC_ARRAY. Phase 5 reviewer call. -
Per-segment mapping (Clause 7): BBB doesn't exercise segmentation. For non-BBB segmentation-enabled fixtures (out of iter4 scope), is the planned
seg_param[s].luma_ac_quant_scale→feature_data[s][ALT_Q]mapping fundamentally wrong (effective scale vs delta), or just lossy? Document the gap clearly. -
Test compile field availability: per Direction 2, every VAAPI field-name reference in this plan should be
gcc -ctest-compiled before Phase 6. Reviewer should verify the access list in Clauses 4, 5, 7, 10. -
Mitigation B regression risk:
request.cis shared with all 5 already-shipping codecs. Could the walk-and-pick-first logic regress any existing test fixture if env vars happen to be unset by accident? Phase 5 should suggest a safety knob (e.g.,LIBVA_V4L2_REQUEST_NO_AUTODETECT=1to force old/dev/video0behavior). -
Lossless flag mapping (Clause 9): VAAPI's
pic_fields.bits.lossless_flag— is it set the same way as FFmpeg'ss->s.h.lossless? VAAPI comment says "LosslessFlag = base_qindex == 0 && y_dc_delta_q == 0 && uv_dc_delta_q == 0 && uv_ac_delta_q == 0" — check that semantics align.
Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP9Profile0 | Commit Z (device-path) + Commit A (RequestQueryConfigProfiles enumeration block) |
| 2. vaCreateConfig SUCCESS | Commit A — RequestCreateConfig case + RequestQueryConfigEntrypoints |
| 3. ffmpeg-vaapi VP9 exit 0 | Commits Z+A+B+C end-to-end; Clauses 1+4+5+11 + parsers |
| 4. mpv VP9 HW=SW byte-identical | Commits Z+A+B+C decode correctness + Phase 3 SW PNGs as Phase 7 anchor; engagement via mpv -v log per memory feedback_hw_decode_engagement_check.md |
| 5. 4-codec regression | Commit Z restores baseline (mitigation B); Commits A+B+C add new VP9 path purely additively (no shared-state mutation) |
Substrate state at Phase 4 close
- Phase 0+1+2+3 commits at gitea (
9a71dbf,2651e4c+56abe3dID-correction,56abe3d). - Fork at iter3 tip
e1aca9con noether; Phase 6 patches will land here. - All Phase 3 anchors captured + preserved on fresnel
/tmp/iter4_phase3/andnoether:~/src/fresnel-fourier/iter4_phase3.tgz. - Memory rules carry forward; new
reference_fresnel_kernel_substrate.mdcovers post-besser substrate. - Phase 4 plan ready for sonnet-architect review (Phase 5).