Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.
Critical findings:
C1 first_part_header_bits = 0 was claimed cosmetic; actually
UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
both read this field unconditionally to compute the macroblock
DMA offset. Setting 0 would place hardware at wrong DMA offset
for ALL macroblock data → garbage decode.
Fix: frame.first_part_header_bits = slice->macroblock_offset
(verified by source identity — vaapi_vp8.c:204 and
v4l2_request_vp8.c:83 use byte-identical formulas).
C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
partition_size[0] is the REMAINING bytes after parsing
(vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
Kernel needs the TOTAL control partition size.
Fix: frame.first_part_size = slice->partition_size[0] +
((macroblock_offset + 7) / 8)
Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.
C3 VAProbabilityDataBufferType does not exist as a buffer-type
enum; it's the struct name. The actual enum constant is
VAProbabilityBufferType (= 13 per va.h:2058). Switch case
using the wrong identifier would have failed Phase 6 compile.
Fix: replace globally in phase2 + phase4 docs.
C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
underscore). Userspace portable cast is int8_t from <stdint.h>.
Fix: replace (s8) with (int8_t) in Clauses 6+7.
Suggested:
S3 Clause 8 comment was factually wrong: hantro_vp8.c::
hantro_vp8_prob_update reads coeff_probs unconditionally;
there is NO default-table fallback. If probability_set==false,
decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
always sends VAProbabilityBufferType per frame), but corrected
comment + added assert(probability_set) runtime guard for
immediate Phase 6 surfacing.
Plus 5 minor S/Q items documented; non-blocking for iter3.
Author's 7 review questions all answered directly in the review:
Q1 quantization derivation: correct for typical content
Q2 first_part_header_bits=0 safety: UNSAFE → C1
Q3 num_dct_parts off-by-one: confirmed correct
Q4 field availability: 2 compile failures found (C3 + C4)
Q5 quant_update[s] semantics: signed delta confirmed
Q6 SHOW_FRAME unconditional: safe for BBB scope
Q7 buffer order independence: confirmed
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.
Refs:
phase4_iter3_plan.md (amended in-place; Phase 5 amendments
section appended)
phase2_iter3_situation.md (amended C3 globally)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
24 KiB
Iteration 3 — Phase 4 (plan)
Locks the iter3 patch shape against the verbatim Phase 3 cross-validator payload and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2. Plan structure mirrors iter2's 10-clause template (phase4_iter2_plan.md).
Phase 3 baseline at phase3_iter3_baseline.md (commit fd3fce8) supplies the byte-level anchors. Phase 2 situation analysis at phase2_iter3_situation.md (commit 898544a) supplies the bug list + contract surface read.
Contract clauses
Clause 1 — Submission shape (per-frame)
ONE batched VIDIOC_S_EXT_CTRLS per frame, bound to the surface's permanent request_fd. Single control. No init-time device-wide menus (VP8 has no DECODE_MODE/START_CODE — Phase 0 V4L2 inventory + FFmpeg ref + Phase 3 strace all confirm).
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME, /* 0xa409c8 */
.ptr = &frame, /* &v4l2_ctrl_vp8_frame */
.size = sizeof frame, /* 1232 bytes */
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
ctrl_class not explicitly set in the call; v4l2_set_controls (src/v4l2.c) wraps it with which=V4L2_CTRL_WHICH_REQUEST_VAL. Phase 3 strace verified ctrl_class=0xf010000 (V4L2_CTRL_CLASS_CODEC_STATELESS) is what the kernel sees, matching iter1+iter2 patterns.
Anchor: Phase 3 baseline § Step 3.3 submission shape table.
Clause 2 — Local struct allocation + zero-init
int vp8_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VAPictureParameterBufferVP8 *picture =
&surface_object->params.vp8.picture;
VASliceParameterBufferVP8 *slice =
&surface_object->params.vp8.slice;
VAIQMatrixBufferVP8 *iqmatrix =
&surface_object->params.vp8.iqmatrix;
VAProbabilityDataBufferVP8 *probability =
&surface_object->params.vp8.probability;
bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set;
bool probability_set = surface_object->params.vp8.probability_set;
struct v4l2_ctrl_vp8_frame frame;
memset(&frame, 0, sizeof frame);
/* All padding fields must be zero per kernel contract; memset
* covers them. C99 designated initializers in FFmpeg ref achieve
* the same; explicit memset matches iter1 mpeg2.c style. */
...
}
Mirror iter1 mpeg2.c::mpeg2_set_controls (src/mpeg2.c:95-118) opening: extract VAAPI struct pointers + flag readouts, allocate kernel struct on stack, memset zero, populate fields below.
Clause 3 — Frame geometry + version + per-frame scalars
frame.width = picture->frame_width;
frame.height = picture->frame_height;
frame.horizontal_scale = 0; /* not exposed by VAAPI; FFmpeg also hardcodes 0 */
frame.vertical_scale = 0;
frame.version = picture->pic_fields.bits.version;
frame.prob_skip_false = picture->prob_skip_false;
frame.prob_intra = picture->prob_intra;
frame.prob_last = picture->prob_last;
frame.prob_gf = picture->prob_gf;
frame.num_dct_parts = slice->num_of_partitions - 1; /* off-by-one per Phase 3 Q2 */
Anchors:
- VAAPI
va_dec_vp8.h:71-160(PictureParameterBuffer) - Phase 3 frame-1 keyframe: width=1280, height=720, version=0, prob_skip=255 (matches mpv/FFmpeg parser output)
- Phase 3 Q2 resolution:
num_dct_parts = num_of_partitions - 1(BBB inter: VAAPI=2 → kernel=1)
Clause 4 — DPB timestamp resolution (3 references)
struct object_surface *last_ref =
SURFACE(driver_data, picture->last_ref_frame);
struct object_surface *golden_ref =
SURFACE(driver_data, picture->golden_ref_frame);
struct object_surface *alt_ref =
SURFACE(driver_data, picture->alt_ref_frame);
if (last_ref != NULL)
frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref != NULL)
frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref != NULL)
frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
Mirrors iter1 mpeg2.c::mpeg2_set_controls forward/backward ref pattern (src/mpeg2.c:146-156). For VA_INVALID_SURFACE (key frame: all three refs invalid), SURFACE() returns NULL and timestamps stay 0 (memset default).
Anchors:
- Phase 3 Q3+Q6 resolution: keyframe writes
last_ts=golden_ts=alt_ts=0; inter frames write all non-zero - Phase 3 frame-2 inter:
last_ts=5000, golden_ts=11000, alt_ts=11000— confirms timestamps are arbitrary (not nanosecond-real-time, just stable VASurfaceID-derived markers)
Clause 5 — Loop filter mapping
for (i = 0; i < 4; i++) {
frame.lf.ref_frm_delta[i] = picture->loop_filter_deltas_ref_frame[i];
frame.lf.mb_mode_delta[i] = picture->loop_filter_deltas_mode[i];
}
frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level;
frame.lf.level = picture->loop_filter_level[0]; /* base segment */
if (picture->pic_fields.bits.loop_filter_adj_enable)
frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE;
if (picture->pic_fields.bits.mode_ref_lf_delta_update)
frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE;
if (picture->pic_fields.bits.filter_type)
frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE;
Anchors:
- Phase 3 keyframe:
ref_frm_delta=(2,0,-2,-2),mb_mode_delta=(4,-2,2,4),sharp=0,level=1,flags=0x03 (ADJ_ENABLE|DELTA_UPDATE) - Phase 3 inter: same deltas (BBB-stable),
level=15,flags=0x01 (ADJ_ENABLE only)— DELTA_UPDATE only set on keyframes - Phase 3 Q5: FILTER_TYPE_SIMPLE bit not set on any captured frame; VAAPI's
filter_type=0for BBB → flag clear ✓
Clause 6 — Quantization base + delta derivation
VAAPI conveys per-segment effective Q indices in iqmatrix->quantization_index[4][6] (segment × component-index). Components: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5).
Kernel takes a base set + per-component deltas (in quant) + per-segment overrides (in segment.quant_update[]).
For segment 0 (always present):
if (iqmatrix_set) {
frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0];
frame.quant.y_dc_delta = (int8_t)(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (int8_t)(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (int8_t)(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (int8_t)(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (int8_t)(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
}
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
* default for fixtures with implicit-zero deltas (most VP8 streams,
* including BBB).
*
* Phase 5 C4: cast is (int8_t), NOT (s8) — the latter is a kernel-
* internal typedef from <linux/types.h> not exposed to userspace.
* Userspace UAPI exposes __s8 (double-underscore). int8_t from
* <stdint.h> is the portable userspace equivalent. */
For segments 1..3 (only used when segmentation enabled):
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
for (i = 1; i < 4; i++)
frame.segment.quant_update[i] = (int8_t)
(iqmatrix->quantization_index[i][0] -
iqmatrix->quantization_index[0][0]);
}
/* For BBB (segmentation disabled), all quant_update[] stay zero
* from memset, matching Phase 3 verbatim payload. */
Anchors:
- Phase 3 keyframe:
y_ac_qi=8, all deltas=0→ BBB has uniform Q across components for segment 0 - Phase 3 inter:
y_ac_qi=122, all deltas=0(frame 2);y_ac_qi=109, all deltas=0(frame 4) — all-zero deltas confirmed throughout BBB - Predicted VAAPI behavior:
iqmatrix->quantization_index[0][0..5]should all equal the same value (e.g. all 8 for keyframe); subtraction yields zero → matches Phase 3 verbatim
Clause 7 — Segment fields
for (i = 0; i < 3; i++)
frame.segment.segment_probs[i] = picture->mb_segment_tree_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED;
if (picture->pic_fields.bits.update_mb_segmentation_map)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.update_segment_feature_data)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA;
/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta; FFmpeg sets
* unconditionally per !s->segmentation.absolute_vals (default).
* Match FFmpeg pattern: set unconditionally. Kernel ignores when
* ENABLED bit is clear (BBB case). */
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
/* segment.lf_update[] populated only when segmentation enabled */
if (picture->pic_fields.bits.segmentation_enabled) {
for (i = 0; i < 4; i++)
frame.segment.lf_update[i] = (int8_t)
(picture->loop_filter_level[i] -
picture->loop_filter_level[0]);
}
Anchors:
- Phase 3 every captured frame:
segment.flags = 0x08 = DELTA_VALUE_MODE(BBB segmentation disabled, FFmpeg sets DVM unconditionally) - Phase 3 every captured frame:
segment_probs = (0,0,0),quant_update = (0,0,0,0),lf_update = (0,0,0,0)— all zero (BBB segmentation disabled)
Clause 8 — Entropy table mapping (3 sources)
VAAPI splits VP8 entropy across THREE buffers: PictureParameterBuffer (mode + mv probs), ProbabilityDataBuffer (coeff_probs), and IQMatrix (quantization). The libva backend assembles into kernel's single v4l2_vp8_entropy struct.
for (i = 0; i < 4; i++)
frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i];
for (i = 0; i < 3; i++)
frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i];
for (i = 0; i < 2; i++)
for (j = 0; j < 19; j++)
frame.entropy.mv_probs[i][j] = picture->mv_probs[i][j];
if (probability_set) {
/* VAAPI's [4][8][3][11] layout matches kernel's exactly; direct memcpy */
memcpy(frame.entropy.coeff_probs,
probability->dct_coeff_probs,
sizeof frame.entropy.coeff_probs);
}
/* Phase 5 S3 correction: if probability_set==false, decode produces
* GARBAGE. hantro_vp8.c::hantro_vp8_prob_update reads
* entropy.coeff_probs unconditionally; there is NO default-table
* fallback in the kernel driver. Practical risk is low — FFmpeg
* vaapi_vp8.c:146-148 always sends VAProbabilityBufferType per
* frame, and mpv-vaapi VP8 uses ff_vp8_vaapi_hwaccel under the hood.
* Add an assert() so any consumer violation surfaces immediately. */
assert(probability_set); /* runtime guard per Phase 5 S3 */
Anchors:
- Phase 3 keyframe:
y_mode_probs=(145,156,163,128)(FFmpeg keyframe const),uv_mode_probs=(142,114,183)(FFmpeg keyframe const). When VAAPI's mpv consumer also writes these constants for keyframes, byte-equality holds. - Phase 3 inter frame 2:
y_mode_probs=(3,1,128,1),uv_mode_probs=(162,101,204)— parser-derived, varies per frame. - Phase 3 entropy hash (sha1-16 prefix): keyframe
8b2fdae200eb193f, inter changes — confirms entropy state propagates per-frame; iter3 backend must wire the per-frame ProbabilityDataBuffer through to coeff_probs.
Clause 9 — Coder state + first-partition fields + flags
frame.coder_state.range = picture->bool_coder_ctx.range;
frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* coder_state.padding stays zero from memset */
/* Phase 5 C1+C2: macroblock_offset IS first_part_header_bits by
* source identity (vaapi_vp8.c:204 and v4l2_request_vp8.c:83 set
* the field with the identical 3-input formula
* 8 * (input - data) - bit_count - 8
* — VAAPI just renames it). Kernel hantro_g1_vp8_dec.c:260 and
* rockchip_vpu2_hw_vp8_dec.c:372 read it unconditionally to compute
* the macroblock-data DMA offset; leaving 0 places hardware at the
* wrong offset → garbage decode. NOT cosmetic.
*
* Phase 5 C2: VAAPI's slice->partition_size[0] is the REMAINING
* bytes of the control partition AFTER ceil(macroblock_offset/8)
* bytes already consumed by the application's bitstream parser
* (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec comment
* confirms). Kernel needs the TOTAL control partition size, so
* recover by adding back the header bytes: */
frame.first_part_header_bits = slice->macroblock_offset;
frame.first_part_size = slice->partition_size[0] +
((uint32_t)slice->macroblock_offset + 7) / 8;
/* Phase 3 keyframe verbatim cross-check:
* first_part_size = 22742 bytes
* first_part_header_bits = 6550 bits → ceil(6550/8) = 819 bytes
* VAAPI partition_size[0] = 22742 - 819 = 21923 (predicted)
* reconstructed = 21923 + 819 = 22742 ✓ */
for (i = 0; i < 8; i++)
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
if (!picture->pic_fields.bits.key_frame)
frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME;
/* VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec */
frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME;
/* Force unconditional per Phase 3 Q4: BBB has no alt-ref invisible
* frames in iter3 scope; document as known fidelity gap for future
* iters covering invisible-frame streams. */
if (picture->pic_fields.bits.mb_no_coeff_skip)
frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF;
if (picture->pic_fields.bits.sign_bias_golden)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.sign_bias_alternate)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT;
/* EXPERIMENTAL bit (0x02) and bit 0x40 NOT set. Phase 3 baseline
* notes ffmpeg-v4l2-request-git sets these for unclear reasons;
* mainline UAPI defines neither for our use case (BBB profile=0).
* Kernel hantro_vp8.c only inspects KEY_FRAME bit. */
Anchors:
- Phase 3 keyframe:
coder_state=(248,133,2),first_part_size=22742,dct_part_sizes=(277872,0,...,0)(277872is huge — notepartition_size[]may overflow u32 for very-large partitions; verified BBB partitions fit u32 by full Phase 3 capture; expected within u32 range for 1280×720 normal-quality content) - Phase 3 keyframe:
flags=0x0d = KEY|SHOW|NOSKIP— libva backend produces 0x0d ✓ - Phase 3 inter:
flags=0x66(FFmpeg) — libva backend produces e.g. 0x04 (SHOW only) or 0x24 (SHOW|SBA) depending on VAAPI sign_bias bits. Phase 7 byte-compare uses field-level, not whole-flags, equality.
Clause 10 — Final batched submission
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME,
.ptr = &frame,
.size = sizeof frame,
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
Returns the dispatcher contract per picture.c::codec_set_controls (negative rc on failure → caller wraps with VA_STATUS_ERROR_OPERATION_FAILED).
Patch shape (commits)
iter3 implements as 4 commits (mirrors iter1 ABCD pattern):
Commit A — src/config.c: enumeration + dispatch + entrypoints
3 sites:
RequestQueryConfigProfiles(after HEVC enumeration block, line ~160): add VP8 enumeration block probingV4L2_PIX_FMT_VP8_FRAMEagainst single + MPLANE OUTPUT formats.RequestCreateConfig(after iter2's HEVCMain case, line ~75): addcase VAProfileVP8Version0_3: break;with comment block matching iter1+iter2 style.RequestQueryConfigEntrypoints(line ~180): addcase VAProfileVP8Version0_3:to existing fall-through case list.
Predicted +16 LOC, 1 file modified. Build target after Commit A: vainfo lists VAProfileVP8Version0_3 on hantro env binding (criterion 1) but vaCreateConfig would fail at later use because no codec dispatcher.
Commit B — NEW src/vp8.c + src/vp8.h + src/meson.build integration
Net-new files implementing vp8_set_controls() per Clauses 1-10 above. Plus meson.build sources/headers list updates to compile + link the new module.
Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson.build. 3 files (2 new + 1 modified). Build target after Commit B: vp8.o compiles standalone, but picture.c can't dispatch yet.
Commit C — src/picture.c + src/surface.h: dispatcher + per-frame buffer routing + union extension + per-frame reset
5 sites:
picture.c:34-36include block: add#include "vp8.h".picture.c::codec_set_controls(line ~218 after HEVCMain case): add VP8 dispatch case callingvp8_set_controls.picture.c::codec_store_buffer(lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer caseVAProbabilityBufferTypewith VP8 inner case.picture.c::RequestBeginPicture(line ~300): add 2 reset lines forparams.vp8.iqmatrix_set = false;andparams.vp8.probability_set = false;.surface.h::object_surface::paramsunion (line ~112): insert vp8 struct after h265 (~10 LOC).
Predicted +50 LOC, 2 files modified.
Build target after Commit C: backend builds clean. mpv-vaapi VP8 decode should work end-to-end. Expected to satisfy criteria 1, 2, 3 of Phase 1.
Commit D — src/v4l2.c (CONDITIONAL — only if v4l2_set_controls doesn't already fall through cleanly)
Phase 2 source-read showed v4l2_set_controls is fourcc-agnostic. Likely no Commit D needed; placeholder commit slot for fix-forward if Phase 6 build/runtime surfaces an unexpected ioctl plumbing issue (mirrors iter1 Commit D fix-forward).
Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|---|---|---|---|
src/config.c |
✓ | +16 | A | |
src/vp8.c |
✓ | +200 | B | |
src/vp8.h |
✓ | +40 | B | |
src/meson.build |
✓ | +2 | B | |
src/picture.c |
✓ | +40 | C | |
src/surface.h |
✓ | +10 | C |
Total: ~308 LOC, 6 files (2 new + 4 modified). 3 commits (A, B, C).
Cross-cutting backlog (out of iter3 scope)
Items inherited from iter1+iter2 close, NOT touched in iter3:
- B1 V4L2 device-discovery (per-boot device numbering shuffle) — workaround: re-verify per-boot via v4l2-ctl --info.
- B3 picture.c BeginPicture profile-aware reset — currently writes byte 240 of union (h264.matrix_set) regardless of profile; iter3 adds vp8 resets but doesn't refactor the cross-profile cleanup.
- B4 context.c log suppression for unsupported codec controls.
- B5 mpeg2 vbv_buffer_size polish (iter1 S2 finding).
- B6 h265 SPS bitstream-parse fidelity gap.
- L3 vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only (memory rule).
Plus iter3 NEW item:
- iter3-Q1
first_part_header_bitsderivation gap. Plan leaves 0; Phase 7 byte-compare may reveal kernel sensitivity. If so, future iter or Phase 4 loopback derives from VAAPIslice->macroblock_offset+ entropy-header subtraction. - iter3-flags ffmpeg-v4l2-request-git sets undocumented bit 0x40 in flags field; libva backend won't replicate. Phase 7 byte-compare will use FIELD-LEVEL equality, not whole-
flagsequality.
Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer:
-
Quantization derivation: is the "compute deltas as iqmatrix[0][N+1] - iqmatrix[0][0]" formula correct for VAAPI's representation? Or does VAAPI store the deltas already (then we just copy-not-subtract)? Phase 3 verbatim shows all-zero deltas for BBB, but BBB doesn't exercise this code path meaningfully — non-zero deltas need a different fixture, not in iter3 scope.
-
Per-segment quant_update[1..3] derivation: when segmentation enabled, is the kernel's
quant_update[s]an absolute Q index (clipped to 0..127) or a signed delta from segment 0's base? VP8 spec says delta indelta_value_mode; libva backend currently writesiqmatrix[s][0] - iqmatrix[0][0]which is signed delta. Phase 5 should empirically verify against a segmentation-enabled test fixture if one is available, OR document as known fidelity gap if not. -
first_part_header_bits = 0fallback: is this actually safe for hantro? Reviewer should grep kernelhantro_vp8.c+rkvdec_vp8.cfor any read offirst_part_header_bitsin the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic. -
Probability buffer ordering: VAAPI's mpv consumer might send VAProbabilityBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's
codec_store_bufferis order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation. -
Endianness assumption: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.
-
Struct size oversight: Phase 2 implicitly assumed ~400 bytes; Phase 3 corrected to 1232. Phase 4 plan uses
sizeof framewhich is compile-time correct, but thecontrols[0].sizefield MUST equal that. Reviewer should verify the FFmpeg ref pattern:controls[0].size = sizeof(controls->frame)— yes, matches. -
Test compile field availability: per memory
feedback_review_empirical_over_theoretical.mdDirection 2, every VAAPI field-name reference in this plan should be verified viagcc -ctest-compile BEFORE Phase 6 starts. Reviewer should test-compile the field accesses listed in Clauses 3-9 (especiallypicture->pic_fields.bits.{...}which iter2 had a bug withuniform_spacing_flag).
Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP8Version0_3 | Commit A — RequestQueryConfigProfiles enumeration block |
| 2. vaCreateConfig SUCCESS | Commit A — RequestCreateConfig case + RequestQueryConfigEntrypoints |
| 3. ffmpeg-vaapi VP8 exit 0 | Commits A+B+C end-to-end; Clause 1 + Clauses 3-9 field mapping |
| 4. mpv DMA-BUF GL HW=SW byte-identical | Commits A+B+C decode correctness + Phase 3 SW JPEGs as Phase 7 anchor |
| 5. 3-codec regression | No regression risk — VP8 path is purely additive (NEW dispatcher case, no shared-state mutation in MPEG-2/HEVC/H.264 paths) |
Substrate state at Phase 4 close
- Phase 1+2+3 commits at gitea (
ea2413e,898544a,fd3fce8). - Fork at iter2 tip
8d71e20on noether; Phase 6 patches will land here. - All Phase 3 anchors captured + preserved on fresnel
/tmp/iter3_phase3/. - Memory rules carry forward;
feedback_fresnel_hostnameadded. - Phase 4 plan ready for sonnet-architect review (Phase 5).
Phase 5 amendments incorporated (post-review)
Sonnet-architect review (phase5_iter3_review.md) returned 4 Critical findings + 1 Suggested correction. All 4 Critical findings empirically verified by author before incorporation per memory feedback_review_empirical_over_theoretical.md Direction 2. Amendments applied in-place to this plan:
| ID | Site | Change | Verified |
|---|---|---|---|
| C1 | Clause 9 | frame.first_part_header_bits = slice->macroblock_offset; (kernel reads it unconditionally per hantro_g1_vp8_dec.c:260) |
✓ grep + source identity |
| C2 | Clause 9 | frame.first_part_size = slice->partition_size[0] + ((macroblock_offset+7)/8) (recover total partition size from VAAPI's post-parse remainder per vaapi_vp8.c:209) |
✓ Phase 3 numerics: 21923+819=22742 |
| C3 | Phase 2 B7 + Phase 4 Commit C | VAProbabilityDataBufferType → VAProbabilityBufferType (correct enum constant from va.h:2058; the "Data" suffix is the struct name not the enum) |
✓ grep |
| C4 | Clauses 6+7 | (s8) → (int8_t) (kernel-internal s8 not exposed to userspace; UAPI exposes __s8; portable userspace cast is int8_t from <stdint.h>) |
✓ grep /usr/include/asm-generic/int-ll64.h:20 |
| S3 | Clause 8 | Comment corrected (kernel has NO probability-table fallback); added assert(probability_set) runtime guard |
✓ source-read hantro_vp8.c:49-143 |
Without the Phase 5 review, iter3 Phase 6 would have:
- Failed first compile attempt (C3 + C4) — ~2 fix-forward commits required
- Then produced visible-but-wrong output (C1 + C2) — Phase 7 byte-compare would have diverged, triggering Phase 7 → Phase 4 loopback for hardware-DMA-offset bug
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-forward commits. The review pass is the right path forward per memory rule "Reviews are never skippable."