Files
fresnel-fourier/phase4_iter3_plan.md
T
claude-noether 2918dda2e0 iter3 Phase 4: plan — 10 contract clauses, ~308-LOC patch, 3 commits
Locks the iter3 patch shape against Phase 3 verbatim cross-validator
payload + Phase 2 contract surface. 10 contract clauses cite kernel
UAPI + VAAPI + FFmpeg ref + Phase 3 byte anchors throughout.

Patch shape (mirrors iter1 ABCD pattern):

  Commit A: src/config.c — enumeration block + CreateConfig case +
            QueryConfigEntrypoints case (3 sites, +16 LOC, 1 file).
            After: vainfo lists VP8Version0_3.
  Commit B: NEW src/vp8.c (~200 LOC) + NEW src/vp8.h (~40 LOC) +
            meson.build sources/headers entries (+2). 3 files
            (2 new + 1 modified).
            After: vp8.o compiles standalone.
  Commit C: src/picture.c — codec_set_controls dispatch +
            codec_store_buffer 4 buffer-type cases + outer
            VAProbabilityDataBufferType case + BeginPicture
            per-frame reset (4 sites, +40 LOC) + src/surface.h
            params.vp8 union member (+10 LOC). 2 files modified.
            After: end-to-end VP8 decode through libva backend.

Total: ~308 LOC, 6 files (2 new + 4 modified), 3 commits.

Contract clauses summary:

  1. Submission shape — single VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=
     V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), id=0xa409c8,
     size=1232 bytes
  2. Local struct alloc + zero-init (memset clears all padding)
  3. Frame geometry + version + per-frame scalars (off-by-one
     num_dct_parts = num_of_partitions - 1)
  4. DPB timestamp resolution (3 refs: last/golden/alt; 0-sentinel
     when SURFACE() returns NULL — mirrors iter1 mpeg2.c pattern)
  5. Loop filter mapping (6 fields + 3 flag bits)
  6. Quantization base + delta derivation (segment 0 = base via
     iqmatrix[0][0]; deltas = iqmatrix[0][N+1] - iqmatrix[0][0]
     signed; per-segment quant_update[1..3] only when segmentation
     enabled)
  7. Segment fields (segment_probs direct copy; flags assembled +
     DELTA_VALUE_MODE set unconditionally per FFmpeg pattern)
  8. Entropy table mapping — 3 VAAPI sources (Picture: y_mode +
     uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3][11]
     direct memcpy; IQMatrix: quant)
  9. Coder state + first-partition fields + flags (6 mainline-
     documented bits only; bit 0x40 + EXPERIMENTAL NOT replicated
     vs ffmpeg-v4l2-request-git anomaly; first_part_header_bits=0
     fallback documented as known fidelity gap)
  10. Final batched submission via v4l2_set_controls

Phase 5 review questions queued (7 items): quantization derivation
correctness, per-segment quant_update semantics, first_part_header_
bits=0 safety, probability buffer ordering, endianness, struct size
sizeof correctness, field-availability test-compile per memory
feedback_review_empirical_over_theoretical Direction 2.

Cross-cutting backlog deferred (B1, B3, B4, B5, B6, L3 inherited;
iter3-Q1 first_part_header_bits + iter3-flags 0x40 anomaly NEW).

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)
  phase3_iter3_baseline.md (Phase 3 verbatim payload anchors)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:52 +00:00

21 KiB
Raw Blame History

Iteration 3 — Phase 4 (plan)

Locks the iter3 patch shape against the verbatim Phase 3 cross-validator payload and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2. Plan structure mirrors iter2's 10-clause template (phase4_iter2_plan.md).

Phase 3 baseline at phase3_iter3_baseline.md (commit fd3fce8) supplies the byte-level anchors. Phase 2 situation analysis at phase2_iter3_situation.md (commit 898544a) supplies the bug list + contract surface read.

Contract clauses

Clause 1 — Submission shape (per-frame)

ONE batched VIDIOC_S_EXT_CTRLS per frame, bound to the surface's permanent request_fd. Single control. No init-time device-wide menus (VP8 has no DECODE_MODE/START_CODE — Phase 0 V4L2 inventory + FFmpeg ref + Phase 3 strace all confirm).

struct v4l2_ext_control ctrls[1] = {
    {
        .id = V4L2_CID_STATELESS_VP8_FRAME,    /* 0xa409c8 */
        .ptr = &frame,                          /* &v4l2_ctrl_vp8_frame */
        .size = sizeof frame,                   /* 1232 bytes */
    },
};

rc = v4l2_set_controls(driver_data->video_fd,
                       surface_object->request_fd,
                       ctrls, 1);

ctrl_class not explicitly set in the call; v4l2_set_controls (src/v4l2.c) wraps it with which=V4L2_CTRL_WHICH_REQUEST_VAL. Phase 3 strace verified ctrl_class=0xf010000 (V4L2_CTRL_CLASS_CODEC_STATELESS) is what the kernel sees, matching iter1+iter2 patterns.

Anchor: Phase 3 baseline § Step 3.3 submission shape table.

Clause 2 — Local struct allocation + zero-init

int vp8_set_controls(struct request_data *driver_data,
                     struct object_context *context,
                     struct object_surface *surface_object)
{
    VAPictureParameterBufferVP8 *picture =
        &surface_object->params.vp8.picture;
    VASliceParameterBufferVP8 *slice =
        &surface_object->params.vp8.slice;
    VAIQMatrixBufferVP8 *iqmatrix =
        &surface_object->params.vp8.iqmatrix;
    VAProbabilityDataBufferVP8 *probability =
        &surface_object->params.vp8.probability;
    bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set;
    bool probability_set = surface_object->params.vp8.probability_set;

    struct v4l2_ctrl_vp8_frame frame;

    memset(&frame, 0, sizeof frame);
    /* All padding fields must be zero per kernel contract; memset
     * covers them. C99 designated initializers in FFmpeg ref achieve
     * the same; explicit memset matches iter1 mpeg2.c style. */
    ...
}

Mirror iter1 mpeg2.c::mpeg2_set_controls (src/mpeg2.c:95-118) opening: extract VAAPI struct pointers + flag readouts, allocate kernel struct on stack, memset zero, populate fields below.

Clause 3 — Frame geometry + version + per-frame scalars

frame.width = picture->frame_width;
frame.height = picture->frame_height;
frame.horizontal_scale = 0;             /* not exposed by VAAPI; FFmpeg also hardcodes 0 */
frame.vertical_scale = 0;
frame.version = picture->pic_fields.bits.version;
frame.prob_skip_false = picture->prob_skip_false;
frame.prob_intra = picture->prob_intra;
frame.prob_last = picture->prob_last;
frame.prob_gf = picture->prob_gf;
frame.num_dct_parts = slice->num_of_partitions - 1;  /* off-by-one per Phase 3 Q2 */

Anchors:

  • VAAPI va_dec_vp8.h:71-160 (PictureParameterBuffer)
  • Phase 3 frame-1 keyframe: width=1280, height=720, version=0, prob_skip=255 (matches mpv/FFmpeg parser output)
  • Phase 3 Q2 resolution: num_dct_parts = num_of_partitions - 1 (BBB inter: VAAPI=2 → kernel=1)

Clause 4 — DPB timestamp resolution (3 references)

struct object_surface *last_ref =
    SURFACE(driver_data, picture->last_ref_frame);
struct object_surface *golden_ref =
    SURFACE(driver_data, picture->golden_ref_frame);
struct object_surface *alt_ref =
    SURFACE(driver_data, picture->alt_ref_frame);

if (last_ref != NULL)
    frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref != NULL)
    frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref != NULL)
    frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);

Mirrors iter1 mpeg2.c::mpeg2_set_controls forward/backward ref pattern (src/mpeg2.c:146-156). For VA_INVALID_SURFACE (key frame: all three refs invalid), SURFACE() returns NULL and timestamps stay 0 (memset default).

Anchors:

  • Phase 3 Q3+Q6 resolution: keyframe writes last_ts=golden_ts=alt_ts=0; inter frames write all non-zero
  • Phase 3 frame-2 inter: last_ts=5000, golden_ts=11000, alt_ts=11000 — confirms timestamps are arbitrary (not nanosecond-real-time, just stable VASurfaceID-derived markers)

Clause 5 — Loop filter mapping

for (i = 0; i < 4; i++) {
    frame.lf.ref_frm_delta[i] = picture->loop_filter_deltas_ref_frame[i];
    frame.lf.mb_mode_delta[i] = picture->loop_filter_deltas_mode[i];
}
frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level;
frame.lf.level = picture->loop_filter_level[0];   /* base segment */
if (picture->pic_fields.bits.loop_filter_adj_enable)
    frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE;
if (picture->pic_fields.bits.mode_ref_lf_delta_update)
    frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE;
if (picture->pic_fields.bits.filter_type)
    frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE;

Anchors:

  • Phase 3 keyframe: ref_frm_delta=(2,0,-2,-2), mb_mode_delta=(4,-2,2,4), sharp=0, level=1, flags=0x03 (ADJ_ENABLE|DELTA_UPDATE)
  • Phase 3 inter: same deltas (BBB-stable), level=15, flags=0x01 (ADJ_ENABLE only) — DELTA_UPDATE only set on keyframes
  • Phase 3 Q5: FILTER_TYPE_SIMPLE bit not set on any captured frame; VAAPI's filter_type=0 for BBB → flag clear ✓

Clause 6 — Quantization base + delta derivation

VAAPI conveys per-segment effective Q indices in iqmatrix->quantization_index[4][6] (segment × component-index). Components: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5).

Kernel takes a base set + per-component deltas (in quant) + per-segment overrides (in segment.quant_update[]).

For segment 0 (always present):

if (iqmatrix_set) {
    frame.quant.y_ac_qi    =        iqmatrix->quantization_index[0][0];
    frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
                                  iqmatrix->quantization_index[0][0]);
    frame.quant.y2_dc_delta = (s8)(iqmatrix->quantization_index[0][2] -
                                   iqmatrix->quantization_index[0][0]);
    frame.quant.y2_ac_delta = (s8)(iqmatrix->quantization_index[0][3] -
                                   iqmatrix->quantization_index[0][0]);
    frame.quant.uv_dc_delta = (s8)(iqmatrix->quantization_index[0][4] -
                                   iqmatrix->quantization_index[0][0]);
    frame.quant.uv_ac_delta = (s8)(iqmatrix->quantization_index[0][5] -
                                   iqmatrix->quantization_index[0][0]);
}
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
 * default for fixtures with implicit-zero deltas (most VP8 streams,
 * including BBB). */

For segments 1..3 (only used when segmentation enabled):

if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
    for (i = 1; i < 4; i++)
        frame.segment.quant_update[i] = (s8)
            (iqmatrix->quantization_index[i][0] -
             iqmatrix->quantization_index[0][0]);
}
/* For BBB (segmentation disabled), all quant_update[] stay zero
 * from memset, matching Phase 3 verbatim payload. */

Anchors:

  • Phase 3 keyframe: y_ac_qi=8, all deltas=0 → BBB has uniform Q across components for segment 0
  • Phase 3 inter: y_ac_qi=122, all deltas=0 (frame 2); y_ac_qi=109, all deltas=0 (frame 4) — all-zero deltas confirmed throughout BBB
  • Predicted VAAPI behavior: iqmatrix->quantization_index[0][0..5] should all equal the same value (e.g. all 8 for keyframe); subtraction yields zero → matches Phase 3 verbatim

Clause 7 — Segment fields

for (i = 0; i < 3; i++)
    frame.segment.segment_probs[i] = picture->mb_segment_tree_probs[i];

if (picture->pic_fields.bits.segmentation_enabled)
    frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED;
if (picture->pic_fields.bits.update_mb_segmentation_map)
    frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.update_segment_feature_data)
    frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA;
/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta; FFmpeg sets
 * unconditionally per !s->segmentation.absolute_vals (default).
 * Match FFmpeg pattern: set unconditionally. Kernel ignores when
 * ENABLED bit is clear (BBB case). */
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;

/* segment.lf_update[] populated only when segmentation enabled */
if (picture->pic_fields.bits.segmentation_enabled) {
    for (i = 0; i < 4; i++)
        frame.segment.lf_update[i] = (s8)
            (picture->loop_filter_level[i] -
             picture->loop_filter_level[0]);
}

Anchors:

  • Phase 3 every captured frame: segment.flags = 0x08 = DELTA_VALUE_MODE (BBB segmentation disabled, FFmpeg sets DVM unconditionally)
  • Phase 3 every captured frame: segment_probs = (0,0,0), quant_update = (0,0,0,0), lf_update = (0,0,0,0) — all zero (BBB segmentation disabled)

Clause 8 — Entropy table mapping (3 sources)

VAAPI splits VP8 entropy across THREE buffers: PictureParameterBuffer (mode + mv probs), ProbabilityDataBuffer (coeff_probs), and IQMatrix (quantization). The libva backend assembles into kernel's single v4l2_vp8_entropy struct.

for (i = 0; i < 4; i++)
    frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i];
for (i = 0; i < 3; i++)
    frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i];
for (i = 0; i < 2; i++)
    for (j = 0; j < 19; j++)
        frame.entropy.mv_probs[i][j] = picture->mv_probs[i][j];

if (probability_set) {
    /* VAAPI's [4][8][3][11] layout matches kernel's exactly; direct memcpy */
    memcpy(frame.entropy.coeff_probs,
           probability->dct_coeff_probs,
           sizeof frame.entropy.coeff_probs);
}
/* If probability_set==false: leave coeff_probs zero from memset.
 * Most consumers will send VAProbabilityDataBuffer per frame; if not,
 * kernel hantro driver re-derives from default tables. Phase 5
 * review will flag this as a fidelity gap if Phase 7 mpv consumer
 * doesn't send probability buffers. */

Anchors:

  • Phase 3 keyframe: y_mode_probs=(145,156,163,128) (FFmpeg keyframe const), uv_mode_probs=(142,114,183) (FFmpeg keyframe const). When VAAPI's mpv consumer also writes these constants for keyframes, byte-equality holds.
  • Phase 3 inter frame 2: y_mode_probs=(3,1,128,1), uv_mode_probs=(162,101,204) — parser-derived, varies per frame.
  • Phase 3 entropy hash (sha1-16 prefix): keyframe 8b2fdae200eb193f, inter changes — confirms entropy state propagates per-frame; iter3 backend must wire the per-frame ProbabilityDataBuffer through to coeff_probs.

Clause 9 — Coder state + first-partition fields + flags

frame.coder_state.range = picture->bool_coder_ctx.range;
frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* coder_state.padding stays zero from memset */

frame.first_part_size = slice->partition_size[0];
frame.first_part_header_bits = 0;
/* first_part_header_bits: VAAPI doesn't expose this directly. FFmpeg
 * derives from internal parser bit-cursor state. Phase 3 keyframe
 * empirical = 6550, inter ranges 86..254. Leaving 0 here is a known
 * fidelity gap; kernel hantro_vp8.c re-parses the uncompressed header
 * and may not depend on this field for correctness. Phase 7 byte-
 * compare will reveal whether Phase 5 review needs to re-open this
 * with bitstream-side derivation (closest VAAPI analog: slice->
 * macroblock_offset, but it's the MB-data offset not header-bits). */

for (i = 0; i < 8; i++)
    frame.dct_part_sizes[i] = slice->partition_size[i + 1];

if (!picture->pic_fields.bits.key_frame)
    frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME;
/* VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec */
frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME;
/* Force unconditional per Phase 3 Q4: BBB has no alt-ref invisible
 * frames in iter3 scope; document as known fidelity gap for future
 * iters covering invisible-frame streams. */
if (picture->pic_fields.bits.mb_no_coeff_skip)
    frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF;
if (picture->pic_fields.bits.sign_bias_golden)
    frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.sign_bias_alternate)
    frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT;
/* EXPERIMENTAL bit (0x02) and bit 0x40 NOT set. Phase 3 baseline
 * notes ffmpeg-v4l2-request-git sets these for unclear reasons;
 * mainline UAPI defines neither for our use case (BBB profile=0).
 * Kernel hantro_vp8.c only inspects KEY_FRAME bit. */

Anchors:

  • Phase 3 keyframe: coder_state=(248,133,2), first_part_size=22742, dct_part_sizes=(277872,0,...,0) (277872 is huge — note partition_size[] may overflow u32 for very-large partitions; verified BBB partitions fit u32 by full Phase 3 capture; expected within u32 range for 1280×720 normal-quality content)
  • Phase 3 keyframe: flags=0x0d = KEY|SHOW|NOSKIP — libva backend produces 0x0d ✓
  • Phase 3 inter: flags=0x66 (FFmpeg) — libva backend produces e.g. 0x04 (SHOW only) or 0x24 (SHOW|SBA) depending on VAAPI sign_bias bits. Phase 7 byte-compare uses field-level, not whole-flags, equality.

Clause 10 — Final batched submission

struct v4l2_ext_control ctrls[1] = {
    {
        .id = V4L2_CID_STATELESS_VP8_FRAME,
        .ptr = &frame,
        .size = sizeof frame,
    },
};

rc = v4l2_set_controls(driver_data->video_fd,
                       surface_object->request_fd,
                       ctrls, 1);
if (rc < 0)
    return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;

Returns the dispatcher contract per picture.c::codec_set_controls (negative rc on failure → caller wraps with VA_STATUS_ERROR_OPERATION_FAILED).

Patch shape (commits)

iter3 implements as 4 commits (mirrors iter1 ABCD pattern):

Commit A — src/config.c: enumeration + dispatch + entrypoints

3 sites:

  1. RequestQueryConfigProfiles (after HEVC enumeration block, line ~160): add VP8 enumeration block probing V4L2_PIX_FMT_VP8_FRAME against single + MPLANE OUTPUT formats.
  2. RequestCreateConfig (after iter2's HEVCMain case, line ~75): add case VAProfileVP8Version0_3: break; with comment block matching iter1+iter2 style.
  3. RequestQueryConfigEntrypoints (line ~180): add case VAProfileVP8Version0_3: to existing fall-through case list.

Predicted +16 LOC, 1 file modified. Build target after Commit A: vainfo lists VAProfileVP8Version0_3 on hantro env binding (criterion 1) but vaCreateConfig would fail at later use because no codec dispatcher.

Commit B — NEW src/vp8.c + src/vp8.h + src/meson.build integration

Net-new files implementing vp8_set_controls() per Clauses 1-10 above. Plus meson.build sources/headers list updates to compile + link the new module.

Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson.build. 3 files (2 new + 1 modified). Build target after Commit B: vp8.o compiles standalone, but picture.c can't dispatch yet.

Commit C — src/picture.c + src/surface.h: dispatcher + per-frame buffer routing + union extension + per-frame reset

5 sites:

  1. picture.c:34-36 include block: add #include "vp8.h".
  2. picture.c::codec_set_controls (line ~218 after HEVCMain case): add VP8 dispatch case calling vp8_set_controls.
  3. picture.c::codec_store_buffer (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case VAProbabilityDataBufferType with VP8 inner case.
  4. picture.c::RequestBeginPicture (line ~300): add 2 reset lines for params.vp8.iqmatrix_set = false; and params.vp8.probability_set = false;.
  5. surface.h::object_surface::params union (line ~112): insert vp8 struct after h265 (~10 LOC).

Predicted +50 LOC, 2 files modified.

Build target after Commit C: backend builds clean. mpv-vaapi VP8 decode should work end-to-end. Expected to satisfy criteria 1, 2, 3 of Phase 1.

Commit D — src/v4l2.c (CONDITIONAL — only if v4l2_set_controls doesn't already fall through cleanly)

Phase 2 source-read showed v4l2_set_controls is fourcc-agnostic. Likely no Commit D needed; placeholder commit slot for fix-forward if Phase 6 build/runtime surfaces an unexpected ioctl plumbing issue (mirrors iter1 Commit D fix-forward).

Files touched summary

File New Modified LOC delta Commit
src/config.c +16 A
src/vp8.c +200 B
src/vp8.h +40 B
src/meson.build +2 B
src/picture.c +40 C
src/surface.h +10 C

Total: ~308 LOC, 6 files (2 new + 4 modified). 3 commits (A, B, C).

Cross-cutting backlog (out of iter3 scope)

Items inherited from iter1+iter2 close, NOT touched in iter3:

  • B1 V4L2 device-discovery (per-boot device numbering shuffle) — workaround: re-verify per-boot via v4l2-ctl --info.
  • B3 picture.c BeginPicture profile-aware reset — currently writes byte 240 of union (h264.matrix_set) regardless of profile; iter3 adds vp8 resets but doesn't refactor the cross-profile cleanup.
  • B4 context.c log suppression for unsupported codec controls.
  • B5 mpeg2 vbv_buffer_size polish (iter1 S2 finding).
  • B6 h265 SPS bitstream-parse fidelity gap.
  • L3 vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only (memory rule).

Plus iter3 NEW item:

  • iter3-Q1 first_part_header_bits derivation gap. Plan leaves 0; Phase 7 byte-compare may reveal kernel sensitivity. If so, future iter or Phase 4 loopback derives from VAAPI slice->macroblock_offset + entropy-header subtraction.
  • iter3-flags ffmpeg-v4l2-request-git sets undocumented bit 0x40 in flags field; libva backend won't replicate. Phase 7 byte-compare will use FIELD-LEVEL equality, not whole-flags equality.

Phase 5 review prep

Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer:

  1. Quantization derivation: is the "compute deltas as iqmatrix[0][N+1] - iqmatrix[0][0]" formula correct for VAAPI's representation? Or does VAAPI store the deltas already (then we just copy-not-subtract)? Phase 3 verbatim shows all-zero deltas for BBB, but BBB doesn't exercise this code path meaningfully — non-zero deltas need a different fixture, not in iter3 scope.

  2. Per-segment quant_update[1..3] derivation: when segmentation enabled, is the kernel's quant_update[s] an absolute Q index (clipped to 0..127) or a signed delta from segment 0's base? VP8 spec says delta in delta_value_mode; libva backend currently writes iqmatrix[s][0] - iqmatrix[0][0] which is signed delta. Phase 5 should empirically verify against a segmentation-enabled test fixture if one is available, OR document as known fidelity gap if not.

  3. first_part_header_bits = 0 fallback: is this actually safe for hantro? Reviewer should grep kernel hantro_vp8.c + rkvdec_vp8.c for any read of first_part_header_bits in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic.

  4. Probability buffer ordering: VAAPI's mpv consumer might send VAProbabilityDataBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's codec_store_buffer is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.

  5. Endianness assumption: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.

  6. Struct size oversight: Phase 2 implicitly assumed ~400 bytes; Phase 3 corrected to 1232. Phase 4 plan uses sizeof frame which is compile-time correct, but the controls[0].size field MUST equal that. Reviewer should verify the FFmpeg ref pattern: controls[0].size = sizeof(controls->frame) — yes, matches.

  7. Test compile field availability: per memory feedback_review_empirical_over_theoretical.md Direction 2, every VAAPI field-name reference in this plan should be verified via gcc -c test-compile BEFORE Phase 6 starts. Reviewer should test-compile the field accesses listed in Clauses 3-9 (especially picture->pic_fields.bits.{...} which iter2 had a bug with uniform_spacing_flag).

Phase 1 criteria → Phase 4 plan trace

Criterion Plan addresses
1. vainfo enumerates VP8Version0_3 Commit A — RequestQueryConfigProfiles enumeration block
2. vaCreateConfig SUCCESS Commit A — RequestCreateConfig case + RequestQueryConfigEntrypoints
3. ffmpeg-vaapi VP8 exit 0 Commits A+B+C end-to-end; Clause 1 + Clauses 3-9 field mapping
4. mpv DMA-BUF GL HW=SW byte-identical Commits A+B+C decode correctness + Phase 3 SW JPEGs as Phase 7 anchor
5. 3-codec regression No regression risk — VP8 path is purely additive (NEW dispatcher case, no shared-state mutation in MPEG-2/HEVC/H.264 paths)

Substrate state at Phase 4 close

  • Phase 1+2+3 commits at gitea (ea2413e, 898544a, fd3fce8).
  • Fork at iter2 tip 8d71e20 on noether; Phase 6 patches will land here.
  • All Phase 3 anchors captured + preserved on fresnel /tmp/iter3_phase3/.
  • Memory rules carry forward; feedback_fresnel_hostname added.
  • Phase 4 plan ready for sonnet-architect review (Phase 5).