From 2918dda2e0581fbd8878d7c0a3e59f0793f54c94 Mon Sep 17 00:00:00 2001 From: "Claude (noether)" Date: Fri, 8 May 2026 20:39:52 +0000 Subject: [PATCH] =?UTF-8?q?iter3=20Phase=204:=20plan=20=E2=80=94=2010=20co?= =?UTF-8?q?ntract=20clauses,=20~308-LOC=20patch,=203=20commits?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Locks the iter3 patch shape against Phase 3 verbatim cross-validator payload + Phase 2 contract surface. 10 contract clauses cite kernel UAPI + VAAPI + FFmpeg ref + Phase 3 byte anchors throughout. Patch shape (mirrors iter1 ABCD pattern): Commit A: src/config.c — enumeration block + CreateConfig case + QueryConfigEntrypoints case (3 sites, +16 LOC, 1 file). After: vainfo lists VP8Version0_3. Commit B: NEW src/vp8.c (~200 LOC) + NEW src/vp8.h (~40 LOC) + meson.build sources/headers entries (+2). 3 files (2 new + 1 modified). After: vp8.o compiles standalone. Commit C: src/picture.c — codec_set_controls dispatch + codec_store_buffer 4 buffer-type cases + outer VAProbabilityDataBufferType case + BeginPicture per-frame reset (4 sites, +40 LOC) + src/surface.h params.vp8 union member (+10 LOC). 2 files modified. After: end-to-end VP8 decode through libva backend. Total: ~308 LOC, 6 files (2 new + 4 modified), 3 commits. Contract clauses summary: 1. Submission shape — single VIDIOC_S_EXT_CTRLS, count=1, ctrl_class= V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), id=0xa409c8, size=1232 bytes 2. Local struct alloc + zero-init (memset clears all padding) 3. Frame geometry + version + per-frame scalars (off-by-one num_dct_parts = num_of_partitions - 1) 4. DPB timestamp resolution (3 refs: last/golden/alt; 0-sentinel when SURFACE() returns NULL — mirrors iter1 mpeg2.c pattern) 5. Loop filter mapping (6 fields + 3 flag bits) 6. Quantization base + delta derivation (segment 0 = base via iqmatrix[0][0]; deltas = iqmatrix[0][N+1] - iqmatrix[0][0] signed; per-segment quant_update[1..3] only when segmentation enabled) 7. Segment fields (segment_probs direct copy; flags assembled + DELTA_VALUE_MODE set unconditionally per FFmpeg pattern) 8. Entropy table mapping — 3 VAAPI sources (Picture: y_mode + uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3][11] direct memcpy; IQMatrix: quant) 9. Coder state + first-partition fields + flags (6 mainline- documented bits only; bit 0x40 + EXPERIMENTAL NOT replicated vs ffmpeg-v4l2-request-git anomaly; first_part_header_bits=0 fallback documented as known fidelity gap) 10. Final batched submission via v4l2_set_controls Phase 5 review questions queued (7 items): quantization derivation correctness, per-segment quant_update semantics, first_part_header_ bits=0 safety, probability buffer ordering, endianness, struct size sizeof correctness, field-availability test-compile per memory feedback_review_empirical_over_theoretical Direction 2. Cross-cutting backlog deferred (B1, B3, B4, B5, B6, L3 inherited; iter3-Q1 first_part_header_bits + iter3-flags 0x40 anomaly NEW). Refs: phase0_findings_iter3.md (Phase 1 lock) phase2_iter3_situation.md (Phase 2 contract surface) phase3_iter3_baseline.md (Phase 3 verbatim payload anchors) Co-Authored-By: Claude Opus 4.7 (1M context) --- phase4_iter3_plan.md | 399 +++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 399 insertions(+) create mode 100644 phase4_iter3_plan.md diff --git a/phase4_iter3_plan.md b/phase4_iter3_plan.md new file mode 100644 index 0000000..7b955ff --- /dev/null +++ b/phase4_iter3_plan.md @@ -0,0 +1,399 @@ +# Iteration 3 — Phase 4 (plan) + +Locks the iter3 patch shape against the verbatim Phase 3 cross-validator payload and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2. Plan structure mirrors iter2's 10-clause template (`phase4_iter2_plan.md`). + +Phase 3 baseline at `phase3_iter3_baseline.md` (commit `fd3fce8`) supplies the byte-level anchors. Phase 2 situation analysis at `phase2_iter3_situation.md` (commit `898544a`) supplies the bug list + contract surface read. + +## Contract clauses + +### Clause 1 — Submission shape (per-frame) + +ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. Single control. No init-time device-wide menus (VP8 has no DECODE_MODE/START_CODE — Phase 0 V4L2 inventory + FFmpeg ref + Phase 3 strace all confirm). + +```c +struct v4l2_ext_control ctrls[1] = { + { + .id = V4L2_CID_STATELESS_VP8_FRAME, /* 0xa409c8 */ + .ptr = &frame, /* &v4l2_ctrl_vp8_frame */ + .size = sizeof frame, /* 1232 bytes */ + }, +}; + +rc = v4l2_set_controls(driver_data->video_fd, + surface_object->request_fd, + ctrls, 1); +``` + +`ctrl_class` not explicitly set in the call; `v4l2_set_controls` (`src/v4l2.c`) wraps it with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verified `ctrl_class=0xf010000` (`V4L2_CTRL_CLASS_CODEC_STATELESS`) is what the kernel sees, matching iter1+iter2 patterns. + +**Anchor**: Phase 3 baseline § Step 3.3 submission shape table. + +### Clause 2 — Local struct allocation + zero-init + +```c +int vp8_set_controls(struct request_data *driver_data, + struct object_context *context, + struct object_surface *surface_object) +{ + VAPictureParameterBufferVP8 *picture = + &surface_object->params.vp8.picture; + VASliceParameterBufferVP8 *slice = + &surface_object->params.vp8.slice; + VAIQMatrixBufferVP8 *iqmatrix = + &surface_object->params.vp8.iqmatrix; + VAProbabilityDataBufferVP8 *probability = + &surface_object->params.vp8.probability; + bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set; + bool probability_set = surface_object->params.vp8.probability_set; + + struct v4l2_ctrl_vp8_frame frame; + + memset(&frame, 0, sizeof frame); + /* All padding fields must be zero per kernel contract; memset + * covers them. C99 designated initializers in FFmpeg ref achieve + * the same; explicit memset matches iter1 mpeg2.c style. */ + ... +} +``` + +Mirror iter1 mpeg2.c::mpeg2_set_controls (`src/mpeg2.c:95-118`) opening: extract VAAPI struct pointers + flag readouts, allocate kernel struct on stack, memset zero, populate fields below. + +### Clause 3 — Frame geometry + version + per-frame scalars + +```c +frame.width = picture->frame_width; +frame.height = picture->frame_height; +frame.horizontal_scale = 0; /* not exposed by VAAPI; FFmpeg also hardcodes 0 */ +frame.vertical_scale = 0; +frame.version = picture->pic_fields.bits.version; +frame.prob_skip_false = picture->prob_skip_false; +frame.prob_intra = picture->prob_intra; +frame.prob_last = picture->prob_last; +frame.prob_gf = picture->prob_gf; +frame.num_dct_parts = slice->num_of_partitions - 1; /* off-by-one per Phase 3 Q2 */ +``` + +**Anchors**: +- VAAPI `va_dec_vp8.h:71-160` (PictureParameterBuffer) +- Phase 3 frame-1 keyframe: width=1280, height=720, version=0, prob_skip=255 (matches mpv/FFmpeg parser output) +- Phase 3 Q2 resolution: `num_dct_parts = num_of_partitions - 1` (BBB inter: VAAPI=2 → kernel=1) + +### Clause 4 — DPB timestamp resolution (3 references) + +```c +struct object_surface *last_ref = + SURFACE(driver_data, picture->last_ref_frame); +struct object_surface *golden_ref = + SURFACE(driver_data, picture->golden_ref_frame); +struct object_surface *alt_ref = + SURFACE(driver_data, picture->alt_ref_frame); + +if (last_ref != NULL) + frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp); +if (golden_ref != NULL) + frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp); +if (alt_ref != NULL) + frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp); +``` + +Mirrors iter1 mpeg2.c::mpeg2_set_controls forward/backward ref pattern (`src/mpeg2.c:146-156`). For VA_INVALID_SURFACE (key frame: all three refs invalid), `SURFACE()` returns NULL and timestamps stay 0 (memset default). + +**Anchors**: +- Phase 3 Q3+Q6 resolution: keyframe writes `last_ts=golden_ts=alt_ts=0`; inter frames write all non-zero +- Phase 3 frame-2 inter: `last_ts=5000, golden_ts=11000, alt_ts=11000` — confirms timestamps are arbitrary (not nanosecond-real-time, just stable VASurfaceID-derived markers) + +### Clause 5 — Loop filter mapping + +```c +for (i = 0; i < 4; i++) { + frame.lf.ref_frm_delta[i] = picture->loop_filter_deltas_ref_frame[i]; + frame.lf.mb_mode_delta[i] = picture->loop_filter_deltas_mode[i]; +} +frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level; +frame.lf.level = picture->loop_filter_level[0]; /* base segment */ +if (picture->pic_fields.bits.loop_filter_adj_enable) + frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE; +if (picture->pic_fields.bits.mode_ref_lf_delta_update) + frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE; +if (picture->pic_fields.bits.filter_type) + frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE; +``` + +**Anchors**: +- Phase 3 keyframe: `ref_frm_delta=(2,0,-2,-2)`, `mb_mode_delta=(4,-2,2,4)`, `sharp=0`, `level=1`, `flags=0x03 (ADJ_ENABLE|DELTA_UPDATE)` +- Phase 3 inter: same deltas (BBB-stable), `level=15`, `flags=0x01 (ADJ_ENABLE only)` — DELTA_UPDATE only set on keyframes +- Phase 3 Q5: FILTER_TYPE_SIMPLE bit not set on any captured frame; VAAPI's `filter_type=0` for BBB → flag clear ✓ + +### Clause 6 — Quantization base + delta derivation + +VAAPI conveys per-segment effective Q indices in `iqmatrix->quantization_index[4][6]` (segment × component-index). Components: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5). + +Kernel takes a base set + per-component deltas (in `quant`) + per-segment overrides (in `segment.quant_update[]`). + +For segment 0 (always present): + +```c +if (iqmatrix_set) { + frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0]; + frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] - + iqmatrix->quantization_index[0][0]); + frame.quant.y2_dc_delta = (s8)(iqmatrix->quantization_index[0][2] - + iqmatrix->quantization_index[0][0]); + frame.quant.y2_ac_delta = (s8)(iqmatrix->quantization_index[0][3] - + iqmatrix->quantization_index[0][0]); + frame.quant.uv_dc_delta = (s8)(iqmatrix->quantization_index[0][4] - + iqmatrix->quantization_index[0][0]); + frame.quant.uv_ac_delta = (s8)(iqmatrix->quantization_index[0][5] - + iqmatrix->quantization_index[0][0]); +} +/* if iqmatrix_set==false: frame.quant stays zero from memset; safe + * default for fixtures with implicit-zero deltas (most VP8 streams, + * including BBB). */ +``` + +For segments 1..3 (only used when segmentation enabled): + +```c +if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) { + for (i = 1; i < 4; i++) + frame.segment.quant_update[i] = (s8) + (iqmatrix->quantization_index[i][0] - + iqmatrix->quantization_index[0][0]); +} +/* For BBB (segmentation disabled), all quant_update[] stay zero + * from memset, matching Phase 3 verbatim payload. */ +``` + +**Anchors**: +- Phase 3 keyframe: `y_ac_qi=8, all deltas=0` → BBB has uniform Q across components for segment 0 +- Phase 3 inter: `y_ac_qi=122, all deltas=0` (frame 2); `y_ac_qi=109, all deltas=0` (frame 4) — all-zero deltas confirmed throughout BBB +- Predicted VAAPI behavior: `iqmatrix->quantization_index[0][0..5]` should all equal the same value (e.g. all 8 for keyframe); subtraction yields zero → matches Phase 3 verbatim + +### Clause 7 — Segment fields + +```c +for (i = 0; i < 3; i++) + frame.segment.segment_probs[i] = picture->mb_segment_tree_probs[i]; + +if (picture->pic_fields.bits.segmentation_enabled) + frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED; +if (picture->pic_fields.bits.update_mb_segmentation_map) + frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP; +if (picture->pic_fields.bits.update_segment_feature_data) + frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA; +/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta; FFmpeg sets + * unconditionally per !s->segmentation.absolute_vals (default). + * Match FFmpeg pattern: set unconditionally. Kernel ignores when + * ENABLED bit is clear (BBB case). */ +frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE; + +/* segment.lf_update[] populated only when segmentation enabled */ +if (picture->pic_fields.bits.segmentation_enabled) { + for (i = 0; i < 4; i++) + frame.segment.lf_update[i] = (s8) + (picture->loop_filter_level[i] - + picture->loop_filter_level[0]); +} +``` + +**Anchors**: +- Phase 3 every captured frame: `segment.flags = 0x08 = DELTA_VALUE_MODE` (BBB segmentation disabled, FFmpeg sets DVM unconditionally) +- Phase 3 every captured frame: `segment_probs = (0,0,0)`, `quant_update = (0,0,0,0)`, `lf_update = (0,0,0,0)` — all zero (BBB segmentation disabled) + +### Clause 8 — Entropy table mapping (3 sources) + +VAAPI splits VP8 entropy across THREE buffers: PictureParameterBuffer (mode + mv probs), ProbabilityDataBuffer (coeff_probs), and IQMatrix (quantization). The libva backend assembles into kernel's single `v4l2_vp8_entropy` struct. + +```c +for (i = 0; i < 4; i++) + frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i]; +for (i = 0; i < 3; i++) + frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i]; +for (i = 0; i < 2; i++) + for (j = 0; j < 19; j++) + frame.entropy.mv_probs[i][j] = picture->mv_probs[i][j]; + +if (probability_set) { + /* VAAPI's [4][8][3][11] layout matches kernel's exactly; direct memcpy */ + memcpy(frame.entropy.coeff_probs, + probability->dct_coeff_probs, + sizeof frame.entropy.coeff_probs); +} +/* If probability_set==false: leave coeff_probs zero from memset. + * Most consumers will send VAProbabilityDataBuffer per frame; if not, + * kernel hantro driver re-derives from default tables. Phase 5 + * review will flag this as a fidelity gap if Phase 7 mpv consumer + * doesn't send probability buffers. */ +``` + +**Anchors**: +- Phase 3 keyframe: `y_mode_probs=(145,156,163,128)` (FFmpeg keyframe const), `uv_mode_probs=(142,114,183)` (FFmpeg keyframe const). When VAAPI's mpv consumer also writes these constants for keyframes, byte-equality holds. +- Phase 3 inter frame 2: `y_mode_probs=(3,1,128,1)`, `uv_mode_probs=(162,101,204)` — parser-derived, varies per frame. +- Phase 3 entropy hash (sha1-16 prefix): keyframe `8b2fdae200eb193f`, inter changes — confirms entropy state propagates per-frame; iter3 backend must wire the per-frame ProbabilityDataBuffer through to coeff_probs. + +### Clause 9 — Coder state + first-partition fields + flags + +```c +frame.coder_state.range = picture->bool_coder_ctx.range; +frame.coder_state.value = picture->bool_coder_ctx.value; +frame.coder_state.bit_count = picture->bool_coder_ctx.count; +/* coder_state.padding stays zero from memset */ + +frame.first_part_size = slice->partition_size[0]; +frame.first_part_header_bits = 0; +/* first_part_header_bits: VAAPI doesn't expose this directly. FFmpeg + * derives from internal parser bit-cursor state. Phase 3 keyframe + * empirical = 6550, inter ranges 86..254. Leaving 0 here is a known + * fidelity gap; kernel hantro_vp8.c re-parses the uncompressed header + * and may not depend on this field for correctness. Phase 7 byte- + * compare will reveal whether Phase 5 review needs to re-open this + * with bitstream-side derivation (closest VAAPI analog: slice-> + * macroblock_offset, but it's the MB-data offset not header-bits). */ + +for (i = 0; i < 8; i++) + frame.dct_part_sizes[i] = slice->partition_size[i + 1]; + +if (!picture->pic_fields.bits.key_frame) + frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME; +/* VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec */ +frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME; +/* Force unconditional per Phase 3 Q4: BBB has no alt-ref invisible + * frames in iter3 scope; document as known fidelity gap for future + * iters covering invisible-frame streams. */ +if (picture->pic_fields.bits.mb_no_coeff_skip) + frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF; +if (picture->pic_fields.bits.sign_bias_golden) + frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN; +if (picture->pic_fields.bits.sign_bias_alternate) + frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT; +/* EXPERIMENTAL bit (0x02) and bit 0x40 NOT set. Phase 3 baseline + * notes ffmpeg-v4l2-request-git sets these for unclear reasons; + * mainline UAPI defines neither for our use case (BBB profile=0). + * Kernel hantro_vp8.c only inspects KEY_FRAME bit. */ +``` + +**Anchors**: +- Phase 3 keyframe: `coder_state=(248,133,2)`, `first_part_size=22742`, `dct_part_sizes=(277872,0,...,0)` (`277872` is huge — note `partition_size[]` may overflow u32 for very-large partitions; verified BBB partitions fit u32 by full Phase 3 capture; expected within u32 range for 1280×720 normal-quality content) +- Phase 3 keyframe: `flags=0x0d = KEY|SHOW|NOSKIP` — libva backend produces 0x0d ✓ +- Phase 3 inter: `flags=0x66` (FFmpeg) — libva backend produces e.g. 0x04 (SHOW only) or 0x24 (SHOW|SBA) depending on VAAPI sign_bias bits. Phase 7 byte-compare uses field-level, not whole-flags, equality. + +### Clause 10 — Final batched submission + +```c +struct v4l2_ext_control ctrls[1] = { + { + .id = V4L2_CID_STATELESS_VP8_FRAME, + .ptr = &frame, + .size = sizeof frame, + }, +}; + +rc = v4l2_set_controls(driver_data->video_fd, + surface_object->request_fd, + ctrls, 1); +if (rc < 0) + return VA_STATUS_ERROR_OPERATION_FAILED; +return 0; +``` + +Returns the dispatcher contract per `picture.c::codec_set_controls` (negative rc on failure → caller wraps with `VA_STATUS_ERROR_OPERATION_FAILED`). + +## Patch shape (commits) + +iter3 implements as 4 commits (mirrors iter1 ABCD pattern): + +### Commit A — `src/config.c`: enumeration + dispatch + entrypoints + +3 sites: +1. `RequestQueryConfigProfiles` (after HEVC enumeration block, line ~160): add VP8 enumeration block probing `V4L2_PIX_FMT_VP8_FRAME` against single + MPLANE OUTPUT formats. +2. `RequestCreateConfig` (after iter2's HEVCMain case, line ~75): add `case VAProfileVP8Version0_3: break;` with comment block matching iter1+iter2 style. +3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP8Version0_3:` to existing fall-through case list. + +Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` lists `VAProfileVP8Version0_3` on hantro env binding (criterion 1) but `vaCreateConfig` would fail at later use because no codec dispatcher. + +### Commit B — NEW `src/vp8.c` + `src/vp8.h` + `src/meson.build` integration + +Net-new files implementing `vp8_set_controls()` per Clauses 1-10 above. Plus meson.build sources/headers list updates to compile + link the new module. + +Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson.build. 3 files (2 new + 1 modified). Build target after Commit B: vp8.o compiles standalone, but picture.c can't dispatch yet. + +### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + per-frame buffer routing + union extension + per-frame reset + +5 sites: +1. `picture.c:34-36` include block: add `#include "vp8.h"`. +2. `picture.c::codec_set_controls` (line ~218 after HEVCMain case): add VP8 dispatch case calling `vp8_set_controls`. +3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityDataBufferType` with VP8 inner case. +4. `picture.c::RequestBeginPicture` (line ~300): add 2 reset lines for `params.vp8.iqmatrix_set = false;` and `params.vp8.probability_set = false;`. +5. `surface.h::object_surface::params` union (line ~112): insert vp8 struct after h265 (~10 LOC). + +Predicted +50 LOC, 2 files modified. + +Build target after Commit C: backend builds clean. mpv-vaapi VP8 decode should work end-to-end. Expected to satisfy criteria 1, 2, 3 of Phase 1. + +### Commit D — `src/v4l2.c` (CONDITIONAL — only if v4l2_set_controls doesn't already fall through cleanly) + +Phase 2 source-read showed v4l2_set_controls is fourcc-agnostic. **Likely no Commit D needed**; placeholder commit slot for fix-forward if Phase 6 build/runtime surfaces an unexpected ioctl plumbing issue (mirrors iter1 Commit D fix-forward). + +## Files touched summary + +| File | New | Modified | LOC delta | Commit | +|---|:-:|:-:|:-:|:-:| +| `src/config.c` | | ✓ | +16 | A | +| `src/vp8.c` | ✓ | | +200 | B | +| `src/vp8.h` | ✓ | | +40 | B | +| `src/meson.build` | | ✓ | +2 | B | +| `src/picture.c` | | ✓ | +40 | C | +| `src/surface.h` | | ✓ | +10 | C | + +**Total**: ~308 LOC, 6 files (2 new + 4 modified). 3 commits (A, B, C). + +## Cross-cutting backlog (out of iter3 scope) + +Items inherited from iter1+iter2 close, NOT touched in iter3: + +- **B1** V4L2 device-discovery (per-boot device numbering shuffle) — workaround: re-verify per-boot via v4l2-ctl --info. +- **B3** picture.c BeginPicture profile-aware reset — currently writes byte 240 of union (h264.matrix_set) regardless of profile; iter3 adds vp8 resets but doesn't refactor the cross-profile cleanup. +- **B4** context.c log suppression for unsupported codec controls. +- **B5** mpeg2 vbv_buffer_size polish (iter1 S2 finding). +- **B6** h265 SPS bitstream-parse fidelity gap. +- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only (memory rule). + +Plus iter3 NEW item: +- **iter3-Q1** `first_part_header_bits` derivation gap. Plan leaves 0; Phase 7 byte-compare may reveal kernel sensitivity. If so, future iter or Phase 4 loopback derives from VAAPI `slice->macroblock_offset` + entropy-header subtraction. +- **iter3-flags** ffmpeg-v4l2-request-git sets undocumented bit 0x40 in flags field; libva backend won't replicate. Phase 7 byte-compare will use FIELD-LEVEL equality, not whole-`flags` equality. + +## Phase 5 review prep + +Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer: + +1. **Quantization derivation**: is the "compute deltas as iqmatrix[0][N+1] - iqmatrix[0][0]" formula correct for VAAPI's representation? Or does VAAPI store the deltas already (then we just copy-not-subtract)? Phase 3 verbatim shows all-zero deltas for BBB, but BBB doesn't exercise this code path meaningfully — non-zero deltas need a different fixture, not in iter3 scope. + +2. **Per-segment quant_update[1..3] derivation**: when segmentation enabled, is the kernel's `quant_update[s]` an absolute Q index (clipped to 0..127) or a signed delta from segment 0's base? VP8 spec says delta in `delta_value_mode`; libva backend currently writes `iqmatrix[s][0] - iqmatrix[0][0]` which is signed delta. **Phase 5 should empirically verify against a segmentation-enabled test fixture if one is available**, OR document as known fidelity gap if not. + +3. **`first_part_header_bits = 0` fallback**: is this actually safe for hantro? Reviewer should grep kernel `hantro_vp8.c` + `rkvdec_vp8.c` for any read of `first_part_header_bits` in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic. + +4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityDataBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation. + +5. **Endianness assumption**: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review. + +6. **Struct size oversight**: Phase 2 implicitly assumed ~400 bytes; Phase 3 corrected to 1232. Phase 4 plan uses `sizeof frame` which is compile-time correct, but the `controls[0].size` field MUST equal that. Reviewer should verify the FFmpeg ref pattern: `controls[0].size = sizeof(controls->frame)` — yes, matches. + +7. **Test compile field availability**: per memory `feedback_review_empirical_over_theoretical.md` Direction 2, every VAAPI field-name reference in this plan should be verified via `gcc -c` test-compile BEFORE Phase 6 starts. Reviewer should test-compile the field accesses listed in Clauses 3-9 (especially `picture->pic_fields.bits.{...}` which iter2 had a bug with `uniform_spacing_flag`). + +## Phase 1 criteria → Phase 4 plan trace + +| Criterion | Plan addresses | +|---|---| +| 1. vainfo enumerates VP8Version0_3 | Commit A — `RequestQueryConfigProfiles` enumeration block | +| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` | +| 3. ffmpeg-vaapi VP8 exit 0 | Commits A+B+C end-to-end; Clause 1 + Clauses 3-9 field mapping | +| 4. mpv DMA-BUF GL HW=SW byte-identical | Commits A+B+C decode correctness + Phase 3 SW JPEGs as Phase 7 anchor | +| 5. 3-codec regression | No regression risk — VP8 path is purely additive (NEW dispatcher case, no shared-state mutation in MPEG-2/HEVC/H.264 paths) | + +## Substrate state at Phase 4 close + +- Phase 1+2+3 commits at gitea (`ea2413e`, `898544a`, `fd3fce8`). +- Fork at iter2 tip `8d71e20` on noether; Phase 6 patches will land here. +- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter3_phase3/`. +- Memory rules carry forward; `feedback_fresnel_hostname` added. +- Phase 4 plan ready for sonnet-architect review (Phase 5).