Files
fresnel-fourier/phase4_iter3_plan.md
T
claude-noether 2918dda2e0 iter3 Phase 4: plan — 10 contract clauses, ~308-LOC patch, 3 commits
Locks the iter3 patch shape against Phase 3 verbatim cross-validator
payload + Phase 2 contract surface. 10 contract clauses cite kernel
UAPI + VAAPI + FFmpeg ref + Phase 3 byte anchors throughout.

Patch shape (mirrors iter1 ABCD pattern):

  Commit A: src/config.c — enumeration block + CreateConfig case +
            QueryConfigEntrypoints case (3 sites, +16 LOC, 1 file).
            After: vainfo lists VP8Version0_3.
  Commit B: NEW src/vp8.c (~200 LOC) + NEW src/vp8.h (~40 LOC) +
            meson.build sources/headers entries (+2). 3 files
            (2 new + 1 modified).
            After: vp8.o compiles standalone.
  Commit C: src/picture.c — codec_set_controls dispatch +
            codec_store_buffer 4 buffer-type cases + outer
            VAProbabilityDataBufferType case + BeginPicture
            per-frame reset (4 sites, +40 LOC) + src/surface.h
            params.vp8 union member (+10 LOC). 2 files modified.
            After: end-to-end VP8 decode through libva backend.

Total: ~308 LOC, 6 files (2 new + 4 modified), 3 commits.

Contract clauses summary:

  1. Submission shape — single VIDIOC_S_EXT_CTRLS, count=1, ctrl_class=
     V4L2_CTRL_CLASS_CODEC_STATELESS (0xf010000), id=0xa409c8,
     size=1232 bytes
  2. Local struct alloc + zero-init (memset clears all padding)
  3. Frame geometry + version + per-frame scalars (off-by-one
     num_dct_parts = num_of_partitions - 1)
  4. DPB timestamp resolution (3 refs: last/golden/alt; 0-sentinel
     when SURFACE() returns NULL — mirrors iter1 mpeg2.c pattern)
  5. Loop filter mapping (6 fields + 3 flag bits)
  6. Quantization base + delta derivation (segment 0 = base via
     iqmatrix[0][0]; deltas = iqmatrix[0][N+1] - iqmatrix[0][0]
     signed; per-segment quant_update[1..3] only when segmentation
     enabled)
  7. Segment fields (segment_probs direct copy; flags assembled +
     DELTA_VALUE_MODE set unconditionally per FFmpeg pattern)
  8. Entropy table mapping — 3 VAAPI sources (Picture: y_mode +
     uv_mode + mv_probs; ProbabilityData: coeff_probs[4][8][3][11]
     direct memcpy; IQMatrix: quant)
  9. Coder state + first-partition fields + flags (6 mainline-
     documented bits only; bit 0x40 + EXPERIMENTAL NOT replicated
     vs ffmpeg-v4l2-request-git anomaly; first_part_header_bits=0
     fallback documented as known fidelity gap)
  10. Final batched submission via v4l2_set_controls

Phase 5 review questions queued (7 items): quantization derivation
correctness, per-segment quant_update semantics, first_part_header_
bits=0 safety, probability buffer ordering, endianness, struct size
sizeof correctness, field-availability test-compile per memory
feedback_review_empirical_over_theoretical Direction 2.

Cross-cutting backlog deferred (B1, B3, B4, B5, B6, L3 inherited;
iter3-Q1 first_part_header_bits + iter3-flags 0x40 anomaly NEW).

Refs:
  phase0_findings_iter3.md (Phase 1 lock)
  phase2_iter3_situation.md (Phase 2 contract surface)
  phase3_iter3_baseline.md (Phase 3 verbatim payload anchors)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 20:39:52 +00:00

400 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 3 — Phase 4 (plan)
Locks the iter3 patch shape against the verbatim Phase 3 cross-validator payload and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2. Plan structure mirrors iter2's 10-clause template (`phase4_iter2_plan.md`).
Phase 3 baseline at `phase3_iter3_baseline.md` (commit `fd3fce8`) supplies the byte-level anchors. Phase 2 situation analysis at `phase2_iter3_situation.md` (commit `898544a`) supplies the bug list + contract surface read.
## Contract clauses
### Clause 1 — Submission shape (per-frame)
ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. Single control. No init-time device-wide menus (VP8 has no DECODE_MODE/START_CODE — Phase 0 V4L2 inventory + FFmpeg ref + Phase 3 strace all confirm).
```c
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME, /* 0xa409c8 */
.ptr = &frame, /* &v4l2_ctrl_vp8_frame */
.size = sizeof frame, /* 1232 bytes */
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
```
`ctrl_class` not explicitly set in the call; `v4l2_set_controls` (`src/v4l2.c`) wraps it with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verified `ctrl_class=0xf010000` (`V4L2_CTRL_CLASS_CODEC_STATELESS`) is what the kernel sees, matching iter1+iter2 patterns.
**Anchor**: Phase 3 baseline § Step 3.3 submission shape table.
### Clause 2 — Local struct allocation + zero-init
```c
int vp8_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VAPictureParameterBufferVP8 *picture =
&surface_object->params.vp8.picture;
VASliceParameterBufferVP8 *slice =
&surface_object->params.vp8.slice;
VAIQMatrixBufferVP8 *iqmatrix =
&surface_object->params.vp8.iqmatrix;
VAProbabilityDataBufferVP8 *probability =
&surface_object->params.vp8.probability;
bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set;
bool probability_set = surface_object->params.vp8.probability_set;
struct v4l2_ctrl_vp8_frame frame;
memset(&frame, 0, sizeof frame);
/* All padding fields must be zero per kernel contract; memset
* covers them. C99 designated initializers in FFmpeg ref achieve
* the same; explicit memset matches iter1 mpeg2.c style. */
...
}
```
Mirror iter1 mpeg2.c::mpeg2_set_controls (`src/mpeg2.c:95-118`) opening: extract VAAPI struct pointers + flag readouts, allocate kernel struct on stack, memset zero, populate fields below.
### Clause 3 — Frame geometry + version + per-frame scalars
```c
frame.width = picture->frame_width;
frame.height = picture->frame_height;
frame.horizontal_scale = 0; /* not exposed by VAAPI; FFmpeg also hardcodes 0 */
frame.vertical_scale = 0;
frame.version = picture->pic_fields.bits.version;
frame.prob_skip_false = picture->prob_skip_false;
frame.prob_intra = picture->prob_intra;
frame.prob_last = picture->prob_last;
frame.prob_gf = picture->prob_gf;
frame.num_dct_parts = slice->num_of_partitions - 1; /* off-by-one per Phase 3 Q2 */
```
**Anchors**:
- VAAPI `va_dec_vp8.h:71-160` (PictureParameterBuffer)
- Phase 3 frame-1 keyframe: width=1280, height=720, version=0, prob_skip=255 (matches mpv/FFmpeg parser output)
- Phase 3 Q2 resolution: `num_dct_parts = num_of_partitions - 1` (BBB inter: VAAPI=2 → kernel=1)
### Clause 4 — DPB timestamp resolution (3 references)
```c
struct object_surface *last_ref =
SURFACE(driver_data, picture->last_ref_frame);
struct object_surface *golden_ref =
SURFACE(driver_data, picture->golden_ref_frame);
struct object_surface *alt_ref =
SURFACE(driver_data, picture->alt_ref_frame);
if (last_ref != NULL)
frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref != NULL)
frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref != NULL)
frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
```
Mirrors iter1 mpeg2.c::mpeg2_set_controls forward/backward ref pattern (`src/mpeg2.c:146-156`). For VA_INVALID_SURFACE (key frame: all three refs invalid), `SURFACE()` returns NULL and timestamps stay 0 (memset default).
**Anchors**:
- Phase 3 Q3+Q6 resolution: keyframe writes `last_ts=golden_ts=alt_ts=0`; inter frames write all non-zero
- Phase 3 frame-2 inter: `last_ts=5000, golden_ts=11000, alt_ts=11000` — confirms timestamps are arbitrary (not nanosecond-real-time, just stable VASurfaceID-derived markers)
### Clause 5 — Loop filter mapping
```c
for (i = 0; i < 4; i++) {
frame.lf.ref_frm_delta[i] = picture->loop_filter_deltas_ref_frame[i];
frame.lf.mb_mode_delta[i] = picture->loop_filter_deltas_mode[i];
}
frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level;
frame.lf.level = picture->loop_filter_level[0]; /* base segment */
if (picture->pic_fields.bits.loop_filter_adj_enable)
frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE;
if (picture->pic_fields.bits.mode_ref_lf_delta_update)
frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE;
if (picture->pic_fields.bits.filter_type)
frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE;
```
**Anchors**:
- Phase 3 keyframe: `ref_frm_delta=(2,0,-2,-2)`, `mb_mode_delta=(4,-2,2,4)`, `sharp=0`, `level=1`, `flags=0x03 (ADJ_ENABLE|DELTA_UPDATE)`
- Phase 3 inter: same deltas (BBB-stable), `level=15`, `flags=0x01 (ADJ_ENABLE only)` — DELTA_UPDATE only set on keyframes
- Phase 3 Q5: FILTER_TYPE_SIMPLE bit not set on any captured frame; VAAPI's `filter_type=0` for BBB → flag clear ✓
### Clause 6 — Quantization base + delta derivation
VAAPI conveys per-segment effective Q indices in `iqmatrix->quantization_index[4][6]` (segment × component-index). Components: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5).
Kernel takes a base set + per-component deltas (in `quant`) + per-segment overrides (in `segment.quant_update[]`).
For segment 0 (always present):
```c
if (iqmatrix_set) {
frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0];
frame.quant.y_dc_delta = (s8)(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (s8)(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (s8)(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (s8)(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (s8)(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
}
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
* default for fixtures with implicit-zero deltas (most VP8 streams,
* including BBB). */
```
For segments 1..3 (only used when segmentation enabled):
```c
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
for (i = 1; i < 4; i++)
frame.segment.quant_update[i] = (s8)
(iqmatrix->quantization_index[i][0] -
iqmatrix->quantization_index[0][0]);
}
/* For BBB (segmentation disabled), all quant_update[] stay zero
* from memset, matching Phase 3 verbatim payload. */
```
**Anchors**:
- Phase 3 keyframe: `y_ac_qi=8, all deltas=0` → BBB has uniform Q across components for segment 0
- Phase 3 inter: `y_ac_qi=122, all deltas=0` (frame 2); `y_ac_qi=109, all deltas=0` (frame 4) — all-zero deltas confirmed throughout BBB
- Predicted VAAPI behavior: `iqmatrix->quantization_index[0][0..5]` should all equal the same value (e.g. all 8 for keyframe); subtraction yields zero → matches Phase 3 verbatim
### Clause 7 — Segment fields
```c
for (i = 0; i < 3; i++)
frame.segment.segment_probs[i] = picture->mb_segment_tree_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED;
if (picture->pic_fields.bits.update_mb_segmentation_map)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.update_segment_feature_data)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA;
/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta; FFmpeg sets
* unconditionally per !s->segmentation.absolute_vals (default).
* Match FFmpeg pattern: set unconditionally. Kernel ignores when
* ENABLED bit is clear (BBB case). */
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
/* segment.lf_update[] populated only when segmentation enabled */
if (picture->pic_fields.bits.segmentation_enabled) {
for (i = 0; i < 4; i++)
frame.segment.lf_update[i] = (s8)
(picture->loop_filter_level[i] -
picture->loop_filter_level[0]);
}
```
**Anchors**:
- Phase 3 every captured frame: `segment.flags = 0x08 = DELTA_VALUE_MODE` (BBB segmentation disabled, FFmpeg sets DVM unconditionally)
- Phase 3 every captured frame: `segment_probs = (0,0,0)`, `quant_update = (0,0,0,0)`, `lf_update = (0,0,0,0)` — all zero (BBB segmentation disabled)
### Clause 8 — Entropy table mapping (3 sources)
VAAPI splits VP8 entropy across THREE buffers: PictureParameterBuffer (mode + mv probs), ProbabilityDataBuffer (coeff_probs), and IQMatrix (quantization). The libva backend assembles into kernel's single `v4l2_vp8_entropy` struct.
```c
for (i = 0; i < 4; i++)
frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i];
for (i = 0; i < 3; i++)
frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i];
for (i = 0; i < 2; i++)
for (j = 0; j < 19; j++)
frame.entropy.mv_probs[i][j] = picture->mv_probs[i][j];
if (probability_set) {
/* VAAPI's [4][8][3][11] layout matches kernel's exactly; direct memcpy */
memcpy(frame.entropy.coeff_probs,
probability->dct_coeff_probs,
sizeof frame.entropy.coeff_probs);
}
/* If probability_set==false: leave coeff_probs zero from memset.
* Most consumers will send VAProbabilityDataBuffer per frame; if not,
* kernel hantro driver re-derives from default tables. Phase 5
* review will flag this as a fidelity gap if Phase 7 mpv consumer
* doesn't send probability buffers. */
```
**Anchors**:
- Phase 3 keyframe: `y_mode_probs=(145,156,163,128)` (FFmpeg keyframe const), `uv_mode_probs=(142,114,183)` (FFmpeg keyframe const). When VAAPI's mpv consumer also writes these constants for keyframes, byte-equality holds.
- Phase 3 inter frame 2: `y_mode_probs=(3,1,128,1)`, `uv_mode_probs=(162,101,204)` — parser-derived, varies per frame.
- Phase 3 entropy hash (sha1-16 prefix): keyframe `8b2fdae200eb193f`, inter changes — confirms entropy state propagates per-frame; iter3 backend must wire the per-frame ProbabilityDataBuffer through to coeff_probs.
### Clause 9 — Coder state + first-partition fields + flags
```c
frame.coder_state.range = picture->bool_coder_ctx.range;
frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* coder_state.padding stays zero from memset */
frame.first_part_size = slice->partition_size[0];
frame.first_part_header_bits = 0;
/* first_part_header_bits: VAAPI doesn't expose this directly. FFmpeg
* derives from internal parser bit-cursor state. Phase 3 keyframe
* empirical = 6550, inter ranges 86..254. Leaving 0 here is a known
* fidelity gap; kernel hantro_vp8.c re-parses the uncompressed header
* and may not depend on this field for correctness. Phase 7 byte-
* compare will reveal whether Phase 5 review needs to re-open this
* with bitstream-side derivation (closest VAAPI analog: slice->
* macroblock_offset, but it's the MB-data offset not header-bits). */
for (i = 0; i < 8; i++)
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
if (!picture->pic_fields.bits.key_frame)
frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME;
/* VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec */
frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME;
/* Force unconditional per Phase 3 Q4: BBB has no alt-ref invisible
* frames in iter3 scope; document as known fidelity gap for future
* iters covering invisible-frame streams. */
if (picture->pic_fields.bits.mb_no_coeff_skip)
frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF;
if (picture->pic_fields.bits.sign_bias_golden)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.sign_bias_alternate)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT;
/* EXPERIMENTAL bit (0x02) and bit 0x40 NOT set. Phase 3 baseline
* notes ffmpeg-v4l2-request-git sets these for unclear reasons;
* mainline UAPI defines neither for our use case (BBB profile=0).
* Kernel hantro_vp8.c only inspects KEY_FRAME bit. */
```
**Anchors**:
- Phase 3 keyframe: `coder_state=(248,133,2)`, `first_part_size=22742`, `dct_part_sizes=(277872,0,...,0)` (`277872` is huge — note `partition_size[]` may overflow u32 for very-large partitions; verified BBB partitions fit u32 by full Phase 3 capture; expected within u32 range for 1280×720 normal-quality content)
- Phase 3 keyframe: `flags=0x0d = KEY|SHOW|NOSKIP` — libva backend produces 0x0d ✓
- Phase 3 inter: `flags=0x66` (FFmpeg) — libva backend produces e.g. 0x04 (SHOW only) or 0x24 (SHOW|SBA) depending on VAAPI sign_bias bits. Phase 7 byte-compare uses field-level, not whole-flags, equality.
### Clause 10 — Final batched submission
```c
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME,
.ptr = &frame,
.size = sizeof frame,
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
```
Returns the dispatcher contract per `picture.c::codec_set_controls` (negative rc on failure → caller wraps with `VA_STATUS_ERROR_OPERATION_FAILED`).
## Patch shape (commits)
iter3 implements as 4 commits (mirrors iter1 ABCD pattern):
### Commit A — `src/config.c`: enumeration + dispatch + entrypoints
3 sites:
1. `RequestQueryConfigProfiles` (after HEVC enumeration block, line ~160): add VP8 enumeration block probing `V4L2_PIX_FMT_VP8_FRAME` against single + MPLANE OUTPUT formats.
2. `RequestCreateConfig` (after iter2's HEVCMain case, line ~75): add `case VAProfileVP8Version0_3: break;` with comment block matching iter1+iter2 style.
3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP8Version0_3:` to existing fall-through case list.
Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` lists `VAProfileVP8Version0_3` on hantro env binding (criterion 1) but `vaCreateConfig` would fail at later use because no codec dispatcher.
### Commit B — NEW `src/vp8.c` + `src/vp8.h` + `src/meson.build` integration
Net-new files implementing `vp8_set_controls()` per Clauses 1-10 above. Plus meson.build sources/headers list updates to compile + link the new module.
Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson.build. 3 files (2 new + 1 modified). Build target after Commit B: vp8.o compiles standalone, but picture.c can't dispatch yet.
### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + per-frame buffer routing + union extension + per-frame reset
5 sites:
1. `picture.c:34-36` include block: add `#include "vp8.h"`.
2. `picture.c::codec_set_controls` (line ~218 after HEVCMain case): add VP8 dispatch case calling `vp8_set_controls`.
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityDataBufferType` with VP8 inner case.
4. `picture.c::RequestBeginPicture` (line ~300): add 2 reset lines for `params.vp8.iqmatrix_set = false;` and `params.vp8.probability_set = false;`.
5. `surface.h::object_surface::params` union (line ~112): insert vp8 struct after h265 (~10 LOC).
Predicted +50 LOC, 2 files modified.
Build target after Commit C: backend builds clean. mpv-vaapi VP8 decode should work end-to-end. Expected to satisfy criteria 1, 2, 3 of Phase 1.
### Commit D — `src/v4l2.c` (CONDITIONAL — only if v4l2_set_controls doesn't already fall through cleanly)
Phase 2 source-read showed v4l2_set_controls is fourcc-agnostic. **Likely no Commit D needed**; placeholder commit slot for fix-forward if Phase 6 build/runtime surfaces an unexpected ioctl plumbing issue (mirrors iter1 Commit D fix-forward).
## Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|:-:|:-:|:-:|:-:|
| `src/config.c` | | ✓ | +16 | A |
| `src/vp8.c` | ✓ | | +200 | B |
| `src/vp8.h` | ✓ | | +40 | B |
| `src/meson.build` | | ✓ | +2 | B |
| `src/picture.c` | | ✓ | +40 | C |
| `src/surface.h` | | ✓ | +10 | C |
**Total**: ~308 LOC, 6 files (2 new + 4 modified). 3 commits (A, B, C).
## Cross-cutting backlog (out of iter3 scope)
Items inherited from iter1+iter2 close, NOT touched in iter3:
- **B1** V4L2 device-discovery (per-boot device numbering shuffle) — workaround: re-verify per-boot via v4l2-ctl --info.
- **B3** picture.c BeginPicture profile-aware reset — currently writes byte 240 of union (h264.matrix_set) regardless of profile; iter3 adds vp8 resets but doesn't refactor the cross-profile cleanup.
- **B4** context.c log suppression for unsupported codec controls.
- **B5** mpeg2 vbv_buffer_size polish (iter1 S2 finding).
- **B6** h265 SPS bitstream-parse fidelity gap.
- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only (memory rule).
Plus iter3 NEW item:
- **iter3-Q1** `first_part_header_bits` derivation gap. Plan leaves 0; Phase 7 byte-compare may reveal kernel sensitivity. If so, future iter or Phase 4 loopback derives from VAAPI `slice->macroblock_offset` + entropy-header subtraction.
- **iter3-flags** ffmpeg-v4l2-request-git sets undocumented bit 0x40 in flags field; libva backend won't replicate. Phase 7 byte-compare will use FIELD-LEVEL equality, not whole-`flags` equality.
## Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer:
1. **Quantization derivation**: is the "compute deltas as iqmatrix[0][N+1] - iqmatrix[0][0]" formula correct for VAAPI's representation? Or does VAAPI store the deltas already (then we just copy-not-subtract)? Phase 3 verbatim shows all-zero deltas for BBB, but BBB doesn't exercise this code path meaningfully — non-zero deltas need a different fixture, not in iter3 scope.
2. **Per-segment quant_update[1..3] derivation**: when segmentation enabled, is the kernel's `quant_update[s]` an absolute Q index (clipped to 0..127) or a signed delta from segment 0's base? VP8 spec says delta in `delta_value_mode`; libva backend currently writes `iqmatrix[s][0] - iqmatrix[0][0]` which is signed delta. **Phase 5 should empirically verify against a segmentation-enabled test fixture if one is available**, OR document as known fidelity gap if not.
3. **`first_part_header_bits = 0` fallback**: is this actually safe for hantro? Reviewer should grep kernel `hantro_vp8.c` + `rkvdec_vp8.c` for any read of `first_part_header_bits` in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic.
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityDataBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
5. **Endianness assumption**: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.
6. **Struct size oversight**: Phase 2 implicitly assumed ~400 bytes; Phase 3 corrected to 1232. Phase 4 plan uses `sizeof frame` which is compile-time correct, but the `controls[0].size` field MUST equal that. Reviewer should verify the FFmpeg ref pattern: `controls[0].size = sizeof(controls->frame)` — yes, matches.
7. **Test compile field availability**: per memory `feedback_review_empirical_over_theoretical.md` Direction 2, every VAAPI field-name reference in this plan should be verified via `gcc -c` test-compile BEFORE Phase 6 starts. Reviewer should test-compile the field accesses listed in Clauses 3-9 (especially `picture->pic_fields.bits.{...}` which iter2 had a bug with `uniform_spacing_flag`).
## Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP8Version0_3 | Commit A — `RequestQueryConfigProfiles` enumeration block |
| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` |
| 3. ffmpeg-vaapi VP8 exit 0 | Commits A+B+C end-to-end; Clause 1 + Clauses 3-9 field mapping |
| 4. mpv DMA-BUF GL HW=SW byte-identical | Commits A+B+C decode correctness + Phase 3 SW JPEGs as Phase 7 anchor |
| 5. 3-codec regression | No regression risk — VP8 path is purely additive (NEW dispatcher case, no shared-state mutation in MPEG-2/HEVC/H.264 paths) |
## Substrate state at Phase 4 close
- Phase 1+2+3 commits at gitea (`ea2413e`, `898544a`, `fd3fce8`).
- Fork at iter2 tip `8d71e20` on noether; Phase 6 patches will land here.
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter3_phase3/`.
- Memory rules carry forward; `feedback_fresnel_hostname` added.
- Phase 4 plan ready for sonnet-architect review (Phase 5).