Files
fresnel-fourier/phase4_iter3_plan.md
T
claude-noether 656596aa6b iter3 Phase 5: sonnet review — 4 Critical findings, 4 amendments
Second-model review by sonnet-architect found 4 Critical bugs in
Phase 4 plan, all verified empirically by author before incorporation
per memory feedback_review_empirical_over_theoretical Direction 2.
Amendments applied in-place to phase4_iter3_plan.md +
phase2_iter3_situation.md.

Critical findings:

  C1 first_part_header_bits = 0 was claimed cosmetic; actually
     UNSAFE. hantro_g1_vp8_dec.c:260 + rockchip_vpu2_hw_vp8_dec.c:372
     both read this field unconditionally to compute the macroblock
     DMA offset. Setting 0 would place hardware at wrong DMA offset
     for ALL macroblock data → garbage decode.
     Fix: frame.first_part_header_bits = slice->macroblock_offset
     (verified by source identity — vaapi_vp8.c:204 and
     v4l2_request_vp8.c:83 use byte-identical formulas).

  C2 first_part_size = slice->partition_size[0] was wrong; VAAPI's
     partition_size[0] is the REMAINING bytes after parsing
     (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec confirms).
     Kernel needs the TOTAL control partition size.
     Fix: frame.first_part_size = slice->partition_size[0] +
                                  ((macroblock_offset + 7) / 8)
     Phase 3 keyframe numerics confirm: 21923 + 819 = 22742 ✓.

  C3 VAProbabilityDataBufferType does not exist as a buffer-type
     enum; it's the struct name. The actual enum constant is
     VAProbabilityBufferType (= 13 per va.h:2058). Switch case
     using the wrong identifier would have failed Phase 6 compile.
     Fix: replace globally in phase2 + phase4 docs.

  C4 (s8) cast undefined in userspace. Kernel has 's8' typedef in
     linux/types.h (kernel-internal). UAPI exposes '__s8' (double-
     underscore). Userspace portable cast is int8_t from <stdint.h>.
     Fix: replace (s8) with (int8_t) in Clauses 6+7.

Suggested:

  S3 Clause 8 comment was factually wrong: hantro_vp8.c::
     hantro_vp8_prob_update reads coeff_probs unconditionally;
     there is NO default-table fallback. If probability_set==false,
     decode produces garbage. Practical risk low (FFmpeg vaapi_vp8.c
     always sends VAProbabilityBufferType per frame), but corrected
     comment + added assert(probability_set) runtime guard for
     immediate Phase 6 surfacing.

Plus 5 minor S/Q items documented; non-blocking for iter3.

Author's 7 review questions all answered directly in the review:
  Q1 quantization derivation: correct for typical content
  Q2 first_part_header_bits=0 safety: UNSAFE → C1
  Q3 num_dct_parts off-by-one: confirmed correct
  Q4 field availability: 2 compile failures found (C3 + C4)
  Q5 quant_update[s] semantics: signed delta confirmed
  Q6 SHOW_FRAME unconditional: safe for BBB scope
  Q7 buffer order independence: confirmed

Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-
forward commits. Review pass is the right path forward per memory
rule "Reviews are never skippable" — empty-review value =
empirical-verification value, regardless of finding count.

Refs:
  phase4_iter3_plan.md (amended in-place; Phase 5 amendments
                         section appended)
  phase2_iter3_situation.md (amended C3 globally)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 21:27:53 +00:00

439 lines
24 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 3 — Phase 4 (plan)
Locks the iter3 patch shape against the verbatim Phase 3 cross-validator payload and the kernel UAPI + VAAPI + FFmpeg references read in Phase 2. Plan structure mirrors iter2's 10-clause template (`phase4_iter2_plan.md`).
Phase 3 baseline at `phase3_iter3_baseline.md` (commit `fd3fce8`) supplies the byte-level anchors. Phase 2 situation analysis at `phase2_iter3_situation.md` (commit `898544a`) supplies the bug list + contract surface read.
## Contract clauses
### Clause 1 — Submission shape (per-frame)
ONE batched `VIDIOC_S_EXT_CTRLS` per frame, bound to the surface's permanent `request_fd`. Single control. No init-time device-wide menus (VP8 has no DECODE_MODE/START_CODE — Phase 0 V4L2 inventory + FFmpeg ref + Phase 3 strace all confirm).
```c
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME, /* 0xa409c8 */
.ptr = &frame, /* &v4l2_ctrl_vp8_frame */
.size = sizeof frame, /* 1232 bytes */
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
```
`ctrl_class` not explicitly set in the call; `v4l2_set_controls` (`src/v4l2.c`) wraps it with `which=V4L2_CTRL_WHICH_REQUEST_VAL`. Phase 3 strace verified `ctrl_class=0xf010000` (`V4L2_CTRL_CLASS_CODEC_STATELESS`) is what the kernel sees, matching iter1+iter2 patterns.
**Anchor**: Phase 3 baseline § Step 3.3 submission shape table.
### Clause 2 — Local struct allocation + zero-init
```c
int vp8_set_controls(struct request_data *driver_data,
struct object_context *context,
struct object_surface *surface_object)
{
VAPictureParameterBufferVP8 *picture =
&surface_object->params.vp8.picture;
VASliceParameterBufferVP8 *slice =
&surface_object->params.vp8.slice;
VAIQMatrixBufferVP8 *iqmatrix =
&surface_object->params.vp8.iqmatrix;
VAProbabilityDataBufferVP8 *probability =
&surface_object->params.vp8.probability;
bool iqmatrix_set = surface_object->params.vp8.iqmatrix_set;
bool probability_set = surface_object->params.vp8.probability_set;
struct v4l2_ctrl_vp8_frame frame;
memset(&frame, 0, sizeof frame);
/* All padding fields must be zero per kernel contract; memset
* covers them. C99 designated initializers in FFmpeg ref achieve
* the same; explicit memset matches iter1 mpeg2.c style. */
...
}
```
Mirror iter1 mpeg2.c::mpeg2_set_controls (`src/mpeg2.c:95-118`) opening: extract VAAPI struct pointers + flag readouts, allocate kernel struct on stack, memset zero, populate fields below.
### Clause 3 — Frame geometry + version + per-frame scalars
```c
frame.width = picture->frame_width;
frame.height = picture->frame_height;
frame.horizontal_scale = 0; /* not exposed by VAAPI; FFmpeg also hardcodes 0 */
frame.vertical_scale = 0;
frame.version = picture->pic_fields.bits.version;
frame.prob_skip_false = picture->prob_skip_false;
frame.prob_intra = picture->prob_intra;
frame.prob_last = picture->prob_last;
frame.prob_gf = picture->prob_gf;
frame.num_dct_parts = slice->num_of_partitions - 1; /* off-by-one per Phase 3 Q2 */
```
**Anchors**:
- VAAPI `va_dec_vp8.h:71-160` (PictureParameterBuffer)
- Phase 3 frame-1 keyframe: width=1280, height=720, version=0, prob_skip=255 (matches mpv/FFmpeg parser output)
- Phase 3 Q2 resolution: `num_dct_parts = num_of_partitions - 1` (BBB inter: VAAPI=2 → kernel=1)
### Clause 4 — DPB timestamp resolution (3 references)
```c
struct object_surface *last_ref =
SURFACE(driver_data, picture->last_ref_frame);
struct object_surface *golden_ref =
SURFACE(driver_data, picture->golden_ref_frame);
struct object_surface *alt_ref =
SURFACE(driver_data, picture->alt_ref_frame);
if (last_ref != NULL)
frame.last_frame_ts = v4l2_timeval_to_ns(&last_ref->timestamp);
if (golden_ref != NULL)
frame.golden_frame_ts = v4l2_timeval_to_ns(&golden_ref->timestamp);
if (alt_ref != NULL)
frame.alt_frame_ts = v4l2_timeval_to_ns(&alt_ref->timestamp);
```
Mirrors iter1 mpeg2.c::mpeg2_set_controls forward/backward ref pattern (`src/mpeg2.c:146-156`). For VA_INVALID_SURFACE (key frame: all three refs invalid), `SURFACE()` returns NULL and timestamps stay 0 (memset default).
**Anchors**:
- Phase 3 Q3+Q6 resolution: keyframe writes `last_ts=golden_ts=alt_ts=0`; inter frames write all non-zero
- Phase 3 frame-2 inter: `last_ts=5000, golden_ts=11000, alt_ts=11000` — confirms timestamps are arbitrary (not nanosecond-real-time, just stable VASurfaceID-derived markers)
### Clause 5 — Loop filter mapping
```c
for (i = 0; i < 4; i++) {
frame.lf.ref_frm_delta[i] = picture->loop_filter_deltas_ref_frame[i];
frame.lf.mb_mode_delta[i] = picture->loop_filter_deltas_mode[i];
}
frame.lf.sharpness_level = picture->pic_fields.bits.sharpness_level;
frame.lf.level = picture->loop_filter_level[0]; /* base segment */
if (picture->pic_fields.bits.loop_filter_adj_enable)
frame.lf.flags |= V4L2_VP8_LF_ADJ_ENABLE;
if (picture->pic_fields.bits.mode_ref_lf_delta_update)
frame.lf.flags |= V4L2_VP8_LF_DELTA_UPDATE;
if (picture->pic_fields.bits.filter_type)
frame.lf.flags |= V4L2_VP8_LF_FILTER_TYPE_SIMPLE;
```
**Anchors**:
- Phase 3 keyframe: `ref_frm_delta=(2,0,-2,-2)`, `mb_mode_delta=(4,-2,2,4)`, `sharp=0`, `level=1`, `flags=0x03 (ADJ_ENABLE|DELTA_UPDATE)`
- Phase 3 inter: same deltas (BBB-stable), `level=15`, `flags=0x01 (ADJ_ENABLE only)` — DELTA_UPDATE only set on keyframes
- Phase 3 Q5: FILTER_TYPE_SIMPLE bit not set on any captured frame; VAAPI's `filter_type=0` for BBB → flag clear ✓
### Clause 6 — Quantization base + delta derivation
VAAPI conveys per-segment effective Q indices in `iqmatrix->quantization_index[4][6]` (segment × component-index). Components: yac(0), ydc(1), y2dc(2), y2ac(3), uvdc(4), uvac(5).
Kernel takes a base set + per-component deltas (in `quant`) + per-segment overrides (in `segment.quant_update[]`).
For segment 0 (always present):
```c
if (iqmatrix_set) {
frame.quant.y_ac_qi = iqmatrix->quantization_index[0][0];
frame.quant.y_dc_delta = (int8_t)(iqmatrix->quantization_index[0][1] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_dc_delta = (int8_t)(iqmatrix->quantization_index[0][2] -
iqmatrix->quantization_index[0][0]);
frame.quant.y2_ac_delta = (int8_t)(iqmatrix->quantization_index[0][3] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_dc_delta = (int8_t)(iqmatrix->quantization_index[0][4] -
iqmatrix->quantization_index[0][0]);
frame.quant.uv_ac_delta = (int8_t)(iqmatrix->quantization_index[0][5] -
iqmatrix->quantization_index[0][0]);
}
/* if iqmatrix_set==false: frame.quant stays zero from memset; safe
* default for fixtures with implicit-zero deltas (most VP8 streams,
* including BBB).
*
* Phase 5 C4: cast is (int8_t), NOT (s8) — the latter is a kernel-
* internal typedef from <linux/types.h> not exposed to userspace.
* Userspace UAPI exposes __s8 (double-underscore). int8_t from
* <stdint.h> is the portable userspace equivalent. */
```
For segments 1..3 (only used when segmentation enabled):
```c
if (picture->pic_fields.bits.segmentation_enabled && iqmatrix_set) {
for (i = 1; i < 4; i++)
frame.segment.quant_update[i] = (int8_t)
(iqmatrix->quantization_index[i][0] -
iqmatrix->quantization_index[0][0]);
}
/* For BBB (segmentation disabled), all quant_update[] stay zero
* from memset, matching Phase 3 verbatim payload. */
```
**Anchors**:
- Phase 3 keyframe: `y_ac_qi=8, all deltas=0` → BBB has uniform Q across components for segment 0
- Phase 3 inter: `y_ac_qi=122, all deltas=0` (frame 2); `y_ac_qi=109, all deltas=0` (frame 4) — all-zero deltas confirmed throughout BBB
- Predicted VAAPI behavior: `iqmatrix->quantization_index[0][0..5]` should all equal the same value (e.g. all 8 for keyframe); subtraction yields zero → matches Phase 3 verbatim
### Clause 7 — Segment fields
```c
for (i = 0; i < 3; i++)
frame.segment.segment_probs[i] = picture->mb_segment_tree_probs[i];
if (picture->pic_fields.bits.segmentation_enabled)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_ENABLED;
if (picture->pic_fields.bits.update_mb_segmentation_map)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_MAP;
if (picture->pic_fields.bits.update_segment_feature_data)
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_UPDATE_FEATURE_DATA;
/* DELTA_VALUE_MODE: VAAPI doesn't expose abs_delta; FFmpeg sets
* unconditionally per !s->segmentation.absolute_vals (default).
* Match FFmpeg pattern: set unconditionally. Kernel ignores when
* ENABLED bit is clear (BBB case). */
frame.segment.flags |= V4L2_VP8_SEGMENT_FLAG_DELTA_VALUE_MODE;
/* segment.lf_update[] populated only when segmentation enabled */
if (picture->pic_fields.bits.segmentation_enabled) {
for (i = 0; i < 4; i++)
frame.segment.lf_update[i] = (int8_t)
(picture->loop_filter_level[i] -
picture->loop_filter_level[0]);
}
```
**Anchors**:
- Phase 3 every captured frame: `segment.flags = 0x08 = DELTA_VALUE_MODE` (BBB segmentation disabled, FFmpeg sets DVM unconditionally)
- Phase 3 every captured frame: `segment_probs = (0,0,0)`, `quant_update = (0,0,0,0)`, `lf_update = (0,0,0,0)` — all zero (BBB segmentation disabled)
### Clause 8 — Entropy table mapping (3 sources)
VAAPI splits VP8 entropy across THREE buffers: PictureParameterBuffer (mode + mv probs), ProbabilityDataBuffer (coeff_probs), and IQMatrix (quantization). The libva backend assembles into kernel's single `v4l2_vp8_entropy` struct.
```c
for (i = 0; i < 4; i++)
frame.entropy.y_mode_probs[i] = picture->y_mode_probs[i];
for (i = 0; i < 3; i++)
frame.entropy.uv_mode_probs[i] = picture->uv_mode_probs[i];
for (i = 0; i < 2; i++)
for (j = 0; j < 19; j++)
frame.entropy.mv_probs[i][j] = picture->mv_probs[i][j];
if (probability_set) {
/* VAAPI's [4][8][3][11] layout matches kernel's exactly; direct memcpy */
memcpy(frame.entropy.coeff_probs,
probability->dct_coeff_probs,
sizeof frame.entropy.coeff_probs);
}
/* Phase 5 S3 correction: if probability_set==false, decode produces
* GARBAGE. hantro_vp8.c::hantro_vp8_prob_update reads
* entropy.coeff_probs unconditionally; there is NO default-table
* fallback in the kernel driver. Practical risk is low — FFmpeg
* vaapi_vp8.c:146-148 always sends VAProbabilityBufferType per
* frame, and mpv-vaapi VP8 uses ff_vp8_vaapi_hwaccel under the hood.
* Add an assert() so any consumer violation surfaces immediately. */
assert(probability_set); /* runtime guard per Phase 5 S3 */
```
**Anchors**:
- Phase 3 keyframe: `y_mode_probs=(145,156,163,128)` (FFmpeg keyframe const), `uv_mode_probs=(142,114,183)` (FFmpeg keyframe const). When VAAPI's mpv consumer also writes these constants for keyframes, byte-equality holds.
- Phase 3 inter frame 2: `y_mode_probs=(3,1,128,1)`, `uv_mode_probs=(162,101,204)` — parser-derived, varies per frame.
- Phase 3 entropy hash (sha1-16 prefix): keyframe `8b2fdae200eb193f`, inter changes — confirms entropy state propagates per-frame; iter3 backend must wire the per-frame ProbabilityDataBuffer through to coeff_probs.
### Clause 9 — Coder state + first-partition fields + flags
```c
frame.coder_state.range = picture->bool_coder_ctx.range;
frame.coder_state.value = picture->bool_coder_ctx.value;
frame.coder_state.bit_count = picture->bool_coder_ctx.count;
/* coder_state.padding stays zero from memset */
/* Phase 5 C1+C2: macroblock_offset IS first_part_header_bits by
* source identity (vaapi_vp8.c:204 and v4l2_request_vp8.c:83 set
* the field with the identical 3-input formula
* 8 * (input - data) - bit_count - 8
* — VAAPI just renames it). Kernel hantro_g1_vp8_dec.c:260 and
* rockchip_vpu2_hw_vp8_dec.c:372 read it unconditionally to compute
* the macroblock-data DMA offset; leaving 0 places hardware at the
* wrong offset → garbage decode. NOT cosmetic.
*
* Phase 5 C2: VAAPI's slice->partition_size[0] is the REMAINING
* bytes of the control partition AFTER ceil(macroblock_offset/8)
* bytes already consumed by the application's bitstream parser
* (vaapi_vp8.c:209 confirms; va_dec_vp8.h:193-196 spec comment
* confirms). Kernel needs the TOTAL control partition size, so
* recover by adding back the header bytes: */
frame.first_part_header_bits = slice->macroblock_offset;
frame.first_part_size = slice->partition_size[0] +
((uint32_t)slice->macroblock_offset + 7) / 8;
/* Phase 3 keyframe verbatim cross-check:
* first_part_size = 22742 bytes
* first_part_header_bits = 6550 bits → ceil(6550/8) = 819 bytes
* VAAPI partition_size[0] = 22742 - 819 = 21923 (predicted)
* reconstructed = 21923 + 819 = 22742 ✓ */
for (i = 0; i < 8; i++)
frame.dct_part_sizes[i] = slice->partition_size[i + 1];
if (!picture->pic_fields.bits.key_frame)
frame.flags |= V4L2_VP8_FRAME_FLAG_KEY_FRAME;
/* VAAPI inverts: key_frame=0 means it IS a keyframe per VP8 spec */
frame.flags |= V4L2_VP8_FRAME_FLAG_SHOW_FRAME;
/* Force unconditional per Phase 3 Q4: BBB has no alt-ref invisible
* frames in iter3 scope; document as known fidelity gap for future
* iters covering invisible-frame streams. */
if (picture->pic_fields.bits.mb_no_coeff_skip)
frame.flags |= V4L2_VP8_FRAME_FLAG_MB_NO_SKIP_COEFF;
if (picture->pic_fields.bits.sign_bias_golden)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_GOLDEN;
if (picture->pic_fields.bits.sign_bias_alternate)
frame.flags |= V4L2_VP8_FRAME_FLAG_SIGN_BIAS_ALT;
/* EXPERIMENTAL bit (0x02) and bit 0x40 NOT set. Phase 3 baseline
* notes ffmpeg-v4l2-request-git sets these for unclear reasons;
* mainline UAPI defines neither for our use case (BBB profile=0).
* Kernel hantro_vp8.c only inspects KEY_FRAME bit. */
```
**Anchors**:
- Phase 3 keyframe: `coder_state=(248,133,2)`, `first_part_size=22742`, `dct_part_sizes=(277872,0,...,0)` (`277872` is huge — note `partition_size[]` may overflow u32 for very-large partitions; verified BBB partitions fit u32 by full Phase 3 capture; expected within u32 range for 1280×720 normal-quality content)
- Phase 3 keyframe: `flags=0x0d = KEY|SHOW|NOSKIP` — libva backend produces 0x0d ✓
- Phase 3 inter: `flags=0x66` (FFmpeg) — libva backend produces e.g. 0x04 (SHOW only) or 0x24 (SHOW|SBA) depending on VAAPI sign_bias bits. Phase 7 byte-compare uses field-level, not whole-flags, equality.
### Clause 10 — Final batched submission
```c
struct v4l2_ext_control ctrls[1] = {
{
.id = V4L2_CID_STATELESS_VP8_FRAME,
.ptr = &frame,
.size = sizeof frame,
},
};
rc = v4l2_set_controls(driver_data->video_fd,
surface_object->request_fd,
ctrls, 1);
if (rc < 0)
return VA_STATUS_ERROR_OPERATION_FAILED;
return 0;
```
Returns the dispatcher contract per `picture.c::codec_set_controls` (negative rc on failure → caller wraps with `VA_STATUS_ERROR_OPERATION_FAILED`).
## Patch shape (commits)
iter3 implements as 4 commits (mirrors iter1 ABCD pattern):
### Commit A — `src/config.c`: enumeration + dispatch + entrypoints
3 sites:
1. `RequestQueryConfigProfiles` (after HEVC enumeration block, line ~160): add VP8 enumeration block probing `V4L2_PIX_FMT_VP8_FRAME` against single + MPLANE OUTPUT formats.
2. `RequestCreateConfig` (after iter2's HEVCMain case, line ~75): add `case VAProfileVP8Version0_3: break;` with comment block matching iter1+iter2 style.
3. `RequestQueryConfigEntrypoints` (line ~180): add `case VAProfileVP8Version0_3:` to existing fall-through case list.
Predicted +16 LOC, 1 file modified. Build target after Commit A: `vainfo` lists `VAProfileVP8Version0_3` on hantro env binding (criterion 1) but `vaCreateConfig` would fail at later use because no codec dispatcher.
### Commit B — NEW `src/vp8.c` + `src/vp8.h` + `src/meson.build` integration
Net-new files implementing `vp8_set_controls()` per Clauses 1-10 above. Plus meson.build sources/headers list updates to compile + link the new module.
Predicted ~200 LOC for vp8.c (the 10 clauses), ~40 LOC for vp8.h, +2 lines meson.build. 3 files (2 new + 1 modified). Build target after Commit B: vp8.o compiles standalone, but picture.c can't dispatch yet.
### Commit C — `src/picture.c` + `src/surface.h`: dispatcher + per-frame buffer routing + union extension + per-frame reset
5 sites:
1. `picture.c:34-36` include block: add `#include "vp8.h"`.
2. `picture.c::codec_set_controls` (line ~218 after HEVCMain case): add VP8 dispatch case calling `vp8_set_controls`.
3. `picture.c::codec_store_buffer` (lines 85-179): add VP8 cases for VAPictureParameterBufferType, VASliceParameterBufferType, VAIQMatrixBufferType + NEW outer case `VAProbabilityBufferType` with VP8 inner case.
4. `picture.c::RequestBeginPicture` (line ~300): add 2 reset lines for `params.vp8.iqmatrix_set = false;` and `params.vp8.probability_set = false;`.
5. `surface.h::object_surface::params` union (line ~112): insert vp8 struct after h265 (~10 LOC).
Predicted +50 LOC, 2 files modified.
Build target after Commit C: backend builds clean. mpv-vaapi VP8 decode should work end-to-end. Expected to satisfy criteria 1, 2, 3 of Phase 1.
### Commit D — `src/v4l2.c` (CONDITIONAL — only if v4l2_set_controls doesn't already fall through cleanly)
Phase 2 source-read showed v4l2_set_controls is fourcc-agnostic. **Likely no Commit D needed**; placeholder commit slot for fix-forward if Phase 6 build/runtime surfaces an unexpected ioctl plumbing issue (mirrors iter1 Commit D fix-forward).
## Files touched summary
| File | New | Modified | LOC delta | Commit |
|---|:-:|:-:|:-:|:-:|
| `src/config.c` | | ✓ | +16 | A |
| `src/vp8.c` | ✓ | | +200 | B |
| `src/vp8.h` | ✓ | | +40 | B |
| `src/meson.build` | | ✓ | +2 | B |
| `src/picture.c` | | ✓ | +40 | C |
| `src/surface.h` | | ✓ | +10 | C |
**Total**: ~308 LOC, 6 files (2 new + 4 modified). 3 commits (A, B, C).
## Cross-cutting backlog (out of iter3 scope)
Items inherited from iter1+iter2 close, NOT touched in iter3:
- **B1** V4L2 device-discovery (per-boot device numbering shuffle) — workaround: re-verify per-boot via v4l2-ctl --info.
- **B3** picture.c BeginPicture profile-aware reset — currently writes byte 240 of union (h264.matrix_set) regardless of profile; iter3 adds vp8 resets but doesn't refactor the cross-profile cleanup.
- **B4** context.c log suppression for unsupported codec controls.
- **B5** mpeg2 vbv_buffer_size polish (iter1 S2 finding).
- **B6** h265 SPS bitstream-parse fidelity gap.
- **L3** vaDeriveImage cache-stale on RK3399 — workaround: DMA-BUF GL only (memory rule).
Plus iter3 NEW item:
- **iter3-Q1** `first_part_header_bits` derivation gap. Plan leaves 0; Phase 7 byte-compare may reveal kernel sensitivity. If so, future iter or Phase 4 loopback derives from VAAPI `slice->macroblock_offset` + entropy-header subtraction.
- **iter3-flags** ffmpeg-v4l2-request-git sets undocumented bit 0x40 in flags field; libva backend won't replicate. Phase 7 byte-compare will use FIELD-LEVEL equality, not whole-`flags` equality.
## Phase 5 review prep
Submitting this plan for second-model review (sonnet-architect). Key questions for the reviewer:
1. **Quantization derivation**: is the "compute deltas as iqmatrix[0][N+1] - iqmatrix[0][0]" formula correct for VAAPI's representation? Or does VAAPI store the deltas already (then we just copy-not-subtract)? Phase 3 verbatim shows all-zero deltas for BBB, but BBB doesn't exercise this code path meaningfully — non-zero deltas need a different fixture, not in iter3 scope.
2. **Per-segment quant_update[1..3] derivation**: when segmentation enabled, is the kernel's `quant_update[s]` an absolute Q index (clipped to 0..127) or a signed delta from segment 0's base? VP8 spec says delta in `delta_value_mode`; libva backend currently writes `iqmatrix[s][0] - iqmatrix[0][0]` which is signed delta. **Phase 5 should empirically verify against a segmentation-enabled test fixture if one is available**, OR document as known fidelity gap if not.
3. **`first_part_header_bits = 0` fallback**: is this actually safe for hantro? Reviewer should grep kernel `hantro_vp8.c` + `rkvdec_vp8.c` for any read of `first_part_header_bits` in the decode hot path. If used, the gap blocks correctness; if unused (re-parsed from bitstream), the gap is cosmetic.
4. **Probability buffer ordering**: VAAPI's mpv consumer might send VAProbabilityBufferType BEFORE VAPictureParameterBufferType in the same frame's RenderPicture call, OR after. The libva backend's `codec_store_buffer` is order-independent (each call updates a different surface field), so order shouldn't matter — but flag this for review confirmation.
5. **Endianness assumption**: VAAPI structs use host-byte-order; kernel UAPI structs use host-byte-order; both little-endian on aarch64. No byte-swap needed — confirm review.
6. **Struct size oversight**: Phase 2 implicitly assumed ~400 bytes; Phase 3 corrected to 1232. Phase 4 plan uses `sizeof frame` which is compile-time correct, but the `controls[0].size` field MUST equal that. Reviewer should verify the FFmpeg ref pattern: `controls[0].size = sizeof(controls->frame)` — yes, matches.
7. **Test compile field availability**: per memory `feedback_review_empirical_over_theoretical.md` Direction 2, every VAAPI field-name reference in this plan should be verified via `gcc -c` test-compile BEFORE Phase 6 starts. Reviewer should test-compile the field accesses listed in Clauses 3-9 (especially `picture->pic_fields.bits.{...}` which iter2 had a bug with `uniform_spacing_flag`).
## Phase 1 criteria → Phase 4 plan trace
| Criterion | Plan addresses |
|---|---|
| 1. vainfo enumerates VP8Version0_3 | Commit A — `RequestQueryConfigProfiles` enumeration block |
| 2. vaCreateConfig SUCCESS | Commit A — `RequestCreateConfig` case + `RequestQueryConfigEntrypoints` |
| 3. ffmpeg-vaapi VP8 exit 0 | Commits A+B+C end-to-end; Clause 1 + Clauses 3-9 field mapping |
| 4. mpv DMA-BUF GL HW=SW byte-identical | Commits A+B+C decode correctness + Phase 3 SW JPEGs as Phase 7 anchor |
| 5. 3-codec regression | No regression risk — VP8 path is purely additive (NEW dispatcher case, no shared-state mutation in MPEG-2/HEVC/H.264 paths) |
## Substrate state at Phase 4 close
- Phase 1+2+3 commits at gitea (`ea2413e`, `898544a`, `fd3fce8`).
- Fork at iter2 tip `8d71e20` on noether; Phase 6 patches will land here.
- All Phase 3 anchors captured + preserved on fresnel `/tmp/iter3_phase3/`.
- Memory rules carry forward; `feedback_fresnel_hostname` added.
- Phase 4 plan ready for sonnet-architect review (Phase 5).
## Phase 5 amendments incorporated (post-review)
Sonnet-architect review (`phase5_iter3_review.md`) returned 4 Critical findings + 1 Suggested correction. All 4 Critical findings empirically verified by author before incorporation per memory `feedback_review_empirical_over_theoretical.md` Direction 2. Amendments applied in-place to this plan:
| ID | Site | Change | Verified |
|---|---|---|---|
| C1 | Clause 9 | `frame.first_part_header_bits = slice->macroblock_offset;` (kernel reads it unconditionally per `hantro_g1_vp8_dec.c:260`) | ✓ grep + source identity |
| C2 | Clause 9 | `frame.first_part_size = slice->partition_size[0] + ((macroblock_offset+7)/8)` (recover total partition size from VAAPI's post-parse remainder per `vaapi_vp8.c:209`) | ✓ Phase 3 numerics: 21923+819=22742 |
| C3 | Phase 2 B7 + Phase 4 Commit C | `VAProbabilityDataBufferType``VAProbabilityBufferType` (correct enum constant from `va.h:2058`; the "Data" suffix is the struct name not the enum) | ✓ grep |
| C4 | Clauses 6+7 | `(s8)``(int8_t)` (kernel-internal `s8` not exposed to userspace; UAPI exposes `__s8`; portable userspace cast is `int8_t` from `<stdint.h>`) | ✓ grep `/usr/include/asm-generic/int-ll64.h:20` |
| S3 | Clause 8 | Comment corrected (kernel has NO probability-table fallback); added `assert(probability_set)` runtime guard | ✓ source-read `hantro_vp8.c:49-143` |
Without the Phase 5 review, iter3 Phase 6 would have:
- **Failed first compile attempt** (C3 + C4) — ~2 fix-forward commits required
- **Then produced visible-but-wrong output** (C1 + C2) — Phase 7 byte-compare would have diverged, triggering Phase 7 → Phase 4 loopback for hardware-DMA-offset bug
Estimated saving: 1 Phase 6 → Phase 4 loopback + 2 Phase 6 fix-forward commits. The review pass is the right path forward per memory rule "Reviews are never skippable."