fresnel-fourier/phase8_iteration33_close.md

## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced

Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).

### Goal

Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.

### Result

| Codec | Status | sha-16 |
|---|---|---|
| H.264 10F | PASS | dd4f5f2d552c07bc |
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
| VP9 10F | PASS | cf35908ae0f9ab60 |
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
| MPEG-2 10F | PASS (libva==kdirect bit-exact) | libva=95c5905890c937d4 kdirect=95c5905890c937d4 |

**5 of 5 codecs PASS** for the libva-correctness contract (libva backend output bit-equal to kdirect reference path).

4 of 5 also bit-equal to SW reference. MPEG-2's HW decoder (hantro) differs from libavcodec SW MPEG-2 by ≤1 LSB per ~67 pixels (mean=0.01, max=1) — IDCT precision artifact, not a libva bug. Both libva and kdirect HW paths produce identical bytes through hantro.

### Root cause for VP8

Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:

```c
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
```

It uses this offset for:
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
- `dct_part_offset = first_part_offset + first_part_size`

So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).

ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:

```c
unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;
```

ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.

### Investigation path

- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.

### Fix: α-30

In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:

```c
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
    unsigned int header_size =
        surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
            10 : 3;
    memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
    surface_object->slices_size += header_size;
}
```

VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.

Source: backend commit `7e0848d`.

### MPEG-2 (closed in same iteration)

Surfaced as "fail" against SW, but libva-vs-kdirect comparison showed both HW paths produce BIT-EXACT EQUAL output (sha=95c5905890c937d4). MPEG-2 decode through libva works correctly. The HW-vs-SW byte divergence is a hantro IDCT precision difference vs libavcodec's exact IDCT (mean diff = 0.01 byte, max = 1 byte, ~1.5% of bytes nonzero). Not a libva bug, not a fixable bug at this level.

No iter34 needed for MPEG-2.

### Substrate state at iter33 close

- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.

### Memory entries

- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).