Files
fresnel-fourier/phase8_iteration33_close.md
T
marfrit cd2d077cb6 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS
After VP8 fix landed, ran 5-codec libva-vs-kdirect anchor sweep.
All 5 codecs produce byte-identical libva and kdirect output:
  h264   sha=dd4f5f2d552c07bc
  hevc   sha=108f925bb6cbb6c9
  vp9    sha=cf35908ae0f9ab60
  vp8    sha=d3231e5b6c0ee10b
  mpeg2  sha=95c5905890c937d4

MPEG-2 HW output (libva and kdirect agree) differs from libavcodec
SW MPEG-2 by ~1 LSB per ~67 pixels — hantro IDCT precision artifact,
not a libva bug. Effectively 5/5 PASS for libva correctness contract.
2026-05-14 16:40:05 +00:00

85 lines
4.8 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
### Goal
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
### Result
| Codec | Status | sha-16 |
|---|---|---|
| H.264 10F | PASS | dd4f5f2d552c07bc |
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
| VP9 10F | PASS | cf35908ae0f9ab60 |
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
| MPEG-2 10F | PASS (libva==kdirect bit-exact) | libva=95c5905890c937d4 kdirect=95c5905890c937d4 |
**5 of 5 codecs PASS** for the libva-correctness contract (libva backend output bit-equal to kdirect reference path).
4 of 5 also bit-equal to SW reference. MPEG-2's HW decoder (hantro) differs from libavcodec SW MPEG-2 by ≤1 LSB per ~67 pixels (mean=0.01, max=1) — IDCT precision artifact, not a libva bug. Both libva and kdirect HW paths produce identical bytes through hantro.
### Root cause for VP8
Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:
```c
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
```
It uses this offset for:
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
- `dct_part_offset = first_part_offset + first_part_size`
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
```c
unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;
```
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
### Investigation path
- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
### Fix: α-30
In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:
```c
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
surface_object->slices_size += header_size;
}
```
VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
Source: backend commit `7e0848d`.
### MPEG-2 (closed in same iteration)
Surfaced as "fail" against SW, but libva-vs-kdirect comparison showed both HW paths produce BIT-EXACT EQUAL output (sha=95c5905890c937d4). MPEG-2 decode through libva works correctly. The HW-vs-SW byte divergence is a hantro IDCT precision difference vs libavcodec's exact IDCT (mean diff = 0.01 byte, max = 1 byte, ~1.5% of bytes nonzero). Not a libva bug, not a fixable bug at this level.
No iter34 needed for MPEG-2.
### Substrate state at iter33 close
- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
### Memory entries
- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).