iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS

ffmpeg-vaapi strips the VP8 uncompressed frame header before
submitting VASliceData. Hantro hard-codes first_part_offset = 10
or 3 based on keyframe flag. libva must prepend matching placeholder
bytes. Backend commit 7e0848d.

3-codec rkvdec anchors unchanged (H264 + HEVC + VP9 all PASS).
VP8 newly PASS through hantro (env-override LIBVA_V4L2_REQUEST_VIDEO_PATH).
MPEG-2 surfaces as next codec — same hantro device, different bug.

Memory: feedback_vaapi_strips_vp8_uncompressed_header.md added.
This commit is contained in:
2026-05-14 16:38:11 +00:00
parent acacf3d7eb
commit 51eee192b8
+80
View File
@@ -0,0 +1,80 @@
## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
### Goal
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
### Result
| Codec | Status | sha-16 |
|---|---|---|
| H.264 10F | PASS | dd4f5f2d552c07bc |
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
| VP9 10F | PASS | cf35908ae0f9ab60 |
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
| MPEG-2 10F | FAIL (separate bug) | libva=95c5905890c937d4 sw=933b744134e47ba4 |
4 of 5 codecs PASS byte-equal SW.
### Root cause for VP8
Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:
```c
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
```
It uses this offset for:
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
- `dct_part_offset = first_part_offset + first_part_size`
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
```c
unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;
```
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
### Investigation path
- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
### Fix: α-30
In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:
```c
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
surface_object->slices_size += header_size;
}
```
VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
Source: backend commit `7e0848d`.
### MPEG-2 (surfaced, deferred to iter34)
Same libva → hantro path. Decodes to non-zero output but byte-different from SW for all 10 frames. Likely a similar ffmpeg-vaapi-strip / OUTPUT-mismatch as VP8, possibly with different offset semantics for MPEG-2. To investigate in next iteration.
### Substrate state at iter33 close
- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
### Memory entries
- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).