cd2d077cb6
After VP8 fix landed, ran 5-codec libva-vs-kdirect anchor sweep. All 5 codecs produce byte-identical libva and kdirect output: h264 sha=dd4f5f2d552c07bc hevc sha=108f925bb6cbb6c9 vp9 sha=cf35908ae0f9ab60 vp8 sha=d3231e5b6c0ee10b mpeg2 sha=95c5905890c937d4 MPEG-2 HW output (libva and kdirect agree) differs from libavcodec SW MPEG-2 by ~1 LSB per ~67 pixels — hantro IDCT precision artifact, not a libva bug. Effectively 5/5 PASS for libva correctness contract.
85 lines
4.8 KiB
Markdown
85 lines
4.8 KiB
Markdown
## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
|
||
|
||
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
|
||
|
||
### Goal
|
||
|
||
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
|
||
|
||
### Result
|
||
|
||
| Codec | Status | sha-16 |
|
||
|---|---|---|
|
||
| H.264 10F | PASS | dd4f5f2d552c07bc |
|
||
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
|
||
| VP9 10F | PASS | cf35908ae0f9ab60 |
|
||
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
|
||
| MPEG-2 10F | PASS (libva==kdirect bit-exact) | libva=95c5905890c937d4 kdirect=95c5905890c937d4 |
|
||
|
||
**5 of 5 codecs PASS** for the libva-correctness contract (libva backend output bit-equal to kdirect reference path).
|
||
|
||
4 of 5 also bit-equal to SW reference. MPEG-2's HW decoder (hantro) differs from libavcodec SW MPEG-2 by ≤1 LSB per ~67 pixels (mean=0.01, max=1) — IDCT precision artifact, not a libva bug. Both libva and kdirect HW paths produce identical bytes through hantro.
|
||
|
||
### Root cause for VP8
|
||
|
||
Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:
|
||
|
||
```c
|
||
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
|
||
```
|
||
|
||
It uses this offset for:
|
||
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
|
||
- `dct_part_offset = first_part_offset + first_part_size`
|
||
|
||
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
|
||
|
||
ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
|
||
|
||
```c
|
||
unsigned int header_size = 3 + 7 * s->keyframe;
|
||
const uint8_t *data = buffer + header_size;
|
||
int data_size = size - header_size;
|
||
```
|
||
|
||
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
|
||
|
||
### Investigation path
|
||
|
||
- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
|
||
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
|
||
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
|
||
|
||
### Fix: α-30
|
||
|
||
In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:
|
||
|
||
```c
|
||
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
|
||
unsigned int header_size =
|
||
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
|
||
10 : 3;
|
||
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
|
||
surface_object->slices_size += header_size;
|
||
}
|
||
```
|
||
|
||
VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
|
||
|
||
Source: backend commit `7e0848d`.
|
||
|
||
### MPEG-2 (closed in same iteration)
|
||
|
||
Surfaced as "fail" against SW, but libva-vs-kdirect comparison showed both HW paths produce BIT-EXACT EQUAL output (sha=95c5905890c937d4). MPEG-2 decode through libva works correctly. The HW-vs-SW byte divergence is a hantro IDCT precision difference vs libavcodec's exact IDCT (mean diff = 0.01 byte, max = 1 byte, ~1.5% of bytes nonzero). Not a libva bug, not a fixable bug at this level.
|
||
|
||
No iter34 needed for MPEG-2.
|
||
|
||
### Substrate state at iter33 close
|
||
|
||
- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
|
||
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
|
||
|
||
### Memory entries
|
||
|
||
- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).
|