After VP8 fix landed, ran 5-codec libva-vs-kdirect anchor sweep. All 5 codecs produce byte-identical libva and kdirect output: h264 sha=dd4f5f2d552c07bc hevc sha=108f925bb6cbb6c9 vp9 sha=cf35908ae0f9ab60 vp8 sha=d3231e5b6c0ee10b mpeg2 sha=95c5905890c937d4 MPEG-2 HW output (libva and kdirect agree) differs from libavcodec SW MPEG-2 by ~1 LSB per ~67 pixels — hantro IDCT precision artifact, not a libva bug. Effectively 5/5 PASS for libva correctness contract.
4.8 KiB
Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
Goal
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
Result
| Codec | Status | sha-16 |
|---|---|---|
| H.264 10F | PASS | dd4f5f2d552c07bc |
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
| VP9 10F | PASS | cf35908ae0f9ab60 |
| VP8 10F | PASS (iter33 α-30) | d3231e5b6c0ee10b |
| MPEG-2 10F | PASS (libva==kdirect bit-exact) | libva=95c5905890c937d4 kdirect=95c5905890c937d4 |
5 of 5 codecs PASS for the libva-correctness contract (libva backend output bit-equal to kdirect reference path).
4 of 5 also bit-equal to SW reference. MPEG-2's HW decoder (hantro) differs from libavcodec SW MPEG-2 by ≤1 LSB per ~67 pixels (mean=0.01, max=1) — IDCT precision artifact, not a libva bug. Both libva and kdirect HW paths produce identical bytes through hantro.
Root cause for VP8
Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349) hard-codes the byte offset to the first compressed partition:
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
It uses this offset for:
mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8dct_part_offset = first_part_offset + first_part_size
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
ffmpeg-vaapi's vaapi_vp8_decode_slice (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
Investigation path
- iter33 kernel printk (
vpu2_iter33_vp8inrockchip_vpu2_hw_vp8_dec.c) dumped the fullv4l2_ctrl_vp8_framestruct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work). - libva-side OUTPUT dump via existing α-16
LIBVA_V4L2_DUMP_OUTPUTshowed libva OUTPUT for keyframe starts at00 47 08 85 …— NOT at the expectedd0 1a 0b 9d 01 2a …(the VP8 keyframe tag + sync). - IVF stream-copy of the source webm confirmed the real frame starts with
d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
Fix: α-30
In src/picture.c::codec_store_buffer, when profile == VAProfileVP8Version0_3 and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend header_size zero bytes to the OUTPUT buffer before the slice-data memcpy:
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
surface_object->slices_size += header_size;
}
VAAPI's pic_fields.bits.key_frame is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
Source: backend commit 7e0848d.
MPEG-2 (closed in same iteration)
Surfaced as "fail" against SW, but libva-vs-kdirect comparison showed both HW paths produce BIT-EXACT EQUAL output (sha=95c5905890c937d4). MPEG-2 decode through libva works correctly. The HW-vs-SW byte divergence is a hantro IDCT precision difference vs libavcodec's exact IDCT (mean diff = 0.01 byte, max = 1 byte, ~1.5% of bytes nonzero). Not a libva bug, not a fixable bug at this level.
No iter34 needed for MPEG-2.
Substrate state at iter33 close
- Backend fork tip
7e0848d(α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes). - Kernel:
linux-fresnel-fourier 7.0-13with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
Memory entries
- New:
feedback_vaapi_strips_vp8_uncompressed_header.md— ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).