Files
fresnel-fourier/phase8_iteration33_close.md
T
marfrit cd2d077cb6 iter33: MPEG-2 closed (libva==kdirect bit-exact) — 5/5 codecs PASS
After VP8 fix landed, ran 5-codec libva-vs-kdirect anchor sweep.
All 5 codecs produce byte-identical libva and kdirect output:
  h264   sha=dd4f5f2d552c07bc
  hevc   sha=108f925bb6cbb6c9
  vp9    sha=cf35908ae0f9ab60
  vp8    sha=d3231e5b6c0ee10b
  mpeg2  sha=95c5905890c937d4

MPEG-2 HW output (libva and kdirect agree) differs from libavcodec
SW MPEG-2 by ~1 LSB per ~67 pixels — hantro IDCT precision artifact,
not a libva bug. Effectively 5/5 PASS for libva correctness contract.
2026-05-14 16:40:05 +00:00

4.8 KiB
Raw Blame History

Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced

Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).

Goal

Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.

Result

Codec Status sha-16
H.264 10F PASS dd4f5f2d552c07bc
HEVC 10F PASS 108f925bb6cbb6c9
VP9 10F PASS cf35908ae0f9ab60
VP8 10F PASS (iter33 α-30) d3231e5b6c0ee10b
MPEG-2 10F PASS (libva==kdirect bit-exact) libva=95c5905890c937d4 kdirect=95c5905890c937d4

5 of 5 codecs PASS for the libva-correctness contract (libva backend output bit-equal to kdirect reference path).

4 of 5 also bit-equal to SW reference. MPEG-2's HW decoder (hantro) differs from libavcodec SW MPEG-2 by ≤1 LSB per ~67 pixels (mean=0.01, max=1) — IDCT precision artifact, not a libva bug. Both libva and kdirect HW paths produce identical bytes through hantro.

Root cause for VP8

Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349) hard-codes the byte offset to the first compressed partition:

u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;

It uses this offset for:

  • mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8
  • dct_part_offset = first_part_offset + first_part_size

So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).

ffmpeg-vaapi's vaapi_vp8_decode_slice (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:

unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;

ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.

Investigation path

  • iter33 kernel printk (vpu2_iter33_vp8 in rockchip_vpu2_hw_vp8_dec.c) dumped the full v4l2_ctrl_vp8_frame struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
  • libva-side OUTPUT dump via existing α-16 LIBVA_V4L2_DUMP_OUTPUT showed libva OUTPUT for keyframe starts at 00 47 08 85 … — NOT at the expected d0 1a 0b 9d 01 2a … (the VP8 keyframe tag + sync).
  • IVF stream-copy of the source webm confirmed the real frame starts with d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.

Fix: α-30

In src/picture.c::codec_store_buffer, when profile == VAProfileVP8Version0_3 and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend header_size zero bytes to the OUTPUT buffer before the slice-data memcpy:

if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
    unsigned int header_size =
        surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
            10 : 3;
    memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
    surface_object->slices_size += header_size;
}

VAAPI's pic_fields.bits.key_frame is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.

Source: backend commit 7e0848d.

MPEG-2 (closed in same iteration)

Surfaced as "fail" against SW, but libva-vs-kdirect comparison showed both HW paths produce BIT-EXACT EQUAL output (sha=95c5905890c937d4). MPEG-2 decode through libva works correctly. The HW-vs-SW byte divergence is a hantro IDCT precision difference vs libavcodec's exact IDCT (mean diff = 0.01 byte, max = 1 byte, ~1.5% of bytes nonzero). Not a libva bug, not a fixable bug at this level.

No iter34 needed for MPEG-2.

Substrate state at iter33 close

  • Backend fork tip 7e0848d (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
  • Kernel: linux-fresnel-fourier 7.0-13 with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.

Memory entries

  • New: feedback_vaapi_strips_vp8_uncompressed_header.md — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).