diff --git a/phase8_iteration33_close.md b/phase8_iteration33_close.md new file mode 100644 index 0000000..74306d2 --- /dev/null +++ b/phase8_iteration33_close.md @@ -0,0 +1,80 @@ +## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced + +Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup). + +### Goal + +Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable. + +### Result + +| Codec | Status | sha-16 | +|---|---|---| +| H.264 10F | PASS | dd4f5f2d552c07bc | +| HEVC 10F | PASS | 108f925bb6cbb6c9 | +| VP9 10F | PASS | cf35908ae0f9ab60 | +| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b | +| MPEG-2 10F | FAIL (separate bug) | libva=95c5905890c937d4 sw=933b744134e47ba4 | + +4 of 5 codecs PASS byte-equal SW. + +### Root cause for VP8 + +Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition: + +```c +u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3; +``` + +It uses this offset for: +- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8` +- `dct_part_offset = first_part_offset + first_part_size` + +So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only). + +ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI: + +```c +unsigned int header_size = 3 + 7 * s->keyframe; +const uint8_t *data = buffer + header_size; +int data_size = size - header_size; +``` + +ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes. + +### Investigation path + +- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work). +- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync). +- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped. + +### Fix: α-30 + +In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy: + +```c +if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) { + unsigned int header_size = + surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ? + 10 : 3; + memset(surface_object->source_data + surface_object->slices_size, 0, header_size); + surface_object->slices_size += header_size; +} +``` + +VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient. + +Source: backend commit `7e0848d`. + +### MPEG-2 (surfaced, deferred to iter34) + +Same libva → hantro path. Decodes to non-zero output but byte-different from SW for all 10 frames. Likely a similar ffmpeg-vaapi-strip / OUTPUT-mismatch as VP8, possibly with different offset semantics for MPEG-2. To investigate in next iteration. + +### Substrate state at iter33 close + +- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes). +- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation. + +### Memory entries + +- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).