Files
fresnel-fourier/phase8_iteration33_close.md
T
marfrit 51eee192b8 iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS
ffmpeg-vaapi strips the VP8 uncompressed frame header before
submitting VASliceData. Hantro hard-codes first_part_offset = 10
or 3 based on keyframe flag. libva must prepend matching placeholder
bytes. Backend commit 7e0848d.

3-codec rkvdec anchors unchanged (H264 + HEVC + VP9 all PASS).
VP8 newly PASS through hantro (env-override LIBVA_V4L2_REQUEST_VIDEO_PATH).
MPEG-2 surfaces as next codec — same hantro device, different bug.

Memory: feedback_vaapi_strips_vp8_uncompressed_header.md added.
2026-05-14 16:38:11 +00:00

81 lines
4.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
### Goal
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
### Result
| Codec | Status | sha-16 |
|---|---|---|
| H.264 10F | PASS | dd4f5f2d552c07bc |
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
| VP9 10F | PASS | cf35908ae0f9ab60 |
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
| MPEG-2 10F | FAIL (separate bug) | libva=95c5905890c937d4 sw=933b744134e47ba4 |
4 of 5 codecs PASS byte-equal SW.
### Root cause for VP8
Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:
```c
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
```
It uses this offset for:
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
- `dct_part_offset = first_part_offset + first_part_size`
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
```c
unsigned int header_size = 3 + 7 * s->keyframe;
const uint8_t *data = buffer + header_size;
int data_size = size - header_size;
```
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
### Investigation path
- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
### Fix: α-30
In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:
```c
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
surface_object->slices_size += header_size;
}
```
VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
Source: backend commit `7e0848d`.
### MPEG-2 (surfaced, deferred to iter34)
Same libva → hantro path. Decodes to non-zero output but byte-different from SW for all 10 frames. Likely a similar ffmpeg-vaapi-strip / OUTPUT-mismatch as VP8, possibly with different offset semantics for MPEG-2. To investigate in next iteration.
### Substrate state at iter33 close
- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
### Memory entries
- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).