iter33 α-30 close: VP8 FIXED — 4/5 codecs PASS
ffmpeg-vaapi strips the VP8 uncompressed frame header before submitting VASliceData. Hantro hard-codes first_part_offset = 10 or 3 based on keyframe flag. libva must prepend matching placeholder bytes. Backend commit 7e0848d. 3-codec rkvdec anchors unchanged (H264 + HEVC + VP9 all PASS). VP8 newly PASS through hantro (env-override LIBVA_V4L2_REQUEST_VIDEO_PATH). MPEG-2 surfaces as next codec — same hantro device, different bug. Memory: feedback_vaapi_strips_vp8_uncompressed_header.md added.
This commit is contained in:
@@ -0,0 +1,80 @@
|
|||||||
|
## Iteration 33 — Phase 8 (close): VP8 FIXED, MPEG-2 surfaced
|
||||||
|
|
||||||
|
Closes 2026-05-14, fourth campaign-day milestone after iter31 α-29 (HEVC) + iter32 (kernel cleanup).
|
||||||
|
|
||||||
|
### Goal
|
||||||
|
|
||||||
|
Unblock VP8 through libva backend → hantro. Pre-existing libva single-device probe limitation can be overridden via env (LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video3 + MEDIA=/dev/media1) so the hantro device is targetable.
|
||||||
|
|
||||||
|
### Result
|
||||||
|
|
||||||
|
| Codec | Status | sha-16 |
|
||||||
|
|---|---|---|
|
||||||
|
| H.264 10F | PASS | dd4f5f2d552c07bc |
|
||||||
|
| HEVC 10F | PASS | 108f925bb6cbb6c9 |
|
||||||
|
| VP9 10F | PASS | cf35908ae0f9ab60 |
|
||||||
|
| **VP8 10F** | **PASS** (iter33 α-30) | d3231e5b6c0ee10b |
|
||||||
|
| MPEG-2 10F | FAIL (separate bug) | libva=95c5905890c937d4 sw=933b744134e47ba4 |
|
||||||
|
|
||||||
|
4 of 5 codecs PASS byte-equal SW.
|
||||||
|
|
||||||
|
### Root cause for VP8
|
||||||
|
|
||||||
|
Hantro's `rockchip_vpu2_vp8_dec_run` (`rockchip_vpu2_hw_vp8_dec.c:349`) hard-codes the byte offset to the first compressed partition:
|
||||||
|
|
||||||
|
```c
|
||||||
|
u32 first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3;
|
||||||
|
```
|
||||||
|
|
||||||
|
It uses this offset for:
|
||||||
|
- `mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8`
|
||||||
|
- `dct_part_offset = first_part_offset + first_part_size`
|
||||||
|
|
||||||
|
So hantro expects OUTPUT[0..N] to start with the VP8 uncompressed frame header (10 bytes for keyframe = 3-byte tag + 3-byte sync + 4-byte width/height; 3 bytes for interframe = tag only).
|
||||||
|
|
||||||
|
ffmpeg-vaapi's `vaapi_vp8_decode_slice` (vaapi_vp8.c:191-192) STRIPS this header before submitting to VAAPI:
|
||||||
|
|
||||||
|
```c
|
||||||
|
unsigned int header_size = 3 + 7 * s->keyframe;
|
||||||
|
const uint8_t *data = buffer + header_size;
|
||||||
|
int data_size = size - header_size;
|
||||||
|
```
|
||||||
|
|
||||||
|
ffmpeg-v4l2request (kdirect) DOES NOT strip — appends the full frame bytes directly. So kdirect's OUTPUT is byte-correct; libva's OUTPUT is missing the header bytes.
|
||||||
|
|
||||||
|
### Investigation path
|
||||||
|
|
||||||
|
- **iter33 kernel printk** (`vpu2_iter33_vp8` in `rockchip_vpu2_hw_vp8_dec.c`) dumped the full `v4l2_ctrl_vp8_frame` struct. Verified that libva's struct is BYTE-IDENTICAL to kdirect's modulo self-consistent timestamp values (α-7 counter vs PTS-derived; both schemes work).
|
||||||
|
- **libva-side OUTPUT dump** via existing α-16 `LIBVA_V4L2_DUMP_OUTPUT` showed libva OUTPUT for keyframe starts at `00 47 08 85 …` — NOT at the expected `d0 1a 0b 9d 01 2a …` (the VP8 keyframe tag + sync).
|
||||||
|
- IVF stream-copy of the source webm confirmed the real frame starts with `d0 1a 0b 9d 01 2a 00 05 d0 02 00 47 08 85 …`. libva's OUTPUT lines up with byte 10 of the real frame → header stripped.
|
||||||
|
|
||||||
|
### Fix: α-30
|
||||||
|
|
||||||
|
In `src/picture.c::codec_store_buffer`, when `profile == VAProfileVP8Version0_3` and the picture parameter has been parsed (iqmatrix_set as proxy, since IQMatrix is submitted in start_frame BEFORE slice data per ffmpeg-vaapi VP8 hwaccel flow), prepend `header_size` zero bytes to the OUTPUT buffer before the slice-data memcpy:
|
||||||
|
|
||||||
|
```c
|
||||||
|
if (profile == VAProfileVP8Version0_3 && surface_object->params.vp8.iqmatrix_set) {
|
||||||
|
unsigned int header_size =
|
||||||
|
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
|
||||||
|
10 : 3;
|
||||||
|
memset(surface_object->source_data + surface_object->slices_size, 0, header_size);
|
||||||
|
surface_object->slices_size += header_size;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
VAAPI's `pic_fields.bits.key_frame` is INVERTED (0 = keyframe per VP8 spec convention). Hantro only uses these prepended bytes for offset arithmetic, not actual parsing, so zero-fill is sufficient.
|
||||||
|
|
||||||
|
Source: backend commit `7e0848d`.
|
||||||
|
|
||||||
|
### MPEG-2 (surfaced, deferred to iter34)
|
||||||
|
|
||||||
|
Same libva → hantro path. Decodes to non-zero output but byte-different from SW for all 10 frames. Likely a similar ffmpeg-vaapi-strip / OUTPUT-mismatch as VP8, possibly with different offset semantics for MPEG-2. To investigate in next iteration.
|
||||||
|
|
||||||
|
### Substrate state at iter33 close
|
||||||
|
|
||||||
|
- Backend fork tip `7e0848d` (α-25 through α-30 + iter29/iter30/iter33 env-gated DIAG probes; α-25, α-29, α-30 are load-bearing fixes).
|
||||||
|
- Kernel: `linux-fresnel-fourier 7.0-13` with iter33 kernel printk in rockchip_vpu2_hw_vp8_dec.c. NOT shipping; clean revert needed after MPEG-2 investigation.
|
||||||
|
|
||||||
|
### Memory entries
|
||||||
|
|
||||||
|
- New: `feedback_vaapi_strips_vp8_uncompressed_header.md` — ffmpeg-vaapi strips VP8 frame header (3 byte for inter, 10 bytes for keyframe) before submitting via VASliceData. libva backend must prepend matching placeholder bytes for any hardware that hard-codes the first_part_offset (hantro does).
|
||||||
Reference in New Issue
Block a user