iter33 α-30: prepend VP8 uncompressed frame header to OUTPUT buffer

ROOT CAUSE FIX for VP8 libva decode garbage output.

ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed
header (3 bytes for interframe, 10 bytes for keyframe) before
submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect)
KEEPS the header in its OUTPUT buffer.

Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349)
hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3'
as the byte offset into OUTPUT where the first compressed partition
starts. It uses this offset for:
  - mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8
  - dct_part_offset = first_part_offset + first_part_size

Without the header, every offset is wrong, the entropy decoder
spins on the wrong bytes, and every frame decodes to garbage.

Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend
header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT
before the slice data memcpy. Hantro skips these bytes for actual
parsing (uses ctrl-struct values instead), so zero-fill is fine.

Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the
v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed
byte-identical control fields. Only the OUTPUT buffer bytes
differed, traced to ffmpeg-vaapi's header stripping.
This commit is contained in:
2026-05-14 16:35:41 +00:00
parent bf3e3d8587
commit 7e0848d7d2
+33
View File
@@ -76,6 +76,39 @@ static VAStatus codec_store_buffer(struct request_data *driver_data,
start_code, sizeof(start_code));
surface_object->slices_size += sizeof(start_code);
}
/*
* iter33 α-30: VP8 OUTPUT buffer needs the uncompressed
* frame header that ffmpeg-vaapi stripped before submitting
* VASliceData. Hantro's vp8_dec_run reads OUTPUT[0..N] with
* an assumed offset of 10 bytes (keyframe) or 3 bytes
* (interframe) before the first_partition data — see
* rockchip_vpu2_hw_vp8_dec.c:349.
*
* ffmpeg-vaapi (vaapi_vp8.c:191-192) strips
* header_size = 3 + 7 * s->keyframe
* before submitting the slice data, so libva needs to
* pre-pad the OUTPUT with that many bytes. Hantro only
* uses these bytes for offset arithmetic, not parsing,
* so zero-filled placeholder is sufficient.
*
* ffmpeg-v4l2request (kdirect path) does NOT strip the
* header, hence its OUTPUT is byte-equal to SW reference
* and decode works correctly. This is the only material
* difference between the two front-ends for VP8.
*
* key_frame in VAAPI's pic_fields.bits is INVERTED:
* 0 → keyframe, 1 → interframe.
*/
if (profile == VAProfileVP8Version0_3 &&
surface_object->params.vp8.iqmatrix_set /* picture parsed by now */) {
unsigned int header_size =
surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ?
10 : 3;
memset(surface_object->source_data +
surface_object->slices_size,
0, header_size);
surface_object->slices_size += header_size;
}
memcpy(surface_object->source_data +
surface_object->slices_size,
buffer_object->data,