From 7e0848d7d210a3db612e0cf67bed00cdbea487bb Mon Sep 17 00:00:00 2001 From: claude-noether Date: Thu, 14 May 2026 16:35:41 +0000 Subject: [PATCH] =?UTF-8?q?iter33=20=CE=B1-30:=20prepend=20VP8=20uncompres?= =?UTF-8?q?sed=20frame=20header=20to=20OUTPUT=20buffer?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ROOT CAUSE FIX for VP8 libva decode garbage output. ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed header (3 bytes for interframe, 10 bytes for keyframe) before submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect) KEEPS the header in its OUTPUT buffer. Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349) hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3' as the byte offset into OUTPUT where the first compressed partition starts. It uses this offset for: - mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8 - dct_part_offset = first_part_offset + first_part_size Without the header, every offset is wrong, the entropy decoder spins on the wrong bytes, and every frame decodes to garbage. Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT before the slice data memcpy. Hantro skips these bytes for actual parsing (uses ctrl-struct values instead), so zero-fill is fine. Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed byte-identical control fields. Only the OUTPUT buffer bytes differed, traced to ffmpeg-vaapi's header stripping. --- src/picture.c | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/src/picture.c b/src/picture.c index 29d9d4e..fe44d35 100644 --- a/src/picture.c +++ b/src/picture.c @@ -76,6 +76,39 @@ static VAStatus codec_store_buffer(struct request_data *driver_data, start_code, sizeof(start_code)); surface_object->slices_size += sizeof(start_code); } + /* + * iter33 α-30: VP8 OUTPUT buffer needs the uncompressed + * frame header that ffmpeg-vaapi stripped before submitting + * VASliceData. Hantro's vp8_dec_run reads OUTPUT[0..N] with + * an assumed offset of 10 bytes (keyframe) or 3 bytes + * (interframe) before the first_partition data — see + * rockchip_vpu2_hw_vp8_dec.c:349. + * + * ffmpeg-vaapi (vaapi_vp8.c:191-192) strips + * header_size = 3 + 7 * s->keyframe + * before submitting the slice data, so libva needs to + * pre-pad the OUTPUT with that many bytes. Hantro only + * uses these bytes for offset arithmetic, not parsing, + * so zero-filled placeholder is sufficient. + * + * ffmpeg-v4l2request (kdirect path) does NOT strip the + * header, hence its OUTPUT is byte-equal to SW reference + * and decode works correctly. This is the only material + * difference between the two front-ends for VP8. + * + * key_frame in VAAPI's pic_fields.bits is INVERTED: + * 0 → keyframe, 1 → interframe. + */ + if (profile == VAProfileVP8Version0_3 && + surface_object->params.vp8.iqmatrix_set /* picture parsed by now */) { + unsigned int header_size = + surface_object->params.vp8.picture.pic_fields.bits.key_frame == 0 ? + 10 : 3; + memset(surface_object->source_data + + surface_object->slices_size, + 0, header_size); + surface_object->slices_size += header_size; + } memcpy(surface_object->source_data + surface_object->slices_size, buffer_object->data,