From b0a93e468332596f599c3d59572205dcbca74dea Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 2 May 2026 12:00:00 +0000 Subject: [PATCH] h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id` — the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655, v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum (equation 8-28) for short-term references or LongTermPicNum (equation 8-29) for long-term references. The surface id has no relationship to either. Kernel-side consumers of pic_num: - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210): dst_entry->pic_num = src_entry->pic_num. Used for field-coded short-term reference disambiguation. - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read pic_num. They resolve refs via reference_ts (timestamp) and POC. This is why fourier's wrong value never surfaced on RK3568 hantro. This patch makes pic_num spec-correct so the libva-v4l2-request fork is upstreamable across drivers without depending on each target's tolerance for non-spec fills. Computation, derived from H.264 spec section 8.2.4.1: For frames (not field-coded), PicNum = FrameNumWrap. FrameNumWrap = (frame_num > cur_frame_num) ? frame_num - max_frame_num : frame_num max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4) cur_frame_num = current picture's frame_num For long-term references: LongTermPicNum = long_term_frame_idx (when not field-coded). VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64): VAPictureH264.frame_idx = long_ref ? pic_id : frame_num So long-term refs already carry long_term_frame_idx in frame_idx; we copy it through. Field-coded streams require an extra factor-of-2 plus a parity adjustment per spec equations 8-28/8-29; this patch does not handle field-coded content. ohm corpus is all frame-coded so this is a follow-up for later. Implementation: add VAPicture parameter to h264_fill_dpb so the function has access to seq_fields.log2_max_frame_num_minus4 and the current picture's frame_num. Update the single caller in h264_va_picture_to_v4l2. Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c line 210. Signed-off-by: Markus Fritsche --- src/h264.c | 42 ++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 40 insertions(+), 2 deletions(-) diff --git a/src/h264.c b/src/h264.c index 6efacb2..ff1585a 100644 --- a/src/h264.c +++ b/src/h264.c @@ -226,8 +226,12 @@ static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags static void h264_fill_dpb(struct request_data *data, struct object_context *context, + VAPictureParameterBufferH264 *VAPicture, struct v4l2_ctrl_h264_decode_params *decode) { + const int max_frame_num = + 1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4); + const int cur_frame_num = (int)VAPicture->frame_num; int i; for (i = 0; i < H264_DPB_SIZE; i++) { @@ -246,7 +250,41 @@ static void h264_fill_dpb(struct request_data *data, } dpb->frame_num = entry->pic.frame_idx; - dpb->pic_num = entry->pic.picture_id; + + /* + * Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must + * equal the H.264 spec's PicNum (8-28) for short-term refs + * or LongTermPicNum (8-29) for long-term refs. + * + * For frames (not field-coded), PicNum = FrameNumWrap. + * FrameNumWrap = (frame_num > cur_frame_num) + * ? frame_num - max_frame_num + * : frame_num + * (per spec section 8.2.4.1, frame_num wraparound). + * + * VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic + * line 64): VAPictureH264.frame_idx holds long_term_frame_idx + * for long-term refs and frame_num for short-term refs. So + * for long-term entries we copy frame_idx straight through + * as LongTermPicNum. + * + * fourier's previous code set pic_num to picture_id (the + * VAAPI surface id) which is unrelated to H.264 PicNum; + * mediatek's vdec_h264_req_common.c::dst_entry->pic_num is + * one consumer that fails on that. Hantro doesn't read + * pic_num at all (uses reference_ts for ref resolution), + * which is why fourier's wrong value never surfaced on + * RK3568. + */ + if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) { + dpb->pic_num = entry->pic.frame_idx; + } else { + int frame_num = (int)entry->pic.frame_idx; + dpb->pic_num = (frame_num > cur_frame_num) + ? frame_num - max_frame_num + : frame_num; + } + dpb->top_field_order_cnt = h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt, entry->pic.flags); @@ -283,7 +321,7 @@ static void h264_va_picture_to_v4l2(struct request_data *driver_data, nal_ref_idc = (b[0] >> 5) & 0x3; nal_unit_type = b[0] & 0x1f; - h264_fill_dpb(driver_data, context, decode); + h264_fill_dpb(driver_data, context, VAPicture, decode); /* * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field