h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id

fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
(equation 8-28) for short-term references or LongTermPicNum
(equation 8-29) for long-term references. The surface id has no
relationship to either.

Kernel-side consumers of pic_num:
  - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
    dst_entry->pic_num = src_entry->pic_num. Used for
    field-coded short-term reference disambiguation.
  - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
    pic_num. They resolve refs via reference_ts (timestamp)
    and POC. This is why fourier's wrong value never surfaced
    on RK3568 hantro.

This patch makes pic_num spec-correct so the libva-v4l2-request
fork is upstreamable across drivers without depending on each
target's tolerance for non-spec fills.

Computation, derived from H.264 spec section 8.2.4.1:

  For frames (not field-coded), PicNum = FrameNumWrap.
  FrameNumWrap = (frame_num > cur_frame_num)
                 ? frame_num - max_frame_num
                 : frame_num

  max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
  cur_frame_num = current picture's frame_num

For long-term references:
  LongTermPicNum = long_term_frame_idx (when not field-coded).
  VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
    VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
  So long-term refs already carry long_term_frame_idx in frame_idx;
  we copy it through.

Field-coded streams require an extra factor-of-2 plus a parity
adjustment per spec equations 8-28/8-29; this patch does not handle
field-coded content. ohm corpus is all frame-coded so this is a
follow-up for later.

Implementation: add VAPicture parameter to h264_fill_dpb so the
function has access to seq_fields.log2_max_frame_num_minus4 and
the current picture's frame_num. Update the single caller in
h264_va_picture_to_v4l2.

Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
line 210.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
This commit is contained in:
2026-05-02 12:00:00 +00:00
parent 05ffd02ff2
commit b0a93e4683
+40 -2
View File
@@ -226,8 +226,12 @@ static inline int32_t h264_strip_ffmpeg_poc_sentinel(int32_t poc, uint32_t flags
static void h264_fill_dpb(struct request_data *data, static void h264_fill_dpb(struct request_data *data,
struct object_context *context, struct object_context *context,
VAPictureParameterBufferH264 *VAPicture,
struct v4l2_ctrl_h264_decode_params *decode) struct v4l2_ctrl_h264_decode_params *decode)
{ {
const int max_frame_num =
1 << (VAPicture->seq_fields.bits.log2_max_frame_num_minus4 + 4);
const int cur_frame_num = (int)VAPicture->frame_num;
int i; int i;
for (i = 0; i < H264_DPB_SIZE; i++) { for (i = 0; i < H264_DPB_SIZE; i++) {
@@ -246,7 +250,41 @@ static void h264_fill_dpb(struct request_data *data,
} }
dpb->frame_num = entry->pic.frame_idx; dpb->frame_num = entry->pic.frame_idx;
dpb->pic_num = entry->pic.picture_id;
/*
* Per ext-ctrls-codec-stateless.rst, dpb[].pic_num must
* equal the H.264 spec's PicNum (8-28) for short-term refs
* or LongTermPicNum (8-29) for long-term refs.
*
* For frames (not field-coded), PicNum = FrameNumWrap.
* FrameNumWrap = (frame_num > cur_frame_num)
* ? frame_num - max_frame_num
* : frame_num
* (per spec section 8.2.4.1, frame_num wraparound).
*
* VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic
* line 64): VAPictureH264.frame_idx holds long_term_frame_idx
* for long-term refs and frame_num for short-term refs. So
* for long-term entries we copy frame_idx straight through
* as LongTermPicNum.
*
* fourier's previous code set pic_num to picture_id (the
* VAAPI surface id) which is unrelated to H.264 PicNum;
* mediatek's vdec_h264_req_common.c::dst_entry->pic_num is
* one consumer that fails on that. Hantro doesn't read
* pic_num at all (uses reference_ts for ref resolution),
* which is why fourier's wrong value never surfaced on
* RK3568.
*/
if (entry->pic.flags & VA_PICTURE_H264_LONG_TERM_REFERENCE) {
dpb->pic_num = entry->pic.frame_idx;
} else {
int frame_num = (int)entry->pic.frame_idx;
dpb->pic_num = (frame_num > cur_frame_num)
? frame_num - max_frame_num
: frame_num;
}
dpb->top_field_order_cnt = dpb->top_field_order_cnt =
h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt, h264_strip_ffmpeg_poc_sentinel(entry->pic.TopFieldOrderCnt,
entry->pic.flags); entry->pic.flags);
@@ -283,7 +321,7 @@ static void h264_va_picture_to_v4l2(struct request_data *driver_data,
nal_ref_idc = (b[0] >> 5) & 0x3; nal_ref_idc = (b[0] >> 5) & 0x3;
nal_unit_type = b[0] & 0x1f; nal_unit_type = b[0] & 0x1f;
h264_fill_dpb(driver_data, context, decode); h264_fill_dpb(driver_data, context, VAPicture, decode);
/* /*
* Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field * Populate every V4L2_CID_STATELESS_H264_DECODE_PARAMS field