h264: always submit SCALING_MATRIX + populate pps num_ref_idx

Three Tier-2C/1B fixes from diff_against_ffmpeg.md (campaign repo):

1. Submit V4L2_CID_STATELESS_H264_SCALING_MATRIX every frame, with
   the H.264 spec flat default (every entry = 16) when the consumer
   didn't send a VAIQMatrixBufferH264. New helper:
   h264_default_flat_scaling_matrix(). Mirrors FFmpeg's
   v4l2_request_h264.c which always provides a scaling matrix.
   Replaces patch 0012's VAIQMatrixBuffer-conditional submission —
   that was corpus-correct (bbb has no explicit scaling lists) but
   inconsistent with what hantro G1 expects.

2. Set pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT
   unconditionally. Hantro G1's set_params reads this flag to gate
   G1_REG_DEC_CTRL2_TYPE1_QUANT_E.

3. Populate pps->num_ref_idx_l0/l1_default_active_minus1 from
   VASliceParameterBufferH264.num_ref_idx_l*_active_minus1. Hantro
   G1 writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE.
   VAAPI doesn't expose the parsed-PPS default fields; the per-slice
   override is the closest available source (matches PPS default
   except on streams with explicit per-slice override).

Why now: 2026-05-04 Phase 0 kernel-side audit (kernel source
drivers/media/platform/verisilicon/hantro_g1_h264_dec.c) showed
hantro G1 writes these fields directly into hardware MMIO
registers. Prior assumption that they're "informational" or
that "VAAPI handles defaults" was wrong — the hardware uses them
to bit-walk the slice header and to size reference lists. See
~/src/libva-multiplanar/diff_against_ffmpeg.md.

This is the easy half of the fix. The load-bearing half — adding
a slice-header bit-parser to populate dec_param->dec_ref_pic_
marking_bit_size, idr_pic_id, pic_order_cnt_bit_size — comes in
the next commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 12:21:23 +00:00
parent 74b3793e3c
commit d41a4b96b3
+85 -12
View File
@@ -469,6 +469,32 @@ static void h264_va_matrix_to_v4l2(struct request_data *driver_data,
sizeof(v4l2_matrix->scaling_list_8x8[3]));
}
/*
* H.264 spec default scaling matrices: Flat_4x4_16 and Flat_8x8_16
* (every entry = 16). When sps_scaling_matrix_present_flag and
* pps_scaling_matrix_present_flag are both false, the bitstream
* carries no explicit scaling lists and the decoder uses these
* flat defaults — matching ITU-T H.264 (08/2024) §7.4.2.1.1.1
* (sequence scaling) and §7.4.2.2 (picture scaling).
*
* Why we always provide the matrix: hantro G1's set_params reads
* pps->flags & V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT to drive
* the G1_REG_DEC_CTRL2_TYPE1_QUANT_E hardware bit. FFmpeg's
* v4l2_request_h264.c always submits the SCALING_MATRIX control
* with the spec default when the bitstream omits explicit lists,
* and always sets the SCALING_MATRIX_PRESENT flag (commit
* comment: "FFmpeg always provide a scaling matrix"). We mirror
* that so the kernel sees a consistent control set across drivers.
*/
static void h264_default_flat_scaling_matrix(
struct v4l2_ctrl_h264_scaling_matrix *v4l2_matrix)
{
memset(v4l2_matrix->scaling_list_4x4, 16,
sizeof(v4l2_matrix->scaling_list_4x4));
memset(v4l2_matrix->scaling_list_8x8, 16,
sizeof(v4l2_matrix->scaling_list_8x8));
}
static void h264_copy_pred_table(struct v4l2_h264_weight_factors *factors,
unsigned int num_refs,
int16_t luma_weight[32],
@@ -713,12 +739,60 @@ int h264_set_controls(struct request_data *driver_data,
h264_va_picture_to_v4l2(driver_data, context, surface,
&surface->params.h264.picture,
&decode, &pps, &sps);
/*
* Populate the scaling matrix unconditionally: from VAAPI's
* VAIQMatrixBufferH264 when the consumer sent one this frame
* (matrix_set), otherwise from the H.264 spec flat defaults.
* Submitted to the kernel as V4L2_CID_STATELESS_H264_SCALING_MATRIX
* for every request — required for FFmpeg/hantro contract parity
* (see h264_default_flat_scaling_matrix() docblock).
*/
if (surface->params.h264.matrix_set)
h264_va_matrix_to_v4l2(driver_data, context,
&surface->params.h264.matrix, &matrix);
else
h264_default_flat_scaling_matrix(&matrix);
h264_va_slice_to_v4l2(driver_data, context,
&surface->params.h264.slice,
&surface->params.h264.picture, &slice, &weights);
/*
* Mirror SCALING_MATRIX_PRESENT in PPS flags. Hantro G1 set_params
* gates its G1_REG_DEC_CTRL2_TYPE1_QUANT_E register bit on this;
* FFmpeg sets it unconditionally with the comment "FFmpeg always
* provide a scaling matrix." We submit the matrix always (above),
* so the flag must be set always to match.
*/
pps.flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT;
/*
* Populate pps->num_ref_idx_l0/l1_default_active_minus1. Hantro G1
* writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE
* MMIO registers (via "(field) + 1", so an uninitialized 0 here
* would advertise "1 active reference per list" to hardware, wrong
* for I/IDR frames with 0 refs and wrong for B frames with >1).
*
* VAAPI's VAPictureParameterBufferH264 does not carry the parsed
* PPS num_ref_idx_l*_default_active_minus1 fields — those are in
* the bitstream's PPS NAL which VAAPI consumers parse client-side
* but don't forward. The closest available source is VASlice's
* num_ref_idx_l*_active_minus1, which is the per-slice override
* defaulting to the PPS value (H.264 §7.4.3 num_ref_idx_active_
* override_flag). For most streams these values match; mismatch
* only on streams with explicit per-slice overrides.
*
* For IDR frames (no references), the values are not used by
* hantro's reference list builder, so a wrong value here is
* harmless. For inter frames it matters and slice-derived is
* the best we can do without a full PPS-NAL parser.
*/
pps.num_ref_idx_l0_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l0_active_minus1;
pps.num_ref_idx_l1_default_active_minus1 =
surface->params.h264.slice.num_ref_idx_l1_active_minus1;
/*
* Derive PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags
* from VASliceParameterBufferH264.slice_type. VAAPI's slice_type
@@ -766,16 +840,17 @@ int h264_set_controls(struct request_data *driver_data,
/*
* Build the per-request control list incrementally:
* - SPS, PPS, DECODE_PARAMS: always required (in either decode
* mode).
* - SCALING_MATRIX: gated on surface->params.h264.matrix_set,
* i.e. the consumer sent a VAIQMatrixBufferH264 this frame.
* This matches the H.264 spec: explicit scaling lists are
* present iff sps_scaling_matrix_present_flag ||
* pps_scaling_matrix_present_flag, in which case VAAPI
* consumers send the matrix; otherwise the kernel uses
* spec-defined defaults. Independent of FRAME_BASED /
* SLICE_BASED.
* - SPS, PPS, DECODE_PARAMS, SCALING_MATRIX: always required.
* Hantro G1 reads the SCALING_MATRIX_PRESENT flag from PPS to
* gate hardware register G1_REG_DEC_CTRL2_TYPE1_QUANT_E and
* reads the matrix entries directly into hardware tables when
* decoding. FFmpeg always submits the matrix (with spec-default
* flat values when no explicit lists are in the bitstream); we
* match that — see h264_default_flat_scaling_matrix() docblock.
* Earlier patch 0012 made SCALING_MATRIX submission conditional
* on VAAPI's VAIQMatrixBuffer arrival; that was corpus-correct
* (bbb has no explicit scaling lists) but inconsistent with the
* hantro contract — replaced 2026-05-04.
* - SLICE_PARAMS: SLICE_BASED only. Kernel doc
* ext-ctrls-codec-stateless.rst (FRAME_BASED entry):
* "When this mode is selected, the
@@ -808,12 +883,10 @@ int h264_set_controls(struct request_data *driver_data,
controls[num_controls].size = sizeof(decode);
num_controls++;
if (surface->params.h264.matrix_set) {
controls[num_controls].id = V4L2_CID_STATELESS_H264_SCALING_MATRIX;
controls[num_controls].p_h264_scaling_matrix = &matrix;
controls[num_controls].size = sizeof(matrix);
num_controls++;
}
if (slice_based) {
controls[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;