Commit Graph

23 Commits

Author SHA1 Message Date
test0r 841f616e74 h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.

fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.

Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).

Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.

Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.

This patch:
  - surface.h: adds bool matrix_set to params.h264, mirroring
    mpeg2.iqmatrix_set / h265.iqmatrix_set.
  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
    sets matrix_set = true when the buffer arrives.
  - picture.c RequestBeginPicture: resets matrix_set = false at the
    start of each Begin/Render/End cycle.
  - h264.c h264_set_controls: builds the controls[] array
    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.

The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."

Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 86a8545146 h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
Fourier's h264_va_picture_to_v4l2 only populated four fields of the
struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
IDR_PIC flag. Many other required-by-spec fields were left at zero-
init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).

For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
the bitstream from the OUTPUT buffer and uses these fields to drive
its bitstream-element offset tracking. Empirically the kernel
returned a successfully-decoded but ZEROED CAPTURE buffer — flat
dark-green frames in mpv output, no errors logged.

This patch fills every field VAAPI exposes:

  - frame_num: from VAPicture->frame_num.
  - FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
  - BOTTOM_FIELD flag: from
    VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.

Also corrects the IDR_PIC flag to use |= instead of = so the new
field flags don't clobber it.

Fields NOT derivable from VAAPI's pre-parsed structures —
idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle — require a slice_header() bit-level
parse. libva-v4l2-request does not currently do this. They remain
at zero-init.

Empirical question this patch answers: does hantro tolerate the
bit_size fields being zero for IDR frames, or does it strictly
require them? If post-patch CAPTURE is still zeroed, a slice-header
parser is required. If CAPTURE shows real picture data, hantro
fills in the bit-positions itself when no hint is supplied.

Cross-reference: gstv4l2codech264dec.c::
gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
lines 632-678).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 4246d5d537 h264: omit per-slice controls in FRAME_BASED mode
Identified by cross-reference against GStreamer's
gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
PRED_WEIGHTS submission on is_slice_based(self):

    if (is_slice_based (self)) {
        control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
        ...
        control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
        ...
    }

In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
bitstream itself from the OUTPUT-queue payload; per-slice controls in
the request trigger cluster-validation EINVAL at error_idx=count
(observed on RK3568 hantro-vpu, kernel 6.19.10).

This patch:
  - Reorders controls[] so FRAME_BASED-required entries come first
    (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
    SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
    indices 4..5).
  - Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
    SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
  - Hardcodes slice_based=false for now since patch 0002 sets the
    device to FRAME_BASED unconditionally. A TODO marks the spot for
    the planned probe-then-set commit, which will populate
    context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
    G_EXT_CTRLS and replace the hardcoded false with a runtime check.

Diagnosis chain:
  - patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
    submission, but cluster-level rejection persisted at error_idx=5
    (count) — meaning kernel walked all 5 controls cleanly but
    rejected the request as a whole.
  - dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
    v4l2-h264.c), not in hantro driver where it could log.
  - GStreamer reference confirmed FRAME_BASED contract: only 4
    sequence-and-frame-level controls go in the per-request batch.

After this patch the kernel should accept the per-request controls
and actually decode the bitstream into the CAPTURE buffer.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r e382c63e20 h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:

    V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
        ((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
         (slice_type == P || slice_type == SP)) ||
        (pps->weighted_bipred_idc == 1 && slice_type == B)

Submitting PRED_WEIGHTS on a frame where the macro evaluates false
triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
6th, last control in the per-request batch) on hantro-vpu and any
other driver that strictly enforces the spec.

Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
weighted prediction): every per-frame batch fails identically, 13
EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
falls back to software decode for every frame.

Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
when the macro returns false; keep at 6 when it returns true.

Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
part of Sonnet's six-commit upstreamable plan; slotted in as patch
0005 ahead of the planned probe-then-set / FRAME_BASED commits
because it unblocks per-frame submission on every backing driver,
not just hantro.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Paul Kocialkowski b5cee9f480 include: Update headers to latest series
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-05-16 16:14:55 +02:00
Paul Kocialkowski 0c611c6b7a Implement proper timestamping for references
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:41:56 +01:00
Paul Kocialkowski 3176adf69c Include local copies of DRM and V4L2 codec definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski 518d7a0c59 Update and harmonize heading author lists
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Maxime Ripard 111f5b209a tree: Rename cedrus_data to request_data
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 4ad990e087 tree: Rename the header and defines
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.

As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 17:02:23 +02:00
Maxime Ripard 2d1bce38c2 h264: Don't set num_slices anymore
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.

Add a counter for the number of slices per surface, and set num_slices to
that value.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 38d38134c7 h264: Set PPS pic_init_qp_minus26 field
The pic_init_qp_minus26 must be set but was not until now. Fix this.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 1fca951c05 h264: Fix prediction weight table
The current code sets the prediction weight table by doing a memcpy of the
libva structure to the v4l2's structure.

However, for the offset and weight parameters, libva's structure uses
16-bits integer, while v4l2 uses 8-bits, which obviously doesn't work well
with memcpy.

Create a function to copy those arrays and matrices instead that follows
the algorithm defined in the H264 spec, and use it so that it works
properly.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard e7c09a336f h264: Implement local cache of the latest decoded pictures
The libva only provides the reference images needed to decode the current
picture, but not the full DPB. However, some codecs need that whole DPB in
order to decode a picture.

For example, the Allwinner hardware codec has an internal SRAM, with each
picture getting a slot in that SRAM, and during each decoding process, some
metadata will then be generated from that SRAM content to a separate
buffer. Therefore, each frames must be located at the same SRAM position
each time so that the metadata are then re-used properly.

However, since libva will only pass a few reference images, we can end up
in a situation where multiple, subsequent, frames will have the same
reference images set, but might all be used as reference later on and
cannot therefore be located at the same position.

And from a more theorical point of view, Linux expects a full blown DPB in
its H264 control.

In order to work around this, we can create a shadow of the DPB by simply
maintaining a list of 16 decoded images, each associated with their
VAPictureH264 and an age. This age is the last time we used that frame as
reference. When a new picture is decoded, either we assign it to a free
slot, or we reuse the slot from the frame that hasn't been used as a
reference for the longest time.

This is a much simpler approach than the one documented in the H264 spec,
but this shouldn't really be a problem since we don't handle the reference
frames ourselves, but just re-use the one from the libva, and taken from
the bitstream before. As such, frames that are not supposed to be used for
reference will not be anymore, their age will not increase, and therefore
after a while we will garbage-collect their slot to store a much newer
frame.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:30:33 +02:00
Maxime Ripard dadb3d344f h264: Pass the context to the sub-control functions
Some functions setting the controls in the H264 code will need the context
in order to access the DPB. Make sure that we pass it as an argument.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:29:28 +02:00
Maxime Ripard acc0cf3475 codecs: pass the context to the controls function as well
Some functions setting the controls will need the context in the future.
Make sure that we provide it as an argument.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 15:28:55 +02:00
Maxime Ripard 5aeb07f8bf tree: Run clang-format to conform to the kernel coding style
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 10:12:15 +02:00
Maxime Ripard b938824c48 tree: Shorten struct sunxi_cedrus_driver_data name
This long structure name makes it quite difficult to fit within the 80
characters limit. Shorten it.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 09:34:15 +02:00
Maxime Ripard 2208d57b8f h264: shorten the surface_object parameter name
Using the same words but not in the same order for both the type and the
variable name isn't particularly helpful, and prevents to stay within 80
characters. Shorten the name a bit.

Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-17 09:31:17 +02:00
Maxime Ripard 6194f1e7da h264: Adjust for the latest h264 API changes
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-13 16:10:21 +02:00
Maxime Ripard 22b51f5ced h264: Fix build failure introduced by previous commit
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
2018-07-13 16:10:02 +02:00
Maxime Ripard 1efa9d877e Add support for H264 decoding
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-07-11 17:07:15 +02:00