Commit Graph

343 Commits

Author SHA1 Message Date
test0r b0a93e4683 h264: fill dpb[].pic_num as PicNum/LongTermPicNum, not VAAPI surface id
fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
(equation 8-28) for short-term references or LongTermPicNum
(equation 8-29) for long-term references. The surface id has no
relationship to either.

Kernel-side consumers of pic_num:
  - mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
    dst_entry->pic_num = src_entry->pic_num. Used for
    field-coded short-term reference disambiguation.
  - hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
    pic_num. They resolve refs via reference_ts (timestamp)
    and POC. This is why fourier's wrong value never surfaced
    on RK3568 hantro.

This patch makes pic_num spec-correct so the libva-v4l2-request
fork is upstreamable across drivers without depending on each
target's tolerance for non-spec fills.

Computation, derived from H.264 spec section 8.2.4.1:

  For frames (not field-coded), PicNum = FrameNumWrap.
  FrameNumWrap = (frame_num > cur_frame_num)
                 ? frame_num - max_frame_num
                 : frame_num

  max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
  cur_frame_num = current picture's frame_num

For long-term references:
  LongTermPicNum = long_term_frame_idx (when not field-coded).
  VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
    VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
  So long-term refs already carry long_term_frame_idx in frame_idx;
  we copy it through.

Field-coded streams require an extra factor-of-2 plus a parity
adjustment per spec equations 8-28/8-29; this patch does not handle
field-coded content. ohm corpus is all frame-coded so this is a
follow-up for later.

Implementation: add VAPicture parameter to h264_fill_dpb so the
function has access to seq_fields.log2_max_frame_num_minus4 and
the current picture's frame_num. Update the single caller in
h264_va_picture_to_v4l2.

Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
line 210.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 05ffd02ff2 h264: derive PFRAME / BFRAME flags from VASlice slice_type
v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
request relied on each backing driver tolerating frame-class
ambiguity.

Kernel survey (linux 6.19.x):
  - tegra-vde/h264.c (lines 783-799) consumes both flags to select
    the inter-frame decode kernel. Without them the I-frame kernel
    runs on P/B content.
  - visl-trace-h264.h uses them for decode tracing.
  - hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
    not consume the flags.

Hantro on ohm decoded bbb cleanly without these flags set (see
phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
an upstreamability fix for cross-driver portability rather than a
correctness fix for hantro.

VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
H.264 slice_header() slice_type field. Per spec 7.4.3:
  0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
  this slice_type." `slice_type % 5` recovers the underlying type
  in either encoding form.

In FRAME_BASED mode we only see surface->params.h264.slice from the
most-recent VASliceParameterBuffer — that's fine: a single coded
picture has a uniform slice_type for the purposes of the PFRAME /
BFRAME flag (multi-slice frames may mix slice types in some streams,
but the flag's semantic is "this is an inter-coded frame," which
holds if any slice is P or B; using the last-seen slice's type is
a reasonable approximation).

Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
Flags table.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r fdb0b728d7 h264: strip ffmpeg-vaapi POC sentinel before passing to V4L2
ROOT CAUSE for "kernel decodes successfully but produces zeroed
CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":

ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
0x10000 as a sentinel for "uninitialised":
  libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
  libavcodec/h264dec.c:444 — IDR reset in idr() helper
ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
pc->poc_msb = pc->prev_poc_msb whenever the slice header's
pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
is the typical case for any normal H.264 content with sane POC
ordering). The sentinel leaks into field_poc[] (line 305) and from
there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).

Empirical confirmation via meitner 2026-05-02 ground-truth test:
ran an LD_PRELOAD shim around vaCreateBuffer against an i965
VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
showed TopFieldOrderCnt = (POC | 0x10000):

  Frame 1 IDR:  raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
  Frame 2:      raw bytes "06 00 01 00"             → TopFOC=65542
  Frame 3:      "02 00 01 00"                       → TopFOC=65538

i965 successfully decodes regardless. V4L2 stateless drivers
(hantro_h264.c::prepare_table feeds the value direct to
tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
for cur_pic_order_count comparison) cannot tolerate the high word —
the kernel's resource sizing math sees POC=65536 for an IDR and
breaks.

This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
inline in src/h264.c. It detects bit 16 set rather than blindly
subtracting, so a future ffmpeg version that fixes the leak
degrades gracefully. The helper is applied at all four POC sites:

  1. h264_fill_dpb:           dpb->top_field_order_cnt
  2. h264_fill_dpb:           dpb->bottom_field_order_cnt
  3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
  4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt

VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
sets POC=0 there; the sentinel never applies. Zeroing them
explicitly removes a class of "stale POC value in invalidated
slot" foot-guns.

Non-trivial follow-ups identified during the meitner experiment
that are NOT addressed by this patch:
  - PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
    not yet derived from VASliceParameterBufferH264.slice_type. The
    bbb corpus is I-only at the start so this hasn't been a
    blocker, but a clip with B-frames will need the slice-type
    routing patch.
  - h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
    almost certainly wrong per the kernel doc — pic_num must equal
    the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
    id. Out of scope here; will surface as a defect on streams
    that have multi-frame DPB lookups.

Cross-references:
  audit_0008_decode_params_2026-05-01.md — kernel-side consumer
    audit confirming POC fields are userspace-required.
  api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
    semantics; H.264 spec section 8.2.1 is the binding contract.
  meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
    capture of the sentinel pattern across 60 frames.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r affb4bd12a DEBUG: dump VAPictureH264 raw bytes + decoded fields
Diagnostic-only. Investigating the observed anomaly:

  - V4L2 strace shows decode_params.top_field_order_cnt = 65536
    on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
  - GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
    for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
  - Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
      va_pic->TopFieldOrderCnt = 0;
      if (pic->field_poc[0] != INT_MAX)
          va_pic->TopFieldOrderCnt = pic->field_poc[0];
    For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
    = 0 + 0 = 0. So FFmpeg should write 0.

If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
libva ABI between ffmpeg's writer and our reader. Most likely
suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
libva headers ffmpeg+libva were built against and the headers
fourier was built against, shifting struct field offsets.

This patch dumps:
  1. sizeof(VAPictureH264) at our reader's view
  2. First 32 raw bytes of VAPicture->CurrPic
  3. Field-decoded values via the .picture_id, .frame_idx, .flags,
     .TopFieldOrderCnt, .BottomFieldOrderCnt accessors

If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
field offset is correct and FFmpeg actually wrote 65536 — meaning
either FFmpeg has a bug, or our test scenario triggers a non-spec
code path. If the raw bytes show 00 00 00 00 at offset 12 but
TopFieldOrderCnt accessor returns 65536, the struct ABI is
mismatched and we need to reconcile libva versions.

If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
+ 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
on this build differs from the documented libva-2.x layout.

Removed once the source of the 65536 is identified.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c672f19f44 h264: hardcode SPS level_idc = 51 (intentional over-allocation)
fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
field stays at zero-init. level_idc=0 is invalid per the H.264 spec
(lowest legal value is 10, Level 1.0). Hantro and other stateless
H.264 decoders use level_idc to pre-allocate decoder resources (DPB
size, motion-vector buffers); when fed an invalid level the hantro
kernel driver silently skips the decode-hardware dispatch — the V4L2
request completes with no error, DQBUF returns the CAPTURE buffer
reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
buffer is never written.

VAAPI's decode-side VAPictureParameterBufferH264 structurally does
NOT include level_idc — `grep level_idc va/va.h` returns only hits
inside VAEncSequenceParameterBufferH264 (the encode path). The
H.264 SPS NAL is also not included in VASliceDataBuffer because
ffmpeg-vaapi parses it client-side and forwards only slice data
(verified empirically via patch 0010's hex-dump of the OUTPUT
buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
therefore not viable from the bitstream libva-v4l2-request
receives.

Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
and 4K@30 mainstream consumer profiles). This INTENTIONALLY
OVER-ALLOCATES decoder resources but is sufficient for any stream
up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
stream (Level 6.x) would under-allocate.

This patch is a known-incomplete intermediate, not a final fix.
The proper upstreamable answer is a level-from-resolution
derivation per H.264 Annex A.3 (max MB rate / max frame size
thresholds). That requires mapping consumer-side framerate which
VAAPI does not expose, so the lookup table is non-trivial. The
TODO is captured inline.

This patch's goal is unblocking decode-hardware engagement on the
ohm_gl_fix corpus while the full level-derivation work proceeds.

Cross-reference: kernel doc
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
level_idc as a required field with no "kernel-derives" annotation —
i.e., userspace-required.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 841f616e74 h264: gate SCALING_MATRIX submission on VAIQMatrixBuffer presence
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.

fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.

Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).

Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.

Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.

This patch:
  - surface.h: adds bool matrix_set to params.h264, mirroring
    mpeg2.iqmatrix_set / h265.iqmatrix_set.
  - picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
    sets matrix_set = true when the buffer arrives.
  - picture.c RequestBeginPicture: resets matrix_set = false at the
    start of each Begin/Render/End cycle.
  - h264.c h264_set_controls: builds the controls[] array
    incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
    matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
    when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.

The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."

Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 1690dfaa79 DEBUG: sentinel-pattern test for CAPTURE buffer write
Diagnostic-only. Writes 0xab×32 into the CAPTURE buffer's first 32
bytes immediately before VIDIOC_QBUF. The 0010 hex-dump after
DQBUF reveals which case we're in:

  - All 0xab → kernel never wrote to this buffer (wrong buffer
    chosen, alias, or no decode actually happened despite
    bytesused=3655712 reported).
  - All zeros → kernel did write 0x00s (overwriting our
    sentinel), and the apparent "no picture" output is the
    kernel-side decode actually producing zeros (e.g. parser
    rejected the bitstream).
  - Mix of zeros and real luma values → kernel wrote real
    decoded pixels; CPU read sees stale-cached zeros somewhere
    OR the sentinel area was a header that decoder zeroed but
    rest is real. Need to check more bytes.
  - All 0xab still → kernel never touched this region but other
    parts of buffer may be filled (incomplete decode).

Removed once Step 1 decode is verified.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 3609fbb425 DEBUG: hex-dump OUTPUT and CAPTURE buffer contents per frame
Diagnostic-only patch (NOT for upstream). Hex-dumps:
  - First 32 bytes of OUTPUT buffer at QBUF time in
    picture.c::RequestEndPicture (i.e. what we feed the kernel)
  - First 32 bytes of CAPTURE Y-plane after DQBUF in
    surface.c::RequestSyncSurface (i.e. what kernel returned)

Lets us see whether:
  - OUTPUT bitstream begins with valid ANNEX_B start code + NAL
    header byte (e.g. `00 00 01 65` for IDR slice)
  - CAPTURE Y-plane after decode contains varied luma data
    (working) vs. all-zeros / repeating pattern (kernel didn't
    write anything).

Removed once Step 1 decode is verified working. Output goes via
existing request_log() to stderr.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 597e896594 surface: don't VIDIOC_S_FMT the CAPTURE queue
The hantro stateless decoder derives the CAPTURE format from the
SPS attached to the per-request OUTPUT controls. Calling
VIDIOC_S_FMT on the CAPTURE queue at vaCreateSurfaces2 time can
leave the driver's vb2 state in an inconsistent configuration
where the queue accepts buffers and DQBUF returns successfully but
the kernel never actually writes decoded pixels into them.

Cross-reference: GStreamer's gst-plugins-bad/sys/v4l2codecs/
gstv4l2decoder.c only calls VIDIOC_G_FMT on the CAPTURE side
(via gst_v4l2_decoder_negotiate_src_format and friends). The
same code path produces correctly-decoded NV12 frames on the
same RK3568 hantro-vpu where libva-v4l2-request-with-S_FMT
emits flat-green zeroed CAPTURE buffers.

The v4l2_get_format() call immediately after this block already
gives us the bytesperline / sizes the driver chose; nothing else
in this file consumed the explicit S_FMT side-effects.

Empirical hypothesis test for the lingering "kernel decodes
without errors but emits zeroed CAPTURE" bug. If post-patch
output shows actual picture content, this confirms the
diagnosis: explicit CAPTURE format mutation breaks hantro's
internal state. If output remains flat-green, the bug is
elsewhere and we resume hex-dump-grade instrumentation.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 86a8545146 h264: fill DECODE_PARAMS frame_num + field flags from VAAPI
Fourier's h264_va_picture_to_v4l2 only populated four fields of the
struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
IDR_PIC flag. Many other required-by-spec fields were left at zero-
init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).

For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
the bitstream from the OUTPUT buffer and uses these fields to drive
its bitstream-element offset tracking. Empirically the kernel
returned a successfully-decoded but ZEROED CAPTURE buffer — flat
dark-green frames in mpv output, no errors logged.

This patch fills every field VAAPI exposes:

  - frame_num: from VAPicture->frame_num.
  - FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
  - BOTTOM_FIELD flag: from
    VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.

Also corrects the IDR_PIC flag to use |= instead of = so the new
field flags don't clobber it.

Fields NOT derivable from VAAPI's pre-parsed structures —
idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle — require a slice_header() bit-level
parse. libva-v4l2-request does not currently do this. They remain
at zero-init.

Empirical question this patch answers: does hantro tolerate the
bit_size fields being zero for IDR frames, or does it strictly
require them? If post-patch CAPTURE is still zeroed, a slice-header
parser is required. If CAPTURE shows real picture data, hantro
fills in the bit-positions itself when no hint is supplied.

Cross-reference: gstv4l2codech264dec.c::
gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
lines 632-678).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 4078368104 context: enable ANNEX_B start-code emission to match device
Patch 0002 sets V4L2_CID_STATELESS_H264_START_CODE to ANNEX_B on the
device, telling the kernel that OUTPUT-buffer payloads will contain
0x00 0x00 0x01 NAL start codes. picture.c::codec_store_buffer has
the prepend logic guarded by `if (context->h264_start_code)`, but
that boolean is set ONLY inside h264_get_controls() — a function
that exists but is never called.

Result: device expects ANNEX_B, libva-v4l2-request feeds raw NAL
payloads with no start codes, kernel cannot find slice boundaries,
hantro emits a zeroed CAPTURE buffer. mpv reports successful decode
because the V4L2 round-trip succeeds (no EINVAL); the visual output
is a flat dark-green frame (NV12 zero through BT.709).

Identified via:
  - Patch 0006 cleared the EINVAL cluster-rejection (128 → 0 on
    bbb_1080p30) but visual output remained flat green.
  - GStreamer reference (gstv4l2codech264dec.c:1363-1377) confirms
    start codes are required when ANNEX_B is selected.
  - Source-archaeology of fourier's picture.c:67-74 showed the gate
    on context->h264_start_code.

Fix: in context.c::RequestCreateContext, immediately after patch
0002's device-control block, set context_object->h264_start_code =
true to match the ANNEX_B mode we just programmed. Hardcoded for
now (matches 0002's hardcoded set); replaced with a runtime probe
in the planned probe-then-set commit.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 4246d5d537 h264: omit per-slice controls in FRAME_BASED mode
Identified by cross-reference against GStreamer's
gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
PRED_WEIGHTS submission on is_slice_based(self):

    if (is_slice_based (self)) {
        control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
        ...
        control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
        ...
    }

In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
bitstream itself from the OUTPUT-queue payload; per-slice controls in
the request trigger cluster-validation EINVAL at error_idx=count
(observed on RK3568 hantro-vpu, kernel 6.19.10).

This patch:
  - Reorders controls[] so FRAME_BASED-required entries come first
    (SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
    SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
    indices 4..5).
  - Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
    SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
  - Hardcodes slice_based=false for now since patch 0002 sets the
    device to FRAME_BASED unconditionally. A TODO marks the spot for
    the planned probe-then-set commit, which will populate
    context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
    G_EXT_CTRLS and replace the hardcoded false with a runtime check.

Diagnosis chain:
  - patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
    submission, but cluster-level rejection persisted at error_idx=5
    (count) — meaning kernel walked all 5 controls cleanly but
    rejected the request as a whole.
  - dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
    v4l2-h264.c), not in hantro driver where it could log.
  - GStreamer reference confirmed FRAME_BASED contract: only 4
    sequence-and-frame-level controls go in the per-request batch.

After this patch the kernel should accept the per-request controls
and actually decode the bitstream into the CAPTURE buffer.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r e382c63e20 h264: submit PRED_WEIGHTS only when WEIGHTED_PRED applies
Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:

    V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
        ((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
         (slice_type == P || slice_type == SP)) ||
        (pps->weighted_bipred_idc == 1 && slice_type == B)

Submitting PRED_WEIGHTS on a frame where the macro evaluates false
triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
6th, last control in the per-request batch) on hantro-vpu and any
other driver that strictly enforces the spec.

Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
weighted prediction): every per-frame batch fails identically, 13
EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
falls back to software decode for every frame.

Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
when the macro returns false; keep at 6 when it returns true.

Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
part of Sonnet's six-commit upstreamable plan; slotted in as patch
0005 ahead of the planned probe-then-set / FRAME_BASED commits
because it unblocks per-frame submission on every backing driver,
not just hantro.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 565f5c0de4 context: introduce request_pool, decouple OUTPUT buffers from surfaces
Commit 3 of the upstreamable plan (upstreamable_design.md §1, §5).
Replaces the prior per-surface OUTPUT-buffer ownership model with a
small driver-wide pool sized by codec pipeline depth (4 H.264 frames
in flight), allocated unconditionally regardless of caller's
num_render_targets.

Prior art (kernel UAPI dev-stateless-decoder.rst, ffmpeg
v4l2_request.c, Chromium V4L2StatelessVideoDecoder, GStreamer
v4l2slh264dec) all decouple OUTPUT and CAPTURE pool sizing. fourier's
"output_count == surfaces_count" model was a category error: OUTPUT
buffers are request-time bitstream slots, CAPTURE buffers are
picture-time DPB slots; their lifecycles and sizing are independent.

Changes:
  * NEW src/request_pool.{c,h} (~200 LoC):
      - request_pool_init(): CREATE_BUFS + per-slot QUERYBUF + mmap.
      - request_pool_destroy(): munmap all, idempotent.
      - request_pool_acquire(): round-robin claim; returns V4L2 buffer
        index of an unused slot or -1.
      - request_pool_release(): mark slot free for reuse.
      - request_pool_slot(): accessor for ptr/size given a buffer index.

  * src/request.h: add struct request_pool output_pool to request_data.

  * src/context.c::RequestCreateContext: replace the per-surface
    OUTPUT loop with a single request_pool_init() call (count=4,
    independent of surfaces_count). Drop the now-unused locals
    (length, offset, source_data, output_buffers_count, index,
    index_base, i, surface_object). DELETES patch 0002's
    "output_buffers_count = ... ? ... : 4" hack inline — the pool's
    own count parameter supersedes it.

  * src/picture.c::RequestBeginPicture: borrow a pool slot at frame
    start, write its mmap pointer/size/index into the surface's
    transient source_* fields. The fields stay (still useful as
    a borrow handle that the existing codec_store_buffer memcpys
    target), but no longer represent surface-permanent ownership.
    Reset slices_size/slices_count here too (was implicit on first
    Render).

  * src/surface.c::RequestSyncSurface: after VIDIOC_DQBUF returns
    the OUTPUT buffer, release the pool slot and clear the surface's
    borrow handle. Fixes the segv on second-frame submission.

  * src/surface.c::RequestDestroySurfaces: remove the munmap of
    source_data — pool owns the mmap.

  * src/request.c::RequestTerminate: call request_pool_destroy()
    before close(video_fd) so munmaps still target a valid fd.

  * src/meson.build: add request_pool.c and request_pool.h to the
    sources/headers lists.

This commit removes 0002's OUTPUT-pool hack inline (the
"floor to 4" line is gone). The DECODE_MODE/START_CODE block in 0002
remains until commit 4 lands.

Build-verified clean on aarch64.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 58a0e8baf9 v4l2: add QUERYCTRL/QUERYMENU capability-probe helpers
Pure utility additions, no behaviour change. Three helpers in
src/v4l2.{c,h}:

  - v4l2_query_ext_ctrl(): wraps VIDIOC_QUERY_EXT_CTRL by CID.
    Returns 0 if the control exists, -1 if not. Caller passes NULL
    qec to test existence only.

  - v4l2_query_menu(): wraps VIDIOC_QUERYMENU at a given index.
    Returns 0 if a menu item exists at that index, -1 otherwise.

  - v4l2_ctrl_menu_has_value(): convenience layered on the above.
    For a menu/intmenu-type control, walks all menu items between
    minimum and maximum and returns true iff `value` is a valid
    entry. Used by callers that ask "does this driver accept menu
    value X for this CID?" without caring about iteration details.

These unblock commit 3 (request_pool — needs ext-ctrl probing for
codec-ops dispatch) and commit 4 (probe-then-set DECODE_MODE/
START_CODE — replaces 0002's unconditional set with a real probe)
of the upstreamable design's six-commit series.

Forward-declarations in v4l2.h keep the header lean: existing
prototypes already use opaque struct v4l2_ext_control * pointers
without including <linux/videodev2.h>; we follow the same
convention for struct v4l2_query_ext_ctrl and struct v4l2_querymenu.

No call sites added in this commit. Compile-only verification:
the .so links cleanly with three new exports.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 50e0c2b996 context: pre-STREAMON device controls and minimum OUTPUT pool
Two related fixes that surfaced during the first hantro-vpu (RK3568)
smoke test of the multiplanar build:

1. **OUTPUT queue must be non-empty at STREAMON.** Hantro's
   vb2_start_streaming rejects an empty queue with EINVAL. Some
   VA-API callers (notably ffmpeg's vaapi-copy path) call
   vaCreateContext with num_render_targets=0 and allocate render
   targets lazily. The OUTPUT (bitstream-input) pool must NOT be
   sized off surfaces_count alone — it is a request-time resource,
   not per-surface. Quick fix: floor the pool to 4 buffers when
   the caller passes 0. (A proper decoupling of OUTPUT pool from
   surface lifecycle is documented in upstreamable_design.md.)

2. **Device-wide stateless H.264 controls before STREAMON.** The
   V4L2 stateless framework requires V4L2_CID_STATELESS_H264_
   DECODE_MODE and START_CODE be set on the device fd
   (request_fd=-1) before stream start. Per-request controls
   (SPS/PPS/SLICE_PARAMS/etc.) attached to a request_fd come
   later via h264_set_controls(). hantro-vpu accepts only
   DECODE_MODE_FRAME_BASED; START_CODE_ANNEX_B matches what the
   existing slice-assembly path emits.

   This is set unconditionally for now (errors silently ignored)
   to keep cedrus and other backends compatible — they may
   default to SLICE_BASED and not expose DECODE_MODE at all.
   Probe-then-set via VIDIOC_QUERYCTRL is the upstream-correct
   approach (see upstreamable_design.md §3).

After this patch, vainfo still enumerates as before, but the first
mpv vaapi-copy attempt advances past STREAMON and into actual
decode submission.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r 10114f6781 mplane: enable V4L2 multiplanar capture for NV12 on hantro-vpu
Fourier's local patch already wired multiplanar plumbing through
src/v4l2.c (helpers v4l2_type_video_{output,capture}() at lines 59-69,
struct v4l2_plane planes[] threading in QUERYBUF/QBUF/DQBUF, per-plane
EXPBUF loop at line 411) and through src/context.c, src/buffer.c,
src/picture.c via the v4l2_type_video_{output,capture}(video_format
->v4l2_mplane) helper calls.

The remaining gap: the NV12 entry in src/video.c was hardcoded to
v4l2_mplane=false, and the bootstrap path in src/surface.c was
hardcoded to singleplanar literals before video_format is populated.

This patch flips the NV12 entry to v4l2_mplane=true and updates the
two singleplanar literals in src/surface.c to their MPLANE variants:

  - src/video.c:42  v4l2_mplane=false -> true (NV12 only;
    Sunxi-tiled NV12 left at false for cedrus compatibility)
  - src/surface.c:84  output_type = v4l2_type_video_output(true)
  - src/surface.c:109 v4l2_find_format(..., CAPTURE_MPLANE, NV12)

Empirically, hantro-vpu (RK3568 mainline) advertises NV12 only under
V4L2_BUF_TYPE_VIDEO_CAPTURE_MPLANE; querying the singleplanar type
returns no match (verified via VIDIOC_ENUM_FMT in Phase 3 GStreamer
strace baseline).

Trade-off accepted: legacy sunxi-cedrus singleplanar NV12 paths are
left unchanged via the SUNXI_TILED_NV12 entry (still mplane=false,
__arm__ only). Pure-NV12 cedrus on aarch64 would regress, but the
known userbase here is RK3566/RK3568 hantro.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-04 09:45:05 +00:00
test0r c45fea96e3 fourier-local: stateless control modernization + HEVC strip
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:

- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
  V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
  (V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
  passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
  (kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
  build path is excluded from this fork (RK3568 hantro G1/G2 has
  no HEVC, and the kernel-side HEVC controls have a separate
  rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
  source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
  the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
  includes it; the passthrough shim's CID aliases live in the
  kernel header now).

Functionally equivalent to the prior fork master commits:
  c1f5108 V4L2_PIX_FMT_H264_SLICE rename
  4ccbfe9 Strip HEVC build path
  da9f2a5 include/h264-ctrls.h passthrough + CID aliases
  fc4bb10 src/h264.c track upstream UAPI shape
  13e9b64 src/h264.c drop num_slices field
  4d14ffb src/tiled_yuv.S aarch64 stub
  1b02c9b src/h264.c include utils.h

Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 09:40:14 +00:00
Chen-Yu Tsai a3c2476de1 Respect libdir for install path
Distritubions may install libraries under architecture-specific
sub-directories, to support multiple architectures on the same
system. In addition, the user may not wish to install the library
with the default prefix.

Use the libdir variable when setting the install path. This
allows both specifying different sub-directories, and a different
prefix.
2019-05-17 13:59:26 +08:00
Chen-Yu Tsai 3264c0495c Add option to specify path to up-to-date kernel headers
The system normally has kernel headers shipped with the distribution.
These typically lag behind actual kernel releases. Thus they would not
have the latest API additions, such as the V4L2 request API this driver
uses.

However, it is also bad practice to just install new kernel headers into
the system wide default location, as there may be some differences
between it and what the C library was built against.

Add an option to specify a path to a set of up-to-date kernel headers.
This would allow the user to build this project in a safe but working
environment.

Signed-off-by: Chen-Yu Tsai <wens@csie.org>
2019-05-17 13:59:23 +08:00
Paul Kocialkowski 7f359be748 Include missing needed codec headers for build
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-05-16 16:32:03 +02:00
Paul Kocialkowski d48ace9757 Update H.264 V4L2 pixel format, which was renamed
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-05-16 16:24:23 +02:00
Nicolas Dufresne fc9252a4d0 image: Fix pitches and offsets in the save image
We where first copying the image structure and then setting the pitches
and offets, so this information was lost. This fixes vaDerivedImage and
vaGetImage implementation.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
2019-05-16 16:14:55 +02:00
Nicolas Dufresne 7233c5a2ae image: Partially implement vaGetImage
This enables raw playback within GStreamer. This is useful for testing
even if slower then DMABuf. This is a partial implementation since we
don't implement partial copy of the surface.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
2019-05-16 16:14:55 +02:00
Nicolas Dufresne b8ac9bb9ea surface: Only set format if unset
The vaCreateSurface2 may be called multiple times, setting the format
again would lead to EBUSY being returned as you cannot change the
format if you have buffers allocated.

Signed-off-by: Nicolas Dufresne <nicolas.dufresne@collabora.com>
2019-05-16 16:14:55 +02:00
Paul Kocialkowski b5cee9f480 include: Update headers to latest series
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-05-16 16:14:55 +02:00
Paul Kocialkowski 0f4a76e9a6 Lower libva requirement to API version 1.1.0 (lib version 2.1.0)
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 14:11:18 +01:00
Paul Kocialkowski 0c611c6b7a Implement proper timestamping for references
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:41:56 +01:00
Paul Kocialkowski 3176adf69c Include local copies of DRM and V4L2 codec definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski ca5198b429 Add support for the meson build system
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski e29b04ccc7 autotools: Rewrite configuration in a minimalistic fashion
Drop the per-codec options while at it, since we'll soon include a copy
of the associated headers.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Paul Kocialkowski 518d7a0c59 Update and harmonize heading author lists
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2019-03-07 11:37:12 +01:00
Ralf Zerres 4855281abe video.c: update struct video_format field
- upstream changed formats.drm_modifier from
  DRM_FORMAT_MOD_ALLWINNER_MB32_TILED -> DRM_FORMAT_MOD_ALLWINNER_TILED

Signed-off-by: Ralf Zerres <ralf.zerres@networkx.de>
2019-03-06 14:35:34 +01:00
Maxime Ripard 85a0f72f72 Merge pull request #9 from jernejsk/build_fix
Fix building with h264 enabled
2018-11-16 16:48:59 +00:00
Jernej Skrabec a816436baf config: fix building with h264 enabled
Signed-off-by: Jernej Skrabec <jernej.skrabec@siol.net>
2018-10-31 18:10:47 +01:00
Maxime Ripard e62c2d1c8e Merge pull request #8 from ezequielgarcia/fixes
Three different, completely unrelated fixes/improvements
2018-10-16 14:53:12 +00:00
Ezequiel Garcia 59cd32bc42 Fix single planar QBUF ioctl
Commit 7ff2543e64 ("Add support for the single-planar V4L2 API")
missed the VIDIOC_QBUF bytesused parameter. The kernel will
warn loudly if bytesused is not properly defined for an OUTPUT buffer.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
2018-10-12 16:42:02 -03:00
Ezequiel Garcia 2c27ec3794 Add settable attributes to pixelformats
Apparently, pixelformats are expected to be settable although
the reason is not exactly clear to me.

However, intel vaapi driver sets all its pixelformats as
settable, and gstreamer-vaapi expects that as well.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
2018-10-12 16:13:05 -03:00
Ezequiel Garcia b2944629fa Add support for dynamic detection of supported codecs
H.264 and H.265 support is still not supported upstream,
so it makes sense to autodetect each codec and only
enable those that are supported.

Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com>
2018-10-12 16:11:06 -03:00
Thomas Petazzoni 3e442a19b6 CREDITS: add Albin Söderqvist
Albin did not had an account on Kickstarter initially, so he was
registered as GuestXYZ. Upon his request, add his real name.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2018-09-08 08:51:51 +02:00
Paul Kocialkowski 7ff2543e64 Add support for the single-planar V4L2 API
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
2018-09-07 16:43:13 +02:00
Paul Kocialkowski 25a8ac4d7e Register video format directly instead of tiled indicator
Signed-off-by: Paul Kocialkowski <contact@paulk.fr>
2018-09-07 12:58:44 +02:00
Maxime Ripard 8857fc7019 Merge pull request #6 from tpetazzoni/credits
Add CREDITS file
2018-09-05 07:43:03 +00:00
Maxime Ripard 26454b70a6 Merge pull request #4 from tpetazzoni/minor-doc-updates
Minor doc updates
2018-09-05 07:42:35 +00:00
Thomas Petazzoni d53335bbc8 Add CREDITS file
As promised by Bootlin's Kickstarter campaign, all contributors above
16 EUR would get their name in the CREDITS file. This commit
implements the promised CREDITS file.

Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2018-09-04 12:05:10 +02:00
Thomas Petazzoni 128936588f Update AUTHORS file with Maxime and Paul
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2018-09-02 21:54:52 +02:00
Thomas Petazzoni c71e16a141 Update README.md to mention H265 support
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
2018-09-02 21:54:18 +02:00
Paul Kocialkowski 13eaae060e Add support for H265 decoding, including predictive frames
Some features are missing, such as scaling lists (quantization) and
10-bit output.

Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-31 10:13:52 +02:00
Paul Kocialkowski 1c009e64d5 media: Adapt for the latest Request API
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-28 10:39:41 +02:00
Paul Kocialkowski 5fd5c9823b mpeg2: Update to match latest definitions
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
2018-08-09 14:13:19 +02:00