Surfaced during iter7 Track F research: the campaign target hardware
is Rockchip RK3566 silicon (PineTab2). The hantro driver attaches
via the rockchip,rk3568-vpu DT compatible because the RK3566 silicon
is close enough to RK3568 to share that variant. The proper RK3566
mainline driver target (rkvdec2 / vdpu346) has no kernel support yet
— Christian Hewitt's patch series LKML 2025/12/26/206 is unmerged.
Updates the two src/ comments that called the hardware "RK3568":
- context.c: hantro-vpu device-init S_EXT_CTRLS comment now reads
"via rockchip,rk3568-vpu DT compatible (covers RK3568 and RK3566
— PineTab2 silicon — since they're close enough)"
- h264.c: DPB pic_num discussion ends "...never surfaced on PineTab2
(RK3566 via hantro/rk3568-vpu)"
Not a correctness change. Compiles + decodes identically. The
update matters for upstream submission accuracy (bootlin/Rockchip
maintainers will care which silicon the campaign tested on).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 5 sonnet review caught four DEBUG sites the first sweep pass
missed (the vaapi-copy + --vo=null stress test didn't exercise the
ExportSurfaceHandle path, so per-frame ExportSurfaceHandle dumps went
undetected).
Removed:
- surface.c::CreateSurfaces2 format-dump (per-CreateSurfaces2 noise,
labeled DEBUG INSTRUMENTATION (surface-export diagnosis 2026-05-04))
- surface.c::ExportSurfaceHandle full-descriptor dump (per-frame for
consumers using DMA-BUF, also labeled DEBUG)
- surface.c::QuerySurfaceStatus -> status= line (per-call noise)
- h264.c V4L2 readback block (~67 lines): static bool readback_warned
+ the per-frame VIDIOC_G_EXT_CTRLS attempt + the readback success
log + the "V4L2 readback unavailable" fallback announcement. With
the iter4 fixes landed, the readback EACCES is no longer load-bearing
to investigate — drop the block + the per-process global state.
Removing the readback block also resolves Phase 5 finding C2: the
static bool readback_warned was new mutable process-global state
introduced post-Track-E, inconsistent with that track's intent.
Net: -107 lines from src/{h264,surface}.c.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
h264.c:
- Remove the slice_header parse success log (the parse data is now
forwarded into decode_params directly without per-frame echo). Keep
the FAILED-rc log since it indicates a real decode-blocking error.
- Remove the iter1 patch-0014 VAPictureH264 byte-dump + field-read
log block. The TopFieldOrderCnt=65536 anomaly it diagnosed was
resolved by the POC sentinel strip (h264_strip_ffmpeg_poc_sentinel)
that stays in the codebase.
surface.c:
- Remove the per-call "RequestSyncSurface RETURN status=" trace.
- Remove the per-call "RequestSyncSurface early-exit" trace.
v4l2.c:
- Suppress the per-frame "Unable to get control(s): Permission denied"
log when errno == EACCES (the expected case on this hantro rig
per iter1 patch-0014's findings). The one-time announcement in
h264.c stays. Real EACCES-on-non-request-fd or other errno values
still log normally.
Per-frame v4l2-request log noise drops from ~30+ lines/frame to
init-time + once-per-resolution-change.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the pre-S_EXT_CTRLS DPB census + per-entry dump that helped
diagnose iter4's frame-11 EINVAL bug. With the fix landed (385dee1),
the diagnostic is no longer needed in the release driver.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
In h264_va_slice_to_v4l2, the B-slice L1 reflist loop wrote .fields
into ref_pic_list0[i] instead of ref_pic_list1[i]. This corrupted L0
reflist fields when L1 was being built and left ref_pic_list1[i].fields
zero (which the kernel may interpret as "no valid field reference").
Pre-existing pre-iter4 bug (caught by iter4 Phase 5 sonnet review,
finding C2). Latent on hantro bbb_1080p30 in FRAME_BASED mode because
hantro walks reference_ts directly and ignores SLICE_PARAMS.fields,
but the bug is wrong-by-construction and would surface on any driver
that reads SLICE_PARAMS reflist fields, on interlaced content, or in
SLICE_BASED decode mode.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Inline log of DECODE_PARAMS.flags, sps.max_num_ref_frames, dpb counts
(valid/active/long-term/internally-used), and per-entry frame_num /
pic_num / fields / reference_ts immediately before each S_EXT_CTRLS
submission.
Used in iter4 Phase 4 to identify (a) the dpb->fields=0 bug and
(b) the stale-entry growth bug. Stays in for iter4 Phase 4 continuation
(at least one more bug still produces EINVAL after frame ~20). Remove
at iter5 DEBUG sweep alongside iter1 ENTER/CAPTURE-dump and iter3 Y2.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two empirically-validated correctness fixes from comparing our h264.c
fill_dpb against FFmpeg's libavcodec/v4l2_request_h264.c::fill_dpb on
the iter3 test rig (Firefox-fourier on ohm RK3568, bbb_1080p30 H.264):
1. Set dpb[].fields = V4L2_H264_FRAME_REF for every valid entry. The
kernel's v4l2_h264_init_reflist_builder iterates dpb[] and skips
entries with fields == 0 — they count as "no field reference"
regardless of VALID/ACTIVE flags. Without this, P-slices that
need to walk the reference list (first one in BBB is at frame 11)
hit "no valid refs" and S_EXT_CTRLS rejects the request with
EINVAL (error_idx == count = kernel's "application bug" sentinel).
2. Skip entries with valid=true but used=false. dpb_update() clears
`used` for all entries then re-marks only those in the current
ReferenceFrames[] list. Stale entries (frames the consumer has
retired from its DPB) were being included, growing the V4L2 dpb[]
monotonically until H264_DPB_SIZE while SPS.max_num_ref_frames
may be 4. FFmpeg iterates h->short_ref[] / h->long_ref[] only —
the currently-referenced set.
Empirical: from "10 frames decode, frame-11 P-slice EINVAL" to
"~20 frames decode, then a different EINVAL on later frames."
Confirms both fixes are correctness improvements but Track A is not
yet fully resolved — at least one more bug remains. iter4 stays open.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Hantro+v4l2 on Linux 6.19.x returns EACCES from VIDIOC_G_EXT_CTRLS
on a request_fd in QUEUEING state for compound H.264 controls. Not
actionable from userspace — kernel-side permission check whose
semantics aren't yet investigated. Decode is unaffected (SET-side
write succeeds; we just can't verify via readback from this rig).
Logging the failure once per process instead of per-frame keeps
the diagnostic message visible without flooding stderr.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tier 3E + 3F observability hardening from the libva-multiplanar
campaign Phase 6 follow-up. Improves diagnostic reliability for
future probes; no functional decode path change.
Tier 3E (cache-fix): patch-0010's CAPTURE Y-plane dump now calls
msync(p, 32, MS_SYNC|MS_INVALIDATE) before the read so userspace
sees what the kernel actually DMA-wrote, not a stale CPU cache
line. Without this, the previous version of the dump consistently
showed the patch-0011 sentinel (0xab) even when the kernel had
overwritten it — caused half a day of mistaken "kernel never wrote
the buffer" diagnosis. Also computes a luma min/max/variance
signal so a uniform fill (variance=0) is visually obvious vs
real pixel data (variance > 0).
Tier 3F (VIDIOC_G_EXT_CTRLS readback): after v4l2_set_controls
in h264_set_controls, reads back DECODE_PARAMS + PPS via
v4l2_get_controls (added by patch 0003) on the request fd.
Logs key fields:
dec.idr_pic_id, poc_lsb, refmark_bits, poc_bits — confirms
slice-header parser outputs landed in the V4L2 control batch.
dec.top_foc / bot_foc — confirms
patch-0015 POC sentinel strip actually applied (should NOT
show 65536 unless the strip mis-fired).
dec.frame_num — cross-checks
against VAAPI's pre-parsed frame_num (also already logged by
patch 0014).
pps.flags + (SMP=...) — confirms
SCALING_MATRIX_PRESENT bit set this build.
pps.refidx_l0/l1 — confirms
Tier 1B num_ref_idx writes landed.
Discriminates "we wrote X but kernel saw Y" from "we wrote zero
all along" — the failure mode the original patch series didn't
catch when slice-header bit_size fields were left zero.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The load-bearing fix from diff_against_ffmpeg.md (campaign repo).
Adds src/h264_slice_header.{c,h} — a minimal H.264 slice_header()
bit-parser per ITU-T H.264 (08/2024) §7.3.3. Parses just enough of
the slice header to populate the V4L2 DECODE_PARAMS fields VAAPI
doesn't carry and that hantro G1 hardware reads directly out of
DECODE_PARAMS into MMIO registers:
dec_param->dec_ref_pic_marking_bit_size -> G1_REG_DEC_CTRL5_REFPIC_MK_LEN
dec_param->idr_pic_id -> G1_REG_DEC_CTRL5_IDR_PIC_ID
dec_param->pic_order_cnt_bit_size -> G1_REG_DEC_CTRL6_POC_LENGTH
dec_param->pic_order_cnt_lsb -> hantro reflist builder (poc_type=0)
dec_param->delta_pic_order_cnt_bottom -> same
dec_param->delta_pic_order_cnt0/1 -> hantro reflist builder (poc_type=1)
Without these set correctly, hantro's hardware bitstream parser
walks past zero bits in the slice header, lands on garbage, decodes
zero pixels — the all-zero CAPTURE output observed across both mpv
and Firefox during 2026-05-04 Phase 0 (see libva-multiplanar campaign
phase0_evidence/2026-05-04-kernel-trace/findings.md).
Implementation:
- Minimal RBSP bit reader (br_read_u/_ue/_se), MSB-first, fault-flag
on overrun.
- Emulation-prevention unescape (strips 0x03 after 0x00 0x00) on
the first 64 bytes of the slice — slice headers fit comfortably.
- Walks slice_header() up to and including dec_ref_pic_marking(),
measuring bit positions for the *_bit_size fields.
- Skips ref_pic_list_modification() and pred_weight_table() —
needed only to advance the bit position to dec_ref_pic_marking().
- Returns a struct with the V4L2 fields plus diagnostics
(first_mb_in_slice, slice_type, pps_id, frame_num).
Wired into h264_va_picture_to_v4l2 (src/h264.c) right after the
nal_ref_idc/nal_unit_type extraction. SPS/PPS context is built from
VAPicture's seq_fields and pic_fields; num_ref_idx_l0/l1_active
defaults come from VASlice (best available substitute for the
parsed PPS values). On parse success, populates decode_params with
the recovered values + emits a request_log with the decoded fields
for cross-validation against VAAPI's pre-parsed values.
src/meson.build: adds h264_slice_header.{c,h} to sources.
Cross-references:
- FFmpeg libavcodec/h264_slice.c (Kwiboo v4l2-request-n8.1) — populates
H264SliceContext::ref_pic_marking_bit_size / pic_order_cnt_bit_size
by the same bit-precise parse, then v4l2_request_h264.c forwards
to V4L2.
- Linux drivers/media/platform/verisilicon/hantro_g1_h264_dec.c
set_params() — the register-write code that reads these fields.
MVC nal_unit_type 20/21 unhandled (this fork strips MVC alongside
HEVC). Multi-slice non-IDR streams parse the first slice's header
only; for FRAME_BASED mode that's fine — kernel sees the whole
bitstream and parses subsequent slices itself.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three Tier-2C/1B fixes from diff_against_ffmpeg.md (campaign repo):
1. Submit V4L2_CID_STATELESS_H264_SCALING_MATRIX every frame, with
the H.264 spec flat default (every entry = 16) when the consumer
didn't send a VAIQMatrixBufferH264. New helper:
h264_default_flat_scaling_matrix(). Mirrors FFmpeg's
v4l2_request_h264.c which always provides a scaling matrix.
Replaces patch 0012's VAIQMatrixBuffer-conditional submission —
that was corpus-correct (bbb has no explicit scaling lists) but
inconsistent with what hantro G1 expects.
2. Set pps->flags |= V4L2_H264_PPS_FLAG_SCALING_MATRIX_PRESENT
unconditionally. Hantro G1's set_params reads this flag to gate
G1_REG_DEC_CTRL2_TYPE1_QUANT_E.
3. Populate pps->num_ref_idx_l0/l1_default_active_minus1 from
VASliceParameterBufferH264.num_ref_idx_l*_active_minus1. Hantro
G1 writes both into G1_REG_DEC_CTRL6_REFIDX0_ACTIVE / REFIDX1_ACTIVE.
VAAPI doesn't expose the parsed-PPS default fields; the per-slice
override is the closest available source (matches PPS default
except on streams with explicit per-slice override).
Why now: 2026-05-04 Phase 0 kernel-side audit (kernel source
drivers/media/platform/verisilicon/hantro_g1_h264_dec.c) showed
hantro G1 writes these fields directly into hardware MMIO
registers. Prior assumption that they're "informational" or
that "VAAPI handles defaults" was wrong — the hardware uses them
to bit-walk the slice header and to size reference lists. See
~/src/libva-multiplanar/diff_against_ffmpeg.md.
This is the easy half of the fix. The load-bearing half — adding
a slice-header bit-parser to populate dec_param->dec_ref_pic_
marking_bit_size, idr_pic_id, pic_order_cnt_bit_size — comes in
the next commit.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces patch 0013's hardcoded level_idc = 51 with a small lookup
that picks the smallest level whose MaxFS contains the encoded
frame size. Patch 0013's TODO is resolved by this change.
VAAPI does not expose level_idc on the decode side
(VAPictureParameterBufferH264 has no such field; only
VAEncSequenceParameterBufferH264 carries it). The H.264 SPS NAL is
parsed client-side by ffmpeg-vaapi and only slice data forwards in
VASliceDataBuffer, so a SPS-NAL byte parser is not viable from the
bitstream the libva-v4l2-request layer receives. We therefore
derive level_idc from picture dimensions, which VAAPI does provide
in VAPictureParameterBufferH264.picture_{width,height}_in_mbs_minus1.
Annex A.3 (Table A-1) MaxFS thresholds:
Level 1.0: 99 MBs ( 176×144 = 11×9 = 99 )
Level 1.1: 396 ( 352×288 = 22×18 = 396 )
Level 2.0: 396
Level 2.1: 792 ( 352×576 / 720×288 )
Level 2.2: 1620 ( 720×480 ≈ 1350; 720×576 = 1620 )
Level 3.0: 1620
Level 3.1: 3600 (1280×720 ≈ 3600 )
Level 3.2: 5120
Level 4.0: 8192 (1920×1088 = 8160 fits )
Level 4.1: 8192
Level 4.2: 8704
Level 5.0: 22080
Level 5.1: 36864 (3840×2176 = 32640 fits; 4K@8K-edge )
Level 5.2: 36864
Level 6.0: 139264 (8K )
V4L2 control encoding: level_idc = (level major × 10) + (level minor).
Level 4.1 → 41, Level 5.1 → 51, Level 6.0 → 60.
Picks for typical content:
1080p (1920×1088 = 8160 MBs) → Level 4.1 (level_idc = 41)
4K (3840×2176 = 32640 MBs) → Level 5.1 (level_idc = 51)
8K (7680×4352 = 130560 MBs) → Level 6.0 (level_idc = 60)
The previous hardcode of 51 was over-allocating for 1080p; with
this patch hantro can pre-allocate based on the actual frame size.
For our ohm corpus (1080p) this drops the requested DPB / MV
buffer sizing from level-5.1 generosity to level-4.1 right-sized.
Without VAAPI exposing framerate we cannot also check MaxMBPS /
MaxBR / MaxCPB. The frame-size-based pick is acceptable in
practice: temporally-dense streams almost always also push
spatially-large frames, so MaxFS captures the dominant
resource-sizing signal.
Cross-reference: H.264 spec Annex A, Table A-1 ("Level limits").
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
level_idc as required-userspace-input, no kernel-derives annotation.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
fourier's h264_fill_dpb assigned `dpb->pic_num = entry->pic.picture_id`
— the VAAPI surface id. Per ext-ctrls-codec-stateless.rst:651-655,
v4l2_h264_dpb_entry.pic_num must equal the H.264 spec PicNum
(equation 8-28) for short-term references or LongTermPicNum
(equation 8-29) for long-term references. The surface id has no
relationship to either.
Kernel-side consumers of pic_num:
- mediatek/decoder/vdec/vdec_h264_req_common.c (line 210):
dst_entry->pic_num = src_entry->pic_num. Used for
field-coded short-term reference disambiguation.
- hantro / rkvdec / cedrus / qcom-iris-stateless: do NOT read
pic_num. They resolve refs via reference_ts (timestamp)
and POC. This is why fourier's wrong value never surfaced
on RK3568 hantro.
This patch makes pic_num spec-correct so the libva-v4l2-request
fork is upstreamable across drivers without depending on each
target's tolerance for non-spec fills.
Computation, derived from H.264 spec section 8.2.4.1:
For frames (not field-coded), PicNum = FrameNumWrap.
FrameNumWrap = (frame_num > cur_frame_num)
? frame_num - max_frame_num
: frame_num
max_frame_num = 1 << (sps.log2_max_frame_num_minus4 + 4)
cur_frame_num = current picture's frame_num
For long-term references:
LongTermPicNum = long_term_frame_idx (when not field-coded).
VAAPI convention (libavcodec/vaapi_h264.c::fill_vaapi_pic line 64):
VAPictureH264.frame_idx = long_ref ? pic_id : frame_num
So long-term refs already carry long_term_frame_idx in frame_idx;
we copy it through.
Field-coded streams require an extra factor-of-2 plus a parity
adjustment per spec equations 8-28/8-29; this patch does not handle
field-coded content. ohm corpus is all frame-coded so this is a
follow-up for later.
Implementation: add VAPicture parameter to h264_fill_dpb so the
function has access to seq_fields.log2_max_frame_num_minus4 and
the current picture's frame_num. Update the single caller in
h264_va_picture_to_v4l2.
Cross-reference: kernel doc ext-ctrls-codec-stateless.rst dpb_entry
table (line 651-655) and mediatek/vdec/vdec_h264_req_common.c
line 210.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
v4l2_ctrl_h264_decode_params.flags has PFRAME and BFRAME bits per
ext-ctrls-codec-stateless.rst. fourier never set them; libva-v4l2-
request relied on each backing driver tolerating frame-class
ambiguity.
Kernel survey (linux 6.19.x):
- tegra-vde/h264.c (lines 783-799) consumes both flags to select
the inter-frame decode kernel. Without them the I-frame kernel
runs on P/B content.
- visl-trace-h264.h uses them for decode tracing.
- hantro / rkvdec / cedrus / mediatek / qcom-iris-stateless do
not consume the flags.
Hantro on ohm decoded bbb cleanly without these flags set (see
phase6/step1/ohm_smoke_2026-05-02T060255Z_post_0015/), so this is
an upstreamability fix for cross-driver portability rather than a
correctness fix for hantro.
VAAPI's VASliceParameterBufferH264.slice_type maps directly to the
H.264 slice_header() slice_type field. Per spec 7.4.3:
0=P 1=B 2=I 3=SP 4=SI; 5..9 = "all slices in the picture have
this slice_type." `slice_type % 5` recovers the underlying type
in either encoding form.
In FRAME_BASED mode we only see surface->params.h264.slice from the
most-recent VASliceParameterBuffer — that's fine: a single coded
picture has a uniform slice_type for the purposes of the PFRAME /
BFRAME flag (multi-slice frames may mix slice types in some streams,
but the flag's semantic is "this is an inter-coded frame," which
holds if any slice is P or B; using the last-seen slice's type is
a reasonable approximation).
Cross-reference: ext-ctrls-codec-stateless.rst Decode Parameters
Flags table.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
ROOT CAUSE for "kernel decodes successfully but produces zeroed
CAPTURE buffers despite no V4L2_BUF_FLAG_ERROR":
ffmpeg's H264POCContext initialises prev_poc_msb to (1 << 16) =
0x10000 as a sentinel for "uninitialised":
libavcodec/h264dec.c:301 — global init in ff_h264_decode_init
libavcodec/h264dec.c:444 — IDR reset in idr() helper
ff_h264_init_poc (libavcodec/h264_parse.c:296-305) then computes
pc->poc_msb = pc->prev_poc_msb whenever the slice header's
pic_order_cnt_lsb hasn't wrapped relative to prev_poc_lsb (which
is the typical case for any normal H.264 content with sane POC
ordering). The sentinel leaks into field_poc[] (line 305) and from
there into VAPictureH264.TopFieldOrderCnt / BottomFieldOrderCnt at
libavcodec/vaapi_h264.c::fill_vaapi_pic (lines 73-78).
Empirical confirmation via meitner 2026-05-02 ground-truth test:
ran an LD_PRELOAD shim around vaCreateBuffer against an i965
VAAPI backend decoding a 60-frame H.264 Main clip. Every frame
showed TopFieldOrderCnt = (POC | 0x10000):
Frame 1 IDR: raw bytes "00 00 01 00" at offset 12 → TopFOC=65536
Frame 2: raw bytes "06 00 01 00" → TopFOC=65542
Frame 3: "02 00 01 00" → TopFOC=65538
i965 successfully decodes regardless. V4L2 stateless drivers
(hantro_h264.c::prepare_table feeds the value direct to
tbl->poc[i*2]/[32], the kernel reflist builder uses it directly
for cur_pic_order_count comparison) cannot tolerate the high word —
the kernel's resource sizing math sees POC=65536 for an IDR and
breaks.
This patch adds h264_strip_ffmpeg_poc_sentinel() as a small static
inline in src/h264.c. It detects bit 16 set rather than blindly
subtracting, so a future ffmpeg version that fixes the leak
degrades gracefully. The helper is applied at all four POC sites:
1. h264_fill_dpb: dpb->top_field_order_cnt
2. h264_fill_dpb: dpb->bottom_field_order_cnt
3. h264_va_picture_to_v4l2: decode->top_field_order_cnt
4. h264_va_picture_to_v4l2: decode->bottom_field_order_cnt
VA_PICTURE_H264_INVALID DPB slots are short-circuited to POC=0
because libavcodec/vaapi_h264.c::init_vaapi_pic (line 43) already
sets POC=0 there; the sentinel never applies. Zeroing them
explicitly removes a class of "stale POC value in invalidated
slot" foot-guns.
Non-trivial follow-ups identified during the meitner experiment
that are NOT addressed by this patch:
- PFRAME / BFRAME flags in v4l2_ctrl_h264_decode_params.flags are
not yet derived from VASliceParameterBufferH264.slice_type. The
bbb corpus is I-only at the start so this hasn't been a
blocker, but a clip with B-frames will need the slice-type
routing patch.
- h264_fill_dpb's pic_num assignment (entry->pic.picture_id) is
almost certainly wrong per the kernel doc — pic_num must equal
the H.264 spec's PicNum / FrameNumWrap, not the VAAPI surface
id. Out of scope here; will surface as a defect on streams
that have multi-frame DPB lookups.
Cross-references:
audit_0008_decode_params_2026-05-01.md — kernel-side consumer
audit confirming POC fields are userspace-required.
api_contract_findings_2026-05-01.md — VAAPI doc gap on POC
semantics; H.264 spec section 8.2.1 is the binding contract.
meitner_2026-05-02_vaapi_idr_groundtruth/ — full empirical
capture of the sentinel pattern across 60 frames.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Diagnostic-only. Investigating the observed anomaly:
- V4L2 strace shows decode_params.top_field_order_cnt = 65536
on the first IDR frame submitted by mpv+ffmpeg+libva-v4l2-request
- GStreamer's reference path writes 0 (spec-correct: PicOrderCnt=0
for IDR with pic_order_cnt_type=0 / pic_order_cnt_lsb=0)
- Reading FFmpeg source (libavcodec/vaapi_h264.c::fill_vaapi_pic):
va_pic->TopFieldOrderCnt = 0;
if (pic->field_poc[0] != INT_MAX)
va_pic->TopFieldOrderCnt = pic->field_poc[0];
For IDR: ff_h264_init_poc sets field_poc[0] = poc_msb + poc_lsb
= 0 + 0 = 0. So FFmpeg should write 0.
If FFmpeg writes 0 but fourier reads 65536, the mismatch is in the
libva ABI between ffmpeg's writer and our reader. Most likely
suspect: VA_PADDING_LOW size in VAPictureH264 differs between the
libva headers ffmpeg+libva were built against and the headers
fourier was built against, shifting struct field offsets.
This patch dumps:
1. sizeof(VAPictureH264) at our reader's view
2. First 32 raw bytes of VAPicture->CurrPic
3. Field-decoded values via the .picture_id, .frame_idx, .flags,
.TopFieldOrderCnt, .BottomFieldOrderCnt accessors
If the raw bytes show 00 00 01 00 at offset 12 (= 65536 LE), the
field offset is correct and FFmpeg actually wrote 65536 — meaning
either FFmpeg has a bug, or our test scenario triggers a non-spec
code path. If the raw bytes show 00 00 00 00 at offset 12 but
TopFieldOrderCnt accessor returns 65536, the struct ABI is
mismatched and we need to reconcile libva versions.
If sizeof(VAPictureH264) prints as something other than 36 (= 4*5
+ 4*VA_PADDING_LOW assuming VA_PADDING_LOW=4), the struct layout
on this build differs from the documented libva-2.x layout.
Removed once the source of the 65536 is identified.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
fourier's h264_va_picture_to_v4l2 never assigns sps->level_idc; the
field stays at zero-init. level_idc=0 is invalid per the H.264 spec
(lowest legal value is 10, Level 1.0). Hantro and other stateless
H.264 decoders use level_idc to pre-allocate decoder resources (DPB
size, motion-vector buffers); when fed an invalid level the hantro
kernel driver silently skips the decode-hardware dispatch — the V4L2
request completes with no error, DQBUF returns the CAPTURE buffer
reporting bytesused=3655712 and no V4L2_BUF_FLAG_ERROR, but the
buffer is never written.
VAAPI's decode-side VAPictureParameterBufferH264 structurally does
NOT include level_idc — `grep level_idc va/va.h` returns only hits
inside VAEncSequenceParameterBufferH264 (the encode path). The
H.264 SPS NAL is also not included in VASliceDataBuffer because
ffmpeg-vaapi parses it client-side and forwards only slice data
(verified empirically via patch 0010's hex-dump of the OUTPUT
buffer: it contains "00 00 01 65 ..." — i.e. ANNEX_B start code +
IDR slice NAL byte, no SPS NAL). A SPS-NAL byte extractor is
therefore not viable from the bitstream libva-v4l2-request
receives.
Workaround: hardcode level_idc = 51 (= Level 5.1, max for 1080p
and 4K@30 mainstream consumer profiles). This INTENTIONALLY
OVER-ALLOCATES decoder resources but is sufficient for any stream
up to 4K@30. It is corpus-correct, not contract-correct: a 4K@60
stream (Level 6.x) would under-allocate.
This patch is a known-incomplete intermediate, not a final fix.
The proper upstreamable answer is a level-from-resolution
derivation per H.264 Annex A.3 (max MB rate / max frame size
thresholds). That requires mapping consumer-side framerate which
VAAPI does not expose, so the lookup table is non-trivial. The
TODO is captured inline.
This patch's goal is unblocking decode-hardware engagement on the
ohm_gl_fix corpus while the full level-derivation work proceeds.
Cross-reference: kernel doc
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SPS lists
level_idc as a required field with no "kernel-derives" annotation —
i.e., userspace-required.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
VAAPI signals "explicit scaling lists are present in the bitstream"
implicitly: the consumer (ffmpeg-vaapi, mpv, etc.) sends a
VAIQMatrixBufferH264 alongside RenderPicture iff
sps_scaling_matrix_present_flag || pps_scaling_matrix_present_flag.
When the bitstream uses default (flat) scaling, no IQMatrixBuffer
arrives and the in-tree h264.matrix struct stays zero-initialised.
fourier's existing codec_store_buffer for MPEG2 and HEVC tracks this
via a per-surface iqmatrix_set boolean (surface.h::mpeg2.iqmatrix_set,
h265.iqmatrix_set) — the H.264 path was missing the equivalent flag,
so set_controls always submitted the scaling matrix, including the
zero-initialised case.
Symptom on hantro-vpu RK3568: when TRANSFORM_8X8_MODE is enabled in
PPS, the kernel multiplies all 8x8 DCT coefficients by the zeroed
scaling_list_8x8, producing a zeroed CAPTURE buffer despite a
successful decode round-trip (no V4L2_BUF_FLAG_ERROR,
bytesused=3655712 reported).
Earlier draft of this patch unconditionally omitted SCALING_MATRIX in
FRAME_BASED. That's corpus-correct (bbb has no explicit scaling
lists) but the wrong predicate: the kernel-side gating is by
"matrix-supplied vs. not," not by decode mode. Streams that signal
explicit scaling lists must submit SCALING_MATRIX in either mode.
Contract verification (audit_0008_decode_params_2026-05-01.md +
hantro_h264.c::assemble_scaling_list): the kernel uses the supplied
matrix when SCALING_MATRIX is in the control batch and falls back
to spec-defined defaults when absent. Mode-independent.
This patch:
- surface.h: adds bool matrix_set to params.h264, mirroring
mpeg2.iqmatrix_set / h265.iqmatrix_set.
- picture.c codec_store_buffer (H.264 VAIQMatrixBufferType case):
sets matrix_set = true when the buffer arrives.
- picture.c RequestBeginPicture: resets matrix_set = false at the
start of each Begin/Render/End cycle.
- h264.c h264_set_controls: builds the controls[] array
incrementally; SPS/PPS/DECODE_PARAMS always; SCALING_MATRIX iff
matrix_set; SLICE_PARAMS only in SLICE_BASED; PRED_WEIGHTS only
when both SLICE_BASED and V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
The pre-existing FRAME_BASED-omits-SLICE_PARAMS rule is preserved —
kernel doc ext-ctrls-codec-stateless.rst:752: "When this mode is
selected, the V4L2_CID_STATELESS_H264_SLICE_PARAMS control shall
not be set."
Cross-reference: kernel UAPI section
ext-ctrls-codec-stateless.rst V4L2_CID_STATELESS_H264_SCALING_MATRIX
(matrix supplied iff explicit scaling lists in bitstream) and
hantro_h264.c::assemble_scaling_list (consumes supplied matrix or
falls back to defaults).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Fourier's h264_va_picture_to_v4l2 only populated four fields of the
struct v4l2_ctrl_h264_decode_params: dpb (via h264_fill_dpb),
nal_ref_idc, top_field_order_cnt, bottom_field_order_cnt, and the
IDR_PIC flag. Many other required-by-spec fields were left at zero-
init (frame_num, idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle, FIELD_PIC and BOTTOM_FIELD flags).
For an IDR (first frame) on hantro-vpu RK3568, the kernel parses
the bitstream from the OUTPUT buffer and uses these fields to drive
its bitstream-element offset tracking. Empirically the kernel
returned a successfully-decoded but ZEROED CAPTURE buffer — flat
dark-green frames in mpv output, no errors logged.
This patch fills every field VAAPI exposes:
- frame_num: from VAPicture->frame_num.
- FIELD_PIC flag: from VAPicture->pic_fields.bits.field_pic_flag.
- BOTTOM_FIELD flag: from
VAPicture->CurrPic.flags & VA_PICTURE_H264_BOTTOM_FIELD.
Also corrects the IDR_PIC flag to use |= instead of = so the new
field flags don't clobber it.
Fields NOT derivable from VAAPI's pre-parsed structures —
idr_pic_id, pic_order_cnt_lsb, delta_pic_order_cnt_*,
dec_ref_pic_marking_bit_size, pic_order_cnt_bit_size,
slice_group_change_cycle — require a slice_header() bit-level
parse. libva-v4l2-request does not currently do this. They remain
at zero-init.
Empirical question this patch answers: does hantro tolerate the
bit_size fields being zero for IDR frames, or does it strictly
require them? If post-patch CAPTURE is still zeroed, a slice-header
parser is required. If CAPTURE shows real picture data, hantro
fills in the bit-positions itself when no hint is supplied.
Cross-reference: gstv4l2codech264dec.c::
gst_v4l2_codec_h264_dec_fill_decoder_params (commit 9e3e775,
lines 632-678).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Identified by cross-reference against GStreamer's
gst-plugins-bad/sys/v4l2codecs/gstv4l2codech264dec.c (upstream commit
9e3e775). At lines 1263-1304, GStreamer gates SLICE_PARAMS and
PRED_WEIGHTS submission on is_slice_based(self):
if (is_slice_based (self)) {
control[num_controls].id = V4L2_CID_STATELESS_H264_SLICE_PARAMS;
...
control[num_controls].id = V4L2_CID_STATELESS_H264_PRED_WEIGHTS;
...
}
In V4L2_STATELESS_H264_DECODE_MODE_FRAME_BASED, the kernel parses the
bitstream itself from the OUTPUT-queue payload; per-slice controls in
the request trigger cluster-validation EINVAL at error_idx=count
(observed on RK3568 hantro-vpu, kernel 6.19.10).
This patch:
- Reorders controls[] so FRAME_BASED-required entries come first
(SPS, PPS, SCALING_MATRIX, DECODE_PARAMS at indices 0..3) and the
SLICE_BASED-only entries come last (SLICE_PARAMS, PRED_WEIGHTS at
indices 4..5).
- Defaults num_controls=4 (FRAME_BASED), expanding to 5 for
SLICE_BASED and 6 when V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED.
- Hardcodes slice_based=false for now since patch 0002 sets the
device to FRAME_BASED unconditionally. A TODO marks the spot for
the planned probe-then-set commit, which will populate
context->decode_mode at CreateContext via VIDIOC_QUERYCTRL/
G_EXT_CTRLS and replace the hardcoded false with a runtime check.
Diagnosis chain:
- patch 0005 reduced one EINVAL per frame on PRED_WEIGHTS
submission, but cluster-level rejection persisted at error_idx=5
(count) — meaning kernel walked all 5 controls cleanly but
rejected the request as a whole.
- dmesg silent → rejection in V4L2 core (v4l2-ctrls-request.c /
v4l2-h264.c), not in hantro driver where it could log.
- GStreamer reference confirmed FRAME_BASED contract: only 4
sequence-and-frame-level controls go in the per-request batch.
After this patch the kernel should accept the per-request controls
and actually decode the bitstream into the CAPTURE buffer.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Per kernel UAPI (include/uapi/linux/v4l2-controls.h),
V4L2_CID_STATELESS_H264_PRED_WEIGHTS is a conditional control:
V4L2_H264_CTRL_PRED_WEIGHTS_REQUIRED(pps, slice) :=
((pps->flags & V4L2_H264_PPS_FLAG_WEIGHTED_PRED) &&
(slice_type == P || slice_type == SP)) ||
(pps->weighted_bipred_idc == 1 && slice_type == B)
Submitting PRED_WEIGHTS on a frame where the macro evaluates false
triggers VIDIOC_S_EXT_CTRLS to return EINVAL at error_idx=5 (the
6th, last control in the per-request batch) on hantro-vpu and any
other driver that strictly enforces the spec.
Smoke trace from RK3568 hantro on bbb_1080p30 (Main profile, no
weighted prediction): every per-frame batch fails identically, 13
EINVALs over a 10-frame run. Without this fix, ffmpeg's vaapi-copy
falls back to software decode for every frame.
Fix: narrow num_controls to 5 (excluding PRED_WEIGHTS at index 5)
when the macro returns false; keep at 6 when it returns true.
Defect found and fixed via Phase 6 Step 1 ohm smoke testing. Not
part of Sonnet's six-commit upstreamable plan; slotted in as patch
0005 ahead of the planned probe-then-set / FRAME_BASED commits
because it unblocks per-frame submission on every backing driver,
not just hantro.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Compound patch carrying the fork's pre-Step-1 substrate, originally
authored by Jernej Škrabec / fourier on top of bootlin's a3c2476:
- src/h264.c + src/picture.c: V4L2_CID_MPEG_VIDEO_H264_* renamed to
V4L2_CID_STATELESS_H264_*, struct shapes tracked to mainline
(V4L2_CID_STATELESS_H264_DECODE_MODE/_START_CODE added to the
passthrough shim).
- include/hevc-ctrls.h: redirect shim to <linux/v4l2-controls.h>
(kernel-side HEVC controls now live in the canonical UAPI header).
- src/meson.build: src/h265.c / src/h265.h commented out — HEVC
build path is excluded from this fork (RK3568 hantro G1/G2 has
no HEVC, and the kernel-side HEVC controls have a separate
rework in flight upstream).
- src/tiled_yuv.S: aarch64 stub for tiled_to_planar (assembly
source was sunxi-cedrus armv7-only; aarch64 needs a stub to keep
the build linking).
- include/h264-ctrls.h: removed (dead post-fourier — no source
includes it; the passthrough shim's CID aliases live in the
kernel header now).
Functionally equivalent to the prior fork master commits:
c1f5108 V4L2_PIX_FMT_H264_SLICE rename
4ccbfe9 Strip HEVC build path
da9f2a5 include/h264-ctrls.h passthrough + CID aliases
fc4bb10 src/h264.c track upstream UAPI shape
13e9b64 src/h264.c drop num_slices field
4d14ffb src/tiled_yuv.S aarch64 stub
1b02c9b src/h264.c include utils.h
Folded into one commit during 2026-05-04 Step 1 reconciliation
(see ../phase0_evidence/2026-05-04/findings.md). Per-patch history
of the early fork commits preserved on the pre-step1 branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reference frames are now identified using their timestamp:
set the timestamp when queuing the output buffer and use it to identify
the frame later on.
Signed-off-by: Paul Kocialkowski <paul.kocialkowski@bootlin.com>
The cedrus_data structure carries the old name. In order to migrate to the
new name, let's rename it to request_data.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The sunxi_cedrus.h header contains a bunch of defines prefixed with
SUNXI_CEDRUS.
As part as the ongoing migration to a more generic name, change that prefix
for V4L2_REQUEST, and the header file to request.h
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The num_slices parameter was improperly set to the number of reference
frames, which is incorrect.
Add a counter for the number of slices per surface, and set num_slices to
that value.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The current code sets the prediction weight table by doing a memcpy of the
libva structure to the v4l2's structure.
However, for the offset and weight parameters, libva's structure uses
16-bits integer, while v4l2 uses 8-bits, which obviously doesn't work well
with memcpy.
Create a function to copy those arrays and matrices instead that follows
the algorithm defined in the H264 spec, and use it so that it works
properly.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The libva only provides the reference images needed to decode the current
picture, but not the full DPB. However, some codecs need that whole DPB in
order to decode a picture.
For example, the Allwinner hardware codec has an internal SRAM, with each
picture getting a slot in that SRAM, and during each decoding process, some
metadata will then be generated from that SRAM content to a separate
buffer. Therefore, each frames must be located at the same SRAM position
each time so that the metadata are then re-used properly.
However, since libva will only pass a few reference images, we can end up
in a situation where multiple, subsequent, frames will have the same
reference images set, but might all be used as reference later on and
cannot therefore be located at the same position.
And from a more theorical point of view, Linux expects a full blown DPB in
its H264 control.
In order to work around this, we can create a shadow of the DPB by simply
maintaining a list of 16 decoded images, each associated with their
VAPictureH264 and an age. This age is the last time we used that frame as
reference. When a new picture is decoded, either we assign it to a free
slot, or we reuse the slot from the frame that hasn't been used as a
reference for the longest time.
This is a much simpler approach than the one documented in the H264 spec,
but this shouldn't really be a problem since we don't handle the reference
frames ourselves, but just re-use the one from the libva, and taken from
the bitstream before. As such, frames that are not supposed to be used for
reference will not be anymore, their age will not increase, and therefore
after a while we will garbage-collect their slot to store a much newer
frame.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Some functions setting the controls in the H264 code will need the context
in order to access the DPB. Make sure that we pass it as an argument.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Some functions setting the controls will need the context in the future.
Make sure that we provide it as an argument.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
The coding style has been a bit erratic. Enforce the linux kernel coding
style by reusing their .clang-format file, running clang-format on the
source, and ignoring the few shortcomings that clang-format has at the
moment (especially on aligning the define values).
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
This long structure name makes it quite difficult to fit within the 80
characters limit. Shorten it.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Using the same words but not in the same order for both the type and the
variable name isn't particularly helpful, and prevents to stay within 80
characters. Shorten the name a bit.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>