Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that
blocked HEVC decode on ampere (RK3588) per
marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close.
Mechanism (Phase 5 amendment to issue body):
The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY
in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }.
The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116).
When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps
(rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the
1-element kernel allocation. Submitting an N-element dynamic-array
control via S_EXT_CTRLS extends the framework allocation.
Userspace fix:
- VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets
driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy
RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39
sub-profile paths are byte-identical to pre-iter2).
- When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as
calloc'd zero arrays, sized by VAAPI's count fields and capped at
H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0).
- Free post-S_EXT_CTRLS.
Decode correctness scope:
VAAPI does NOT expose per-set st_ref_pic_set syntax elements
(delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h.
All-zero entries give empty inter-pred RPS per set, which is correct
for IDR-only streams and incorrect for streams with inter-pred RPS
dependence. iter2 acceptance: stop the OOPS. Decode-correctness for
inter-RPS content is a known follow-up requiring either bitstream-snoop
or SPS-passthrough via a new VAAPI extension.
Files:
- include/hevc-ctrls.h: #ifndef-guarded fallback definitions for
V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host
is on linux-api-headers 6.19-1; the new CIDs land in 7.0).
- src/request.h: driver_data->has_ext_sps_rps (persists for driver
lifetime; gated solely by HEVC code path so cross-codec leakage
impossible).
- src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl.
- src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h>
(replaces <linux/v4l2-controls.h>) for forward UAPI compatibility.
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so,
0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl
table omits the CIDs; probe returns false; new code path never executes.
iter39 sub-profile work (commits 662f887 + 8746690) is preserved
in-tree; iter2 is a forward-compatible additive change.
Refs:
marfrit/libva-v4l2-request-fourier#3
ampere-fourier/iter1_close.md HEVC blocker
ampere-fourier/iter2_phase0_findings.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Pure-C unit test for nv15_unpack_plane_to_p010, independent of any V4L2
hardware. Verifies bit layout against the spec at
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst by packing known
10-bit pixel values, running the unpack, and asserting P010 output
matches pixel<<6.
Coverage:
- zero, all-max
- 8 known position/spread vectors
- widths {1, 2, 3, 7, 8} including remainder paths
- multi-row with stride padding
- chroma-shape (half-height)
Build + run:
cc -Wall -Werror -O2 -o test_nv15_unpack \
tests/test_nv15_unpack.c src/nv15.c
./test_nv15_unpack
Confirmed PASS on noether (x86_64 native). Catches the highest-risk
class of regression in iter39 — silent bit-shift errors in the unpack —
without requiring fresnel hardware.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.
VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).
Touchpoints (per Phase 5 sonnet-architect review amendments):
- include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
- src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
- src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
- src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
- src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
advertisement for 10-bit profiles
- src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
transition (sibling to iter38 device-switch invalidation), is_10bit flag
- src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
- src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
P010-aware sizing in CreateImage, NV15 → P010 unpack call in
copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
- src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
per-codec paths
- src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag
Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.
Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Latent bug surfaced by iter38 multi-device probe. profiles[] array
in RequestQueryConfigProfiles is sized by V4L2_REQUEST_MAX_PROFILES
(set as context->max_profiles=11 in VA_DRIVER_INIT), but the bounds
checks used V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES (10). Pre-iter38 only
a single device's profiles were enumerated, total ≤9, so the off-by-
one never bit. With iter38's rkvdec+hantro union (10 profiles total
across MPEG2/H264/HEVC/VP8/VP9), the last enumerator (VP9) hit
index=9 with the check 'index < 10-1 = 9' → skipped.
Probe BOTH rkvdec and hantro-vpu at VA_DRIVER_INIT and keep their
{video,media}_fd pairs in driver_data. RequestQueryConfigProfiles
enumerates the union of supported profiles from all open fds.
RequestCreateConfig retargets driver_data->{video,media}_fd to the
device that serves the requested profile; if a switch is needed
(active fd is wrong), tears down output_pool, capture_pool, video_format
cache, and fmt_valid so the next RequestCreateContext rebuilds them
on the new device.
Profile→device map (RK3399-shaped):
H264 / HEVC / VP9 → rkvdec
MPEG-2 / VP8 → hantro-vpu
Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit overrides
(skips alt-probe when those are set).
Closes the 'libva multi-device probe' open item from iter36/iter37
campaign-close.
α-26 (iter26) wrote VAAPI's picture->st_rps_bits to the V4L2 decode_params
field of the same name based on field-name match. Per V4L2 spec, this field
is the bit-count of st_ref_pic_set() *in the SPS* — VAAPI doesn't expose
that. The slice-header bit-count (which IS what VAAPI's st_rps_bits provides)
belongs in slice_params->short_term_ref_pic_set_size (handled correctly in
α-29).
rkvdec doesn't read decode_params->short_term_ref_pic_set_size, so the
misroute was harmless but stale. This revert restores spec-correct semantics
(0 when SPS bit-count is unknown).
Cosmetic cleanup; no functional change.
ROOT CAUSE FIX for VP8 libva decode garbage output.
ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed
header (3 bytes for interframe, 10 bytes for keyframe) before
submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect)
KEEPS the header in its OUTPUT buffer.
Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349)
hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3'
as the byte offset into OUTPUT where the first compressed partition
starts. It uses this offset for:
- mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8
- dct_part_offset = first_part_offset + first_part_size
Without the header, every offset is wrong, the entropy decoder
spins on the wrong bytes, and every frame decodes to garbage.
Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend
header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT
before the slice data memcpy. Hantro skips these bytes for actual
parsing (uses ctrl-struct values instead), so zero-fill is fine.
Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the
v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed
byte-identical control fields. Only the OUTPUT buffer bytes
differed, traced to ffmpeg-vaapi's header stripping.
LIBVA_VP8_DUMP_FRAME=1 prints the v4l2_ctrl_vp8_frame struct fields
to stderr before VIDIOC_S_EXT_CTRLS. Goal: diff libva-side struct
against expected kdirect-side values for VP8 frame-2+ divergence
(libva produces non-trivial but wrong output; kdirect VP8 byte-equal
to SW). Env-gated, no behavior change otherwise.
ROOT CAUSE FIX for HEVC frame 2+ divergence (Bug 5 remainder).
rkvdec's assemble_sw_rps (rkvdec-hevc.c:386-389) uses
sl_params->short_term_ref_pic_set_size to compute the bit offset where
long-term RPS data starts in the slice header. When zero, it falls back
to fls(num_short_term_ref_pic_sets - 1) — wrong when num=1 (BBB's case).
α-26 misdirected: set decode_params->short_term_ref_pic_set_size = st_rps_bits
but rkvdec doesn't use that field. The correct consumer is slice_params per
V4L2 spec and rkvdec source.
VAAPI's picture->st_rps_bits is documented as: 'number of bits that structure
short_term_ref_pic_set(num_short_term_ref_pic_sets) takes in slice segment
header when short_term_ref_pic_set_sps_flag equals 0' — exactly what
sl_params->short_term_ref_pic_set_size means.
Frames 1 (IDR) unaffected (V4L2 rkvdec gates on !IDR_PIC flag).
Frames 2+: bit offset for long-term RPS now correct, slice header parsing
no longer falls off the edge of the entropy bitstream.
Default behavior unchanged: counter*1000ns same as before.
With LIBVA_TS_SCALE=N, multiplies the ns timestamp by N. Lets us
sweep timestamp magnitude to test whether small-ts collides with
stale CAPTURE entries in vb2_find_buffer for HEVC frame 2+ bug.
Also keeps iter29 slice-tail probe from previous commit.
Env-gated via LIBVA_HEVC_DUMP_SLICE_TAIL=1. Goal: characterise the
40-byte inflation in libva's slice_data buffer vs ffmpeg-v4l2request
(see iter27/28 close — HEVC frame 2+ divergence at byte 1382401).
Dumps per slice: nal_unit_type, slice_data_size, slice_data_byte_offset,
and the last 80 bytes of source_data for that slice. Lets us see if the
trailing 40 bytes are (a) real entropy, (b) trailing zeros, (c) a
next-NAL start code prefix, or (d) random memory.
iter28b tested LIBVA_HEVC_TRIM_TRAILING=40 on HEVC. Result: hash
differed at byte 899745 (inside frame 1, NOT just frame 2 boundary at
byte 1382401). Trimming 40 bytes off the IDR slice (96890→96850)
corrupted frame 1. The 40-byte inflation is not uniform per slice;
requires dynamic detection (e.g., scan for rbsp_stop_one_bit) or
per-slice-type logic.
VAAPI's slice_data_size includes NAL+slice header bytes that precede the
slice payload. rkvdec_hevc expects bit_size to cover the slice payload
(starting at data_byte_offset). Setting bit_size = slice_data_size * 8
made rkvdec read past slice payload → wrong entropy state → frame 2+
garbage despite correct ctx->image_fmt (iter25) and decode_params
(iter26).
Empirical match: with formula (slice_data_size - slice_data_byte_offset)
* 8, libva produces bit_size=44096 for BBB frame 2 matching kdirect's
44096 exactly per iter27 dmesg printk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VAPictureParameterBufferHEVC exposes st_rps_bits — the number of bits
the inline short_term_ref_pic_set syntax element takes in the slice
header. rkvdec's DPB resolution for P/B frames uses this to skip the
RPS data correctly; with size=0 it skips wrong bytes and reads wrong
references → frame 2+ visual divergence.
iter25 evidence: libva HEVC frame 1 byte-identical to kdirect, but
frame 2 diverges at the decode_params bytes 4-5 (libva 0x00 0x00,
kdirect 0x0a 0x00 = 10).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rkvdec_h264_validate_sps doubles height when FRAME_MBS_ONLY is unset
(field-to-frame). Dummy with 1080-height was failing validation as
2176 > 1080, returning -EINVAL silently (void-cast). Even though libva
ignores the result of v4l2_set_controls, the side effect was leaving
ctx->image_fmt at ANY → first per-frame H264_SPS still hit -EBUSY in
try_or_set_cluster → setup loop broke (Bug 4 unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause for Bug 5 (HEVC libva = all-zero CAPTURE) and Bug 4 (H.264
libva = keyframe partial), localized via iter17→iter24 kernel-printk
chain:
rkvdec_s_ctrl() for HEVC_SPS / H264_SPS calls get_image_fmt() and,
if the resolved image_fmt differs from cached ctx->image_fmt (default
RKVDEC_IMG_FMT_ANY at open), tries to reset the CAPTURE format.
Format reset returns -EBUSY when vb2_is_busy(CAPTURE_queue) — any
CAPTURE buffer allocated blocks the change.
libva (iter5b-β) pre-allocates 24 CAPTURE buffers at CreateContext
via cap_pool_init, BEFORE any per-frame S_EXT_CTRLS. First per-frame
HEVC_SPS therefore fails with -EBUSY in try_or_set_cluster, breaks
v4l2_ctrl_request_setup's outer loop, leaves all 5 staged HEVC
compound controls at zero in ctx->ctrl_hdl. rkvdec_hevc_run reads
zero (iter20 dmesg: sps[0..16]=00..00), hardware sees w=0 h=0,
CAPTURE comes out all-zero (Bug 5).
Fix: BEFORE cap_pool_init, inject one S_EXT_CTRLS (no request, no
which) with a synthetic SPS containing the profile's known chroma +
bit_depth. CAPTURE queue is still empty at this point → vb2_is_busy
returns false → rkvdec_s_ctrl succeeds, ctx->image_fmt is updated to
the profile's image_fmt. From then on, per-frame SPS submissions with
matching chroma + bit_depth see image_fmt_changed=false → skip reset
→ commit succeeds.
VP9 / MPEG-2 / VP8 paths are not affected: VP9's rkvdec coded_fmt_desc
has no get_image_fmt op; MPEG-2 + VP8 route to hantro.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Env-gated by LIBVA_V4L2_REQ_GETBACK. After v4l2_set_controls() against
the request_fd in h265_set_controls(), issue G_EXT_CTRLS with the same
request_fd targeting SPS and log first 16 bytes returned.
iter20 (kernel printk) found rkvdec sees all-zero ctx->ctrl_hdl SPS for
libva HEVC vs correct bytes for kdirect. The remaining branch is whether
req->p_new was ever staged with libva's payload, or whether
v4l2_ctrl_request_setup failed to apply it.
α-24 distinguishes the two:
zero readback -> staging failed in v4l2_s_ext_ctrls
non-zero -> apply failed in v4l2_ctrl_request_setup
EACCES -> kernel disallows req readback; need deeper printk
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If removing DECODE_PARAMS from libva's S_EXT_CTRLS batch lets the other
4 controls stage, rkvdec_hevc_run printk will show w=1280 h=720 etc.
That confirms DECODE_PARAMS specifically is failing kernel validation
and rolling back the whole batch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests mechanism 5 (silent partial failure). If error_idx != count after
S_EXT_CTRLS, one of the per-request controls was rejected by the kernel
even though the ioctl returned 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static storage for sps/pps/decode_params/scaling_matrix + no-free for
slice_params_array. Tests the kernel-defers-compound-copy hypothesis
from iter17 P7 finding.
If hashes change -> mechanism 3 confirmed; will refactor to per-surface
heap allocation.
If hashes unchanged -> mechanism 3 disproved; iter19 explores
mechanisms 1/2/5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Test discriminator: lowering MIN_CAP_POOL from 24 to 11 (matching
kdirect) did not change any of the 5-codec hashes. Pool depth is
not the cause of Bug 4/5/6. Revert.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quick discriminator: if pool depth affects rkvdec's per-codec state
machine, reducing libva's pool to kdirect's ~11 might change Bug 4/5/6
hashes. Reverts to 24 if test shows no change or regression.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 3 ioctl-sequence diff: kdirect (ffmpeg-v4l2request) S_FMTs CAPTURE
with NV12 + dimensions after S_FMT OUTPUT, BEFORE CREATE_BUFS. libva's
old code only G_FMTs CAPTURE (per iter5b-β's hantro-targeted comment
that explicit S_FMT puts hantro into an inconsistent state).
For rkvdec on RK3399 the absence of explicit S_FMT CAPTURE doesn't
commit the chosen NV12 format properly. rkvdec HEVC + H.264 silently
produce zero / garbage CAPTURE output — Bug 4 + Bug 5 root cause.
Now: S_FMT OUTPUT → S_FMT CAPTURE → G_FMT CAPTURE. Failure of S_FMT
CAPTURE is non-fatal: fall back to G_FMT (preserves the iter5b-β
hantro path).
Future iter to gate this on driver_kind explicitly per
feedback_per_driver_kludge_gating.md. For now, always-on is safe
because kdirect proves S_FMT CAPTURE works on both rkvdec AND hantro.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LIBVA_V4L2_DUMP_OUTPUT=<dir> writes source_data[0..slices_size] to
<dir>/output_p<profile>_s<surface>_t<ts>.bin immediately before
v4l2_queue_buffer OUTPUT. Discriminates whether libva writes the
correct H.264/HEVC bitstream bytes (same as kdirect/input file).
Off by default. Wrapped in static-cache env check.
iter11+12+13 confirmed Bug 4/5 are not in S_EXT_CTRLS payload, not
in kernel substrate (RFC v2), not in CPU cache visibility (α-17 sync
ioctl works but inert). The remaining libva-side surface is the
actual bitstream bytes the kernel reads.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
V4L2 CAPTURE buffers are V4L2_MEMORY_MMAP and mapped cached. Kernel
DMA writes don't propagate to CPU cache observer; reading
destination_data[] without DMA_BUF_IOCTL_SYNC(START|READ) returns
stale data on RK3399 — observed as Bug 4 (H.264 partial-fill) and
Bug 5 (HEVC all-zero) when libva goes through cached-mmap readback
while kdirect ffmpeg-v4l2request + DRM_PRIME-mmap reads cleanly via
implicit sync.
Per Tomasz Figa's 2024 linaro-mm-sig discussion + feedback_rfc_v2_
vb2_dma_resv_scope.md: userspace responsibility for cache sync on
cached-mmap'd V4L2 buffers. RFC v2 fence work doesn't engage this
path; this ioctl pair does.
Just-in-time EXPBUF + SYNC + close per copy. Per-call cost is one
ioctl pair + one fd lifecycle per plane. Could cache the EXPBUF fd
on cap_pool slot but doing it transient keeps lifecycle simple.
Closing the EXPBUF fd is a no-op on V4L2 buffer memory.
If EXPBUF or SYNC fails, fall through to existing memcpy path —
preserves pre-iter13 behavior on the error branch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two fixes in one commit:
α-13 (h265_fill_sps): sps_max_num_reorder_pics now derived from
sps_max_dec_pic_buffering_minus1 (safe upper bound per H.265 §A.4.2)
instead of hardcoded 0. Phase 5b empirically showed rkvdec ignores
this field on RK3399, so this is wire-correctness hygiene only — matches
kdirect's payload pattern without behavior change.
α-14 (h265_set_controls): derive IRAP_PIC / IDR_PIC flags from the
first slice's nal_unit_type (parsed by h265_fill_slice_params into
slice_params_array[0].nal_unit_type). Without these flags rkvdec
doesn't recognise the keyframe boundary, treats IDR as inter without
references, and produces all-zero CAPTURE output — observed as Bug 5
on libva HEVC (06b2c5a0...). kdirect sets these from the bitstream
parse and decodes correctly (9340b832...).
Mapping:
nal_unit_type 16..23 -> IRAP_PIC
nal_unit_type 19 (IDR_W_RADL) or 20 (IDR_N_LP) -> IDR_PIC
HEVC-only (no risk to other codecs). h265_set_controls already
profile-gated via picture.c::codec_set_controls VAProfileHEVCMain
dispatch. Per feedback_unconditional_codec_state.md.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace gettimeofday in RequestEndPicture with object_context-scoped
counter producing small us values (1, 2, 3, ...) so OUTPUT QBUF
timestamp and DPB.reference_ts match ffmpeg-v4l2request's pattern.
Phase 5 IMP-1: counter scoped to object_context (not driver_data) to
avoid multi-context collisions.
Empirical confirmation only — reviewer's CRIT-1 predicts this is
inert (VP9/MPEG-2 use same path and PASS). If α-7 produces the same
broken hash, the libva wire-byte search space is exhausted and iter10
must pivot to slice-data inspection or kernel investigation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Bug 4 root cause per Phase 7 γ + Phase 4c strace re-decode:
libva strips FFmpeg's bit-16 POC sentinel; kdirect (ffmpeg-v4l2request)
does NOT strip. rkvdec writes top/bottom_field_order_cnt directly to
MMIO via writel_relaxed; with libva sending 0 instead of kdirect's
65536, hardware POC comparisons mismatch and motion compensation
silently corrupts (16x32 patch + nothing else).
The original h264_strip_ffmpeg_poc_sentinel was hantro-specific
(hantro_h264.c prepare_table fed unmasked tbl->poc[]). Hantro+H.264
is not exercised on RK3399; deferring per-driver gating to iter9 if
it surfaces.
Preserve VA_PICTURE_H264_INVALID → return 0 (correct zero-init for
empty DPB slots per Phase 5c amendment).
4 call sites unchanged (h264.c:309, 312, 462, 465 — for ref and current
frame TopFieldOrderCnt / BottomFieldOrderCnt). Both reference and
current-frame POCs now pass through unchanged so hardware compares
agree.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Env-gated CAPTURE pre-zero in BeginPicture after cap_pool_acquire. With
LIBVA_V4L2_ZERO_CAPTURE=1, the slot mmap region is memset 0 before the
kernel decode runs. Discriminates "kernel writes partial then aborts"
from "kernel writes nothing, buffer carries stale residue from prior
allocation."
Per Phase 5 IMP-1: the 16x32 patch in libva H.264 frame 1 may be either
real partial kernel write OR stale residue. This gate makes the next
sweep run deterministically zero the buffer; if the patch still appears
after, the kernel really writes it; if not, it was stale.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After RequestSyncSurface DQBUFs CAPTURE and marks slot DECODED, optionally
dump first/last 32 bytes of each destination_data plane plus a non-zero
count over a per-plane scan window (one MB row for plane 0, 1024 bytes
for chroma). Gated behind LIBVA_V4L2_DUMP_CAPTURE=1; default off, no
regression on existing flows.
Diagnostic for Bug 4 (H.264 partial-fill): distinguishes "kernel didn't
write" from "libva mis-reads" from "stale-residue" by inspecting the
post-DQBUF buffer state directly.
Phase 5 amendments applied:
- Amendment 1 (CRIT-1): snprintf-buffered hex line, one request_log call.
- Amendment 2 (CRIT-2): dump nested inside current_slot != NULL guard.
- Amendment 4 (IMP-3): placed between cap_pool_mark_decoded and
status=VASurfaceDisplaying on happy path only.
- Amendment 5 (MIN-2): scan window = max(1024 chroma, bpl*16 luma).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Empirical Phase 7 verification revealed the algorithm bug: data links
in MEDIA_IOC_G_TOPOLOGY connect PAD IDs, not entity IDs directly.
My iter7 Phase 6 commit compared link source_id/sink_id against
the proc entity_id, never matched → io_entity_ids stayed empty →
interface lookup never fired → returns -1 → falls back to legacy
hardcoded path.
Topology dump on fresnel /dev/media0 (rkvdec) confirmed:
- Entity 3 (rkvdec-proc) has function=0x4008 (DECODER) ✓
- Data link src=16777218 sink=16777220 — these are PAD ids
(0x01000002, 0x01000004), NOT entity 3.
- Interface link src=50331660 (interface) sink=1 (entity) — for
interface links source/sink ARE entity IDs.
Fix: resolve pads → entities via the topo.pads[] array.
1. Collect pads belonging to proc entity (via pads[].entity_id).
2. For each data link touching those pads, the OTHER pad's
entity_id is an IO neighbor.
3. Find interface link to those IO entities (unchanged from prev).
Also allocate topo.pads[] in the 2-call ioctl pattern.
Signed-off-by: claude-noether <claude-noether@reauktion.de>
Refactor request.c::find_video_node_via_topology to
find_decoder_video_node_via_topology — walks media-topology entities
looking for MEDIA_ENT_F_PROC_VIDEO_DECODER function, then follows the
kernel's link graph (data link from proc to IO entity, interface link
from IO entity to V4L_VIDEO interface) to the correct /dev/videoN.
Two-pass find_codec_device: pass 1 accepts only "rkvdec" (multi-codec
decoder, 3 of 5 codecs); pass 2 accepts any known_decoder_drivers
entry. Pre-iter7 the walk picked whichever media device matched the
hantro-vpu driver name first — which on RK3399 could be the encoder
half of the same media device, surfacing as an empty profile list.
Phase 5 amendments incorporated:
- CRIT-1: use MEDIA_LNK_FL_INTERFACE_LINK (1U<<28) to discriminate
interface vs data links.
- CRIT-2: check both source_id and sink_id of each link.
- IMP-3: 2-call MEDIA_IOC_G_TOPOLOGY pattern (allocate all 3 arrays
before second call); pre-iter7 had a spurious memset + third call.
iter4-B1b (multi-decoder routing — open BOTH rkvdec AND hantro from
one backend instance) still deferred. Post-iter7 MPEG-2/VP8 (hantro)
still need LIBVA_V4L2_REQUEST_VIDEO_PATH override.
Signed-off-by: claude-noether <claude-noether@reauktion.de>
Phase 7 empirical: all 5 libva codecs returned all-zero because
CreateContext's surfaces_ids[] walk was a no-op for ffmpeg-vaapi-copy
which passes surfaces_count=0 to vaCreateContext (per the iter6
comment at context.c:262). Surfaces existed in driver_data's
surface_heap but weren't in the param array → destination_* stayed
at the zero initialization from CreateSurfaces2 β → BeginPicture's
surface_bind_slot saw destination_planes_count=0 → no data
assignment → copy_surface_to_image read all-zero.
Fix: cache the format-uniform CAPTURE geometry in driver_data
(fmt_valid, fmt_planes_count, fmt_buffers_count, fmt_format_height,
fmt_sizes[], fmt_bytesperlines[]). Populate at CreateContext after
v4l2_get_format(CAPTURE). Walk surface_heap (not just surfaces_ids[])
to fill every existing surface. Add lazy-fill in CreateSurfaces2 for
surfaces created AFTER CreateContext. Invalidate cache in
DestroyContext.
New helper: surface_fill_format_uniform(driver_data, surface_object).
Idempotent on destination_planes_count != 0.
Signed-off-by: claude-noether <claude-noether@reauktion.de>
Strip OUTPUT-side V4L2 device-format lifecycle out of
RequestCreateSurfaces2 entirely. Move S_FMT(OUTPUT), CAPTURE-format
probe, cap_pool_init, per-surface destination_* fill into
RequestCreateContext where config_id (and therefore the bound
VAProfile) is known via config_object->pixelformat (wired by
commit B). The α' multi-CreateSurfaces2-mid-stream failure mode
disappears because β has no in-CreateSurfaces2 teardown branch;
each context cycle does its own setup, DestroyContext handles
teardown.
Phase 5 v2 review amendments:
- CRIT-1: removed video_format==NULL early-return at context.c:64-66
(would have rejected every first β CreateContext).
- CRIT-2: added request_pool_destroy() to DestroyContext before
REQBUFS(0). Pre-β only surface.c's resolution-change branch
called request_pool_destroy; β strips that, so DestroyContext
becomes the sole per-session teardown site.
- IMP-1: probe CAPTURE format first to derive output_type from
video_format->v4l2_mplane (eliminates the hardcoded mplane=true
hack from the Phase 4 v2 plan).
- IMP-2: surface_reset_format_cache() deleted (function + declaration
in surface.h + call in DestroyContext + last_output_{width,height}
fields in request.h). All dead under β.
CreateSurfaces2 now ~50 LOC (was ~250). Pure surface ID allocation
+ per-surface lifecycle bookkeeping; no V4L2 device state touched.
Signed-off-by: claude-noether <claude-noether@reauktion.de>
Populate the previously-dead pixelformat field at config.h:46 from
pixelformat_for_profile(profile). The switch at lines 54-88 already
rejects unsupported profiles, so by the time we reach the assignment
at line 98, pixelformat_for_profile returns non-zero.
Commit C reads this field at CreateContext to set the V4L2 OUTPUT
format correctly per profile (the β architectural fix for Bug 2 —
HEVC/VP9/VP8 currently dispatch through the pre-iter5b H264_SLICE
hardcode at surface.c:173 because surface.c has no config_id to look
up the profile).
Signed-off-by: claude-noether <claude-noether@reauktion.de>
Re-introduce after the iter5b-α' revert. Helper maps VAProfile to V4L2
OUTPUT-side FOURCC, used at CreateConfig in commit B to populate the
previously-dead object_config->pixelformat field. β reads from there
at CreateContext (commit C).
Single source of truth for the profile→pixelformat mapping; mirrors
the per-profile probes in config.c::RequestQueryConfigProfiles
(lines 138-188).
Register codec.c in meson.build sources, codec.h in headers.
Signed-off-by: claude-noether <claude-noether@reauktion.de>