LIBVA-2 follow-up. RequestQueryConfigProfiles walks each known
decoder fd via any_fd_supports_output_format() and adds a VAProfile*
for each codec OUTPUT format the V4L2 device advertises. The fd
list missed video_fd_daedalus — so on a Pi 5 with rpi-hevc-dec
primary + daedalus_v4l2 alt, only S265 (HEVC) was probed and the
H.264 / VP9 / AV1 profiles never got enumerated.
Effect on higgs: ffmpeg -hwaccel vaapi -i h264_test.mp4 reported
"No support for codec h264 profile 578" before the per-codec
dispatch in request_switch_device_for_profile could fire — the
profile-578 (H264 Constrained Baseline) check happened during
hwaccel init, found nothing in the libva profile list, and bailed
without ever calling into the daedalus path.
Fix: extend the fds[] array in any_fd_supports_output_format from
5 to 6 entries, with the sixth being video_fd_daedalus when
HAVE_DAEDALUS_V4L2 is on (and -1 otherwise so it's skipped by the
`if (fds[i] < 0) continue;` guard). After the fix, daedalus_v4l2's
OUTPUT format menu (VP9F + AV1F + S264) gets seen, and Request-
QueryConfigProfiles returns VP9Profile0 + AV1Profile0 + the H264*
profiles, all of which then route through the LIBVA-1 'd' kind
override in request_switch_device_for_profile.
Verified on higgs:
Before:
vainfo: Supported profile and entrypoints
VAProfileHEVCMain : VAEntrypointVLD
(only HEVC; H264/VP9/AV1 not enumerated)
ffmpeg vaapi -i h264 → "No support for codec h264 profile 578"
Build clean on boltzmann (only config.c.o + request.c.o recompile).
Backward-compatible on RK3399/3588 — the new slot is gated by
HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus >= 0; both stay false in
those deployments. Existing 5-fd probe order unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
LIBVA-1 — when both rpi-hevc-dec and daedalus_v4l2 are loaded, finish
the per-codec dispatch so HEVC goes to rpi-hevc-dec (existing 'p'
override) and VP9 / AV1 / H.264 go to the daedalus daemon ('d').
Before this change the multi-device-probe accepted only ONE driver
plus a fixed alt slot (rkvdec↔hantro-vpu); on a Pi 5 with both decoders
the find_codec_device() walk preferred rpi-hevc-dec by known_decoder_
drivers[] order and never opened daedalus_v4l2, so VP9/AV1/H.264 frames
hit rpi-hevc-dec's S_FMT and failed.
Changes:
- request.c multi-device-probe: when primary = rpi-hevc-dec, alt =
daedalus_v4l2 (when HAVE_DAEDALUS_V4L2 is on); symmetric handling
in the daedalus_v4l2 primary branch so alt = rpi-hevc-dec. This
preserves the iter40 fallback (no daedalus → alt = NULL) when the
build option is off.
- request.c alt-driver opening block: generalized from the iter38
rkvdec/hantro pair to also dispatch into video_fd_rpi_hevc_dec and
video_fd_daedalus slots. Defensive close on unknown alt-driver
name (shouldn't happen — primary_driver branches gate the choices —
but keeps the slot tally clean if a future driver name is added
above without wiring up the dispatch here).
- request_switch_device_for_profile: added 'd' kind handler +
profile override block. When daedalus is open, VP9 / AV1 / H.264*
route to it. HEVC stays on rpi-hevc-dec via the existing 'p'
override. AV1 'a' kind (RK3588 vpu981) wins ONLY if vpu981 was
probed, so the override only fires on hosts where vpu981 stayed
-1 (i.e. Pi 5).
- RequestTerminate: close the daedalus_v4l2 fd pair on teardown
(was leaking — caught while reviewing the alt-driver expansion).
Build: meson + ninja clean on boltzmann (only pre-existing GStreamer
H265 parser noise). Behaviour on RK3399/3588 boxes unchanged — the
new branches are gated by HAVE_DAEDALUS_V4L2 *and* video_fd_daedalus
≥ 0, both of which stay false in those deployments.
Companion to daedalus-v4l2 481279c (Phase 8.13 systemd unit) and
marfrit-packages noether/daedalus-v4l2-kernel-6.18-compat branch.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Better diagnostic when VIDIOC_S_EXT_CTRLS returns < 0: read
back error_idx and print which control id rejected (or
"ioctl-level" when error_idx == count, meaning the rejection
was generic, not per-control).
Made it possible to triage the daedalus_v4l2 phase 8.13 issue
by separating "the actual stateless control failed" (would
show failing_ctrl_id=0xa40a2c VP9_FRAME) from "libva probing
H264/HEVC profile/level we don't expose" (failing_ctrl_id=
0xa40900 H264_PROFILE etc.) — the latter is harmless on a
VP9-only context.
Before:
v4l2-request: Unable to set control(s): Invalid argument
After (per-control):
v4l2-request: Unable to set control(s): Invalid argument
(error_idx=0/2 failing_ctrl_id=0xa40900 size=0)
After (ioctl-level):
v4l2-request: Unable to set control(s): Invalid argument
(error_idx=2/2 ioctl-level)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a build-time switch so platforms that will never see a
daedalus_v4l2 kernel module (Allwinner cedrus, RK without the
shim, etc.) can opt out of the probe entry + dispatch branch.
meson setup build # daedalus support on
meson setup build-off -Ddaedalus_v4l2=false # off
Implementation:
- meson_options.txt: new boolean `daedalus_v4l2`, default true.
- src/meson.build: when option is true, autoconfig.h gets
`#define HAVE_DAEDALUS_V4L2 1`.
- src/request.c: known_decoder_drivers[] entry, primary-driver
detection branch, and post-probe log line all gated by
#ifdef HAVE_DAEDALUS_V4L2.
- src/request.h: struct daedalus fields kept UNCONDITIONAL.
Two extra int per session and the struct layout stays stable
across translation units regardless of option — avoids the
ODR risk of every consumer of request.h needing to include
autoconfig.h before request.h.
Verified on hertz: both builds compile clean.
build/src/autoconfig.h has HAVE_DAEDALUS_V4L2; .so contains
"daedalus_v4l2" string + log message.
build-off/src/autoconfig.h doesn't; .so contains no daedalus
strings at all.
Default-on build still passes vainfo end-to-end:
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileH264Main / High / ConstrainedBaseline / MultiviewHigh
/ StereoHigh : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileAV1Profile0 : VAEntrypointVLD
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Phase 8.10 of the daedalus-v4l2 sibling campaign — out-of-tree
V4L2 stateless decoder shim that forwards bitstream to a
userspace daemon (FFmpeg-software decode for VP9 / AV1 / H.264;
pixels back via dmabuf into the CAPTURE buffer).
Adds the same iter40-shaped wiring as rpi-hevc-dec:
- known_decoder_drivers[] entry "daedalus_v4l2"
- video_fd_daedalus + media_fd_daedalus slots in driver_data
- -1 init alongside the other multi-device slots
- primary-driver detection branch in the auto-probe block
- post-probe log line for symmetry with iter40
No per-profile dispatch changes needed — daedalus_v4l2 advertises
the standard V4L2_PIX_FMT_{VP9_FRAME,AV1_FRAME,H264_SLICE}
OUTPUT fourccs the fork's existing per-driver paths already
handle.
Verified on hertz (Pi 5 / CM5, 6.12.75+rpt-rpi-2712) with the
daedalus_v4l2 module loaded:
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
vainfo --display drm --device /dev/dri/renderD128
v4l2-request: opened daedalus_v4l2 at video_fd=... media_fd=... (Pi 5 daemon-backed VP9/AV1/H264)
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileVP9Profile0 : VAEntrypointVLD
VAProfileAV1Profile0 : VAEntrypointVLD
Without the env override the auto-probe still picks rpi-hevc-dec
first (it's earlier in known_decoder_drivers[]); on the standalone
daedalus_v4l2 path the daemon-backed decode is what answers
S_FMT/QBUF/DQBUF. On a mixed-driver Pi 5 box where both modules
are loaded, HEVC continues to route through rpi-hevc-dec via the
existing 'p' override; VP9/AV1/H264 would prefer daedalus_v4l2
since rpi-hevc-dec is HEVC-only.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Imports the minimal "vainfo lists VAProfileAV1Profile0" layer from the
operator's in-progress av1-iter1 branch (Phase 2 steps 1, 2 — commits
bed75c0 + 61db76e on av1-iter1). The Phase 3-5 bit-exact decode-side
work stays in av1-iter1; this commit gives master the enumeration +
fd-routing layer so consumers (ffmpeg-vaapi, firefox-fourier, chromium-
fourier) at least see VAProfileAV1Profile0 today on RK3588.
What this commit adds:
- video_fd_vpu981 + media_fd_vpu981 slots to struct request_data
(named to match av1-iter1's convention so the operator's Phase 3-5
merge resolves cleanly)
- 4th-decoder probe loop in VA_DRIVER_INIT that walks hantro-vpu
media nodes for an instance advertising V4L2_PIX_FMT_AV1_FRAME
(AV1F) as OUTPUT pixfmt. RK3588 has 3 hantro-vpu instances all
reporting driver="hantro-vpu" + model="hantro-vpu", so OUTPUT-
format probe is the only DTS-independent discriminator.
- 'a' kind in request_device_kind_for_profile (VAProfileAV1Profile0)
+ 'a' branch in request_switch_device_for_profile.
- video_fd_vpu981 added to any_fd_supports_output_format helper
(existing 3-slot loop missed the new fd; same off-by-one trap
that bit ampere's av1-iter1 enumeration for a week).
- VAProfileAV1Profile0 → V4L2_PIX_FMT_AV1_FRAME in pixelformat_for
_profile.
- VAProfileAV1Profile0 push in RequestQueryConfigProfiles +
RequestQueryConfigEntrypoints + RequestCreateConfig switch.
- vpu981 fd cleanup in RequestTerminate.
- rpi_hevc_dec fd cleanup added at the same time (was already missing
in master — fixed defensively).
- V4L2_REQUEST_MAX_PROFILES bumped 13 → 14. Defensively sized for
the post-Option-B-revert future: with iter39 Option B reverted
(Hi10P + Main10 back in enumeration) plus AV1, max possible
enumeration is 13. The per-group guards use `index < MAX - N`
pattern; for a singleton push to succeed at index=13 we need
MAX >= 14. Bumping now avoids the same off-by-one bug from
silently dropping AV1 when Option B eventually reverts.
What this commit does NOT add:
- av1.{c,h} decode-side scaffolding (Phase 2 step 4 on av1-iter1 —
~177 LoC including a stub av1_set_controls that returns -1). When
the operator's av1-iter1 Phase 3-5 work lands on master, those
500+ LoC + the stub will follow. Without them, consumers calling
vaCreateContext(VAProfileAV1Profile0) succeed at the libva layer
but ffmpeg-vaapi will fail at the first vaRenderPicture with an
AV1-buffer-type rejection — clean error, no crash.
Verified 2026-05-18 on ampere:
$ env LIBVA_DRIVER_NAME=v4l2_request vainfo | grep VAProfile
... (10 prior profiles, unchanged) ...
VAProfileAV1Profile0 : VAEntrypointVLD ✓
Probe log: "ampere-av1: vpu981 AV1 decoder at /dev/video4 + /dev/media3"
Build clean on ampere with GCC 16.1.1; no warnings introduced.
ampere's running module restored to the av1-iter1 build after the
verification — this commit's .so was NOT permanently installed.
Closes the headline acceptance criterion in
marfrit/libva-v4l2-request-fourier#2 ("vainfo on ampere lists
VAProfileAV1"). End-to-end AV1 decode bit-exactness is iter4 work
that the av1-iter1 branch continues to drive.
Co-Authored-By: claude-noether <claude-noether@reauktion.de>
Per-driver gate added: when rpi-hevc-dec active, parse SPS NAL from
surface_object->source_data via the iter2 vendored GStreamer parser
and override the VAAPI-omitted v4l2_ctrl_hevc_sps fields
(sps_max_num_reorder_pics, sps_max_latency_increase_plus1,
sps_max_sub_layers_minus1, max_dec_pic_buffering_minus1[HighestTid]).
Cached at driver_data->hevc_sps_field_cache.
Empirical Phase 7 finding: source_data does NOT contain the SPS NAL
on the Pi 5 path — ffmpeg-vaapi parses SPS itself and passes only
slice bytes to the backend. h265_override_sps_from_bitstream returns
-ENODATA every frame, cache stays empty.
Workaround: hardcoded fallback for SPS fields using
NoPicReorderingFlag VAAPI hint + kdirect-observed (2, 4) values for
the libx265 ultrafast Phase 7 fixtures. Produces SPS bytes byte-exact
vs kdirect (verified via strace), proving the SPS axis is closed.
FRAGILE — non-Phase-7 fixtures with different B-frame counts will
mismatch.
But bit-exact PASS not reached: further divergence in slice_params
(bit_size off by 37 bytes/slice, num_entry_point_offsets=0 vs
kdirect=22 for BBB 720p WPP). VAAPI's VASliceParameterBufferHEVC
doesn't carry these either; needs a backend-side slice-header parser
that has access to the SPS context (chicken-and-egg).
Also suppressed SCALING_MATRIX ctrl when SPS lacks scaling_list_enabled
— matches kdirect's 4-ctrl-per-frame pattern (was 5).
Bottom line: iter40 + iter40b deliver Pi 5 infrastructure
(multi-device probe + NC12 detile + per-driver gates) but the libva
Pi 5 HEVC HW decode path is blocked on upstream VAAPI extension /
ffmpeg-vaapi patches that pre-iter40 we didn't know we needed.
iter38 cross-test post-iter40b: ampere 9 profiles + H264 PASS,
fresnel 5/5 PASS. No sibling regression.
Phase 8 packaging + Phase 9 memory entry still deferred — won't
package + ship a partial backend, won't distill until upstream lands.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
V4L2_HEVC_DPB_ENTRIES_NUM_MAX is 16, but
VASliceParameterBufferHEVC::RefPicList is [2][15] and the eight
delta_*_weight_lX / luma_offset_lX / delta_chroma_weight_lX /
ChromaOffsetLX arrays are all [15]. Iterating the per-slot copy
loops to 16 over-reads the VAAPI source by one element.
The bug was always there but hidden under -O3 (meson's default
buildtype=release): GCC unrolled the inner loop and dead-folded
the out-of-bounds load. Under -O2 (Arch makepkg CFLAGS) the
canonical vectorised loop ran and produced a real SEGV at
v4l2_request_drv_video.so + 0xb3a4 inside h265_fill_slice_params,
breaking HEVC immediately after the package install on fresnel
(iter38 5/5 baseline dropped to 4/5).
Define a local VA_HEVC_REF_LIST_LEN (15) and use it as the cap
for the four offending loops. RefPicList and pred_weight_table
copies now respect the source bound; V4L2 destination still has
16 slots, the upper one stays at memset-zero which is correct.
Verified locally: -O2 build + package re-install restores HEVC
to bit-exact PASS vs kdirect (sha 108f925bb6cbb6c9). iter38 5/5
baseline restored.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Phase 7 fix 63fed87 (unconditional P010 in QueryImageFormats) broke
HEVC 8-bit on fresnel: ffmpeg-vaapi picked P010 for the HEVC hwframe
pool, vaEndPicture SEGV'd when consumer-side P010 expectations met
the 8-bit NV12 CAPTURE buffer. Exit 139 (SIGSEGV) on first frame.
Original reasoning for 63fed87 (advertise early so ffmpeg's pre-
CreateContext query sees P010) doesn't apply with Option B in place —
Hi10P + Main10 are dropped from RequestQueryConfigProfiles, so no
10-bit decode pipeline reaches QueryImageFormats. The gate on
is_10bit (false for all enumerated profiles post-Option-B) correctly
returns NV12-only.
Verified on fresnel post-revert: HEVC bit-exact PASS sha
108f925bb6cbb6c9 restored; iter38 5/5 baseline intact.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Phase 7 close + user-directed Option B trigger (web research /
rockchip-mpp showed Hi10P is effectively impossible on the current
stack). Cross-test on ampere RK3588 confirmed the SAME failure mode
as fresnel RK3399 — both produce all-zero output via libva; kdirect
fails with EINVAL on both. The blocker is in ffmpeg-v4l2-request
userspace plumbing for the new uAPI controls Karlman's kernel patches
introduced, NOT in our backend or the kernel.
Sources confirming kernel + HW capable but userspace pending:
- lwn.net/Articles/950434: "to fully runtime test... you may need
upstream DRM commits, FFmpeg patches"
- patchwork.kernel.org Karlman v6 → v10 series on linux-media
- Rockchip RK3399 + RK3588 datasheets list 10-bit H.264 support
Stop enumerating Hi10P + Main10 so VAAPI consumers don't try the
broken path. The backend infrastructure (codec.c profile cases,
context.c NV15 CAPTURE + synthetic SPS bit_depth=2 + video_format
invalidation, image.c P010 reporting + NV15→P010 unpack, surface.c
RT_FORMAT_YUV420_10 guard + NV15 PRIME fourcc, nv15.c + nv15.h
unpack primitive, request.h is_10bit flag) is RETAINED — just
re-add the two profiles[index++] lines and bump the H264 guard
back to (-6) when upstream ffmpeg-vaapi V4L2 hwaccel learns 10-bit.
Memory: feedback_rk3399_h264_hi10p_advertised_not_functional.md
captures the empirical evidence for future iterations.
vainfo after this commit: 10 profiles (was 12), matches the iter38
baseline. iter38 5/5 PASS preserved (no other codec touched).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
ffmpeg-vaapi's hwcontext_vaapi calls vaQueryImageFormats during
hwframes context setup, BEFORE vaCreateContext fires. Our previous
gate on driver_data->is_10bit meant P010 wasn't in the catalog at
that early query — ffmpeg's hwdownload then rejected pix_fmt=p010le
with "Invalid output format p010le for hwframe download" and decode
failed before our backend's CreateContext saw the 10-bit profile.
Fix: advertise P010 unconditionally in QueryImageFormats. Safe because
consumers ask for P010 only when their decode pipeline needs 10-bit,
and our P010 unpack path in copy_surface_to_image is gated on
image->format.fourcc == VA_FOURCC_P010 (independent of is_10bit).
Verified on fresnel: with this fix, Hi10P decode advances past the
hwdownload filter setup. (Run pending bundle to fresnel.)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
RK3399 rkvdec advertises NV15 in VIDIOC_ENUM_FMT(CAPTURE) only AFTER
S_FMT(OUTPUT) + S_EXT_CTRLS(SPS) resolve image_fmt to 420_10BIT.
Pre-flight v4l2_find_format(NV15) always returns 0 → video_format
stays NULL → CreateContext returns OPERATION_FAILED → ffmpeg-vaapi
hwaccel init fails with "Failed to create decode context: 1".
Verified on fresnel (kernel 7.0-14 / linux-fresnel-fourier):
v4l2-ctl -d /dev/video1 --list-formats → only NV12 enumerated
Fix: for 10-bit profiles, skip the find_format probe and directly
map to our NV15 video_format entry. The later S_FMT(CAPTURE) in
the same RequestCreateContext path commits the actual NV15 mode
once the synthetic-SPS injection sets bit_depth_luma_minus8=2.
Discovered during Phase 7 sub-profile verification — Criterion 1
(vainfo enumeration) PASSed but Criteria 2/3 (Hi10P/Main10 decode)
failed with the hwaccel init error. iter38 5/5 baseline still PASSES
(no regression — non-10-bit path unchanged).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Per Phase 4 plan + Phase 5 review amendments (SPS parse-and-cache,
per-fd gating).
src/h265.c additions:
- #include <errno.h>, the v4l2-hevc-ext-controls.h, and the
vendored gst/codecparsers/gsth265parser.h
- new static helper h265_populate_ext_sps_rps_cache(): walks
surface_object->source_data for an SPS NAL (nal_unit_type == 33)
using gst_h265_parser_identify_nalu; if found, calls
gst_h265_parser_parse_sps_ext (NOT gst_h265_parser_parse_sps —
the latter discards the per-RPS-entry EXT data we need); maps
GstH265ShortTermRefPicSet (base) + GstH265ShortTermRefPicSetExt
(carrying use_delta_flag[16], used_by_curr_pic_flag[16],
delta_poc_s0_minus1[16], delta_poc_s1_minus1[16]) into the V4L2
struct arrays; stores on driver_data->hevc_rps_cache_*
- non-IDR-frame handling: cache holds across frames, so frames
whose source_data lacks an SPS NAL reuse the previously-parsed
cached arrays (Phase 5 review item #3)
- controls[] grows from [5] to [7]; the 2 new entries are appended
after the standard 5 (SPS/PPS/SLICE_PARAMS/SCALING_MATRIX/
DECODE_PARAMS), gated by driver_data->has_hevc_ext_sps_rps_rkvdec
(per-fd probe result from Step 3) + the cache being valid
- field-by-field mapping mirrors GStreamer's
gst_v4l2_codec_h265_dec_fill_ext_sps_rps verbatim (the upstream
reference identified in Phase 0 prior-art survey)
src/request.h additions:
- struct request_data carries hevc_rps_cache_st (array pointer),
_st_count, hevc_rps_cache_lt, _lt_count, hevc_rps_cache_valid.
Single-slot cache (sps_id 0 only; multi-SPS streams would need
expanding). Stores POST-MAPPED V4L2 structs so request.h doesn't
need to know GstH265SPS / GstH265SPSEXT types.
Critical interpretation correction (Phase 5 review followup):
GstH265SPS has short_term_ref_pic_set[65] (base) but NOT
short_term_ref_pic_set_ext[]. The EXT array lives on a SEPARATE
GstH265SPSEXT struct accessed via gst_h265_parser_parse_sps_ext.
The 'plain' gst_h265_parser_parse_sps internally calls _ext with a
LOCAL discarded SPSEXT (see gsth265parser.c:2050). Our call must
use the _ext variant directly to keep the EXT data. Caught during
Step 4 first-build error.
Build verified: ninja -C build clean. .so is 759 KB (up from 485 KB
original, 682 KB after Step 2 vendor — the +80 KB is the new helper
+ extension).
iter2 Phase 6 Step 5 (install + reboot + smoke-test) is the F1
falsifier moment: if HEVC stops OOPSing, mechanism confirmed; if it
still OOPSes, loopback Phase 0 with re-opened kernel-agent#11.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
src/hevc-ctrls/v4l2-hevc-ext-controls.h (NEW, MIT, ~95 LOC):
Verbatim mirror of Linux 7.0 V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS
and _LT_RPS control IDs + struct definitions + flag macros. Each
symbol is ifndef-guarded so when ampere's linux-api-headers
eventually bumps to 7.0+, the kernel header takes precedence and
this shim silently no-ops. Citation block links the upstream
Casanova v8 series.
Per LGPL section 3.b, kernel UAPI struct definitions are excepted
from GPL inheritance, so copying them into MIT userspace is fine.
src/request.h: added has_hevc_ext_sps_rps_rkvdec + _hantro bool
fields on struct request_data — pair-of-flags layout mirrors
video_fd_rkvdec / video_fd_hantro (iter38 multi-device-probe
pattern, per feedback_multi_device_probe_design). Phase 5 review
identified single-scalar storage as a silent-misbehavior risk
across device-switch boundaries.
src/request.c:
- new probe_hevc_ext_sps_rps_controls(fd) helper: queries the two
new CIDs via VIDIOC_QUERYCTRL; returns true iff both register.
RK3399 rkvdec (linux 6.x or 7.x without VDPU381/383 bindings)
returns false; RK3588 rkvdec (VDPU381/383) returns true.
- probe each driver_data->video_fd_rkvdec / _hantro after the
iter38 multi-device-probe block at VA_DRIVER_INIT time
- log-line if rkvdec supports it - diagnostic for Phase 7
src/meson.build: added the new UAPI header to the headers list.
Build verified: ninja -C build clean, .so produced. The new probe
runs at driver init and stores the result, but nothing CONSUMES the
result yet — that's Step 4 (h265_set_controls wiring).
Per ampere-kernel-decoders campaign iter2 Phase 4 step 3 (amended
by Phase 5 review item 'per-fd storage').
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Vendored gsth265parser + nalutils + gstbitreader + gstbytereader (the
Step 1 commit) compile cleanly against libc + libv4l2 only after
adding 1 compat translation unit + 5 stub headers, no edits to the
vendored .c/.h files themselves.
src/h265_parser/gst_compat.{h,c} — new files (MIT, original work):
- GLib type aliases (gboolean, gchar, gint*, guint*, gsize, gpointer)
- Memory helpers (g_malloc/g_free as #define free, g_memdup2 inline)
- Asserts as no-op + parser-return-code-propagation
- All GST_DEBUG/INFO/WARNING/ERROR/LOG/FIXME as no-ops (the parser
is heavy on debug logging; we compile it all out)
- GArray implementation (~100 LOC, just enough for gsth265parser.c's
24 call sites)
- GList full struct with .data/.next/.prev so callers compile;
list-manipulation functions abort() — dead code paths only
- Byte-order read/write macros (GST_READ_UINT8/16/24/32/64_LE/BE,
GST_WRITE_UINT8/16/24/32_BE) — aarch64 LE inlines
- g_once_init_enter/leave as simple gate
- G_MAXUINT*, G_MAXINT*, G_MINxxx, G_GNUC_* attribute macros, etc.
- Opaque GstBuffer/GstMemory/GstMapInfo + abort-stub functions for
the encoder-side SEI-insertion paths the libva backend never invokes
- gst_util_ceil_log2 real impl (used by slice-header parser; dead
for our SPS-only call path but cheaper to implement than stub)
src/h265_parser/gst/{gst.h,base/base-prelude.h,base/gstbitwriter.h,
codecparsers/codecparsers-prelude.h,glib-compat-private.h} — 5 new
stub headers (MIT). All include gst_compat.h. gstbitwriter.h adds
abort-stub functions for the bit-writer API (used by nalutils.c's NAL
emulation-prevention encoder path — dead code for the parse-only
libva backend).
src/meson.build — added the 5 new .c source files and 10 new .h
headers; added include_directories('h265_parser') to the include path
so the vendored files' '#include <gst/base/...>' style references
resolve to the stub headers + actual vendored files in the local
tree.
Build verified: ninja -C build produces v4l2_request_drv_video.so
(682 KB, up from 485 KB pre-vendor — the +200 KB is the vendored
parser code). nm shows gst_h265_parse_sps, gst_h265_parse_sps_ext,
gst_h265_parser_identify_nalu, and the other functions we need for
Step 4 are present in the binary.
Two #warning messages from gsth265parser.h about API stability are
upstream-intentional and harmless ('The H.265 parsing library is
unstable API and may change in future').
This commit completes Step 2 of ampere-kernel-decoders iter2 Phase 6.
Backend remains functionally identical to pre-iter2 — the new code
compiles + links but is not yet called from h265_set_controls (that's
Step 4). Existing 5 codecs continue to work as before.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Source: gitlab.freedesktop.org/gstreamer/gstreamer @ commit 43421c2a5b8a
(refs/tags/1.28.2). All 8 vendored files copied verbatim into
src/h265_parser/:
gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.c (168 KB)
gst-plugins-bad/gst-libs/gst/codecparsers/gsth265parser.h ( 92 KB)
gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.c (13 KB)
gst-plugins-bad/gst-libs/gst/codecparsers/nalutils.h ( 8 KB)
gstreamer/libs/gst/base/gstbitreader.c ( 8 KB)
gstreamer/libs/gst/base/gstbitreader.h ( 10 KB)
gstreamer/libs/gst/base/gstbytereader.c ( 39 KB)
gstreamer/libs/gst/base/gstbytereader.h ( 25 KB)
Total ~11 KLOC, LGPL v2.1+ per original headers (Intel + Sreerenj
Balachandran + others). LGPL headers preserved verbatim. Backend's
existing COPYING.LGPL covers redistribution.
** Build is INTENTIONALLY BROKEN at this commit. ** GLib dependencies
(GArray, g_malloc, gboolean, GST_DEBUG, etc.) are not yet satisfied;
src/Makefile.am is not yet updated to include these files. Step 2
performs the GLib-to-libc mechanical adaptation; Step 3 wires the
header + Makefile.
This vendor-unchanged commit is the upstream-tracking baseline. When
GStreamer ships a parser bug fix, the future-sync workflow is:
git diff src/h265_parser/ HEAD..(this commit)
to surface our adaptations, then rebase those over the upstream fix.
Per ampere-kernel-decoders campaign iter2 Phase 4 §Step 1
(/home/mfritsche/src/ampere-kernel-decoders/phase4_plan_iter2.md).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fixes the rkvdec_hevc_prepare_hw_st_rps out-of-bounds kernel OOPS that
blocked HEVC decode on ampere (RK3588) per
marfrit/libva-v4l2-request-fourier#3 and ampere-fourier iter1 close.
Mechanism (Phase 5 amendment to issue body):
The new EXT_SPS controls are registered as V4L2_CTRL_FLAG_DYNAMIC_ARRAY
in vdpu38x_hevc_ctrl_descs (rkvdec.c:279/284) with cfg.dims = { 65 }.
The v4l2-ctrl framework init-allocates 1 zeroed element (ctrls-core.c:2116).
When num_short_term_ref_pic_sets > 1, rkvdec_hevc_prepare_hw_st_rps
(rkvdec-hevc-common.c:393-405) iterates idx 0..N-1 and overruns the
1-element kernel allocation. Submitting an N-element dynamic-array
control via S_EXT_CTRLS extends the framework allocation.
Userspace fix:
- VIDIOC_QUERY_EXT_CTRL probe at first HEVC CreateContext sets
driver_data->has_ext_sps_rps (true on VDPU381/383, false on legacy
RK3399 — control unregistered there, so fresnel iter38 5/5 + iter39
sub-profile paths are byte-identical to pre-iter2).
- When set, h265_set_controls appends EXT_SPS_ST_RPS + _LT_RPS as
calloc'd zero arrays, sized by VAAPI's count fields and capped at
H.265 §7.4.3.2 spec maxima (ST 64, LT 32). Min 1 (kernel rejects 0).
- Free post-S_EXT_CTRLS.
Decode correctness scope:
VAAPI does NOT expose per-set st_ref_pic_set syntax elements
(delta_idx_minus1, delta_rps_sign, etc.) — confirmed in va_dec_hevc.h.
All-zero entries give empty inter-pred RPS per set, which is correct
for IDR-only streams and incorrect for streams with inter-pred RPS
dependence. iter2 acceptance: stop the OOPS. Decode-correctness for
inter-RPS content is a known follow-up requiring either bitstream-snoop
or SPS-passthrough via a new VAAPI extension.
Files:
- include/hevc-ctrls.h: #ifndef-guarded fallback definitions for
V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS + structs (ampere host
is on linux-api-headers 6.19-1; the new CIDs land in 7.0).
- src/request.h: driver_data->has_ext_sps_rps (persists for driver
lifetime; gated solely by HEVC code path so cross-codec leakage
impossible).
- src/context.c: probe at HEVC CreateContext via v4l2_query_ext_ctrl.
- src/h265.c: controls[5] → controls[7]; #include <hevc-ctrls.h>
(replaces <linux/v4l2-controls.h>) for forward UAPI compatibility.
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1): clean .so,
0 new warnings. Fresnel cross-device safety: legacy RK3399 rkvdec_ctrl
table omits the CIDs; probe returns false; new code path never executes.
iter39 sub-profile work (commits 662f887 + 8746690) is preserved
in-tree; iter2 is a forward-compatible additive change.
Refs:
marfrit/libva-v4l2-request-fourier#3
ampere-fourier/iter1_close.md HEVC blocker
ampere-fourier/iter2_phase0_findings.md
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds VAProfileH264High10 and VAProfileHEVCMain10 to the libva-v4l2-request
backend. RK3399 rkvdec emits decoded frames as V4L2_PIX_FMT_NV15 (4 × 10-bit
values packed in 5 bytes per element); VAAPI consumers receive standard
VA_FOURCC_P010 via a new userspace unpack in copy_surface_to_image.
VP9 Profile 2 explicitly NOT added — RK3399 rkvdec kernel ctrl table
caps at V4L2_MPEG_VIDEO_VP9_PROFILE_0 (rkvdec.c::rkvdec_vp9_ctrl_descs).
Touchpoints (per Phase 5 sonnet-architect review amendments):
- include/drm_fourcc.h: define DRM_FORMAT_NV15 (vendored libdrm lacks it)
- src/nv15.{c,h}: NV15 → P010 plane unpack (LSB-first, per
Documentation/userspace-api/media/v4l/pixfmt-nv15.rst)
- src/video.c: NV15 entry in formats[] (else NULL-deref on video_format_find)
- src/codec.c: pixelformat_for_profile cases for Hi10P + Main10
- src/config.c: enumeration, validation, entrypoints, RT_FORMAT_YUV420_10
advertisement for 10-bit profiles
- src/context.c: per-profile CAPTURE pix_fmt (NV12/NV15), 10-bit synthetic
SPS (bit_depth_luma_minus8=2), video_format invalidation on bit-depth
transition (sibling to iter38 device-switch invalidation), is_10bit flag
- src/surface.c: RT_FORMAT_YUV420_10 admission, NV15 fourcc on PRIME export
- src/image.c: P010 reporting in DeriveImage + QueryImageFormats,
P010-aware sizing in CreateImage, NV15 → P010 unpack call in
copy_surface_to_image (gated on is_10bit + image.format.fourcc == P010)
- src/picture.c: 4 switch blocks route Hi10P/Main10 to existing H264/HEVC
per-codec paths
- src/request.h: MAX_PROFILES bump 11 → 13, driver_data->is_10bit flag
Scope: COPY path (vaGetImage / vaDeriveImage) only. Standard ffmpeg-vaapi
hwdownload, mpv vaapi-copy, and any consumer using vaGetImage works
end-to-end. PRIME-path consumers that only know NV12/P010 must use the
COPY path; PRIME consumers aware of NV15 (panfrost-Mesa et al.) get the
correct fourcc on RequestExportSurfaceHandle. PRIME-side P010 emission is
follow-up scope (would need DRM_FORMAT_P010 + per-plane unpack into a
GPU-accessible buffer).
Compile-tested on boltzmann (aarch64 native, gcc 15.2.1, libva 1.23.0,
libdrm 2.4.133): clean build, .so produced, 0 new warnings.
Phase 0/2 evidence: linux-mmind-v7.0 drivers/media/platform/rockchip/rkvdec.
rkvdec_h264_decoded_fmts[] and rkvdec_hevc_decoded_fmts[] both list NV15;
ctrl tables cap at HEVC MAIN_10 and H264 HIGH_422_INTRA (Hi10P < cap, not
in menu_skip_mask). image_fmt resolution (rkvdec-h264-common.c:196,
rkvdec-hevc-common.c:467) dispatches on bit_depth_luma_minus8 only.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Latent bug surfaced by iter38 multi-device probe. profiles[] array
in RequestQueryConfigProfiles is sized by V4L2_REQUEST_MAX_PROFILES
(set as context->max_profiles=11 in VA_DRIVER_INIT), but the bounds
checks used V4L2_REQUEST_MAX_CONFIG_ATTRIBUTES (10). Pre-iter38 only
a single device's profiles were enumerated, total ≤9, so the off-by-
one never bit. With iter38's rkvdec+hantro union (10 profiles total
across MPEG2/H264/HEVC/VP8/VP9), the last enumerator (VP9) hit
index=9 with the check 'index < 10-1 = 9' → skipped.
Probe BOTH rkvdec and hantro-vpu at VA_DRIVER_INIT and keep their
{video,media}_fd pairs in driver_data. RequestQueryConfigProfiles
enumerates the union of supported profiles from all open fds.
RequestCreateConfig retargets driver_data->{video,media}_fd to the
device that serves the requested profile; if a switch is needed
(active fd is wrong), tears down output_pool, capture_pool, video_format
cache, and fmt_valid so the next RequestCreateContext rebuilds them
on the new device.
Profile→device map (RK3399-shaped):
H264 / HEVC / VP9 → rkvdec
MPEG-2 / VP8 → hantro-vpu
Honours LIBVA_V4L2_REQUEST_VIDEO_PATH / MEDIA_PATH explicit overrides
(skips alt-probe when those are set).
Closes the 'libva multi-device probe' open item from iter36/iter37
campaign-close.
α-26 (iter26) wrote VAAPI's picture->st_rps_bits to the V4L2 decode_params
field of the same name based on field-name match. Per V4L2 spec, this field
is the bit-count of st_ref_pic_set() *in the SPS* — VAAPI doesn't expose
that. The slice-header bit-count (which IS what VAAPI's st_rps_bits provides)
belongs in slice_params->short_term_ref_pic_set_size (handled correctly in
α-29).
rkvdec doesn't read decode_params->short_term_ref_pic_set_size, so the
misroute was harmless but stale. This revert restores spec-correct semantics
(0 when SPS bit-count is unknown).
Cosmetic cleanup; no functional change.
ROOT CAUSE FIX for VP8 libva decode garbage output.
ffmpeg-vaapi's vaapi_vp8.c:191-192 STRIPS the VP8 uncompressed
header (3 bytes for interframe, 10 bytes for keyframe) before
submitting the slice data via VAAPI. ffmpeg-v4l2request (kdirect)
KEEPS the header in its OUTPUT buffer.
Hantro's rockchip_vpu2_vp8_dec_run (rockchip_vpu2_hw_vp8_dec.c:349)
hard-codes 'first_part_offset = V4L2_VP8_FRAME_IS_KEY_FRAME(hdr) ? 10 : 3'
as the byte offset into OUTPUT where the first compressed partition
starts. It uses this offset for:
- mb_offset_bits = first_part_offset * 8 + first_part_header_bits + 8
- dct_part_offset = first_part_offset + first_part_size
Without the header, every offset is wrong, the entropy decoder
spins on the wrong bytes, and every frame decodes to garbage.
Fix: in codec_store_buffer for VAProfileVP8Version0_3, prepend
header_size bytes (10 keyframe / 3 interframe) of zeros to OUTPUT
before the slice data memcpy. Hantro skips these bytes for actual
parsing (uses ctrl-struct values instead), so zero-fill is fine.
Empirical: iter33 kernel printk in vpu2_vp8_dec_run dumped the
v4l2_ctrl_vp8_frame struct for libva vs kdirect and confirmed
byte-identical control fields. Only the OUTPUT buffer bytes
differed, traced to ffmpeg-vaapi's header stripping.
LIBVA_VP8_DUMP_FRAME=1 prints the v4l2_ctrl_vp8_frame struct fields
to stderr before VIDIOC_S_EXT_CTRLS. Goal: diff libva-side struct
against expected kdirect-side values for VP8 frame-2+ divergence
(libva produces non-trivial but wrong output; kdirect VP8 byte-equal
to SW). Env-gated, no behavior change otherwise.
ROOT CAUSE FIX for HEVC frame 2+ divergence (Bug 5 remainder).
rkvdec's assemble_sw_rps (rkvdec-hevc.c:386-389) uses
sl_params->short_term_ref_pic_set_size to compute the bit offset where
long-term RPS data starts in the slice header. When zero, it falls back
to fls(num_short_term_ref_pic_sets - 1) — wrong when num=1 (BBB's case).
α-26 misdirected: set decode_params->short_term_ref_pic_set_size = st_rps_bits
but rkvdec doesn't use that field. The correct consumer is slice_params per
V4L2 spec and rkvdec source.
VAAPI's picture->st_rps_bits is documented as: 'number of bits that structure
short_term_ref_pic_set(num_short_term_ref_pic_sets) takes in slice segment
header when short_term_ref_pic_set_sps_flag equals 0' — exactly what
sl_params->short_term_ref_pic_set_size means.
Frames 1 (IDR) unaffected (V4L2 rkvdec gates on !IDR_PIC flag).
Frames 2+: bit offset for long-term RPS now correct, slice header parsing
no longer falls off the edge of the entropy bitstream.
Default behavior unchanged: counter*1000ns same as before.
With LIBVA_TS_SCALE=N, multiplies the ns timestamp by N. Lets us
sweep timestamp magnitude to test whether small-ts collides with
stale CAPTURE entries in vb2_find_buffer for HEVC frame 2+ bug.
Also keeps iter29 slice-tail probe from previous commit.
Env-gated via LIBVA_HEVC_DUMP_SLICE_TAIL=1. Goal: characterise the
40-byte inflation in libva's slice_data buffer vs ffmpeg-v4l2request
(see iter27/28 close — HEVC frame 2+ divergence at byte 1382401).
Dumps per slice: nal_unit_type, slice_data_size, slice_data_byte_offset,
and the last 80 bytes of source_data for that slice. Lets us see if the
trailing 40 bytes are (a) real entropy, (b) trailing zeros, (c) a
next-NAL start code prefix, or (d) random memory.
iter28b tested LIBVA_HEVC_TRIM_TRAILING=40 on HEVC. Result: hash
differed at byte 899745 (inside frame 1, NOT just frame 2 boundary at
byte 1382401). Trimming 40 bytes off the IDR slice (96890→96850)
corrupted frame 1. The 40-byte inflation is not uniform per slice;
requires dynamic detection (e.g., scan for rbsp_stop_one_bit) or
per-slice-type logic.
VAAPI's slice_data_size includes NAL+slice header bytes that precede the
slice payload. rkvdec_hevc expects bit_size to cover the slice payload
(starting at data_byte_offset). Setting bit_size = slice_data_size * 8
made rkvdec read past slice payload → wrong entropy state → frame 2+
garbage despite correct ctx->image_fmt (iter25) and decode_params
(iter26).
Empirical match: with formula (slice_data_size - slice_data_byte_offset)
* 8, libva produces bit_size=44096 for BBB frame 2 matching kdirect's
44096 exactly per iter27 dmesg printk.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
VAPictureParameterBufferHEVC exposes st_rps_bits — the number of bits
the inline short_term_ref_pic_set syntax element takes in the slice
header. rkvdec's DPB resolution for P/B frames uses this to skip the
RPS data correctly; with size=0 it skips wrong bytes and reads wrong
references → frame 2+ visual divergence.
iter25 evidence: libva HEVC frame 1 byte-identical to kdirect, but
frame 2 diverges at the decode_params bytes 4-5 (libva 0x00 0x00,
kdirect 0x0a 0x00 = 10).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
rkvdec_h264_validate_sps doubles height when FRAME_MBS_ONLY is unset
(field-to-frame). Dummy with 1080-height was failing validation as
2176 > 1080, returning -EINVAL silently (void-cast). Even though libva
ignores the result of v4l2_set_controls, the side effect was leaving
ctx->image_fmt at ANY → first per-frame H264_SPS still hit -EBUSY in
try_or_set_cluster → setup loop broke (Bug 4 unchanged).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Root cause for Bug 5 (HEVC libva = all-zero CAPTURE) and Bug 4 (H.264
libva = keyframe partial), localized via iter17→iter24 kernel-printk
chain:
rkvdec_s_ctrl() for HEVC_SPS / H264_SPS calls get_image_fmt() and,
if the resolved image_fmt differs from cached ctx->image_fmt (default
RKVDEC_IMG_FMT_ANY at open), tries to reset the CAPTURE format.
Format reset returns -EBUSY when vb2_is_busy(CAPTURE_queue) — any
CAPTURE buffer allocated blocks the change.
libva (iter5b-β) pre-allocates 24 CAPTURE buffers at CreateContext
via cap_pool_init, BEFORE any per-frame S_EXT_CTRLS. First per-frame
HEVC_SPS therefore fails with -EBUSY in try_or_set_cluster, breaks
v4l2_ctrl_request_setup's outer loop, leaves all 5 staged HEVC
compound controls at zero in ctx->ctrl_hdl. rkvdec_hevc_run reads
zero (iter20 dmesg: sps[0..16]=00..00), hardware sees w=0 h=0,
CAPTURE comes out all-zero (Bug 5).
Fix: BEFORE cap_pool_init, inject one S_EXT_CTRLS (no request, no
which) with a synthetic SPS containing the profile's known chroma +
bit_depth. CAPTURE queue is still empty at this point → vb2_is_busy
returns false → rkvdec_s_ctrl succeeds, ctx->image_fmt is updated to
the profile's image_fmt. From then on, per-frame SPS submissions with
matching chroma + bit_depth see image_fmt_changed=false → skip reset
→ commit succeeds.
VP9 / MPEG-2 / VP8 paths are not affected: VP9's rkvdec coded_fmt_desc
has no get_image_fmt op; MPEG-2 + VP8 route to hantro.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Env-gated by LIBVA_V4L2_REQ_GETBACK. After v4l2_set_controls() against
the request_fd in h265_set_controls(), issue G_EXT_CTRLS with the same
request_fd targeting SPS and log first 16 bytes returned.
iter20 (kernel printk) found rkvdec sees all-zero ctx->ctrl_hdl SPS for
libva HEVC vs correct bytes for kdirect. The remaining branch is whether
req->p_new was ever staged with libva's payload, or whether
v4l2_ctrl_request_setup failed to apply it.
α-24 distinguishes the two:
zero readback -> staging failed in v4l2_s_ext_ctrls
non-zero -> apply failed in v4l2_ctrl_request_setup
EACCES -> kernel disallows req readback; need deeper printk
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If removing DECODE_PARAMS from libva's S_EXT_CTRLS batch lets the other
4 controls stage, rkvdec_hevc_run printk will show w=1280 h=720 etc.
That confirms DECODE_PARAMS specifically is failing kernel validation
and rolling back the whole batch.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Tests mechanism 5 (silent partial failure). If error_idx != count after
S_EXT_CTRLS, one of the per-request controls was rejected by the kernel
even though the ioctl returned 0.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Static storage for sps/pps/decode_params/scaling_matrix + no-free for
slice_params_array. Tests the kernel-defers-compound-copy hypothesis
from iter17 P7 finding.
If hashes change -> mechanism 3 confirmed; will refactor to per-surface
heap allocation.
If hashes unchanged -> mechanism 3 disproved; iter19 explores
mechanisms 1/2/5.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>