iter39: extend auto-probe to a 3rd fd for RK3588 rockchip,rk3588-av1-vpu-dec #2
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
iter39: extend auto-probe to a 3rd fd for RK3588
rockchip,rk3588-av1-vpu-decMotivation
On ampere (RK3588), the V4L2 decoder topology is three independent decoder cores, not two:
(
/dev/media2isrockchip,rk3588-vepu121-enc— encoder, out of scope.)The libva-v4l2-request-fourier auto-probe introduced in iter38 (
request.c::request_datacarryingvideo_fd_rkvdec+media_fd_rkvdec+video_fd_hantro+media_fd_hantro) is hard-capped at 2 fds: one rkvdec + one hantro instance. The RK3588 av1-vpu-dec is also a hantro-driven device but it's a different hantro instance from the genericrk3568-vpu-decone (different DT compatible, different/dev/video*node, different supported pixfmts). The iter38 probe picks the first hantro it finds and skips the rest —/dev/video4(AV1) is therefore never opened.Result:
vainfodoes not enumerateVAProfileAV1on ampere even though the kernel exposesAV1Fcleanly viav4l2-ctl -d /dev/video4 --list-formats-out.Proposed iter39
Generalize
struct request_datafrom per-kind fixed slots to a table or array of(fd, media_fd, driver_kind, codec_set)tuples. Three sub-changes:video_fd_rkvdec+…_hantro(and helpersrequest_device_kind_for_profile()returning'r'/'h') withstruct decoder_slot decoders[MAX_DECODERS];where each slot tracks fd, media-fd, driver-kind (rkvdec/hantro-rk3568/hantro-rk3588-av1/ …), and the V4L2 OUTPUT pixfmts it advertises./dev/media*(or/dev/video*— pick the simpler enumeration), for each media node read driver name + card type viaMEDIA_IOC_DEVICE_INFO, classify into a known driver-kind via a small table, and add a slot. Cap at a sensible MAX_DECODERS (3 or 4 for the current fleet).RequestCreateConfigper-profile routing becomes a table lookup: profile → required driver-kind → find a matching slot → switch activedriver_data->video_fd/media_fd. Or keep the slot-array indexed access alive past CreateConfig if the decode loop ever needs to retarget mid-frame (it doesn't today, but the API allows).RequestQueryConfigProfilesalready iterates fds + unions per iter38b'sany_fd_supports_output_format()helper; just extend that helper to iterate the slot array.Boundary
This is pure userspace, backend-only. No kernel changes. The kernel already exposes the AV1 decoder; the backend just needs to look for it.
The same generalization helps any future RK35xx variant that grows additional decoder cores (e.g. a hypothetical
rk3588-vp9-decif VP9 lands as its own dedicated block instead of inside rkvdec).Acceptance
vainfoon ampere listsVAProfileAV1. ampere-fourier iter4 (planned) validates AV1 decode end-to-end via the iter39 backend through/dev/video4against an AV1-encoded test clip (clip provenance TBD — BBB isn't AV1 by default).Out of scope for iter39
missing multi-core support, ignoring this instancecores. Those nodes don't even register, so they're not user-visible. Multi-core glue is upstream-kernel work.Refs
~/src/ampere-fourier/phase0_findings.md(decoder topology evidence)feedback_multi_device_probe_design(iter38 architecture, the iter39 generalizes)marfrit/libva-v4l2-request-fourier @ 7ac934eTriage refresh 2026-05-18. Still valid. Acceptance criterion (
vainfo on ampere lists VAProfileAV1) is unchanged — no progress on the AV1 enumeration goal.What did happen instead (partial pattern precedent)
iter40 added a 3rd hardcoded fd pair to
request.hfor the Pi 5 HEVC decoder, NOT for ampere's AV1:The iter40 commit (
3ffa9d0 iter40: Pi 5 HEVC chapter — backend integration lands) shows the same shape the iter38 multi-device-probe established. So the architectural pattern works for 3 decoder kinds.But the proposed generalization in this issue —
struct decoder_slot decoders[MAX_DECODERS]array — was NOT taken. Each new decoder kind currently means hardcoding another pair.Operator decision paths
a) Minimal-delta path (matches established iter40 precedent): add a 4th hardcoded pair
video_fd_hantro_av1+media_fd_hantro_av1, classifier inrequest_proberecognizingrk3588-av1-vpu-dec, dispatch inRequestCreateConfigforVAProfileAV1. ~50 LOC, mirrors iter40 exactly. Drawback: codebase carries N hardcoded pairs, will need same surgery for every future decoder kind.b) Generalize now (this issue's original proposal): refactor to
struct decoder_slot decoders[MAX_DECODERS]array, rewrite probe + dispatch as table-driven, migrate the existing 3 pairs into the array. Larger change (~200 LOC + test rewrite for iter38-baseline preservation), but caps future growth.c) Wait for AV1 priority: AV1 source clip + Phase 4 byte-exactness criteria are also unaddressed (issue's own out-of-scope list). If nothing is actively decoding AV1 in the fleet today, this can wait — once a real consumer materializes, do (a) for the AV1 enablement plus (b) preemptively.
Recommend (a) at the time the fleet has a concrete AV1 use case (firefox-fourier playing YouTube AV1 streams is a candidate driver — see
daedalus-fourierREADME for the YouTube ∩ Pi5-HW = ∅ context, which makes ampere the right hardware for YouTube AV1). If no concrete consumer arrives in, say, a month, defer (c).The bigger refactor (b) is best landed concurrently with another decoder-kind addition where the operator wants to amortize the cost, not as a standalone cleanup pass.
Keeping open as waiting-on-operator-priority. No empirical reproduction needed for this one — symptom (
vainfolacks AV1) is invariant until the backend gains the 4th-fd code path.Headline acceptance criterion met 2026-05-18 — one-line fix.
What turned out to be the bug
ampere's
av1-iter1branch has been doing the heavy lifting for months — Phase 2 step 2 (commit61db76e) added theVAProfileAV1Profile0enumeration inconfig.c; Phase 2 step 1 (commitbed75c0) added thevideo_fd_vpu981slot + AV1F-discriminating probe; Phase 2 step 4 (78a9978) added ~500 LoC AV1 dispatch scaffolding; Phase 3 (d7ef0f6) reached 3/10 frames bit-exact vs kdirect. Butvainfowas still not listing the profile.Diagnosed via strace:
v4l2-request: ampere-av1: vpu981 AV1 decoder at /dev/video4 + /dev/media3any_fd_supports_output_format()helper returns trueif (found && index < (V4L2_REQUEST_MAX_PROFILES - 1))fails because by the time the AV1 push is reached, 10 profiles are already inprofiles[](MPEG2×2 + H264×5 + HEVC + VP8 + VP9),index = 10,MAX_PROFILES = 11, so10 < 11 - 1 = 10→ false → AV1 silently dropped.Off-by-one in the bounds-check pattern. Comment at the AV1 push said "MAX_PROFILES=11 is now EXACTLY full with this addition" — but the
< MAX - 1guard requires room for ANOTHER slot AFTER the push, so EXACTLY full doesn't fit through the guard.Fix
One-line bump in
src/request.h. The other guards each take their own N-off-the-top for their push groups; their semantics are unchanged.context->max_profiles = V4L2_REQUEST_MAX_PROFILESsizes the consumer-side array automatically.Verification 2026-05-18
Acceptance criterion met (
vainfo on ampere lists VAProfileAV1).State of the fix
Committed on ampere's local
av1-iter1branch asd21feba. Ampere's git remote only hasorigin=marfrit/libva-v4l2-request-fourier(no claude-noether SSH alias configured there), so I haven't pushed. Options:a) Operator picks up
d21febaand merges into the in-progress av1-iter1 work directly.b) I add a claude-noether remote on ampere + push to claude-noether's fork + open a small PR for the 1-line fix.
c) I create a fresh branch on noether's checkout, port the 1-line change there, push via claude-noether normal flow.
The rest of the av1-iter1 work (3/10 frames bit-exact, film_grain handling, dispatch scaffolding) is in flight in the operator's own iteration — issue #2's headline ask is met today but the full AV1 decode bit-exact pass is Phase 4 work that this issue's "out of scope" section calls out.
Recommend closing this issue once the operator picks an integration path for the 1-line fix.
Cross-references: VP9 enablement issue #12 closed today with the same shape (1-line
any_fd_supports_output_formatextension covered AV1 too as a side effect).feedback_no_bbb_intro_framesis the cross-cutting discipline that this work also benefited from.Closing — headline ask delivered.
PR #5 merged: master now has the
vpu9814th-fd probe +VAProfileAV1Profile0enumeration + the defensive MAX_PROFILES bump 13 → 14 (per the off-by-one logic correctness — total possible enumeration if iter39 Option B reverts = 13, guards need MAX ≥ 14).vainfo on ampere lists
VAProfileAV1Profile0✓ (verified pre-merge against the patched build; ampere's running.sothen restored to the in-progressav1-iter1build so the operator's Phase 3-5 bit-exact work isn't disrupted).End-to-end AV1 decode bit-exact is iter4 work that the
av1-iter1operator branch continues to drive (~500 LoC av1.c dispatch + film_grain wiring + reference_frame_ts plumbing). When that lands, the merge against today's master will be small (mostly the av1.{c,h} additions + the picture.c dispatch hook).Reopen criteria
Today's coverage: