Locks the research question, captures hardware + kernel + userland state, walks the V4L2 decoder topology (3 decoder cores), enumerates libva backend behavior, anchors the in-session N=1 baseline (H.264 via rkvdec, VP8 + MPEG-2 via hantro — all decode end-to-end), and documents the three blockers (kernel HEVC OOPS in rkvdec_hevc_prepare_hw_st_rps; kernel VP9 not exposed on RK3588 rkvdec; libva backend iter38 hard-capped at 2 fds, AV1 unprobed). Predecessor carryover rules followed: backend source pin + memory entries carry; per-codec FPS numbers and bit-exact criteria do not. Open questions tabled for Phase 1 goal lock: 1. Success-metric scope: 3 / 5 / 6 codecs. 2. RK3588 bit-exact anchor (kdirect adapt vs SW byte-compare). 3. HEVC OOPS pre-decode vs post-decode (instrumentation Q). 4. firefox-fourier vendor defaults adequacy on RK3588. 5. AV1 source clip provenance for the eventual iter39 test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
13 KiB
Phase 0 — Substrate / Motivation / Inventory
Closed 2026-05-16. Iteration 1 of the ampere-fourier campaign. Locks the research question, captures the substrate, anchors the in-session baseline, surfaces open questions.
Research question
Can the libva-v4l2-request-fourier backend (iter38b, fresnel-certified) deliver HW decode on RK3588 for every codec the SoC actually exposes via mainline-rc3 V4L2 stateless — and where it can't, what's the minimal kernel-agent experiment / backend iteration that closes each remaining codec?
Operator-supplied mechanism (verbatim 2026-05-16):
ampere should not have fourier-specific kernel patches. If any are necessary, those should be added as an experiment with kernel-agent.
So the campaign is userspace-first: validate the codecs that the clean kernel supports, and for everything else, file kernel-agent experiments (separate fleet target, not the baseline).
Hardware substrate
| Property | Value |
|---|---|
| Board | CoolPi CM5 GenBook |
| Compatible chain | coolpi,pi-cm5-genbook coolpi,pi-cm5 rockchip,rk3588 |
| Booted kernel | 7.0.0-rc3-devices+ from linux-ampere-fourier 7.0rc3.kafr1-1 (vanilla torvalds v7.0-rc3 + ampere DTS/board patches; no codec patches) |
| Boot stack | extlinux entry arch_mainline |
| kernel cmdline | … cma=256M cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory coherent_pool=2M memtest=2 |
| libva backend | libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1 packaged BUT broken on HEVC (campaign sibling issue marfrit-packages#17). Working hand-build (md5 0c9a7efaab6a31b74536ac1abff18dfa, 484 800 B) reinstalled via sudo install -m644 over the package's file. |
| ffmpeg | ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4 |
| mpv | mpv-fourier 1:0.41.0-10 (iter2, ships --hwdec=v4l2request-copy) |
| libva runtime | libva 2.23.0-1, libva-utils 2.22.0-1, libdrm 2.4.133-1, linux-api-headers 6.19-1 |
/usr/include/linux/v4l2-controls.h sha256 f70d5224… — same as fresnel and fermi (verified during marfrit-packages#17 forensics). V4L2 control struct layout is byte-identical across the fleet.
V4L2 decoder topology (v4l2-ctl -d /dev/videoN --info)
| Node | Media | Driver | Card type (DT compatible) | OUTPUT pixfmts |
|---|---|---|---|---|
/dev/video0 |
— | rockchip-rga |
rockchip-rga |
BA24 / AR24 / XR24 / RGB3 / BGR3 / AR12 / AR15 / RGBP / XR15 / XB24 / AB24 / BX24 / RA24 / BA15 / RA15 / YU12 / YV12 / NV12 / NV21 / YUYV / UYVY / NV16 / NV61 / YM12 / YM21 / NM12 / NM21 / YM16 / YM61 / TL44 / RM12 — 2D blit, out of scope |
/dev/video1 |
/dev/media0 |
rkvdec |
(no DT card name in this kernel) | S265 (HEVC), S264 (H.264) |
/dev/video2 |
/dev/media1 |
hantro-vpu |
rockchip,rk3568-vpu-dec |
S264 (H.264), MG2S (MPEG-2), VP8F (VP8) |
/dev/video3 |
/dev/media2 |
hantro-vpu |
rockchip,rk3588-vepu121-enc |
YM12 / NM12 / YUYV / UYVY — encoder, out of scope |
/dev/video4 |
/dev/media3 |
hantro-vpu |
rockchip,rk3588-av1-vpu-dec |
AV1F (AV1) |
/dev/video5, /dev/video6 |
/dev/media4 |
uvcvideo |
USB Camera | not decoders |
Three independent decoder cores. No VP9 pixfmt is exposed by any node on this kernel.
dmesg at boot also reports four … missing multi-core support, ignoring this instance lines (fdc40100, fdba4000, fdba8000, fdbac000) — RK3588 has additional rkvdec+hantro cores that mainline doesn't bind because the multi-core glue isn't implemented. So the advertised throughput cap is single-core per block, not full-SoC.
libva backend behavior
$ LIBVA_DRIVER_NAME=v4l2_request vainfo
v4l2-request: auto-selected codec device: /dev/video1 + /dev/media0
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media1
vainfo: VA-API version: 1.23 (libva 2.22.0)
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointVLD
9 profiles — same count as iter38 pre-bounds-fix on fresnel (no VP9 because the kernel doesn't expose it for rkvdec to pick up; no AV1 because the iter38 auto-probe is hard-capped at 2 fds: one rkvdec + one hantro-vpu instance). The third decoder (/dev/video4 AV1) is therefore invisible to the libva session even though the kernel exposes it.
In-session baseline anchor (Phase 0 N=1)
Test material: BBB 60 s 720p, re-encoded on ampere via ffmpeg-v4l2-request-fourier with same encoder presets as fresnel-fourier iter1 measurements (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30). 5 clips in ~/measurements/encoded/.
Probe (libva ffmpeg, ~/measurements/safe_probe.sh), single-shot post-reboot, full clip decode:
=== ampere baseline: libva HW decode probe (H.264 + VP8 + MPEG-2) ===
h264 rkvdec /dev/video1 OK elapsed=3.15s
vp8 hantro /dev/video2 OK elapsed=6.62s v4l2-request: Unable to set control(s): Invalid argument
mpeg2 hantro /dev/video2 OK elapsed=7.23s v4l2-request: Unable to set control(s): Invalid argument
3-of-6 codecs decode end-to-end. The "Unable to set control(s)" warnings on VP8 / MPEG-2 are from the cross-codec capability probe phase (rkvdec rejects VP8 / MPEG-2 controls and hantro rejects HEVC controls; iter38 auto-probe walks both fds at startup, captures these as warnings, ignores). Not a decode-time failure.
dmesg clean post-probe — only the usual boot init lines, no OOPSes.
This is N=1, not enough to lock as a load-bearing floor. Phase 3 will repeat at N=3+ with real instrumentation (lsof on the v4l2 fds, perf counters on rkvdec/hantro IRQs, libva trace, byte-comparison of decoded YUV vs SW reference — not stdout-grepped FPS).
Codecs blocked, and why
HEVC — kernel OOPS in rkvdec_hevc_prepare_hw_st_rps
First HEVC decode attempt produced this kernel oops (full dmesg captured):
Modules linked in: ...rockchip_vdec...
lr : rkvdec_hevc_prepare_hw_st_rps+0x38/0x300 [rockchip_vdec]
Call trace:
__pi_memcmp+0x10/0x110 (P)
rkvdec_hevc_assemble_hw_rps+0x1c/0xac [rockchip_vdec]
rkvdec_hevc_run+0x12c/0x148 [rockchip_vdec]
rkvdec_device_run+0x48/0x104 [rockchip_vdec]
v4l2_m2m_try_run+0x80/0x13c [v4l2_mem2mem]
v4l2_m2m_request_queue+0xe8/0x140 [v4l2_mem2mem]
media_request_ioctl_queue+0x1bc/0x2f4 [mc]
media_request_ioctl+0xc8/0x160 [mc]
__arm64_sys_ioctl+0xa4/0x100
Cascading effect: after this oops the whole v4l2_mem2mem path wedges. Subsequent H.264 (rkvdec) AND VP8 / MPEG-2 (hantro) ffmpeg -hwaccel vaapi invocations block in futex wait inside libva, with the underlying ioctl thread blocked in the kernel m2m queue. The wedge persists until reboot.
This is a kernel bug in the RK3588 mainline rkvdec HEVC path — specifically the RPS (Reference Picture Set) preparation function. Not a libva backend bug (the working hand-built 0c9a7efa… backend is the one in use; the libva controls are what fresnel sends successfully to its own RK3399 rkvdec). The faulting instruction is __pi_memcmp — likely a NULL or unmapped pointer dereference during RPS comparison.
Per memory feedback_va_st_rps_bits_is_slice_field: fresnel iter31 had a related RPS issue, but that one was on the libva side (writing st_rps_bits into slice_params rather than decode_params). The libva-side fix is already in place at backend 7ac934e. The ampere oops is downstream of that — kernel-side.
Path to unblock: kernel-agent experiment with a candidate fix. Either:
- Find an existing upstream fix on linux-media or rockchip-linux trees (RK3588 rkvdec HEVC may have a fix in a later -rc).
- Bisect within mainline to localize the regression.
- Write a defensive guard around the memcmp call in
rkvdec_hevc_prepare_hw_st_rps.
Filed as separate kernel-agent issue to follow.
VP9 — kernel doesn't expose V4L2_PIX_FMT_VP9_FRAME on rkvdec
v4l2-ctl -d /dev/video1 --list-formats-out returns only S264 + S265. The kernel module v4l2_vp9 is loaded (per lsmod: used by rockchip_vdec and hantro_vpu), but the RK3588 rkvdec binding does not register V4L2_PIX_FMT_VP9_FRAME as an output format. Likely needs DTS work + a driver patch to add the VP9 control set to the RK3588 rkvdec variant_ops, matching what RK3399 has.
Path to unblock: kernel-agent experiment with VP9 enablement patch for RK3588 rkvdec. Out of scope for the baseline per operator policy.
AV1 — libva backend iter38 only probes 2 fds, not 3
/dev/video4 (AV1) is exposed by the kernel — v4l2-ctl --list-formats-out shows AV1F. But the libva-v4l2-request-fourier auto-probe (iter38 multi-device-probe in src/request.c::request_data) tracks only video_fd_rkvdec + video_fd_hantro — there's no slot for a third decoder kind. The AV1 fd is never opened, so VAProfileAV1 never appears in vainfo.
Path to unblock: backend iter39. Extend struct request_data to a third fd (or to a table-driven dispatch over an arbitrary number of fds), update request_device_kind_for_profile() to return 'a' (av1) and route AV1 profiles to the new fd, add bounds and termination handling. Pure userspace work; no kernel involvement.
Predecessor data — what carries vs what doesn't
Per feedback_dev_process.md Phase 0 rules:
Carries (state):
- libva-v4l2-request-fourier source pin
7ac934e(iter38b, fresnel-certified) - The marfrit-packages CI build bug for libva (
marfrit-packages#17) — sameae611d80…packaged binary deployed to ampere, same workaround (hand-build over) applies - The encode preset set (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30)
- Memory entries that govern decoder-output verification (
feedback_pixel_compare_in_yuv_not_png,feedback_hw_decode_engagement_check,feedback_rockchip_pixel_verify_path)
Does NOT carry (data):
- Per-codec FPS numbers from fresnel-fourier iter1 measurements (RK3588 ≠ RK3399, different CPU + decoder block specs, no read-across)
- Per-codec bit-exact PASS criterion (fresnel's libva == kdirect held on RK3399; needs separate proof on RK3588)
- mpv
--hwdec=v4l2request-copy236 FPS for H.264 from fresnel — recompute on ampere
Ampere binding cells in this campaign anchor to in-session-acquired numbers, even when fresnel measured an identical condition.
Open questions tabled into Phase 1
- What is the success metric? "All 6 codecs HW-decode" requires three external dependencies (kernel HEVC fix, kernel VP9 enablement, backend iter39). Phase 1 must pick whether the iteration locks against (a) the 3 currently-decodable codecs, (b) all 5 web-relevant codecs (excludes MPEG-2 from a Firefox-consumer view since
video/mp2tis not supported by Gecko), (c) all 6. - What is bit-exactness anchored to on RK3588? fresnel used
libva == kdirect(the kernel-direct V4L2 stateless test rig that bypasses libva). Does kdirect still build / run on RK3588? Probably needs adaptation. Alternative: anchor onlibva == ffmpeg-SW-decodebyte-for-byte, accepting that some codecs (MPEG-2 IDCT, H.264 inter-pred drift) won't be bit-exact and need an SSIM-floor instead. - Does the kernel HEVC OOPS cascade trip BEFORE actual decode, or only AFTER the first frame is queued? Phase 3 instrumentation (ftrace on
rkvdec_hevc_runentry, perf on__pi_memcmpsite) will tell us — that determines whether we can sandbox HEVC viatry_decode_one_framewithout wedging the m2m subsystem. - Are firefox-fourier vendor defaults sufficient on ampere? Not yet installed. Fresnel: yes for 3/4 codecs after
firefox-fourier 150.0.1-5shippedrockchip-fourier-defaults.js. Same package on RK3588 — probably yes, but unverified. - What's the AV1 source clip? None of BBB / Sintel are AV1-encoded in their standard distributions. Need to either (a) re-encode BBB to AV1 via libaom-av1 (slow but reproducible) or (b) source a short AV1 sample from elsewhere (e.g. AV1 sample bag) for the iter39 backend test once that lands.
Phase 0 close
Substrate, motivation, inventory captured. Baseline anchor at N=1 confirms the working 3-codec subset; HEVC kernel oops + VP9 kernel gap + AV1 backend gap documented as the three open work items. Ready for Phase 1 goal lock.
Iteration log:
- iter1: 2026-05-16 — this document.