Files
ampere-fourier/phase0_findings.md
marfrit de0f267d05 iter1 phase0: substrate / motivation / inventory
Locks the research question, captures hardware + kernel + userland
state, walks the V4L2 decoder topology (3 decoder cores), enumerates
libva backend behavior, anchors the in-session N=1 baseline (H.264
via rkvdec, VP8 + MPEG-2 via hantro — all decode end-to-end), and
documents the three blockers (kernel HEVC OOPS in
rkvdec_hevc_prepare_hw_st_rps; kernel VP9 not exposed on RK3588
rkvdec; libva backend iter38 hard-capped at 2 fds, AV1 unprobed).

Predecessor carryover rules followed: backend source pin + memory
entries carry; per-codec FPS numbers and bit-exact criteria do not.

Open questions tabled for Phase 1 goal lock:
  1. Success-metric scope: 3 / 5 / 6 codecs.
  2. RK3588 bit-exact anchor (kdirect adapt vs SW byte-compare).
  3. HEVC OOPS pre-decode vs post-decode (instrumentation Q).
  4. firefox-fourier vendor defaults adequacy on RK3588.
  5. AV1 source clip provenance for the eventual iter39 test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 07:19:57 +00:00

13 KiB

Phase 0 — Substrate / Motivation / Inventory

Closed 2026-05-16. Iteration 1 of the ampere-fourier campaign. Locks the research question, captures the substrate, anchors the in-session baseline, surfaces open questions.

Research question

Can the libva-v4l2-request-fourier backend (iter38b, fresnel-certified) deliver HW decode on RK3588 for every codec the SoC actually exposes via mainline-rc3 V4L2 stateless — and where it can't, what's the minimal kernel-agent experiment / backend iteration that closes each remaining codec?

Operator-supplied mechanism (verbatim 2026-05-16):

ampere should not have fourier-specific kernel patches. If any are necessary, those should be added as an experiment with kernel-agent.

So the campaign is userspace-first: validate the codecs that the clean kernel supports, and for everything else, file kernel-agent experiments (separate fleet target, not the baseline).

Hardware substrate

Property Value
Board CoolPi CM5 GenBook
Compatible chain coolpi,pi-cm5-genbook coolpi,pi-cm5 rockchip,rk3588
Booted kernel 7.0.0-rc3-devices+ from linux-ampere-fourier 7.0rc3.kafr1-1 (vanilla torvalds v7.0-rc3 + ampere DTS/board patches; no codec patches)
Boot stack extlinux entry arch_mainline
kernel cmdline … cma=256M cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory coherent_pool=2M memtest=2
libva backend libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1 packaged BUT broken on HEVC (campaign sibling issue marfrit-packages#17). Working hand-build (md5 0c9a7efaab6a31b74536ac1abff18dfa, 484 800 B) reinstalled via sudo install -m644 over the package's file.
ffmpeg ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4
mpv mpv-fourier 1:0.41.0-10 (iter2, ships --hwdec=v4l2request-copy)
libva runtime libva 2.23.0-1, libva-utils 2.22.0-1, libdrm 2.4.133-1, linux-api-headers 6.19-1

/usr/include/linux/v4l2-controls.h sha256 f70d5224… — same as fresnel and fermi (verified during marfrit-packages#17 forensics). V4L2 control struct layout is byte-identical across the fleet.

V4L2 decoder topology (v4l2-ctl -d /dev/videoN --info)

Node Media Driver Card type (DT compatible) OUTPUT pixfmts
/dev/video0 rockchip-rga rockchip-rga BA24 / AR24 / XR24 / RGB3 / BGR3 / AR12 / AR15 / RGBP / XR15 / XB24 / AB24 / BX24 / RA24 / BA15 / RA15 / YU12 / YV12 / NV12 / NV21 / YUYV / UYVY / NV16 / NV61 / YM12 / YM21 / NM12 / NM21 / YM16 / YM61 / TL44 / RM12 — 2D blit, out of scope
/dev/video1 /dev/media0 rkvdec (no DT card name in this kernel) S265 (HEVC), S264 (H.264)
/dev/video2 /dev/media1 hantro-vpu rockchip,rk3568-vpu-dec S264 (H.264), MG2S (MPEG-2), VP8F (VP8)
/dev/video3 /dev/media2 hantro-vpu rockchip,rk3588-vepu121-enc YM12 / NM12 / YUYV / UYVY — encoder, out of scope
/dev/video4 /dev/media3 hantro-vpu rockchip,rk3588-av1-vpu-dec AV1F (AV1)
/dev/video5, /dev/video6 /dev/media4 uvcvideo USB Camera not decoders

Three independent decoder cores. No VP9 pixfmt is exposed by any node on this kernel.

dmesg at boot also reports four … missing multi-core support, ignoring this instance lines (fdc40100, fdba4000, fdba8000, fdbac000) — RK3588 has additional rkvdec+hantro cores that mainline doesn't bind because the multi-core glue isn't implemented. So the advertised throughput cap is single-core per block, not full-SoC.

libva backend behavior

$ LIBVA_DRIVER_NAME=v4l2_request vainfo
v4l2-request: auto-selected codec device: /dev/video1 + /dev/media0
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media1
vainfo: VA-API version: 1.23 (libva 2.22.0)
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
      VAProfileMPEG2Simple            : VAEntrypointVLD
      VAProfileMPEG2Main              : VAEntrypointVLD
      VAProfileH264Main               : VAEntrypointVLD
      VAProfileH264High               : VAEntrypointVLD
      VAProfileH264ConstrainedBaseline: VAEntrypointVLD
      VAProfileH264MultiviewHigh      : VAEntrypointVLD
      VAProfileH264StereoHigh         : VAEntrypointVLD
      VAProfileHEVCMain               : VAEntrypointVLD
      VAProfileVP8Version0_3          : VAEntrypointVLD

9 profiles — same count as iter38 pre-bounds-fix on fresnel (no VP9 because the kernel doesn't expose it for rkvdec to pick up; no AV1 because the iter38 auto-probe is hard-capped at 2 fds: one rkvdec + one hantro-vpu instance). The third decoder (/dev/video4 AV1) is therefore invisible to the libva session even though the kernel exposes it.

In-session baseline anchor (Phase 0 N=1)

Test material: BBB 60 s 720p, re-encoded on ampere via ffmpeg-v4l2-request-fourier with same encoder presets as fresnel-fourier iter1 measurements (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30). 5 clips in ~/measurements/encoded/.

Probe (libva ffmpeg, ~/measurements/safe_probe.sh), single-shot post-reboot, full clip decode:

=== ampere baseline: libva HW decode probe (H.264 + VP8 + MPEG-2) ===
  h264    rkvdec /dev/video1         OK  elapsed=3.15s
  vp8     hantro /dev/video2         OK  elapsed=6.62s     v4l2-request: Unable to set control(s): Invalid argument
  mpeg2   hantro /dev/video2         OK  elapsed=7.23s     v4l2-request: Unable to set control(s): Invalid argument

3-of-6 codecs decode end-to-end. The "Unable to set control(s)" warnings on VP8 / MPEG-2 are from the cross-codec capability probe phase (rkvdec rejects VP8 / MPEG-2 controls and hantro rejects HEVC controls; iter38 auto-probe walks both fds at startup, captures these as warnings, ignores). Not a decode-time failure.

dmesg clean post-probe — only the usual boot init lines, no OOPSes.

This is N=1, not enough to lock as a load-bearing floor. Phase 3 will repeat at N=3+ with real instrumentation (lsof on the v4l2 fds, perf counters on rkvdec/hantro IRQs, libva trace, byte-comparison of decoded YUV vs SW reference — not stdout-grepped FPS).

Codecs blocked, and why

HEVC — kernel OOPS in rkvdec_hevc_prepare_hw_st_rps

First HEVC decode attempt produced this kernel oops (full dmesg captured):

Modules linked in: ...rockchip_vdec...
lr : rkvdec_hevc_prepare_hw_st_rps+0x38/0x300 [rockchip_vdec]
Call trace:
 __pi_memcmp+0x10/0x110 (P)
 rkvdec_hevc_assemble_hw_rps+0x1c/0xac [rockchip_vdec]
 rkvdec_hevc_run+0x12c/0x148 [rockchip_vdec]
 rkvdec_device_run+0x48/0x104 [rockchip_vdec]
 v4l2_m2m_try_run+0x80/0x13c [v4l2_mem2mem]
 v4l2_m2m_request_queue+0xe8/0x140 [v4l2_mem2mem]
 media_request_ioctl_queue+0x1bc/0x2f4 [mc]
 media_request_ioctl+0xc8/0x160 [mc]
 __arm64_sys_ioctl+0xa4/0x100

Cascading effect: after this oops the whole v4l2_mem2mem path wedges. Subsequent H.264 (rkvdec) AND VP8 / MPEG-2 (hantro) ffmpeg -hwaccel vaapi invocations block in futex wait inside libva, with the underlying ioctl thread blocked in the kernel m2m queue. The wedge persists until reboot.

This is a kernel bug in the RK3588 mainline rkvdec HEVC path — specifically the RPS (Reference Picture Set) preparation function. Not a libva backend bug (the working hand-built 0c9a7efa… backend is the one in use; the libva controls are what fresnel sends successfully to its own RK3399 rkvdec). The faulting instruction is __pi_memcmp — likely a NULL or unmapped pointer dereference during RPS comparison.

Per memory feedback_va_st_rps_bits_is_slice_field: fresnel iter31 had a related RPS issue, but that one was on the libva side (writing st_rps_bits into slice_params rather than decode_params). The libva-side fix is already in place at backend 7ac934e. The ampere oops is downstream of that — kernel-side.

Path to unblock: kernel-agent experiment with a candidate fix. Either:

  • Find an existing upstream fix on linux-media or rockchip-linux trees (RK3588 rkvdec HEVC may have a fix in a later -rc).
  • Bisect within mainline to localize the regression.
  • Write a defensive guard around the memcmp call in rkvdec_hevc_prepare_hw_st_rps.

Filed as separate kernel-agent issue to follow.

VP9 — kernel doesn't expose V4L2_PIX_FMT_VP9_FRAME on rkvdec

v4l2-ctl -d /dev/video1 --list-formats-out returns only S264 + S265. The kernel module v4l2_vp9 is loaded (per lsmod: used by rockchip_vdec and hantro_vpu), but the RK3588 rkvdec binding does not register V4L2_PIX_FMT_VP9_FRAME as an output format. Likely needs DTS work + a driver patch to add the VP9 control set to the RK3588 rkvdec variant_ops, matching what RK3399 has.

Path to unblock: kernel-agent experiment with VP9 enablement patch for RK3588 rkvdec. Out of scope for the baseline per operator policy.

AV1 — libva backend iter38 only probes 2 fds, not 3

/dev/video4 (AV1) is exposed by the kernel — v4l2-ctl --list-formats-out shows AV1F. But the libva-v4l2-request-fourier auto-probe (iter38 multi-device-probe in src/request.c::request_data) tracks only video_fd_rkvdec + video_fd_hantro — there's no slot for a third decoder kind. The AV1 fd is never opened, so VAProfileAV1 never appears in vainfo.

Path to unblock: backend iter39. Extend struct request_data to a third fd (or to a table-driven dispatch over an arbitrary number of fds), update request_device_kind_for_profile() to return 'a' (av1) and route AV1 profiles to the new fd, add bounds and termination handling. Pure userspace work; no kernel involvement.

Predecessor data — what carries vs what doesn't

Per feedback_dev_process.md Phase 0 rules:

Carries (state):

  • libva-v4l2-request-fourier source pin 7ac934e (iter38b, fresnel-certified)
  • The marfrit-packages CI build bug for libva (marfrit-packages#17) — same ae611d80… packaged binary deployed to ampere, same workaround (hand-build over) applies
  • The encode preset set (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30)
  • Memory entries that govern decoder-output verification (feedback_pixel_compare_in_yuv_not_png, feedback_hw_decode_engagement_check, feedback_rockchip_pixel_verify_path)

Does NOT carry (data):

  • Per-codec FPS numbers from fresnel-fourier iter1 measurements (RK3588 ≠ RK3399, different CPU + decoder block specs, no read-across)
  • Per-codec bit-exact PASS criterion (fresnel's libva == kdirect held on RK3399; needs separate proof on RK3588)
  • mpv --hwdec=v4l2request-copy 236 FPS for H.264 from fresnel — recompute on ampere

Ampere binding cells in this campaign anchor to in-session-acquired numbers, even when fresnel measured an identical condition.

Open questions tabled into Phase 1

  1. What is the success metric? "All 6 codecs HW-decode" requires three external dependencies (kernel HEVC fix, kernel VP9 enablement, backend iter39). Phase 1 must pick whether the iteration locks against (a) the 3 currently-decodable codecs, (b) all 5 web-relevant codecs (excludes MPEG-2 from a Firefox-consumer view since video/mp2t is not supported by Gecko), (c) all 6.
  2. What is bit-exactness anchored to on RK3588? fresnel used libva == kdirect (the kernel-direct V4L2 stateless test rig that bypasses libva). Does kdirect still build / run on RK3588? Probably needs adaptation. Alternative: anchor on libva == ffmpeg-SW-decode byte-for-byte, accepting that some codecs (MPEG-2 IDCT, H.264 inter-pred drift) won't be bit-exact and need an SSIM-floor instead.
  3. Does the kernel HEVC OOPS cascade trip BEFORE actual decode, or only AFTER the first frame is queued? Phase 3 instrumentation (ftrace on rkvdec_hevc_run entry, perf on __pi_memcmp site) will tell us — that determines whether we can sandbox HEVC via try_decode_one_frame without wedging the m2m subsystem.
  4. Are firefox-fourier vendor defaults sufficient on ampere? Not yet installed. Fresnel: yes for 3/4 codecs after firefox-fourier 150.0.1-5 shipped rockchip-fourier-defaults.js. Same package on RK3588 — probably yes, but unverified.
  5. What's the AV1 source clip? None of BBB / Sintel are AV1-encoded in their standard distributions. Need to either (a) re-encode BBB to AV1 via libaom-av1 (slow but reproducible) or (b) source a short AV1 sample from elsewhere (e.g. AV1 sample bag) for the iter39 backend test once that lands.

Phase 0 close

Substrate, motivation, inventory captured. Baseline anchor at N=1 confirms the working 3-codec subset; HEVC kernel oops + VP9 kernel gap + AV1 backend gap documented as the three open work items. Ready for Phase 1 goal lock.

Iteration log:

  • iter1: 2026-05-16 — this document.