iter1 phase0: substrate / motivation / inventory

Locks the research question, captures hardware + kernel + userland
state, walks the V4L2 decoder topology (3 decoder cores), enumerates
libva backend behavior, anchors the in-session N=1 baseline (H.264
via rkvdec, VP8 + MPEG-2 via hantro — all decode end-to-end), and
documents the three blockers (kernel HEVC OOPS in
rkvdec_hevc_prepare_hw_st_rps; kernel VP9 not exposed on RK3588
rkvdec; libva backend iter38 hard-capped at 2 fds, AV1 unprobed).

Predecessor carryover rules followed: backend source pin + memory
entries carry; per-codec FPS numbers and bit-exact criteria do not.

Open questions tabled for Phase 1 goal lock:
  1. Success-metric scope: 3 / 5 / 6 codecs.
  2. RK3588 bit-exact anchor (kdirect adapt vs SW byte-compare).
  3. HEVC OOPS pre-decode vs post-decode (instrumentation Q).
  4. firefox-fourier vendor defaults adequacy on RK3588.
  5. AV1 source clip provenance for the eventual iter39 test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 07:19:57 +00:00
commit de0f267d05
2 changed files with 229 additions and 0 deletions
+67
View File
@@ -0,0 +1,67 @@
# ampere-fourier
## TL;DR
Peer campaign to [`fresnel-fourier`](../fresnel-fourier/) and [`libva-multiplanar`](../libva-multiplanar/), targeting **ampere** (CoolPi CM5 GenBook / Rockchip **RK3588**) — the third Rockchip generation the libva-v4l2-request-fourier backend gets exercised against (after ohm's RK3568 and fresnel's RK3399). Goal: make `libva-v4l2-request-fourier` deliver HW decode on RK3588 for every codec the SoC actually exposes via mainline V4L2 stateless.
## Hardware target
| Property | Value |
|---|---|
| Board | CoolPi CM5 GenBook (`coolpi,pi-cm5-genbook` / `coolpi,pi-cm5` / `rockchip,rk3588`) |
| SoC | Rockchip **RK3588** (4× Cortex-A76 + 4× Cortex-A55, Mali-G610) |
| Decoder block 1 | **rkvdec** (`/dev/media0`, `/dev/video1`) — H.264 + HEVC (mainline `rockchip_vdec` driver) |
| Decoder block 2 | **hantro-vpu** as `rockchip,rk3568-vpu-dec` (`/dev/media1`, `/dev/video2`) — H.264 + MPEG-2 + VP8 (generic hantro mainline) |
| Decoder block 3 | **hantro-vpu** as `rockchip,rk3588-av1-vpu-dec` (`/dev/media3`, `/dev/video4`) — AV1 dedicated decoder |
| Encoder block | **hantro-vpu** as `rockchip,rk3588-vepu121-enc` (`/dev/media2`, `/dev/video3`) — out of scope |
| RGA | `rockchip-rga` (`/dev/video0`) — 2D blit, out of scope |
| Kernel | `linux-ampere-fourier 7.0rc3.kafr1-1` (vanilla `torvalds v7.0-rc3` + ampere DTS / board patches only; **no fourier codec patches**) |
| Boot stack | extlinux entry `arch_mainline``Image-7.0.0-rc3-ARCH+` + `rk3588-coolpi-cm5-genbook.dtb-7.0.0-rc3-ARCH+` |
The decode surface is broader than fresnel's: three independent decoder cores (rkvdec + hantro-vpu + av1-vpu-dec) vs fresnel's two (rkvdec + hantro-vpu). The hantro-vpu binding here is a generic `rockchip,rk3568-vpu-dec` (not an RK3588-specific compatible) — kernel handles it via the rk3568 binding chain.
Mali-G610 (panfrost / panthor) is a different generation than fresnel's T860 and ohm's G52 — kwin / mesa / panfrost regressions or wins do **not** transfer.
## Scope (LOCKED 2026-05-16 in phase0_findings.md)
**In scope:**
- libva-v4l2-request-fourier backend exercised on ampere V4L2 decode nodes (rkvdec + hantro-vpu + av1-vpu-dec).
- Codecs: **everything the kernel V4L2 surface exposes** — confirmed via `v4l2-ctl --list-formats-out`:
- H.264 (`S264`) — rkvdec OR hantro
- HEVC (`S265`) — rkvdec only
- MPEG-2 (`MG2S`) — hantro only
- VP8 (`VP8F`) — hantro only
- AV1 (`AV1F`) — av1-vpu-dec only
- Consumers: `vainfo`, `ffmpeg -hwaccel vaapi`, `mpv --hwdec=v4l2request-copy`, `firefox-fourier` (with vendor-default prefs from marfrit-packages#8).
**Out of scope:**
- **VP9** on RK3588 rkvdec — kernel does not expose `V4L2_PIX_FMT_VP9_FRAME` on this device in mainline v7.0-rc3. Enabling it is a kernel-agent experiment (separate campaign / issue), **not part of the ampere baseline**.
- AV1 encoder, JPEG encoder, RGA — not decode work.
- Codec-side patches to `linux-ampere-fourier` — per user policy (2026-05-16), ampere stays on a clean mainline + board-DTS kernel. Anything else gets routed through kernel-agent as an **experiment** (separate branch / target), not the baseline package.
## Process
8(+1) phase loop per [`feedback_dev_process.md`](../../.claude/projects/-home-mfritsche-src/memory/feedback_dev_process.md). Phase 0 substrate in [`phase0_findings.md`](phase0_findings.md). Phase 5 review uses the sonnet-architect subagent pattern (`Plan` with `model: sonnet`).
Predecessor data carryover: fresnel-fourier reached iter38 close with the same backend (`libva-v4l2-request-fourier @ 7ac934e`, "iter38b"). Per `feedback_dev_process.md` Phase 0 rules, fresnel's per-codec FPS / bit-exactness numbers carry as **reference history only**; ampere binding cells anchor to in-session measurements on RK3588 hardware.
## Predecessor work this campaign builds on
- **[`../fresnel-fourier/`](../fresnel-fourier/)** — RK3399 peer campaign, closed iter38 at `e66c5c0`. Backend fork tip `7ac934e`. The libva backend, mpv-fourier, ffmpeg-v4l2-request-fourier, firefox-fourier packages all consumed unmodified on ampere.
- **[`../libva-multiplanar/libva-v4l2-request-fourier/`](../libva-multiplanar/libva-v4l2-request-fourier/)** — the backend fork itself.
- **[`marfrit/kernel-agent`](https://git.reauktion.de/marfrit/kernel-agent)** — issue #6 filed 2026-05-15 (bootstrap-missing + asks). Bootstrap part landed via PRs #8/#9/#10 on 2026-05-16, producing `linux-ampere-fourier 7.0rc3.kafr1-1`. Asks #2 (VP9) and #3 (AV1 integration) reframed as experiments per user policy — separate kernel-agent issues to follow.
- **[`marfrit/marfrit-packages` issue #17](https://git.reauktion.de/marfrit/marfrit-packages/issues/17)** — libva-v4l2-request-fourier CI build pkgrel=1 produces broken HEVC binary. ampere worked around by hand-rebuilding from source (md5 `0c9a7efa…`, 485 KB) over the packaged binary. Same workaround as fresnel until #17 is fixed.
## Operator-facing repo URL
`git.reauktion.de/marfrit/ampere-fourier` — to be created during the first iteration if there's something publish-worthy. For now this repo lives only at `~/src/ampere-fourier` on noether.
## Non-upstreaming default
Inherited from libva-multiplanar / `feedback_no_upstream.md`. Patches must be aligned to upstream in syntax and semantics; PR / MR / bug-report only on explicit operator instruction. The exception is `marfrit/kernel-agent` experiments, which exist precisely to track candidate upstream patches in a controlled fleet-rollout pipeline.
## Build infrastructure
ampere is itself a `kernel-agent` aarch64 build host (secondary, per kernel-agent README) — 8-core RK3588 + 32 GB RAM. For libva backend / mpv / ffmpeg builds the campaign uses ampere directly (small projects, hand-build is faster than packaging detour). For experimental kernel builds, kernel-agent dispatches to boltzmann (primary aarch64) or ampere (fallback).
+162
View File
@@ -0,0 +1,162 @@
# Phase 0 — Substrate / Motivation / Inventory
Closed 2026-05-16. Iteration 1 of the ampere-fourier campaign. Locks the research question, captures the substrate, anchors the in-session baseline, surfaces open questions.
## Research question
**Can the libva-v4l2-request-fourier backend (iter38b, fresnel-certified) deliver HW decode on RK3588 for every codec the SoC actually exposes via mainline-rc3 V4L2 stateless — and where it can't, what's the minimal kernel-agent experiment / backend iteration that closes each remaining codec?**
Operator-supplied mechanism (verbatim 2026-05-16):
> ampere should not have fourier-specific kernel patches. If any are necessary, those should be added as an experiment with kernel-agent.
So the campaign is **userspace-first**: validate the codecs that the clean kernel supports, and for everything else, file kernel-agent experiments (separate fleet target, not the baseline).
## Hardware substrate
| Property | Value |
|---|---|
| Board | CoolPi CM5 GenBook |
| Compatible chain | `coolpi,pi-cm5-genbook` `coolpi,pi-cm5` `rockchip,rk3588` |
| Booted kernel | `7.0.0-rc3-devices+` from `linux-ampere-fourier 7.0rc3.kafr1-1` (vanilla torvalds v7.0-rc3 + ampere DTS/board patches; no codec patches) |
| Boot stack | extlinux entry `arch_mainline` |
| kernel cmdline | `… cma=256M cgroup_enable=cpuset cgroup_memory=1 cgroup_enable=memory coherent_pool=2M memtest=2` |
| libva backend | `libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1` packaged BUT **broken on HEVC** (campaign sibling issue `marfrit-packages#17`). Working hand-build (md5 `0c9a7efaab6a31b74536ac1abff18dfa`, 484 800 B) reinstalled via `sudo install -m644` over the package's file. |
| ffmpeg | `ffmpeg-v4l2-request-fourier 2:8.1.r123329.b57fbbe-4` |
| mpv | `mpv-fourier 1:0.41.0-10` (iter2, ships `--hwdec=v4l2request-copy`) |
| libva runtime | `libva 2.23.0-1`, `libva-utils 2.22.0-1`, `libdrm 2.4.133-1`, `linux-api-headers 6.19-1` |
`/usr/include/linux/v4l2-controls.h` sha256 `f70d5224…` — same as fresnel and fermi (verified during marfrit-packages#17 forensics). V4L2 control struct layout is byte-identical across the fleet.
## V4L2 decoder topology (`v4l2-ctl -d /dev/videoN --info`)
| Node | Media | Driver | Card type (DT compatible) | OUTPUT pixfmts |
|------|-------|--------|---------------------------|-----------------|
| `/dev/video0` | — | `rockchip-rga` | `rockchip-rga` | BA24 / AR24 / XR24 / RGB3 / BGR3 / AR12 / AR15 / RGBP / XR15 / XB24 / AB24 / BX24 / RA24 / BA15 / RA15 / YU12 / YV12 / NV12 / NV21 / YUYV / UYVY / NV16 / NV61 / YM12 / YM21 / NM12 / NM21 / YM16 / YM61 / TL44 / RM12 — 2D blit, out of scope |
| `/dev/video1` | `/dev/media0` | `rkvdec` | (no DT card name in this kernel) | **S265** (HEVC), **S264** (H.264) |
| `/dev/video2` | `/dev/media1` | `hantro-vpu` | `rockchip,rk3568-vpu-dec` | **S264** (H.264), **MG2S** (MPEG-2), **VP8F** (VP8) |
| `/dev/video3` | `/dev/media2` | `hantro-vpu` | `rockchip,rk3588-vepu121-enc` | YM12 / NM12 / YUYV / UYVY — encoder, out of scope |
| `/dev/video4` | `/dev/media3` | `hantro-vpu` | `rockchip,rk3588-av1-vpu-dec` | **AV1F** (AV1) |
| `/dev/video5`, `/dev/video6` | `/dev/media4` | `uvcvideo` | USB Camera | not decoders |
Three independent decoder cores. No VP9 pixfmt is exposed by any node on this kernel.
dmesg at boot also reports four `… missing multi-core support, ignoring this instance` lines (`fdc40100`, `fdba4000`, `fdba8000`, `fdbac000`) — RK3588 has additional rkvdec+hantro cores that mainline doesn't bind because the multi-core glue isn't implemented. So the *advertised* throughput cap is single-core per block, not full-SoC.
## libva backend behavior
```
$ LIBVA_DRIVER_NAME=v4l2_request vainfo
v4l2-request: auto-selected codec device: /dev/video1 + /dev/media0
v4l2-request: iter38: also opened hantro-vpu decoder at /dev/video2 + /dev/media1
vainfo: VA-API version: 1.23 (libva 2.22.0)
vainfo: Driver version: v4l2-request
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264High : VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264MultiviewHigh : VAEntrypointVLD
VAProfileH264StereoHigh : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD
VAProfileVP8Version0_3 : VAEntrypointVLD
```
9 profiles — same count as iter38 pre-bounds-fix on fresnel (no VP9 because the kernel doesn't expose it for rkvdec to pick up; no AV1 because the iter38 auto-probe is hard-capped at 2 fds: one rkvdec + one hantro-vpu instance). The third decoder (`/dev/video4` AV1) is therefore invisible to the libva session even though the kernel exposes it.
## In-session baseline anchor (Phase 0 N=1)
Test material: BBB 60 s 720p, re-encoded on ampere via `ffmpeg-v4l2-request-fourier` with same encoder presets as fresnel-fourier iter1 measurements (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30). 5 clips in `~/measurements/encoded/`.
Probe (libva ffmpeg, `~/measurements/safe_probe.sh`), single-shot post-reboot, full clip decode:
```
=== ampere baseline: libva HW decode probe (H.264 + VP8 + MPEG-2) ===
h264 rkvdec /dev/video1 OK elapsed=3.15s
vp8 hantro /dev/video2 OK elapsed=6.62s v4l2-request: Unable to set control(s): Invalid argument
mpeg2 hantro /dev/video2 OK elapsed=7.23s v4l2-request: Unable to set control(s): Invalid argument
```
3-of-6 codecs decode end-to-end. The "Unable to set control(s)" warnings on VP8 / MPEG-2 are from the cross-codec capability probe phase (rkvdec rejects VP8 / MPEG-2 controls and hantro rejects HEVC controls; iter38 auto-probe walks both fds at startup, captures these as warnings, ignores). Not a decode-time failure.
dmesg clean post-probe — only the usual boot init lines, no OOPSes.
This is **N=1**, not enough to lock as a load-bearing floor. Phase 3 will repeat at N=3+ with real instrumentation (lsof on the v4l2 fds, perf counters on rkvdec/hantro IRQs, libva trace, byte-comparison of decoded YUV vs SW reference — not stdout-grepped FPS).
## Codecs blocked, and why
### HEVC — kernel OOPS in `rkvdec_hevc_prepare_hw_st_rps`
First HEVC decode attempt produced this kernel oops (full dmesg captured):
```
Modules linked in: ...rockchip_vdec...
lr : rkvdec_hevc_prepare_hw_st_rps+0x38/0x300 [rockchip_vdec]
Call trace:
__pi_memcmp+0x10/0x110 (P)
rkvdec_hevc_assemble_hw_rps+0x1c/0xac [rockchip_vdec]
rkvdec_hevc_run+0x12c/0x148 [rockchip_vdec]
rkvdec_device_run+0x48/0x104 [rockchip_vdec]
v4l2_m2m_try_run+0x80/0x13c [v4l2_mem2mem]
v4l2_m2m_request_queue+0xe8/0x140 [v4l2_mem2mem]
media_request_ioctl_queue+0x1bc/0x2f4 [mc]
media_request_ioctl+0xc8/0x160 [mc]
__arm64_sys_ioctl+0xa4/0x100
```
**Cascading effect**: after this oops the whole `v4l2_mem2mem` path wedges. Subsequent H.264 (rkvdec) AND VP8 / MPEG-2 (hantro) `ffmpeg -hwaccel vaapi` invocations block in futex wait inside libva, with the underlying ioctl thread blocked in the kernel m2m queue. The wedge persists until reboot.
This is a kernel bug in the **RK3588 mainline rkvdec HEVC** path — specifically the RPS (Reference Picture Set) preparation function. Not a libva backend bug (the working hand-built `0c9a7efa…` backend is the one in use; the libva controls are what fresnel sends successfully to its own RK3399 rkvdec). The faulting instruction is `__pi_memcmp` — likely a NULL or unmapped pointer dereference during RPS comparison.
Per memory `feedback_va_st_rps_bits_is_slice_field`: fresnel iter31 had a related RPS issue, but that one was on the *libva* side (writing `st_rps_bits` into `slice_params` rather than `decode_params`). The libva-side fix is already in place at backend `7ac934e`. The ampere oops is downstream of that — kernel-side.
Path to unblock: kernel-agent experiment with a candidate fix. Either:
- Find an existing upstream fix on linux-media or rockchip-linux trees (RK3588 rkvdec HEVC may have a fix in a later -rc).
- Bisect within mainline to localize the regression.
- Write a defensive guard around the memcmp call in `rkvdec_hevc_prepare_hw_st_rps`.
Filed as separate kernel-agent issue to follow.
### VP9 — kernel doesn't expose `V4L2_PIX_FMT_VP9_FRAME` on rkvdec
`v4l2-ctl -d /dev/video1 --list-formats-out` returns only S264 + S265. The kernel module `v4l2_vp9` *is* loaded (per `lsmod`: used by `rockchip_vdec` and `hantro_vpu`), but the RK3588 rkvdec binding does not register `V4L2_PIX_FMT_VP9_FRAME` as an output format. Likely needs DTS work + a driver patch to add the VP9 control set to the RK3588 rkvdec variant_ops, matching what RK3399 has.
Path to unblock: kernel-agent experiment with VP9 enablement patch for RK3588 rkvdec. Out of scope for the baseline per operator policy.
### AV1 — libva backend iter38 only probes 2 fds, not 3
`/dev/video4` (AV1) is exposed by the kernel — `v4l2-ctl --list-formats-out` shows `AV1F`. But the libva-v4l2-request-fourier auto-probe (iter38 multi-device-probe in `src/request.c::request_data`) tracks only `video_fd_rkvdec` + `video_fd_hantro` — there's no slot for a third decoder kind. The AV1 fd is never opened, so VAProfileAV1 never appears in vainfo.
Path to unblock: backend iter39. Extend `struct request_data` to a third fd (or to a table-driven dispatch over an arbitrary number of fds), update `request_device_kind_for_profile()` to return 'a' (av1) and route AV1 profiles to the new fd, add bounds and termination handling. Pure userspace work; no kernel involvement.
## Predecessor data — what carries vs what doesn't
Per `feedback_dev_process.md` Phase 0 rules:
**Carries (state):**
- libva-v4l2-request-fourier source pin `7ac934e` (iter38b, fresnel-certified)
- The marfrit-packages CI build bug for libva (`marfrit-packages#17`) — same `ae611d80…` packaged binary deployed to ampere, same workaround (hand-build over) applies
- The encode preset set (x264 medium CRF 23, x265 fast CRF 28, mpeg2video qscale 4, libvpx CRF 30, libvpx-vp9 CRF 30)
- Memory entries that govern decoder-output verification (`feedback_pixel_compare_in_yuv_not_png`, `feedback_hw_decode_engagement_check`, `feedback_rockchip_pixel_verify_path`)
**Does NOT carry (data):**
- Per-codec FPS numbers from fresnel-fourier iter1 measurements (RK3588 ≠ RK3399, different CPU + decoder block specs, no read-across)
- Per-codec bit-exact PASS criterion (fresnel's libva == kdirect held on RK3399; needs separate proof on RK3588)
- mpv `--hwdec=v4l2request-copy` 236 FPS for H.264 from fresnel — recompute on ampere
Ampere binding cells in this campaign anchor to **in-session-acquired numbers**, even when fresnel measured an identical condition.
## Open questions tabled into Phase 1
1. **What is the success metric?** "All 6 codecs HW-decode" requires three external dependencies (kernel HEVC fix, kernel VP9 enablement, backend iter39). Phase 1 must pick whether the iteration locks against (a) the 3 currently-decodable codecs, (b) all 5 web-relevant codecs (excludes MPEG-2 from a Firefox-consumer view since `video/mp2t` is not supported by Gecko), (c) all 6.
2. **What is bit-exactness anchored to on RK3588?** fresnel used `libva == kdirect` (the kernel-direct V4L2 stateless test rig that bypasses libva). Does kdirect still build / run on RK3588? Probably needs adaptation. Alternative: anchor on `libva == ffmpeg-SW-decode` byte-for-byte, accepting that some codecs (MPEG-2 IDCT, H.264 inter-pred drift) won't be bit-exact and need an SSIM-floor instead.
3. **Does the kernel HEVC OOPS cascade trip BEFORE actual decode, or only AFTER the first frame is queued?** Phase 3 instrumentation (ftrace on `rkvdec_hevc_run` entry, perf on `__pi_memcmp` site) will tell us — that determines whether we can sandbox HEVC via `try_decode_one_frame` without wedging the m2m subsystem.
4. **Are firefox-fourier vendor defaults sufficient on ampere?** Not yet installed. Fresnel: yes for 3/4 codecs after `firefox-fourier 150.0.1-5` shipped `rockchip-fourier-defaults.js`. Same package on RK3588 — probably yes, but unverified.
5. **What's the AV1 source clip?** None of BBB / Sintel are AV1-encoded in their standard distributions. Need to either (a) re-encode BBB to AV1 via libaom-av1 (slow but reproducible) or (b) source a short AV1 sample from elsewhere (e.g. AV1 sample bag) for the iter39 backend test once that lands.
## Phase 0 close
Substrate, motivation, inventory captured. Baseline anchor at N=1 confirms the working 3-codec subset; HEVC kernel oops + VP9 kernel gap + AV1 backend gap documented as the three open work items. Ready for Phase 1 goal lock.
Iteration log:
- iter1: 2026-05-16 — this document.