3144b31ac9
marfrit-packages/arch/chromium-fourier/STUDY.md now holds the plan for patching upstream Chromium to reach VaapiVideoDecoder directly on Linux Wayland, talking to libva-v4l2-request-fourier. README points at it from the existing 'why not Brave's V4L2 / use-system-ffmpeg path' section so a cold-start session sees the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
591 lines
32 KiB
Markdown
591 lines
32 KiB
Markdown
# Fourier — Hardware-Assisted Video Decoding on the Rockchip Fleet
|
||
|
||
One umbrella project to bring up working HW video decode on all four Rockchip
|
||
devices in Markus' fleet. Named 2026-04-24.
|
||
|
||
## Target fleet
|
||
|
||
| Host | SoC | Role | Parent umbrella |
|
||
|-----------|--------|---------------------------|-----------------|
|
||
| fresnel | RK3399 | Pinebook Pro fleet laptop | — |
|
||
| ohm | RK3566 | PineTab2 tablet | — |
|
||
| boltzmann | RK3588 | Rock 5 ITX+ (always on) | Volta |
|
||
| ampere | RK3588 | CoolPi GenBook laptop | Coulomb |
|
||
|
||
Current priority: **ohm**. The other three follow in the order of easiest-win
|
||
(fresnel, mature mainline) → incremental (ampere, RK3588 landed Feb 2026) →
|
||
hardest (boltzmann, currently on vendor kernel).
|
||
|
||
## Related Rockchip projects
|
||
|
||
Fourier touches hardware that already has its own umbrellas. Keep display/boot
|
||
concerns in those projects; keep video decode in Fourier.
|
||
|
||
- **Coulomb** — ampere / CoolPi GenBook stack. Subprojects:
|
||
- **Bin** — u-boot, display bringup (eDP + keyboard)
|
||
- **MegabitChip** — DDR init blob matching-decomp / HIL
|
||
- **RockHard** — mainline kernel (Collabora TF-A, LP5-3200 OC)
|
||
- **Volta** — boltzmann / Rock 5 ITX+ stack. Subprojects:
|
||
- **Quark** — edk2-rk3588 UEFI firmware
|
||
- **Neutron** — mainline / vendor kernel, UEFI-booted
|
||
- **fresnel** — Pinebook Pro, custom overclocked kernel package (no umbrella)
|
||
- **ohm** — PineTab2, DanctNIX base, UART capture rig for ampere DDR work
|
||
|
||
Cross-cutting notes in `/home/mfritsche/.claude/projects/-home-mfritsche-claude/memory/`
|
||
— see `MEMORY.md` for the index.
|
||
|
||
## Meet His
|
||
|
||
When Fourier needs infra muscle (wake a sleeping host, reach a VPN peer, bump
|
||
DHCP on the Fritz, find the right lmcp token, set up distcc for a kernel
|
||
build), summon **His** — the Home Infrastructure Specialist.
|
||
|
||
- As a subagent: `Agent(subagent_type: "his", prompt: "...")` — takes over a
|
||
task end-to-end and returns a report.
|
||
- As a skill: `Skill(skill: "his")` — loads the runbook cheatsheet into the
|
||
current session.
|
||
|
||
His knows the distcc workers (tesla, CT108, dcc1), the Fritz!DECT plug blast
|
||
radii (careful with Himbeere + Office), the `/opt/herding/` layout on hertz,
|
||
the lmcp endpoint token map, and the VPN naming (`<host>.vpn` via shannon's
|
||
dnsmasq). Canonical runbook: `/home/mfritsche/claude/CLAUDE.md` on noether.
|
||
|
||
For Fourier specifically, expect to lean on His for: waking ohm over LAN +
|
||
VPN, confirming ampere is at .vpn (not the stale .88.x LAN IP), setting up
|
||
cross-builds of patched ffmpeg via distcc, and nudging kernel packages into
|
||
the marfrit repo.
|
||
|
||
## Two kernel paths
|
||
|
||
Rockchip hardware decode has **two incompatible kernel driver families**.
|
||
Userspace config diverges accordingly — don't mix flags.
|
||
|
||
### Path A — mainline V4L2 stateless
|
||
|
||
- Kernel: `rkvdec` / `rkvdec2` stateless V4L2 drivers (mainline or out-of-tree
|
||
patch series).
|
||
- /dev node: V4L2 m2m `/dev/videoN` with compressed H.264/HEVC/VP9 queues +
|
||
uncompressed output queue.
|
||
- FFmpeg: `ffmpeg-v4l2-request-git` from AUR (Jernej's patchset). Upstream
|
||
v2 patchset posted on ffmpeg-devel 2024-08, **not yet merged** as of
|
||
2026-04-24.
|
||
- GStreamer: `v4l2codecs` plugin in gst-plugins-bad. 1.26 added Rockchip
|
||
buffer formats; 1.28 added the new HEVC UAPI controls.
|
||
- mpv: `mpv --hwdec=drm --vo=gpu-next` (wayland compositor) or
|
||
`--vo=dmabuf-wayland` (wlroots).
|
||
- Kodi: native V4L2-request API support, independent of FFmpeg.
|
||
- VA-API: `libva-v4l2-request` bridge for Firefox / Chromium paths.
|
||
|
||
### Path B — Rockchip MPP vendor
|
||
|
||
- Kernel: `mpp_rkvdec` / `mpp_rkvdec2` from Rockchip BSP (the rkr* kernel
|
||
series).
|
||
- /dev node: `/dev/mpp_service` chardev, not V4L2.
|
||
- FFmpeg: build with `--enable-rkmpp` against `librockchip-mpp`.
|
||
- GStreamer: Rockchip-specific plugins (`rkximagesink`, `mppvideodec`, …).
|
||
- Present today on **boltzmann** (kernel 6.1.75-rkr3).
|
||
|
||
## rkvdec mainline status (2026-04-24)
|
||
|
||
| SoC | Block | Mainline status | Codecs (mainline) |
|
||
|---------|--------------------|------------------------------|------------------------------------------------------|
|
||
| RK3399 | Hantro G1/G2 | mature | MPEG-2, VP8, H.264, HEVC, VP9 |
|
||
| RK3566 | rkvdec2 (VDPU346) | **not yet** (Collabora TODO) | rkvdec1: MPEG-2/VP8/H.264; HEVC/VP9 need out-of-tree |
|
||
| RK3588 | VDPU381 | **merged 2026-02-27** | H.264, H.265 (no VP9/AV1 upstream yet) |
|
||
| RK3576 | VDPU383 | merged 2026-02-27 | H.264, H.265 |
|
||
|
||
The RK3588 / RK3576 landing was the Feb 2026 Collabora drop. VDPU346 for the
|
||
RK356x family is on the roadmap but not merged; ohm depends on DanctNIX
|
||
carrying out-of-tree rkvdec2 patches if we want HEVC/VP9.
|
||
|
||
## ohm — first priority, known recipe
|
||
|
||
From Martin Chang's blog (clehaxze.tw, 2023, still mirrored by DanctNIX):
|
||
|
||
```sh
|
||
sudo pacman -S mpv
|
||
yay -S ffmpeg-v4l2-request-git
|
||
mpv --hwdec=drm --vo=gpu-next --wayland-disable-vsync=yes input.mp4
|
||
# wlroots compositor variant:
|
||
mpv --hwdec=drm --vo=dmabuf-wayland input.mp4
|
||
```
|
||
|
||
2023 coverage: **MPEG-2, VP8, H.264**. 1080p60 H.264 hit ~80% of one CPU core
|
||
— decode was on the hardware; the CPU was in the compositor path.
|
||
|
||
### Recon 2026-04-24 (ohm)
|
||
|
||
- Running kernel: `6.19.10-danctnix1-1-pinetab2`.
|
||
- Decoder: `hantro-vpu` on `/dev/video1` (DT `rockchip,rk3568-vpu` at
|
||
`fdea0400`); encoder on `/dev/video2`.
|
||
- Exposed stateless formats: `S264` (H.264), `MG2S` (MPEG-2), `VP8F` (VP8)
|
||
→ `NV12`. No HEVC, no VP9.
|
||
- No rkvdec2 DT node, no rkvdec2 driver, no decoder firmware blob
|
||
(`/lib/firmware/rockchip/` only carries `dptx.bin`). HEVC/VP9 hardware
|
||
decode is **not available on this image**. Tree-side confirmed: the
|
||
kernel build ships only the mainline `rockchip-vdec` module
|
||
(`CONFIG_VIDEO_ROCKCHIP_VDEC=m`, Collabora, rkvdec1) whose DT aliases are
|
||
`rockchip,rk{3288,3328,3399}-vdec` — none of which match RK3566. No
|
||
rkvdec2 module in `/lib/modules/.../rockchip/`; DanctNIX carries no
|
||
out-of-tree rkvdec2 patches. HEVC/VP9 on ohm needs either (a) our own
|
||
carrier patch series (VDPU346 driver + DT + firmware, rebuild
|
||
`linux-pinetab2`) or (b) waiting for VDPU346 to land upstream.
|
||
- Userspace: `ffmpeg 8.1` (stock, v4l2m2m only — **no v4l2request hwaccel**,
|
||
does not drive the stateless decoder), `gstreamer 1.28.2` with
|
||
`v4l2codecs` exposing `v4l2sl{h264,mpeg2,vp8}dec`, `mpv 0.41.0`,
|
||
`libva 2.23.0`. No `ffmpeg-v4l2-request-git`, no `libva-v4l2-request`,
|
||
no `kodi`.
|
||
- Noise: `bes2600` Wi-Fi OOT driver spams `WARN` every ~30 s. Unrelated to
|
||
decode; ignore for this umbrella.
|
||
|
||
So: 2023 recipe works verbatim for H.264 / MPEG-2 / VP8 once
|
||
`ffmpeg-v4l2-request-git` is installed. GStreamer `v4l2slh264dec` is the
|
||
shortest path to a first decode proof. HEVC/VP9 park until the kernel side
|
||
picks up rkvdec2 patches.
|
||
|
||
### Plan (tasks #18–24 in the noether task list)
|
||
|
||
1. **Live recon** — kernel, `/dev/video*`, `v4l2-ctl --list-devices`, dmesg
|
||
rkvdec, `/lib/firmware/rockchip/`, installed ffmpeg/gstreamer/mpv/kodi
|
||
2. **SW baseline** — 1080p H.264 / HEVC / VP9 benchmark with ffmpeg and mpv
|
||
hwdec=no, record CPU% + fps
|
||
3. **Driver binding** — confirm V4L2 stateless decode node exposed; if not,
|
||
diagnose kconfig / DT / firmware
|
||
4. **GStreamer v4l2codecs** — cleanest proof; `v4l2slh264dec` pipeline with
|
||
`fakesink`, then with display sink
|
||
5. **FFmpeg v4l2-request** — install `ffmpeg-v4l2-request-git` (AUR) or
|
||
verify a DanctNIX-shipped ffmpeg with v4l2-request enabled
|
||
6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target
|
||
7. **Document** — freeze the final recipe in this README
|
||
|
||
**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 + 4 + 5 done for
|
||
H.264. GStreamer `v4l2slh264dec → waylandsink` is 6–7 % CPU with
|
||
zero-copy dmabuf NV12. FFmpeg with `-hwaccel v4l2request -hwaccel_output_format
|
||
drm_prime` is 14 % realtime / 93.5 fps peak — hantro's 1080p H.264
|
||
ceiling on RK3566 is ≈95 fps. marfrit `ffmpeg-v4l2-request-git` lives in
|
||
the repo and installs cleanly via `pacman -S`. Task (b) (Brave HW decode)
|
||
blocked on multiplanar `libva-v4l2-request` rework — see section below.
|
||
**Note**: DanctNIX already ships `ffmpeg-v4l2-request 2:8.1-3` in its
|
||
`danctnix` repo; the marfrit build is our stripped variant (no X11/AMF/
|
||
CUDA/etc, Kwiboo tip pinned by commit) and sorts strictly newer in pacman
|
||
vercmp.
|
||
|
||
### Baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
|
||
|
||
Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler
|
||
`/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm
|
||
(same bytes on every fleet device). **Source frame rate is 24 fps** per
|
||
H.264 caps (`framerate=24/1`) — the `1080p30` in the file name is a
|
||
misnomer carried from Blender's encode metadata; the actual media rate is
|
||
24 fps. That's the number the target/delivered columns below are measured
|
||
against.
|
||
|
||
| Test | CPU% | Frames / wall | Effective fps | Dropped |
|
||
|--------------------------------------------------------------|-------|-------------------|---------------|---------|
|
||
| SW: `ffmpeg -hwaccel none -f null -` | 319 % | 1440 / 18.55 s | 77.6 | n/a |
|
||
| SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a |
|
||
| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** |
|
||
| HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a |
|
||
| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 1:1 | 7 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
|
||
| HW: `gst v4l2slh264dec → waylandsink fullscreen=true`, scaled | 6 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
|
||
| HW: `ffmpeg -hwaccel v4l2request -f null -` | 105 % | 1440 / 38.4 s | 37.5 | n/a |
|
||
| HW: `ffmpeg -re -hwaccel v4l2request -f null -` | 67 % | 1440 / 59.9 s | 24.0 (paced) | n/a |
|
||
| HW: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -f null -` | 51 % | 1440 / 15.4 s | **93.5** | n/a |
|
||
| HW: `ffmpeg -re + drm_prime -f null -` | **14 %** | 1440 / 59.9 s | 24.0 (paced) | n/a |
|
||
| HW: `mpv --hwdec=v4l2request --vo=gpu-next`, DSI-1 | 122 % | 1440 source / 60 s| ~8.5 delivered| **930** |
|
||
|
||
Reading:
|
||
- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but
|
||
costs ~3.2 cores unpaced or 1 core at realtime pace.
|
||
- SW mpv via `gpu-next` through KWin drops **68 %** of frames while only
|
||
using 127 % CPU — the compositor / VO path is the bottleneck, not
|
||
decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below.
|
||
- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU.
|
||
- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12)
|
||
drops the CPU to 6–7 %** — an ≈18× drop from the SW mpv number. The
|
||
GStreamer `v4l2codecs → waylandsink` path on KWin negotiates
|
||
dmabuf-direct, bypassing any GL upload.
|
||
- **Fit-to-screen scaling is effectively free.** Native 1920×1080 in a
|
||
window is 7 % CPU; `fullscreen=true` with KWin scaling the dmabuf down
|
||
to 1280×800 is 6 % CPU. Almost certainly the VOP2 HW scaling planes on
|
||
RK3566 are doing the downscale during scanout, not the GPU.
|
||
- Frame-drop count validated by `progressreport update-freq=5`: stream
|
||
position advances 1:1 with wall clock for the full 62 s run — zero
|
||
drops, full 24 fps delivery.
|
||
- **FFmpeg path needs `-hwaccel_output_format drm_prime`** to avoid the
|
||
default CPU readback. Without it, `-hwaccel v4l2request` costs 67 % CPU
|
||
at realtime 24 fps; with it, 14 %. Peak throughput 93.5 fps vs 37.5 fps.
|
||
The hantro VPU's real 1080p H.264 ceiling on RK3566 is ≈95 fps — enough
|
||
for 1080p60 with headroom. GStreamer's `v4l2codecs → waylandsink` path
|
||
still wins on CPU (6–7 %) because its dmabuf-direct scanout avoids every
|
||
copy; ffmpeg's `null` muxer costs a few percent even with drm_prime.
|
||
- **mpv with `--hwdec=v4l2request` does engage the HW decoder** (mpv
|
||
loads the `drmprime` hwdec driver, mpv's `--hwdec=help` lists
|
||
`h264-v4l2request` after the marfrit ffmpeg install), but with
|
||
`--vo=gpu-next` we hit the same compositor bottleneck as SW mpv: 122 %
|
||
CPU + 930 / 1440 dropped frames. The decode is essentially free; KWin /
|
||
gpu-next's GL composition path still drops most frames waiting on the
|
||
compositor. `--vo=dmabuf-wayland` would be the right VO for zero-copy
|
||
to KWin, but mpv's hwdec → dmabuf-wayland format negotiation fails
|
||
("hardware format not supported", `yuv420p → drm_prime` upload fails).
|
||
Until that's untangled, the **recommended ohm playback recipe is
|
||
GStreamer `v4l2slh264dec → waylandsink`** for graphical use, and
|
||
`ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime` for any
|
||
scripted / headless work.
|
||
|
||
### Browser HW decode (Brave / Chromium) — partially wired, library-blocked
|
||
|
||
Attempted as task (b) on 2026-04-24. State reached:
|
||
|
||
- Installed bootlin `libva-v4l2-request-git` from AUR with local patches
|
||
(patch archive on ohm: `~/fourier-test/libva-patches/fourier-local.patch`):
|
||
- HEVC stripped from the build (`h265.c` + `h265.h` removed from
|
||
`src/meson.build`, HEVC case in `picture.c` returns
|
||
`UNSUPPORTED_PROFILE`). The library's HEVC UAPI binding predates the
|
||
mainline rename (`V4L2_CID_MPEG_VIDEO_HEVC_*` → `V4L2_CID_STATELESS_HEVC_*`)
|
||
and ohm has no HW HEVC anyway.
|
||
- Missing `#include "utils.h"` added to `src/h264.c` (pre-existing bug
|
||
the library grew into as compilers got stricter about implicit
|
||
declarations).
|
||
- `src/config.c` probe extended to try both `V4L2_BUF_TYPE_VIDEO_OUTPUT`
|
||
and `V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE` when enumerating profiles.
|
||
- `vainfo LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1`
|
||
enumerates profiles: **MPEG-2 Simple/Main + H.264 Main/High/ConstrainedBaseline/
|
||
MultiviewHigh/StereoHigh**. Library is up and talking to the hantro
|
||
decoder.
|
||
- Brave launched with `--enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks,AcceleratedVideoDecodeLinuxGL --use-gl=angle --use-angle=gl-egl --ozone-platform=wayland`
|
||
visibly activates VaapiVideoDecoder in the GPU process argv and attempts
|
||
VA context creation on the video file.
|
||
|
||
Failure mode:
|
||
```
|
||
ERROR:media/gpu/vaapi/vaapi_wrapper.cc:2407] vaCreateContext failed,
|
||
VA error: operation failed
|
||
ERROR:media/gpu/vaapi/vaapi_video_decoder.cc:1219] failed creating VAContext
|
||
```
|
||
|
||
Root cause: the bootlin `libva-v4l2-request` was written for Allwinner's
|
||
sunxi-cedrus (single-plane V4L2). The probe patch gets it past format
|
||
enumeration, but `context.c` / `picture.c` / `v4l2.c` still hardcode
|
||
single-plane buffer setup, `VIDIOC_S_FMT` paths, and request-API frame
|
||
submission in ways that don't match a multiplanar decoder. A real port to
|
||
multiplanar is a side project — the upstream is effectively unmaintained
|
||
(last meaningful commit years ago; Collabora talked about a replacement
|
||
but it has not shipped).
|
||
|
||
Decision: **browser HW video decode on ohm is parked until a multiplanar
|
||
`libva-v4l2-request` rework exists**, either ours or someone else's.
|
||
Non-browser HW decode (GStreamer `v4l2codecs`, FFmpeg via
|
||
`ffmpeg-v4l2-request-git`) is unaffected and works. The patch set on
|
||
ohm is a useful starting point for anyone who wants to pick up the
|
||
multiplanar port.
|
||
|
||
#### Why not "Brave via our system ffmpeg" or Chromium's own V4L2 decoder?
|
||
|
||
- **System ffmpeg is not a decoder source for Chromium/Brave.** Chromium
|
||
links a vendored ffmpeg fork; even the Arch-style `use_system_ffmpeg`
|
||
builds use system `libavcodec` only as a pure software decoder
|
||
(`FfmpegVideoDecoder`), with no `hw_device_ctx` / hwaccel wiring.
|
||
Installing our `ffmpeg-v4l2-request-git` would not give Brave HW decode.
|
||
- **Chromium's own `V4L2VideoDecoder` class is compiled into `brave-bin`**
|
||
(verified via `strings`) but gated behind ChromeOS-only runtime factory
|
||
checks. `UseChromeOSDirectVideoDecoder` / `V4L2FlatStatelessVideoDecoder`
|
||
feature flag names do **not** appear as strings in the brave-bin binary
|
||
— only `AcceleratedVideoDecodeLinuxGL` does, which is a VA-API gate, not
|
||
a V4L2 gate. Enabling those flags at the command line is a no-op on this
|
||
build. Unlocking V4L2 on Linux Brave would require either (a) a Chromium
|
||
rebuild with different buildflags, or (b) upstream patches to extend the
|
||
V4L2 decoder factory to Linux non-ChromeOS targets.
|
||
|
||
#### chromium-fourier (parallel side-project)
|
||
|
||
Workspace at `marfrit-packages/arch/chromium-fourier/STUDY.md` —
|
||
patching upstream Chromium so it reaches `VaapiVideoDecoder` directly
|
||
on Linux Wayland (skipping the chromeos pipeline that fails on Brave),
|
||
then talks to our `marfrit/libva-v4l2-request-fourier` libva backend.
|
||
No shipping fork covers this niche today (every ARM-Linux Chromium-with-
|
||
HW-decode goes through the **vendor MPP path** on **5.10 BSP kernel +
|
||
X11 + Mali blob + panfork** — opposite of where Fourier is going). See
|
||
that STUDY for the patch shape, reference forks, build plan on fermi
|
||
with distcc-avahi acceleration, and validation path.
|
||
|
||
### Acceptance criterion
|
||
|
||
**1080p @ 30 fps, no dropped frames.** Applies to every device and every
|
||
codec we claim support for. Same bar everywhere — don't grade RK3566 on a
|
||
curve vs RK3588. Dropped-frame count comes from `mpv --msg-level=all=info`
|
||
or `ffmpeg -f null -`; "no dropped frames" means zero across a 60 s clip.
|
||
|
||
### Test corpus
|
||
|
||
**Big Buck Bunny** is the canonical open test clip (Blender Foundation 2008,
|
||
CC-BY), ubiquitous in HW-decode testing across a decade of embedded
|
||
silicon. Various encodes readily available at
|
||
[peach.blender.org](https://peach.blender.org/download/) and mirrors.
|
||
|
||
Plan: park a fixed set on doppler under `/moviedata/fourier-test/` so Gerbera
|
||
re-indexes it and every fleet device on LAN + VPN streams identical bits.
|
||
Canonical encodes:
|
||
|
||
| File | Codec | Res | Notes |
|
||
|-------------------------------|--------|-------|------------------------------|
|
||
| bbb_1080p30_h264.mp4 | H.264 | 1080p | covers all four devices |
|
||
| bbb_1080p30_hevc.mp4 | HEVC | 1080p | ampere/boltzmann full; ohm pending rkvdec2 |
|
||
| bbb_1080p30_vp9.webm | VP9 | 1080p | ohm pending; RK3588 pending upstream |
|
||
| bbb_1080p30_mpeg2.ts | MPEG-2 | 1080p | Path A fresnel + rkvdec1 baseline |
|
||
| bbb_1080p30_vp8.webm | VP8 | 1080p | fresnel + RK356x rkvdec1 baseline |
|
||
| bbb_2160p60_hevc.mp4 | HEVC | 4K60 | RK3588 stretch goal (ampere/boltzmann only) |
|
||
|
||
Stretch: add **Tears of Steel** (2012, also CC-BY) for HDR / high-bitrate
|
||
HEVC once the stretch goal is viable.
|
||
|
||
## After ohm
|
||
|
||
- **ampere** (RK3588) — once RockHard kernel tracks a tree with VDPU381
|
||
merged (>= Linux 6.14-ish with the Feb 2026 backport), add `v4l2codecs` and
|
||
`ffmpeg-v4l2-request` packaging to RockHard's output. Should be a small
|
||
step.
|
||
- **boltzmann** (RK3588) — dual-path decision. Short-term: use Path B with
|
||
rkmpp (the kernel already exposes it). Long-term: migrate Neutron to a
|
||
mainline kernel with VDPU381 and move to Path A for symmetry with ampere.
|
||
- **fresnel** (RK3399) — Hantro is mature upstream; likely only needs
|
||
userspace install + recipe validation. Endeavour OS package names TBD.
|
||
|
||
Patches upstreamed along the way (v4l2-request to FFmpeg, VDPU346 to
|
||
linux-media) count double — they benefit the whole fleet.
|
||
|
||
## Known gotchas / watch-for
|
||
|
||
Sign-posting only — details in linked sources. If you hit one of these,
|
||
expand the bullet with your specific find.
|
||
|
||
- **IOMMU faults.** Collabora's Feb 2026 VDPU381 landing included specific
|
||
IOMMU fixes. If dmesg shows rockchip-iommu faults during decode, check
|
||
you're on a kernel that carries the fixes, not just the driver.
|
||
([Collabora post](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html))
|
||
- **Firmware blob.** rkvdec / rkvdec2 need a firmware file under
|
||
`/lib/firmware/rockchip/` (varies by SoC). Missing blob → driver probe
|
||
fails silently (no /dev/videoN). Check linux-firmware package currency.
|
||
- **HEVC UAPI version gating.** New explicit RPS controls arrived with
|
||
the Feb 2026 landing; GStreamer 1.28 + FFmpeg-preliminary honour them.
|
||
Older userspace on a newer kernel (or vice versa) fails in unintuitive
|
||
ways. Pin both sides when diagnosing.
|
||
- **Compositor-bound ≠ decode-bound.** The 2023 clehaxze result saw 80%
|
||
CPU at 1080p60 H.264 — that was the compositor + dmabuf path, not
|
||
decode. Always attribute CPU with `perf top` / `top -H` before calling
|
||
decode slow. Use `mpv --vo=null` to isolate the decode path.
|
||
- **VOP2 cfg_done (RK3588).** Decoded frames appear via VP0/VP1/VP2; if
|
||
the shadow-register latch dance is wrong, frames don't show. See
|
||
`feedback_rk3588_cfg_done.md` + `project_bin_vp0_theory.md` in noether
|
||
memory. Touches Coulomb / Bin territory — if it gets there, ask.
|
||
- **Multi-core VDPU381 on RK3588.** Silicon has two decoder cores;
|
||
upstream multi-core support is still landing. Until then, ampere /
|
||
boltzmann run on one core and under-deliver vs headline specs.
|
||
|
||
## Build infrastructure
|
||
|
||
Fourier touches two kinds of builds: userspace (FFmpeg + patches, GStreamer,
|
||
libva-v4l2-request) and kernel (DT + module compiles if we carry out-of-tree
|
||
rkvdec2 / VDPU346). Both benefit from distcc — CT108 (14-core aarch64 on
|
||
data, wake via `wake-data` on hertz) plus always-on tesla on hertz.
|
||
|
||
Zeroconf is broken in containers — hand-wire `DISTCC_HOSTS`.
|
||
|
||
### Kernel / DT modules (cross-build from x86 host)
|
||
|
||
```sh
|
||
DISTCC_HOSTS="localhost/4 192.168.88.208:3632/14,lzo,cpp tesla.lxd:3632/4,lzo,cpp" \
|
||
pump make -j80 O=build/ ARCH=arm64 \
|
||
CROSS_COMPILE='distcc aarch64-linux-gnu-' HOSTCC='ccache gcc'
|
||
```
|
||
|
||
Pump mode sometimes misbehaves on aarch64 cross builds — fall back to plain
|
||
`make -j80` without `pump` if include-server throws.
|
||
|
||
### FFmpeg / userspace (native aarch64 host — ohm, ampere, fermi build CT)
|
||
|
||
```sh
|
||
export DISTCC_HOSTS="localhost/4 192.168.88.208:3632/14,lzo,cpp tesla.lxd:3632/4,lzo,cpp"
|
||
export MAKEFLAGS="-j80"
|
||
makepkg -s # AUR / marfrit-packages workflow
|
||
# or, from raw tree:
|
||
./configure && pump make -j80
|
||
```
|
||
|
||
`makepkg` honours `MAKEFLAGS` but not `DISTCC_HOSTS` for all build hooks —
|
||
put `DISTCC_HOSTS` in `/etc/makepkg.conf` or export before invocation. CC/CXX
|
||
wrappers: `/usr/lib/distcc/` symlinks on native aarch64 (NOT
|
||
`/usr/lib/distcc/bin/`).
|
||
|
||
Remember to re-run `wake-data` if CT108 is asleep — see `his` skill or the
|
||
`project_distcc_infra.md` memory for the full recipe. When that memory and
|
||
this README diverge, fix both.
|
||
|
||
## Packaging — marfrit-packages
|
||
|
||
New fleet hosts should get Fourier userspace via `pacman -S` / `apt install`,
|
||
not per-device AUR rebuilds. Mirror plan in `marfrit-packages` (layout:
|
||
`arch/<pkg>/PKGBUILD`, `debian/<pkg>/build-deb.sh + debian/`):
|
||
|
||
| Package | Source | CI runner | Target | Status |
|
||
|------------------------------|-----------------------------------------|-----------|---------|--------|
|
||
| `ffmpeg-v4l2-request-git` | Kwiboo `v4l2-request-n8.1` (rebase of Jernej's series onto ffmpeg 8.1), pinned to `_commit` | **fermi** (Arch ARM aarch64) | Path A: ohm, fresnel, ampere | scaffolded 2026-04-24 |
|
||
| `ffmpeg-v4l2-request` | same patches against Debian ffmpeg src | **feynman** (Debian aarch64) | Path A Debian hosts | pending |
|
||
| `gst-plugins-bad-fourier` | only if stock distro package is <1.28 | fermi / feynman | 1.28 HEVC UAPI uplift | not needed yet (stock ALARM ships 1.28.2) |
|
||
| `libva-v4l2-request` | upstream tag | fermi / feynman | VA-API bridge | pending |
|
||
|
||
Why not the AUR `ffmpeg-v4l2-request-git`: it tracks a 6.1.1 branch with
|
||
`epoch=2`, which makes it older than stock Arch `2:8.1-3` — install would
|
||
**downgrade** system ffmpeg. It also pulls X11/AMF/CUDA/FireWire/AviSynth/
|
||
OpenMPT/Bluray/etc, none relevant on a Wayland ARM video-decode fleet, and
|
||
has no commit pin. The marfrit fork tracks Kwiboo's `v4l2-request-n8.1`
|
||
(ffmpeg 8.1 base, sideways swap with stock), pins a commit for
|
||
reproducibility, and strips the unused deps.
|
||
|
||
The Arch package is the priority (ohm + fresnel are Arch-based). Debian
|
||
package comes second (relevant if a fleet host ever runs Debian — kepler or
|
||
similar). Both are `provides=(ffmpeg) conflicts=(ffmpeg)`, so install
|
||
deliberately replaces the distro ffmpeg on a given host.
|
||
|
||
CI trigger on each push. Artifacts land in packages.reauktion.de
|
||
(`reference_marfrit_repo_bootstrap.md` in noether memory has the client
|
||
bootstrap).
|
||
|
||
Out of scope: rkmpp-based FFmpeg (Path B). Boltzmann stays on rkr* until
|
||
Neutron migrates; that host doesn't need a marfrit-packages entry until
|
||
migration.
|
||
|
||
## Upstreaming policy
|
||
|
||
**Default: upstream-first.** Anything generic (FFmpeg v4l2-request polish,
|
||
GStreamer fixes, linux-media DT nodes for rkvdec2 / VDPU346 on RK356x,
|
||
kernel compatible-string additions) goes to the relevant maintainer list
|
||
(`ffmpeg-devel`, `linux-media`, `linux-rockchip`, `gstreamer-devel`,
|
||
gitlab.fd.o) — not to marfrit-packages as a downstream carrier.
|
||
|
||
**Fallback: when upstream ideology blocks success**, stop fighting the
|
||
review. Mark the patch `# NOT FOR UPSTREAM`, carry it in marfrit-packages,
|
||
and document it in this README under a "fleet-local patches" section. The
|
||
stance then becomes: *Markus has the only devices in the world which are
|
||
hardware-decoding-capable, and whoever wants to follow should get a Claude
|
||
plan*. Fleet keeps working; replication path is reproducible; no energy
|
||
burnt arguing with maintainers who won't merge.
|
||
|
||
The bar for switching: a concrete rejection on ideological grounds (licensing
|
||
stance, design-religion, NIH), not a normal review cycle. Normal review
|
||
friction — respin until it lands.
|
||
|
||
Known RockHard precedent for downstream-only: the `GCC_PLUGINS` patch
|
||
(`project_linux_rk3588.md`). Tag new carrier-only patches the same way for
|
||
consistency.
|
||
|
||
## Working agreements
|
||
|
||
Standing rules for how we run this project — inherited from the broader
|
||
collaboration canon (`feedback_*` / `project_*` in noether's memory system),
|
||
captured here so a cold-start Fourier session doesn't re-learn them.
|
||
|
||
### ReCAP — ReContextualization After Pruning
|
||
|
||
Claude's context gets compacted when long sessions hit the window limit, and
|
||
any live-session state not in durable storage vanishes. Our counter:
|
||
|
||
- Memory files (`/home/mfritsche/.claude/projects/-home-mfritsche-claude/memory/`,
|
||
`MEMORY.md` is the index) are the long-term substrate. Don't keep
|
||
load-bearing facts only in conversation.
|
||
- Project READMEs (this file) carry the research dossier + current plan.
|
||
Update them when state changes, not later.
|
||
- Task list on noether carries active work. Status transitions are cheap —
|
||
mark in_progress when starting, completed when done.
|
||
- After a `/compact`, re-read relevant memory files + README + recent git log
|
||
before reasoning. Don't infer from conversation alone.
|
||
|
||
See `project_recap.md` in memory for the full protocol.
|
||
|
||
### Commit per experiment
|
||
|
||
Every experiment that touches the tree — a kconfig change, a DT tweak, a
|
||
ffmpeg build flag, a test clip benchmark — gets its own commit on a WIP
|
||
branch, with a short message naming what changed and what was observed.
|
||
|
||
- Dirty trees are tech debt. If the tree is dirty for >30 min, commit.
|
||
- Include benchmark numbers / dmesg excerpts / observation in the commit
|
||
body, not just the code diff.
|
||
- Rebase / squash later if a clean series matters; don't delay the commit
|
||
waiting for it.
|
||
|
||
See `feedback_commit_cadence.md` in memory.
|
||
|
||
### Ask before flash / reboot
|
||
|
||
This project spans fleet laptops and always-on hosts. We have **real blast
|
||
radius**.
|
||
|
||
- Never flash anything without a verified, tested backup. Fritz!Box 7490 was
|
||
bricked by flashing with an empty backup on 2026-04-12 — we don't repeat
|
||
that class of mistake. See `feedback_flash_critical.md`.
|
||
- Shared hardware reboots pause the user's other work. Ask before rebooting
|
||
data (Proxmox), hertz (home-LAN spine), boltzmann (always-on build host).
|
||
See `feedback_no_bulldoze_reboots.md`.
|
||
- For kernel / u-boot / firmware changes: simulate first where possible
|
||
(QEMU, module-style load) before flashing silicon. See
|
||
`feedback_simulate_first.md`.
|
||
- **Never** run `update-initramfs` or equivalent on a remote host that boots
|
||
from a non-standard root (ZFS, btrfs-with-subvols). See
|
||
`feedback_initramfs.md`.
|
||
|
||
### Off-machine backups
|
||
|
||
Fleet laptops back up to data via backintime (Anacron mode, daily attempts,
|
||
VPN-guarded, 30-day staleness alert by email). Hertz backs up to data
|
||
weekly / quarterly via `backup-hertz.sh` (vitruvius dashboard shows
|
||
progress). Data itself lives on ZFS RAIDZ2 + snapshots.
|
||
|
||
- Before any invasive change on a fleet laptop, confirm its last successful
|
||
backintime snapshot in `/opt/herding/var/backintime-status/<host>` on
|
||
hertz.
|
||
- New per-device state (kernel patch series, ffmpeg build tree) lives in a
|
||
Gitea repo + gets backed up with the rest of the working tree — don't
|
||
leave unique work on a single disk.
|
||
- External photos / family data on hertz is mirrored via the hertz backup to
|
||
data. Restore recipe is in the two-step `restore-hertz-step1-sd.sh` /
|
||
`restore-hertz-step2-lxd.sh` on data.
|
||
|
||
### See also — broader feedback canon
|
||
|
||
- **Test the observer first** (`feedback_observer_first.md`) — before
|
||
drawing decode-performance conclusions, confirm the rig (v4l2-ctl reports
|
||
sensible caps? ffmpeg actually routing through v4l2request?).
|
||
- **Three strikes then verify** (`feedback_three_strikes.md`) — after two
|
||
failed fixes, stop guessing; verify the binary / wire / protocol.
|
||
- **TRM or nothing** (`feedback_trm_or_nothing.md`) — register writes need
|
||
documentation backing. Relevant for any DT or driver patches we ship.
|
||
- **Trust Markus' eyes** (`feedback_trust_user_eyes.md`) — if he reports
|
||
"it plays smoothly", that's primary evidence. Don't over-qualify.
|
||
- **No budget framing** (`feedback_no_budget_framing.md`) — don't pre-shrink
|
||
scope citing "session cost"; Markus sets pace.
|
||
|
||
## References
|
||
|
||
### Mainline kernel state
|
||
- [Rockchip RK3588 / RK3576 H.264 and H.265 video decoders gain mainline Linux support (CNX Software, 2026-02-27)](https://www.cnx-software.com/2026/02/27/rockchip-rk3588-rk3576-h-264-and-h-265-video-decoders-mainline-linux/)
|
||
- [RK3588 and RK3576 video decoders support merged in the upstream Linux Kernel (Collabora)](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html)
|
||
- [Upstream support for Rockchip's RK3588: Progress and future plans (Collabora)](https://www.collabora.com/news-and-blog/news-and-events/rockchip-rk3588-upstream-support-progress-future-plans.html)
|
||
- [media: rkvdec: Add support for VDPU381 and VDPU383 (LWN)](https://lwn.net/Articles/1053556/)
|
||
- [RKVDEC2 Driver Posted For Accelerated Video Decoding On Newer Rockchip SoCs (Phoronix)](https://www.phoronix.com/news/RKVDEC2-Rockchip-Video-Decode)
|
||
|
||
### FFmpeg V4L2 request API
|
||
- [PATCH v2 0/8 Add V4L2 Request API hwaccels for MPEG2, H.264 and HEVC (ffmpeg-devel, 2024-08)](https://ffmpeg.org/pipermail/ffmpeg-devel/2024-August/332034.html)
|
||
- [Miouyouyou/FFmpeg-V4L2-Request (build script)](https://github.com/Miouyouyou/FFmpeg-V4L2-Request)
|
||
- [ffmpeg v4l2 requests 4.4.3 patchset (artemis.sh, 2023-03)](https://artemis.sh/2023/03/06/ffmpeg-v4l2-requests-4.4.3.html)
|
||
|
||
### GStreamer v4l2codecs
|
||
- [GStreamer v4l2codecs plugin docs](https://gstreamer.freedesktop.org/documentation/v4l2codecs/index.html)
|
||
- [Adding VP9 and MPEG2 stateless support in v4l2codecs for GStreamer (Collabora, 2021)](https://www.collabora.com/news-and-blog/blog/2021/06/23/adding-vp9-and-mpeg2-stateless-support-in-v4l2codecs-for-gstreamer/)
|
||
- [GStreamer 1.26 — improved hardware efficiency (Collabora)](https://www.collabora.com/news-and-blog/news-and-events/gstreamer-126-improved-hardware-efficiency.html)
|
||
|
||
### PineTab2 specific
|
||
- [Hardware accelerated playback on PineTab 2 (clehaxze.tw, 2023-09)](https://clehaxze.tw/gemlog/2023/09-17-hardware-accelerated-playback-on-pinetab2.gmi)
|
||
- [PINE64 Mainline Hardware Decoding wiki](https://wiki.pine64.org/wiki/Mainline_Hardware_Decoding)
|
||
- [dreemurrs-embedded/linux-pinetab2 releases (6.15.2-danctnix2 latest)](https://github.com/dreemurrs-embedded/linux-pinetab2/releases)
|