marfrit-packages/arch/chromium-fourier/STUDY.md now holds the plan for
patching upstream Chromium to reach VaapiVideoDecoder directly on
Linux Wayland, talking to libva-v4l2-request-fourier. README points at
it from the existing 'why not Brave's V4L2 / use-system-ffmpeg path'
section so a cold-start session sees the next step.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mpv on ohm (with the marfrit ffmpeg-v4l2-request-git installed) auto-
exposes h264-v4l2request as an hwdec choice and loads the drmprime VO
helper when --hwdec=v4l2request is set. Decode IS hardware. But the VO
side is still gpu-next + KWin, so we hit the same compositor bottleneck
the SW mpv test exposed: 122% CPU, 930/1440 dropped frames (~64% drop
rate, ~8.5 fps delivered).
--vo=dmabuf-wayland would be the canonical fix (gst's waylandsink path
gives 6-7% CPU through the same compositor), but mpv's hwdec → dmabuf-
wayland negotiation fails today: "hardware format not supported", with
mpv treating the decode output as yuv420p and trying to upload it to
drm_prime. Disentangling that is its own dive; not pursuing now.
Recommended ohm playback recipe stays:
- Graphical: gst-launch-1.0 ... ! v4l2slh264dec ! waylandsink (6% CPU)
- Headless / scripted: ffmpeg -hwaccel v4l2request
-hwaccel_output_format drm_prime (14% CPU)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Installed marfrit/ffmpeg-v4l2-request-git on ohm (pacman -S replaces
stock ffmpeg via provides/conflicts=ffmpeg; yes|pacman past the interactive
conflict prompt). ffmpeg -hwaccels now reports v4l2request alongside
vaapi/drm.
Four timing runs on bbb_1080p30_h264.mp4:
H1 -hwaccel v4l2request (unpaced): 37.5 fps, 105% CPU
H2 -re -hwaccel v4l2request (realtime): 24.0 fps, 67% CPU
H3 -hwaccel v4l2request -hwaccel_output_format drm_prime:
(unpaced): 93.5 fps, 51% CPU
H4 -re + drm_prime (realtime): 24.0 fps, 14% CPU
The drm_prime output format is decisive — 5x lower CPU at realtime,
2.5x the peak throughput. Without it, ffmpeg's v4l2request hwaccel maps
every decoded dmabuf back to a CPU-readable NV12 plane before handing
off to the muxer, which blocks the decoder pipeline and burns CPU.
With drm_prime, frames stay as dmabuf handles through the null muxer.
93.5 fps peak means the hantro VPU on RK3566 can sustain ≈95 fps of
1080p H.264 — plenty of headroom for 1080p60, and 4x over 24 fps
source rate.
GStreamer v4l2slh264dec -> waylandsink still wins on CPU (6-7%) because
its dmabuf-direct scanout via KWin->VOP2 avoids every copy. ffmpeg
drm_prime -f null - loses a few % in the null muxer layer, which would
disappear entirely in a decode->drm_prime->waylandsink filter pipeline.
Also discovered during install: DanctNIX's own repo already ships
ffmpeg-v4l2-request 2:8.1-3 (missed in initial recon — I only looked
at the installed "ffmpeg" name, not at package alternatives under
"ffmpeg-*"). The marfrit variant is still the better fit for Fourier
(stripped deps, commit-pinned, our CI control plane) and sorts
strictly newer in vercmp (2:8.1.r123329.b57fbbe-1 > 2:8.1-3).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Followup to the user's "can Brave use our ffmpeg instead?" question.
Both alternatives investigated, both dead ends:
- System ffmpeg path: Chromium/Brave link a vendored ffmpeg. Even the
use_system_ffmpeg builds use system libavcodec only as a pure SW decoder
(FfmpegVideoDecoder — no hw_device_ctx / hwaccel wiring). Our
ffmpeg-v4l2-request-git would be used as SW. No HW speedup.
- Chromium's own V4L2VideoDecoder: class IS compiled into brave-bin
(strings /opt/brave-bin/brave | grep V4L2VideoDecoder hits), but the
factory is ChromeOS-gated. Feature flag strings UseChromeOSDirectVideoDecoder
/ V4L2FlatStatelessVideoDecoder / V4L2FlatVideoDecoder do NOT appear as
strings in the binary — only AcceleratedVideoDecodeLinuxGL does, and
that's the VA-API gate, not a V4L2 gate. Setting those flags on the
cmdline is a no-op on this build.
Conclusion: unlocking browser HW decode on ohm requires either (a) Chromium
rebuilt with V4L2 decoder factory extended to Linux non-ChromeOS, (b) a
multiplanar-capable libva-v4l2-request, or (c) a new chromium-on-Linux HW
decode path that hasn't shipped yet.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
End-to-end status of browser HW decode:
- Built bootlin libva-v4l2-request-git from AUR with local patches:
- HEVC stripped (old V4L2_CID_MPEG_VIDEO_HEVC_* names; kernel renamed
them to V4L2_CID_STATELESS_HEVC_*; ohm has no HW HEVC anyway)
- Missing #include "utils.h" in h264.c added
- config.c probe extended to try both OUTPUT and OUTPUT_MPLANE types
(patch archive parked on ohm at ~/fourier-test/libva-patches/)
- vainfo enumerates MPEG-2 + H.264 profiles (Main/High/ConstrainedBaseline/
MultiviewHigh/StereoHigh) via LIBVA_DRIVER_NAME=v4l2_request
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1.
- Brave with --enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks,
AcceleratedVideoDecodeLinuxGL + --use-gl=angle --use-angle=gl-egl picks
up VAAPI in the GPU process and calls vaCreateContext on the video.
- vaCreateContext fails. Root cause: bootlin library was written for
Allwinner sunxi-cedrus (single-plane V4L2); context.c / picture.c / v4l2.c
hardcode single-plane paths that don't match Rockchip hantro multiplanar.
Profile probe is now multiplanar-aware (my patch), but buffer setup and
frame-submission paths aren't.
Decision: browser HW video decode on ohm parked until a multiplanar port
of libva-v4l2-request exists. Non-browser paths (gst v4l2codecs, ffmpeg
v4l2-request) are unaffected. Patch set on ohm is a starting point for
anyone picking up the multiplanar rework; also a candidate for a future
marfrit libva-v4l2-request-fourier PKGBUILD once multiplanar works.
Also updated the Status line with fermi CI success: first push built,
signed, and published ffmpeg-v4l2-request-git-2:8.1.r123329.b57fbbe-1
on packages.reauktion.de in one shot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Added two HW rows to the baseline table with hard fps telemetry from
progressreport update-freq=5:
HW 1:1 windowed (1920x1080): 7% CPU, 1488/62 s stream 1:1 wall
HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall
DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°,
effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen
requires downscale.
The surprise: fullscreen scaling is actually LOWER CPU than native
windowed, within noise. KWin on RK3566 must be handing the dmabuf to
VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy
scale on the display controller. That's the "compositor-bound" bottleneck
from the SW baseline going away entirely once the whole path is dmabuf.
progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s.
Exact 1:1 for the full run means zero drops and real-time delivery.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GStreamer v4l2codecs path validated on ohm: HW H.264 decode plus
zero-copy dmabuf NV12 direct to waylandsink via KWin on DSI-1. 62 s
playback, 7% CPU total (≈4 s of user+sys over 62 s wall), no pipeline
errors. Negotiated caps:
video/x-raw(memory:DMABuf), format=DMA_DRM, drm-format=NV12,
width=1920, height=1080, framerate=24/1
— so the hantro-vpu decoder emits dmabufs that waylandsink imports
directly. That explains the massive CPU drop vs the SW mpv baseline
(127 % with 973 / 1440 frames dropped at gpu-next).
Side finding: the bbb_1080p30_h264.mp4 clip on doppler is actually 24 fps
per the SPS caps — the "1080p30" in the filename is Blender's encode
metadata, not the real rate. Recalibrated all baseline rows against
24 fps source. SW mpv drop number revised: 973 of 1440 → 68 %, not 54 %
I wrote earlier.
Also added an HW throughput row (fakesink sync=false): 36.8 fps, 89 %
CPU — hantro G1 on RK3566 can sustain ~1.5× realtime for 1080p H.264.
Frame-drop count for the HW waylandsink run is still unknown —
fpsdisplaysink signal-fps-measurements doesn't wire through gst-launch.
Next pass can use GST_DEBUG=fpsdisplaysink:5 or a progressreport element
to get a precise delivered-fps number.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
marfrit-packages commit 9994c4e adds arch/ffmpeg-v4l2-request-git/ plus
the CI job (chained after claude-his-any, continue-on-error). Tracks
Kwiboo's v4l2-request-n8.1 branch, pinned to commit b57fbbe, strips
the AUR package's X11/AMF/CUDA/FireWire/etc deps, keeps encoders.
Fleshes out the packaging table with Status column and the "why not the
AUR package" rationale so a cold-start session doesn't re-discover the
downgrade trap.
Status: scaffolded, not yet pushed — next step is to push marfrit-packages
main, let fermi CI build, install on ohm, then run the v4l2-request hwaccel
validation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16
dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz
lxc file pull + ephemeral http bridge).
- Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps,
319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom.
- Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core).
Per-frame decode cost ≈40 ms.
- mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973
dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered.
Decode is not the bottleneck; the compositor/VO path is. HW decode will
cut decode CPU but not by itself fix delivery drops — the "compositor-
bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout
(--vo=drm) or dmabuf-wayland will be the lever for the delivery side.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
modinfo on the sole rkvdec-family module in the built image
(/lib/modules/6.19.10-danctnix1-1-pinetab2/.../rockchip/rkvdec/rockchip-vdec.ko)
reports author Boris Brezillon, alias set of/rockchip,rk{3288,3328,3399}-vdec
only — that is the mainline rkvdec1 driver, not VDPU346. No rkvdec2
sibling directory, no extra module, CONFIG_VIDEO_ROCKCHIP_VDEC=m is the
only relevant config symbol.
Conclusion: DanctNIX ships a vanilla rkvdec-family config and does not
carry out-of-tree rkvdec2 patches for RK3566. HEVC/VP9 on ohm requires
either our own carrier patch series (VDPU346 driver + DT + firmware,
rebuild linux-pinetab2) or waiting for VDPU346 to land upstream.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Step 1 (live recon) on ohm. Findings folded into README under new
"Recon 2026-04-24" subsection, replacing the pre-recon "Open question"
paragraph and the now-stale "Current blocker: ohm offline" note.
- Kernel 6.19.10-danctnix1-1-pinetab2 (README had 6.15 — stale by 4 minors).
- Only Hantro VPU decoder bound (rockchip,rk3568-vpu @ fdea0400); exposes
stateless H.264 (S264), MPEG-2 (MG2S), VP8 (VP8F). No HEVC/VP9.
- No rkvdec2 DT node, no rkvdec2 driver, /lib/firmware/rockchip/ has only
dptx.bin. HEVC/VP9 HW decode unavailable on this image.
- Stock ffmpeg 8.1 has v4l2m2m only (no v4l2request hwaccel) — cannot drive
the stateless decoder; need ffmpeg-v4l2-request-git. GStreamer 1.28.2
v4l2codecs already exposes v4l2sl{h264,mpeg2,vp8}dec.
- Unrelated noise: bes2600 Wi-Fi OOT driver WARNs every ~30 s.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Umbrella for HW-assisted video decode across fresnel/ohm/boltzmann/ampere.
Captures Path A (mainline V4L2 stateless) vs Path B (Rockchip MPP vendor),
rkvdec mainline status as of 2026-04-24 (VDPU381/383 landed Feb 2026;
VDPU346 for RK356x still pending), the 2023 working recipe from
clehaxze.tw, plan for ohm as first priority, and working-agreement
reminders (ReCAP, commit-per-experiment, ask-before-flash/reboot,
off-machine backups).