Commit Graph

16 Commits

Author SHA1 Message Date
test0r 2808d486fa ohm step 5 validated: ffmpeg -hwaccel v4l2request + drm_prime = 14% CPU
Installed marfrit/ffmpeg-v4l2-request-git on ohm (pacman -S replaces
stock ffmpeg via provides/conflicts=ffmpeg; yes|pacman past the interactive
conflict prompt). ffmpeg -hwaccels now reports v4l2request alongside
vaapi/drm.

Four timing runs on bbb_1080p30_h264.mp4:

  H1  -hwaccel v4l2request            (unpaced):    37.5 fps, 105% CPU
  H2  -re -hwaccel v4l2request         (realtime):   24.0 fps,  67% CPU
  H3  -hwaccel v4l2request -hwaccel_output_format drm_prime:
                                        (unpaced):   93.5 fps,  51% CPU
  H4  -re + drm_prime                  (realtime):   24.0 fps,  14% CPU

The drm_prime output format is decisive — 5x lower CPU at realtime,
2.5x the peak throughput. Without it, ffmpeg's v4l2request hwaccel maps
every decoded dmabuf back to a CPU-readable NV12 plane before handing
off to the muxer, which blocks the decoder pipeline and burns CPU.

With drm_prime, frames stay as dmabuf handles through the null muxer.
93.5 fps peak means the hantro VPU on RK3566 can sustain ≈95 fps of
1080p H.264 — plenty of headroom for 1080p60, and 4x over 24 fps
source rate.

GStreamer v4l2slh264dec -> waylandsink still wins on CPU (6-7%) because
its dmabuf-direct scanout via KWin->VOP2 avoids every copy. ffmpeg
drm_prime -f null - loses a few % in the null muxer layer, which would
disappear entirely in a decode->drm_prime->waylandsink filter pipeline.

Also discovered during install: DanctNIX's own repo already ships
ffmpeg-v4l2-request 2:8.1-3 (missed in initial recon — I only looked
at the installed "ffmpeg" name, not at package alternatives under
"ffmpeg-*"). The marfrit variant is still the better fit for Fourier
(stripped deps, commit-pinned, our CI control plane) and sorts
strictly newer in vercmp (2:8.1.r123329.b57fbbe-1 > 2:8.1-3).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 21:25:38 +00:00
test0r 7517fb7e9b ohm task (b.2): neither system ffmpeg nor Chromium's V4L2 decoder reachable
Followup to the user's "can Brave use our ffmpeg instead?" question.
Both alternatives investigated, both dead ends:

- System ffmpeg path: Chromium/Brave link a vendored ffmpeg. Even the
  use_system_ffmpeg builds use system libavcodec only as a pure SW decoder
  (FfmpegVideoDecoder — no hw_device_ctx / hwaccel wiring). Our
  ffmpeg-v4l2-request-git would be used as SW. No HW speedup.
- Chromium's own V4L2VideoDecoder: class IS compiled into brave-bin
  (strings /opt/brave-bin/brave | grep V4L2VideoDecoder hits), but the
  factory is ChromeOS-gated. Feature flag strings UseChromeOSDirectVideoDecoder
  / V4L2FlatStatelessVideoDecoder / V4L2FlatVideoDecoder do NOT appear as
  strings in the binary — only AcceleratedVideoDecodeLinuxGL does, and
  that's the VA-API gate, not a V4L2 gate. Setting those flags on the
  cmdline is a no-op on this build.

Conclusion: unlocking browser HW decode on ohm requires either (a) Chromium
rebuilt with V4L2 decoder factory extended to Linux non-ChromeOS, (b) a
multiplanar-capable libva-v4l2-request, or (c) a new chromium-on-Linux HW
decode path that hasn't shipped yet.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:57:20 +00:00
test0r 7301067108 ohm task (b): Brave reached vaCreateContext, blocked on multiplanar libva
End-to-end status of browser HW decode:

- Built bootlin libva-v4l2-request-git from AUR with local patches:
  - HEVC stripped (old V4L2_CID_MPEG_VIDEO_HEVC_* names; kernel renamed
    them to V4L2_CID_STATELESS_HEVC_*; ohm has no HW HEVC anyway)
  - Missing #include "utils.h" in h264.c added
  - config.c probe extended to try both OUTPUT and OUTPUT_MPLANE types
  (patch archive parked on ohm at ~/fourier-test/libva-patches/)
- vainfo enumerates MPEG-2 + H.264 profiles (Main/High/ConstrainedBaseline/
  MultiviewHigh/StereoHigh) via LIBVA_DRIVER_NAME=v4l2_request
  LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1.
- Brave with --enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks,
  AcceleratedVideoDecodeLinuxGL + --use-gl=angle --use-angle=gl-egl picks
  up VAAPI in the GPU process and calls vaCreateContext on the video.
- vaCreateContext fails. Root cause: bootlin library was written for
  Allwinner sunxi-cedrus (single-plane V4L2); context.c / picture.c / v4l2.c
  hardcode single-plane paths that don't match Rockchip hantro multiplanar.
  Profile probe is now multiplanar-aware (my patch), but buffer setup and
  frame-submission paths aren't.

Decision: browser HW video decode on ohm parked until a multiplanar port
of libva-v4l2-request exists. Non-browser paths (gst v4l2codecs, ffmpeg
v4l2-request) are unaffected. Patch set on ohm is a starting point for
anyone picking up the multiplanar rework; also a candidate for a future
marfrit libva-v4l2-request-fourier PKGBUILD once multiplanar works.

Also updated the Status line with fermi CI success: first push built,
signed, and published ffmpeg-v4l2-request-git-2:8.1.r123329.b57fbbe-1
on packages.reauktion.de in one shot.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:52:09 +00:00
test0r 670f2ad6ee ohm task (a): fit-to-screen HW decode is free — 6% CPU, zero drops
Added two HW rows to the baseline table with hard fps telemetry from
progressreport update-freq=5:

  HW 1:1 windowed (1920x1080):      7% CPU, 1488/62 s stream 1:1 wall
  HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall

DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°,
effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen
requires downscale.

The surprise: fullscreen scaling is actually LOWER CPU than native
windowed, within noise. KWin on RK3566 must be handing the dmabuf to
VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy
scale on the display controller. That's the "compositor-bound" bottleneck
from the SW baseline going away entirely once the whole path is dmabuf.

progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s.
Exact 1:1 for the full run means zero drops and real-time delivery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:30:33 +00:00
test0r 3ef29d615e ohm step 4: HW decode proof via gst v4l2slh264dec — 7% CPU on Wayland
GStreamer v4l2codecs path validated on ohm: HW H.264 decode plus
zero-copy dmabuf NV12 direct to waylandsink via KWin on DSI-1. 62 s
playback, 7% CPU total (≈4 s of user+sys over 62 s wall), no pipeline
errors. Negotiated caps:

  video/x-raw(memory:DMABuf), format=DMA_DRM, drm-format=NV12,
  width=1920, height=1080, framerate=24/1

— so the hantro-vpu decoder emits dmabufs that waylandsink imports
directly. That explains the massive CPU drop vs the SW mpv baseline
(127 % with 973 / 1440 frames dropped at gpu-next).

Side finding: the bbb_1080p30_h264.mp4 clip on doppler is actually 24 fps
per the SPS caps — the "1080p30" in the filename is Blender's encode
metadata, not the real rate. Recalibrated all baseline rows against
24 fps source. SW mpv drop number revised: 973 of 1440 → 68 %, not 54 %
I wrote earlier.

Also added an HW throughput row (fakesink sync=false): 36.8 fps, 89 %
CPU — hantro G1 on RK3566 can sustain ~1.5× realtime for 1080p H.264.

Frame-drop count for the HW waylandsink run is still unknown —
fpsdisplaysink signal-fps-measurements doesn't wire through gst-launch.
Next pass can use GST_DEBUG=fpsdisplaysink:5 or a progressreport element
to get a precise delivered-fps number.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:18:00 +00:00
test0r 1be49951a7 ohm step 5: scaffold ffmpeg-v4l2-request-git PKGBUILD in marfrit-packages
marfrit-packages commit 9994c4e adds arch/ffmpeg-v4l2-request-git/ plus
the CI job (chained after claude-his-any, continue-on-error). Tracks
Kwiboo's v4l2-request-n8.1 branch, pinned to commit b57fbbe, strips
the AUR package's X11/AMF/CUDA/FireWire/etc deps, keeps encoders.

Fleshes out the packaging table with Status column and the "why not the
AUR package" rationale so a cold-start session doesn't re-discover the
downgrade trap.

Status: scaffolded, not yet pushed — next step is to push marfrit-packages
main, let fermi CI build, install on ohm, then run the v4l2-request hwaccel
validation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 20:04:55 +00:00
test0r 1266c9da8e ohm step 2: SW baseline — 127% CPU, 54% frame drops via KWin/gpu-next
ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16
dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz
lxc file pull + ephemeral http bridge).

- Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps,
  319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom.
- Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core).
  Per-frame decode cost ≈40 ms.
- mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973
  dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered.

Decode is not the bottleneck; the compositor/VO path is. HW decode will
cut decode CPU but not by itself fix delivery drops — the "compositor-
bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout
(--vo=drm) or dmabuf-wayland will be the lever for the delivery side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:53:57 +00:00
test0r 06ee15fc32 ohm recon: tree-side confirms no rkvdec2 patches in danctnix kernel
modinfo on the sole rkvdec-family module in the built image
(/lib/modules/6.19.10-danctnix1-1-pinetab2/.../rockchip/rkvdec/rockchip-vdec.ko)
reports author Boris Brezillon, alias set of/rockchip,rk{3288,3328,3399}-vdec
only — that is the mainline rkvdec1 driver, not VDPU346. No rkvdec2
sibling directory, no extra module, CONFIG_VIDEO_ROCKCHIP_VDEC=m is the
only relevant config symbol.

Conclusion: DanctNIX ships a vanilla rkvdec-family config and does not
carry out-of-tree rkvdec2 patches for RK3566. HEVC/VP9 on ohm requires
either our own carrier patch series (VDPU346 driver + DT + firmware,
rebuild linux-pinetab2) or waiting for VDPU346 to land upstream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:28:48 +00:00
test0r 67ffb43fc1 ohm recon 2026-04-24: hantro-only, no rkvdec2; kernel 6.19.10-danctnix1
Step 1 (live recon) on ohm. Findings folded into README under new
"Recon 2026-04-24" subsection, replacing the pre-recon "Open question"
paragraph and the now-stale "Current blocker: ohm offline" note.

- Kernel 6.19.10-danctnix1-1-pinetab2 (README had 6.15 — stale by 4 minors).
- Only Hantro VPU decoder bound (rockchip,rk3568-vpu @ fdea0400); exposes
  stateless H.264 (S264), MPEG-2 (MG2S), VP8 (VP8F). No HEVC/VP9.
- No rkvdec2 DT node, no rkvdec2 driver, /lib/firmware/rockchip/ has only
  dptx.bin. HEVC/VP9 HW decode unavailable on this image.
- Stock ffmpeg 8.1 has v4l2m2m only (no v4l2request hwaccel) — cannot drive
  the stateless decoder; need ffmpeg-v4l2-request-git. GStreamer 1.28.2
  v4l2codecs already exposes v4l2sl{h264,mpeg2,vp8}dec.
- Unrelated noise: bes2600 Wi-Fi OOT driver WARNs every ~30 s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:25:33 +00:00
test0r fe597bbf6d Known gotchas: IOMMU, firmware blob, HEVC UAPI, compositor vs decode, VOP2 cfg_done, multi-core VDPU381 2026-04-24 19:07:45 +00:00
test0r f60315797c Build infra: distcc recipe (CT108 + tesla) for kernel + userspace 2026-04-24 19:04:38 +00:00
test0r 9b54e3dcfe Upstreaming policy: upstream-first with ideological-fork fallback 2026-04-24 19:03:28 +00:00
test0r b1c68e4eca Packaging plan: ffmpeg-v4l2-request via marfrit-packages (fermi+feynman) 2026-04-24 19:01:00 +00:00
test0r 56a3aa2889 Test corpus: Big Buck Bunny set on doppler /moviedata/fourier-test/ 2026-04-24 18:49:13 +00:00
test0r 8bdbc65e95 Acceptance: 1080p @ 30 fps, zero dropped frames 2026-04-24 18:48:24 +00:00
test0r c51760dae1 Fourier — initial dossier + plan
Umbrella for HW-assisted video decode across fresnel/ohm/boltzmann/ampere.
Captures Path A (mainline V4L2 stateless) vs Path B (Rockchip MPP vendor),
rkvdec mainline status as of 2026-04-24 (VDPU381/383 landed Feb 2026;
VDPU346 for RK356x still pending), the 2023 working recipe from
clehaxze.tw, plan for ohm as first priority, and working-agreement
reminders (ReCAP, commit-per-experiment, ask-before-flash/reboot,
off-machine backups).
2026-04-24 18:46:07 +00:00