fourier

Author	SHA1	Message	Date
test0r	3144b31ac9	ohm browser HW decode: link to chromium-fourier side-project marfrit-packages/arch/chromium-fourier/STUDY.md now holds the plan for patching upstream Chromium to reach VaapiVideoDecoder directly on Linux Wayland, talking to libva-v4l2-request-fourier. README points at it from the existing 'why not Brave's V4L2 / use-system-ffmpeg path' section so a cold-start session sees the next step. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 22:51:30 +00:00
test0r	bc1e281328	ohm mpv HW result: --hwdec=v4l2request engages, gpu-next still drops mpv on ohm (with the marfrit ffmpeg-v4l2-request-git installed) auto- exposes h264-v4l2request as an hwdec choice and loads the drmprime VO helper when --hwdec=v4l2request is set. Decode IS hardware. But the VO side is still gpu-next + KWin, so we hit the same compositor bottleneck the SW mpv test exposed: 122% CPU, 930/1440 dropped frames (~64% drop rate, ~8.5 fps delivered). --vo=dmabuf-wayland would be the canonical fix (gst's waylandsink path gives 6-7% CPU through the same compositor), but mpv's hwdec → dmabuf- wayland negotiation fails today: "hardware format not supported", with mpv treating the decode output as yuv420p and trying to upload it to drm_prime. Disentangling that is its own dive; not pursuing now. Recommended ohm playback recipe stays: - Graphical: gst-launch-1.0 ... ! v4l2slh264dec ! waylandsink (6% CPU) - Headless / scripted: ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime (14% CPU) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-25 22:17:38 +00:00
test0r	2808d486fa	ohm step 5 validated: ffmpeg -hwaccel v4l2request + drm_prime = 14% CPU Installed marfrit/ffmpeg-v4l2-request-git on ohm (pacman -S replaces stock ffmpeg via provides/conflicts=ffmpeg; yes\|pacman past the interactive conflict prompt). ffmpeg -hwaccels now reports v4l2request alongside vaapi/drm. Four timing runs on bbb_1080p30_h264.mp4: H1 -hwaccel v4l2request (unpaced): 37.5 fps, 105% CPU H2 -re -hwaccel v4l2request (realtime): 24.0 fps, 67% CPU H3 -hwaccel v4l2request -hwaccel_output_format drm_prime: (unpaced): 93.5 fps, 51% CPU H4 -re + drm_prime (realtime): 24.0 fps, 14% CPU The drm_prime output format is decisive — 5x lower CPU at realtime, 2.5x the peak throughput. Without it, ffmpeg's v4l2request hwaccel maps every decoded dmabuf back to a CPU-readable NV12 plane before handing off to the muxer, which blocks the decoder pipeline and burns CPU. With drm_prime, frames stay as dmabuf handles through the null muxer. 93.5 fps peak means the hantro VPU on RK3566 can sustain ≈95 fps of 1080p H.264 — plenty of headroom for 1080p60, and 4x over 24 fps source rate. GStreamer v4l2slh264dec -> waylandsink still wins on CPU (6-7%) because its dmabuf-direct scanout via KWin->VOP2 avoids every copy. ffmpeg drm_prime -f null - loses a few % in the null muxer layer, which would disappear entirely in a decode->drm_prime->waylandsink filter pipeline. Also discovered during install: DanctNIX's own repo already ships ffmpeg-v4l2-request 2:8.1-3 (missed in initial recon — I only looked at the installed "ffmpeg" name, not at package alternatives under "ffmpeg-*"). The marfrit variant is still the better fit for Fourier (stripped deps, commit-pinned, our CI control plane) and sorts strictly newer in vercmp (2:8.1.r123329.b57fbbe-1 > 2:8.1-3). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 21:25:38 +00:00
test0r	7517fb7e9b	ohm task (b.2): neither system ffmpeg nor Chromium's V4L2 decoder reachable Followup to the user's "can Brave use our ffmpeg instead?" question. Both alternatives investigated, both dead ends: - System ffmpeg path: Chromium/Brave link a vendored ffmpeg. Even the use_system_ffmpeg builds use system libavcodec only as a pure SW decoder (FfmpegVideoDecoder — no hw_device_ctx / hwaccel wiring). Our ffmpeg-v4l2-request-git would be used as SW. No HW speedup. - Chromium's own V4L2VideoDecoder: class IS compiled into brave-bin (strings /opt/brave-bin/brave \| grep V4L2VideoDecoder hits), but the factory is ChromeOS-gated. Feature flag strings UseChromeOSDirectVideoDecoder / V4L2FlatStatelessVideoDecoder / V4L2FlatVideoDecoder do NOT appear as strings in the binary — only AcceleratedVideoDecodeLinuxGL does, and that's the VA-API gate, not a V4L2 gate. Setting those flags on the cmdline is a no-op on this build. Conclusion: unlocking browser HW decode on ohm requires either (a) Chromium rebuilt with V4L2 decoder factory extended to Linux non-ChromeOS, (b) a multiplanar-capable libva-v4l2-request, or (c) a new chromium-on-Linux HW decode path that hasn't shipped yet. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:57:20 +00:00
test0r	7301067108	ohm task (b): Brave reached vaCreateContext, blocked on multiplanar libva End-to-end status of browser HW decode: - Built bootlin libva-v4l2-request-git from AUR with local patches: - HEVC stripped (old V4L2_CID_MPEG_VIDEO_HEVC_* names; kernel renamed them to V4L2_CID_STATELESS_HEVC_*; ohm has no HW HEVC anyway) - Missing #include "utils.h" in h264.c added - config.c probe extended to try both OUTPUT and OUTPUT_MPLANE types (patch archive parked on ohm at ~/fourier-test/libva-patches/) - vainfo enumerates MPEG-2 + H.264 profiles (Main/High/ConstrainedBaseline/ MultiviewHigh/StereoHigh) via LIBVA_DRIVER_NAME=v4l2_request LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video1. - Brave with --enable-features=VaapiVideoDecoder,VaapiIgnoreDriverChecks, AcceleratedVideoDecodeLinuxGL + --use-gl=angle --use-angle=gl-egl picks up VAAPI in the GPU process and calls vaCreateContext on the video. - vaCreateContext fails. Root cause: bootlin library was written for Allwinner sunxi-cedrus (single-plane V4L2); context.c / picture.c / v4l2.c hardcode single-plane paths that don't match Rockchip hantro multiplanar. Profile probe is now multiplanar-aware (my patch), but buffer setup and frame-submission paths aren't. Decision: browser HW video decode on ohm parked until a multiplanar port of libva-v4l2-request exists. Non-browser paths (gst v4l2codecs, ffmpeg v4l2-request) are unaffected. Patch set on ohm is a starting point for anyone picking up the multiplanar rework; also a candidate for a future marfrit libva-v4l2-request-fourier PKGBUILD once multiplanar works. Also updated the Status line with fermi CI success: first push built, signed, and published ffmpeg-v4l2-request-git-2:8.1.r123329.b57fbbe-1 on packages.reauktion.de in one shot. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:52:09 +00:00
test0r	670f2ad6ee	ohm task (a): fit-to-screen HW decode is free — 6% CPU, zero drops Added two HW rows to the baseline table with hard fps telemetry from progressreport update-freq=5: HW 1:1 windowed (1920x1080): 7% CPU, 1488/62 s stream 1:1 wall HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°, effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen requires downscale. The surprise: fullscreen scaling is actually LOWER CPU than native windowed, within noise. KWin on RK3566 must be handing the dmabuf to VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy scale on the display controller. That's the "compositor-bound" bottleneck from the SW baseline going away entirely once the whole path is dmabuf. progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s. Exact 1:1 for the full run means zero drops and real-time delivery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:30:33 +00:00
test0r	3ef29d615e	ohm step 4: HW decode proof via gst v4l2slh264dec — 7% CPU on Wayland GStreamer v4l2codecs path validated on ohm: HW H.264 decode plus zero-copy dmabuf NV12 direct to waylandsink via KWin on DSI-1. 62 s playback, 7% CPU total (≈4 s of user+sys over 62 s wall), no pipeline errors. Negotiated caps: video/x-raw(memory:DMABuf), format=DMA_DRM, drm-format=NV12, width=1920, height=1080, framerate=24/1 — so the hantro-vpu decoder emits dmabufs that waylandsink imports directly. That explains the massive CPU drop vs the SW mpv baseline (127 % with 973 / 1440 frames dropped at gpu-next). Side finding: the bbb_1080p30_h264.mp4 clip on doppler is actually 24 fps per the SPS caps — the "1080p30" in the filename is Blender's encode metadata, not the real rate. Recalibrated all baseline rows against 24 fps source. SW mpv drop number revised: 973 of 1440 → 68 %, not 54 % I wrote earlier. Also added an HW throughput row (fakesink sync=false): 36.8 fps, 89 % CPU — hantro G1 on RK3566 can sustain ~1.5× realtime for 1080p H.264. Frame-drop count for the HW waylandsink run is still unknown — fpsdisplaysink signal-fps-measurements doesn't wire through gst-launch. Next pass can use GST_DEBUG=fpsdisplaysink:5 or a progressreport element to get a precise delivered-fps number. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:18:00 +00:00
test0r	1be49951a7	ohm step 5: scaffold ffmpeg-v4l2-request-git PKGBUILD in marfrit-packages marfrit-packages commit 9994c4e adds arch/ffmpeg-v4l2-request-git/ plus the CI job (chained after claude-his-any, continue-on-error). Tracks Kwiboo's v4l2-request-n8.1 branch, pinned to commit b57fbbe, strips the AUR package's X11/AMF/CUDA/FireWire/etc deps, keeps encoders. Fleshes out the packaging table with Status column and the "why not the AUR package" rationale so a cold-start session doesn't re-discover the downgrade trap. Status: scaffolded, not yet pushed — next step is to push marfrit-packages main, let fermi CI build, install on ohm, then run the v4l2-request hwaccel validation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 20:04:55 +00:00
test0r	1266c9da8e	ohm step 2: SW baseline — 127% CPU, 54% frame drops via KWin/gpu-next ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16 dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz lxc file pull + ephemeral http bridge). - Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps, 319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom. - Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core). Per-frame decode cost ≈40 ms. - mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973 dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered. Decode is not the bottleneck; the compositor/VO path is. HW decode will cut decode CPU but not by itself fix delivery drops — the "compositor- bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout (--vo=drm) or dmabuf-wayland will be the lever for the delivery side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:53:57 +00:00
test0r	06ee15fc32	ohm recon: tree-side confirms no rkvdec2 patches in danctnix kernel modinfo on the sole rkvdec-family module in the built image (/lib/modules/6.19.10-danctnix1-1-pinetab2/.../rockchip/rkvdec/rockchip-vdec.ko) reports author Boris Brezillon, alias set of/rockchip,rk{3288,3328,3399}-vdec only — that is the mainline rkvdec1 driver, not VDPU346. No rkvdec2 sibling directory, no extra module, CONFIG_VIDEO_ROCKCHIP_VDEC=m is the only relevant config symbol. Conclusion: DanctNIX ships a vanilla rkvdec-family config and does not carry out-of-tree rkvdec2 patches for RK3566. HEVC/VP9 on ohm requires either our own carrier patch series (VDPU346 driver + DT + firmware, rebuild linux-pinetab2) or waiting for VDPU346 to land upstream. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:28:48 +00:00
test0r	67ffb43fc1	ohm recon 2026-04-24: hantro-only, no rkvdec2; kernel 6.19.10-danctnix1 Step 1 (live recon) on ohm. Findings folded into README under new "Recon 2026-04-24" subsection, replacing the pre-recon "Open question" paragraph and the now-stale "Current blocker: ohm offline" note. - Kernel 6.19.10-danctnix1-1-pinetab2 (README had 6.15 — stale by 4 minors). - Only Hantro VPU decoder bound (rockchip,rk3568-vpu @ fdea0400); exposes stateless H.264 (S264), MPEG-2 (MG2S), VP8 (VP8F). No HEVC/VP9. - No rkvdec2 DT node, no rkvdec2 driver, /lib/firmware/rockchip/ has only dptx.bin. HEVC/VP9 HW decode unavailable on this image. - Stock ffmpeg 8.1 has v4l2m2m only (no v4l2request hwaccel) — cannot drive the stateless decoder; need ffmpeg-v4l2-request-git. GStreamer 1.28.2 v4l2codecs already exposes v4l2sl{h264,mpeg2,vp8}dec. - Unrelated noise: bes2600 Wi-Fi OOT driver WARNs every ~30 s. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-24 19:25:33 +00:00
test0r	fe597bbf6d	Known gotchas: IOMMU, firmware blob, HEVC UAPI, compositor vs decode, VOP2 cfg_done, multi-core VDPU381	2026-04-24 19:07:45 +00:00
test0r	f60315797c	Build infra: distcc recipe (CT108 + tesla) for kernel + userspace	2026-04-24 19:04:38 +00:00
test0r	9b54e3dcfe	Upstreaming policy: upstream-first with ideological-fork fallback	2026-04-24 19:03:28 +00:00
test0r	b1c68e4eca	Packaging plan: ffmpeg-v4l2-request via marfrit-packages (fermi+feynman)	2026-04-24 19:01:00 +00:00
test0r	56a3aa2889	Test corpus: Big Buck Bunny set on doppler /moviedata/fourier-test/	2026-04-24 18:49:13 +00:00
test0r	8bdbc65e95	Acceptance: 1080p @ 30 fps, zero dropped frames	2026-04-24 18:48:24 +00:00
test0r	c51760dae1	Fourier — initial dossier + plan Umbrella for HW-assisted video decode across fresnel/ohm/boltzmann/ampere. Captures Path A (mainline V4L2 stateless) vs Path B (Rockchip MPP vendor), rkvdec mainline status as of 2026-04-24 (VDPU381/383 landed Feb 2026; VDPU346 for RK356x still pending), the 2023 working recipe from clehaxze.tw, plan for ohm as first priority, and working-agreement reminders (ReCAP, commit-per-experiment, ask-before-flash/reboot, off-machine backups).	2026-04-24 18:46:07 +00:00

18 Commits