diff --git a/README.md b/README.md index bb4286f..032b200 100644 --- a/README.md +++ b/README.md @@ -158,14 +158,17 @@ picks up rkvdec2 patches. 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target 7. **Document** — freeze the final recipe in this README -**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 + 4 done for H.264 on -ohm (GStreamer `v4l2slh264dec → waylandsink` at 7 % CPU with zero-copy -dmabuf NV12). Step 5 scaffold pushed; fermi CI **built + signed + -published** `ffmpeg-v4l2-request-git-2:8.1.r123329.b57fbbe-1-aarch64` on -first push. Immediate next: install on ohm via `pacman -Sy`, validate the -ffmpeg v4l2-request hwaccel path, compare CPU / fps to the GStreamer -numbers. Task (b) (Brave HW decode) blocked on multiplanar -`libva-v4l2-request` rework — see section below. +**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 + 4 + 5 done for +H.264. GStreamer `v4l2slh264dec → waylandsink` is 6–7 % CPU with +zero-copy dmabuf NV12. FFmpeg with `-hwaccel v4l2request -hwaccel_output_format +drm_prime` is 14 % realtime / 93.5 fps peak — hantro's 1080p H.264 +ceiling on RK3566 is ≈95 fps. marfrit `ffmpeg-v4l2-request-git` lives in +the repo and installs cleanly via `pacman -S`. Task (b) (Brave HW decode) +blocked on multiplanar `libva-v4l2-request` rework — see section below. +**Note**: DanctNIX already ships `ffmpeg-v4l2-request 2:8.1-3` in its +`danctnix` repo; the marfrit build is our stripped variant (no X11/AMF/ +CUDA/etc, Kwiboo tip pinned by commit) and sorts strictly newer in pacman +vercmp. ### Baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`) @@ -185,6 +188,10 @@ against. | HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a | | HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 1:1 | 7 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) | | HW: `gst v4l2slh264dec → waylandsink fullscreen=true`, scaled | 6 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) | +| HW: `ffmpeg -hwaccel v4l2request -f null -` | 105 % | 1440 / 38.4 s | 37.5 | n/a | +| HW: `ffmpeg -re -hwaccel v4l2request -f null -` | 67 % | 1440 / 59.9 s | 24.0 (paced) | n/a | +| HW: `ffmpeg -hwaccel v4l2request -hwaccel_output_format drm_prime -f null -` | 51 % | 1440 / 15.4 s | **93.5** | n/a | +| HW: `ffmpeg -re + drm_prime -f null -` | **14 %** | 1440 / 59.9 s | 24.0 (paced) | n/a | Reading: - SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but @@ -204,6 +211,13 @@ Reading: - Frame-drop count validated by `progressreport update-freq=5`: stream position advances 1:1 with wall clock for the full 62 s run — zero drops, full 24 fps delivery. +- **FFmpeg path needs `-hwaccel_output_format drm_prime`** to avoid the + default CPU readback. Without it, `-hwaccel v4l2request` costs 67 % CPU + at realtime 24 fps; with it, 14 %. Peak throughput 93.5 fps vs 37.5 fps. + The hantro VPU's real 1080p H.264 ceiling on RK3566 is ≈95 fps — enough + for 1080p60 with headroom. GStreamer's `v4l2codecs → waylandsink` path + still wins on CPU (6–7 %) because its dmabuf-direct scanout avoids every + copy; ffmpeg's `null` muxer costs a few percent even with drm_prime. ### Browser HW decode (Brave / Chromium) — partially wired, library-blocked