ohm step 2: SW baseline — 127% CPU, 54% frame drops via KWin/gpu-next

ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16 dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz lxc file pull + ephemeral http bridge). - Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps, 319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom. - Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core). Per-frame decode cost ≈40 ms. - mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973 dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered. Decode is not the bottleneck; the compositor/VO path is. HW decode will cut decode CPU but not by itself fix delivery drops — the "compositor- bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout (--vo=drm) or dmabuf-wayland will be the lever for the delivery side. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 19:53:57 +00:00
parent 06ee15fc32
commit 1266c9da8e
1 changed files with 23 additions and 4 deletions
@@ -158,10 +158,29 @@ picks up rkvdec2 patches.
 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target
 7. **Document** — freeze the final recipe in this README

-**Status 2026-04-24**: ohm reachable; step 1 (live recon) complete; step 3
-effectively done for H.264 / MPEG-2 / VP8 (driver bound, node exposed).
-Open threads: step 2 (SW baseline numbers) and step 5 (ffmpeg-v4l2-request
-build + install).
+**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on
+ohm. SW baseline numbers below; step 5 (ffmpeg-v4l2-request build +
+install via marfrit-packages) is next.
+
+### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
+
+Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler
+`/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm
+(same bytes on every fleet device).
+
+| Test                                   | CPU%  | Frames / wall  | Effective fps | Dropped |
+|----------------------------------------|-------|----------------|---------------|---------|
+| `ffmpeg -hwaccel none -f null -`       | 319 % | 1440 / 18.55 s | 77.6          | n/a     |
+| `ffmpeg -re -hwaccel none -f null -`   |  90 % | 1440 / 60.27 s | 23.9 (paced)  | n/a     |
+| `mpv --hwdec=no --vo=gpu-next`, DSI-1  | 127 % | 1800 target    | ~14 delivered | **973** |
+
+Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55
+(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through
+KWin drops **54 %** of frames while only using 127 % CPU — the
+compositor / VO path is the bottleneck, not decode. Exactly the
+"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the
+decode cost but won't by itself fix the composition drops; `--vo=drm`
+(direct KMS scanout) or a dmabuf-capable VO is the lever for that.

 ### Acceptance criterion