From 1266c9da8ea89c820cdd2814165ed3a9276963de Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Fri, 24 Apr 2026 19:53:57 +0000 Subject: [PATCH] =?UTF-8?q?ohm=20step=202:=20SW=20baseline=20=E2=80=94=201?= =?UTF-8?q?27%=20CPU,=2054%=20frame=20drops=20via=20KWin/gpu-next?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16 dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz lxc file pull + ephemeral http bridge). - Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps, 319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom. - Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core). Per-frame decode cost ≈40 ms. - mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973 dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered. Decode is not the bottleneck; the compositor/VO path is. HW decode will cut decode CPU but not by itself fix delivery drops — the "compositor- bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout (--vo=drm) or dmabuf-wayland will be the lever for the delivery side. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 27 +++++++++++++++++++++++---- 1 file changed, 23 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index efb5cde..941cc70 100644 --- a/README.md +++ b/README.md @@ -158,10 +158,29 @@ picks up rkvdec2 patches. 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target 7. **Document** — freeze the final recipe in this README -**Status 2026-04-24**: ohm reachable; step 1 (live recon) complete; step 3 -effectively done for H.264 / MPEG-2 / VP8 (driver bound, node exposed). -Open threads: step 2 (SW baseline numbers) and step 5 (ffmpeg-v4l2-request -build + install). +**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on +ohm. SW baseline numbers below; step 5 (ffmpeg-v4l2-request build + +install via marfrit-packages) is next. + +### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`) + +Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler +`/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm +(same bytes on every fleet device). + +| Test | CPU% | Frames / wall | Effective fps | Dropped | +|----------------------------------------|-------|----------------|---------------|---------| +| `ffmpeg -hwaccel none -f null -` | 319 % | 1440 / 18.55 s | 77.6 | n/a | +| `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 23.9 (paced) | n/a | +| `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1800 target | ~14 delivered | **973** | + +Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55 +(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through +KWin drops **54 %** of frames while only using 127 % CPU — the +compositor / VO path is the bottleneck, not decode. Exactly the +"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the +decode cost but won't by itself fix the composition drops; `--vo=drm` +(direct KMS scanout) or a dmabuf-capable VO is the lever for that. ### Acceptance criterion