ohm step 2: SW baseline — 127% CPU, 54% frame drops via KWin/gpu-next

ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16
dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz
lxc file pull + ephemeral http bridge).

- Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps,
  319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom.
- Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core).
  Per-frame decode cost ≈40 ms.
- mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973
  dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered.

Decode is not the bottleneck; the compositor/VO path is. HW decode will
cut decode CPU but not by itself fix delivery drops — the "compositor-
bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout
(--vo=drm) or dmabuf-wayland will be the lever for the delivery side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 19:53:57 +00:00
parent 06ee15fc32
commit 1266c9da8e
+23 -4
View File
@@ -158,10 +158,29 @@ picks up rkvdec2 patches.
6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target
7. **Document** — freeze the final recipe in this README
**Status 2026-04-24**: ohm reachable; step 1 (live recon) complete; step 3
effectively done for H.264 / MPEG-2 / VP8 (driver bound, node exposed).
Open threads: step 2 (SW baseline numbers) and step 5 (ffmpeg-v4l2-request
build + install).
**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on
ohm. SW baseline numbers below; step 5 (ffmpeg-v4l2-request build +
install via marfrit-packages) is next.
### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler
`/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm
(same bytes on every fleet device).
| Test | CPU% | Frames / wall | Effective fps | Dropped |
|----------------------------------------|-------|----------------|---------------|---------|
| `ffmpeg -hwaccel none -f null -` | 319 % | 1440 / 18.55 s | 77.6 | n/a |
| `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 23.9 (paced) | n/a |
| `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1800 target | ~14 delivered | **973** |
Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55
(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through
KWin drops **54 %** of frames while only using 127 % CPU — the
compositor / VO path is the bottleneck, not decode. Exactly the
"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the
decode cost but won't by itself fix the composition drops; `--vo=drm`
(direct KMS scanout) or a dmabuf-capable VO is the lever for that.
### Acceptance criterion