From 1266c9da8ea89c820cdd2814165ed3a9276963de Mon Sep 17 00:00:00 2001
From: Markus Fritsche <fritsche.markus@gmail.com>
Date: Fri, 24 Apr 2026 19:53:57 +0000
Subject: [PATCH] =?UTF-8?q?ohm=20step=202:=20SW=20baseline=20=E2=80=94=201?=
 =?UTF-8?q?27%=20CPU,=2054%=20frame=20drops=20via=20KWin/gpu-next?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

ffmpeg -hwaccel none, bbb_1080p30_h264.mp4 (fleet-canonical, SHA-16
dcf8a7170fbd49bb, pulled from doppler /moviedata/fourier-test/ via hertz
lxc file pull + ephemeral http bridge).

- Max throughput (no -re, no display): 1440 frames in 18.55 s → 77.6 fps,
  319% CPU (≈3.2 of 4 A55 cores). 2.6× realtime headroom.
- Realtime -re (no display): 1440 frames in 60.27 s, 90% CPU (≈1 core).
  Per-frame decode cost ≈40 ms.
- mpv --hwdec=no --vo=gpu-next through KWin on DSI-1: 127% CPU but 973
  dropped frames out of 1800 target → 54% drop rate, ~14 fps delivered.

Decode is not the bottleneck; the compositor/VO path is. HW decode will
cut decode CPU but not by itself fix delivery drops — the "compositor-
bound ≠ decode-bound" gotcha the README flagged. Direct KMS scanout
(--vo=drm) or dmabuf-wayland will be the lever for the delivery side.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---
 README.md | 27 +++++++++++++++++++++++----
 1 file changed, 23 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index efb5cde..941cc70 100644
--- a/README.md
+++ b/README.md
@@ -158,10 +158,29 @@ picks up rkvdec2 patches.
 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target
 7. **Document** — freeze the final recipe in this README
 
-**Status 2026-04-24**: ohm reachable; step 1 (live recon) complete; step 3
-effectively done for H.264 / MPEG-2 / VP8 (driver bound, node exposed).
-Open threads: step 2 (SW baseline numbers) and step 5 (ffmpeg-v4l2-request
-build + install).
+**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on
+ohm. SW baseline numbers below; step 5 (ffmpeg-v4l2-request build +
+install via marfrit-packages) is next.
+
+### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`)
+
+Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler
+`/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm
+(same bytes on every fleet device).
+
+| Test                                   | CPU%  | Frames / wall  | Effective fps | Dropped |
+|----------------------------------------|-------|----------------|---------------|---------|
+| `ffmpeg -hwaccel none -f null -`       | 319 % | 1440 / 18.55 s | 77.6          | n/a     |
+| `ffmpeg -re -hwaccel none -f null -`   |  90 % | 1440 / 60.27 s | 23.9 (paced)  | n/a     |
+| `mpv --hwdec=no --vo=gpu-next`, DSI-1  | 127 % | 1800 target    | ~14 delivered | **973** |
+
+Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55
+(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through
+KWin drops **54 %** of frames while only using 127 % CPU — the
+compositor / VO path is the bottleneck, not decode. Exactly the
+"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the
+decode cost but won't by itself fix the composition drops; `--vo=drm`
+(direct KMS scanout) or a dmabuf-capable VO is the lever for that.
 
 ### Acceptance criterion