From 670f2ad6eecf25ff847a9047822c910a6bb516a5 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Fri, 24 Apr 2026 20:30:33 +0000 Subject: [PATCH] =?UTF-8?q?ohm=20task=20(a):=20fit-to-screen=20HW=20decode?= =?UTF-8?q?=20is=20free=20=E2=80=94=206%=20CPU,=20zero=20drops?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Added two HW rows to the baseline table with hard fps telemetry from progressreport update-freq=5: HW 1:1 windowed (1920x1080): 7% CPU, 1488/62 s stream 1:1 wall HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°, effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen requires downscale. The surprise: fullscreen scaling is actually LOWER CPU than native windowed, within noise. KWin on RK3566 must be handing the dmabuf to VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy scale on the display controller. That's the "compositor-bound" bottleneck from the SW baseline going away entirely once the whole path is dmabuf. progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s. Exact 1:1 for the full run means zero drops and real-time delivery. Co-Authored-By: Claude Opus 4.7 (1M context) --- README.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/README.md b/README.md index d0eae2c..e56c0fd 100644 --- a/README.md +++ b/README.md @@ -181,7 +181,8 @@ against. | SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a | | SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** | | HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a | -| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 | 7 % | 1488 source / 62 s| ~24 (paced) | unknown | +| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 1:1 | 7 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) | +| HW: `gst v4l2slh264dec → waylandsink fullscreen=true`, scaled | 6 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) | Reading: - SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but @@ -191,13 +192,16 @@ Reading: decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below. - HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU. - **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12) - drops the CPU to 7 %** — an ≈18× drop from the SW mpv number. The + drops the CPU to 6–7 %** — an ≈18× drop from the SW mpv number. The GStreamer `v4l2codecs → waylandsink` path on KWin negotiates dmabuf-direct, bypassing any GL upload. -- Frame-drop count for the HW waylandsink run is unknown here - (fpsdisplaysink signal wasn't wired through `gst-launch`); next pass - should parse via `GST_DEBUG=fpsdisplaysink:5` or a `progressreport` - element. +- **Fit-to-screen scaling is effectively free.** Native 1920×1080 in a + window is 7 % CPU; `fullscreen=true` with KWin scaling the dmabuf down + to 1280×800 is 6 % CPU. Almost certainly the VOP2 HW scaling planes on + RK3566 are doing the downscale during scanout, not the GPU. +- Frame-drop count validated by `progressreport update-freq=5`: stream + position advances 1:1 with wall clock for the full 62 s run — zero + drops, full 24 fps delivery. ### Acceptance criterion