ohm task (a): fit-to-screen HW decode is free — 6% CPU, zero drops

Added two HW rows to the baseline table with hard fps telemetry from
progressreport update-freq=5:

  HW 1:1 windowed (1920x1080):      7% CPU, 1488/62 s stream 1:1 wall
  HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall

DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°,
effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen
requires downscale.

The surprise: fullscreen scaling is actually LOWER CPU than native
windowed, within noise. KWin on RK3566 must be handing the dmabuf to
VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy
scale on the display controller. That's the "compositor-bound" bottleneck
from the SW baseline going away entirely once the whole path is dmabuf.

progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s.
Exact 1:1 for the full run means zero drops and real-time delivery.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-24 20:30:33 +00:00
parent 3ef29d615e
commit 670f2ad6ee
+10 -6
View File
@@ -181,7 +181,8 @@ against.
| SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a | | SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a |
| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** | | SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** |
| HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a | | HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a |
| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 | 7 % | 1488 source / 62 s| ~24 (paced) | unknown | | HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 1:1 | 7 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
| HW: `gst v4l2slh264dec → waylandsink fullscreen=true`, scaled | 6 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
Reading: Reading:
- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but - SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but
@@ -191,13 +192,16 @@ Reading:
decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below. decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below.
- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU. - HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU.
- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12) - **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12)
drops the CPU to 7 %** — an ≈18× drop from the SW mpv number. The drops the CPU to 67 %** — an ≈18× drop from the SW mpv number. The
GStreamer `v4l2codecs → waylandsink` path on KWin negotiates GStreamer `v4l2codecs → waylandsink` path on KWin negotiates
dmabuf-direct, bypassing any GL upload. dmabuf-direct, bypassing any GL upload.
- Frame-drop count for the HW waylandsink run is unknown here - **Fit-to-screen scaling is effectively free.** Native 1920×1080 in a
(fpsdisplaysink signal wasn't wired through `gst-launch`); next pass window is 7 % CPU; `fullscreen=true` with KWin scaling the dmabuf down
should parse via `GST_DEBUG=fpsdisplaysink:5` or a `progressreport` to 1280×800 is 6 % CPU. Almost certainly the VOP2 HW scaling planes on
element. RK3566 are doing the downscale during scanout, not the GPU.
- Frame-drop count validated by `progressreport update-freq=5`: stream
position advances 1:1 with wall clock for the full 62 s run — zero
drops, full 24 fps delivery.
### Acceptance criterion ### Acceptance criterion