ohm task (a): fit-to-screen HW decode is free — 6% CPU, zero drops
Added two HW rows to the baseline table with hard fps telemetry from progressreport update-freq=5: HW 1:1 windowed (1920x1080): 7% CPU, 1488/62 s stream 1:1 wall HW fullscreen=true (scaled 1280x800): 6% CPU, 1488/62 s stream 1:1 wall DSI-1 is 800x1280 at 59.99 Hz with KWin applying output_transform 270°, effective geometry 1280x800 landscape. Video is 1920x1080 so fullscreen requires downscale. The surprise: fullscreen scaling is actually LOWER CPU than native windowed, within noise. KWin on RK3566 must be handing the dmabuf to VOP2's HW scaling planes for direct scanout — a zero-CPU, zero-copy scale on the display controller. That's the "compositor-bound" bottleneck from the SW baseline going away entirely once the whole path is dmabuf. progressreport stream-position vs wall-clock: 4@5s, 9@10s, ... 62@62s. Exact 1:1 for the full run means zero drops and real-time delivery. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -181,7 +181,8 @@ against.
|
|||||||
| SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a |
|
| SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a |
|
||||||
| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** |
|
| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** |
|
||||||
| HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a |
|
| HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a |
|
||||||
| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 | 7 % | 1488 source / 62 s| ~24 (paced) | unknown |
|
| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 1:1 | 7 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
|
||||||
|
| HW: `gst v4l2slh264dec → waylandsink fullscreen=true`, scaled | 6 % | 1488 / 62 s | 24.0 (paced) | 0 (progressreport 1:1) |
|
||||||
|
|
||||||
Reading:
|
Reading:
|
||||||
- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but
|
- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but
|
||||||
@@ -191,13 +192,16 @@ Reading:
|
|||||||
decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below.
|
decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below.
|
||||||
- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU.
|
- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU.
|
||||||
- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12)
|
- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12)
|
||||||
drops the CPU to 7 %** — an ≈18× drop from the SW mpv number. The
|
drops the CPU to 6–7 %** — an ≈18× drop from the SW mpv number. The
|
||||||
GStreamer `v4l2codecs → waylandsink` path on KWin negotiates
|
GStreamer `v4l2codecs → waylandsink` path on KWin negotiates
|
||||||
dmabuf-direct, bypassing any GL upload.
|
dmabuf-direct, bypassing any GL upload.
|
||||||
- Frame-drop count for the HW waylandsink run is unknown here
|
- **Fit-to-screen scaling is effectively free.** Native 1920×1080 in a
|
||||||
(fpsdisplaysink signal wasn't wired through `gst-launch`); next pass
|
window is 7 % CPU; `fullscreen=true` with KWin scaling the dmabuf down
|
||||||
should parse via `GST_DEBUG=fpsdisplaysink:5` or a `progressreport`
|
to 1280×800 is 6 % CPU. Almost certainly the VOP2 HW scaling planes on
|
||||||
element.
|
RK3566 are doing the downscale during scanout, not the GPU.
|
||||||
|
- Frame-drop count validated by `progressreport update-freq=5`: stream
|
||||||
|
position advances 1:1 with wall clock for the full 62 s run — zero
|
||||||
|
drops, full 24 fps delivery.
|
||||||
|
|
||||||
### Acceptance criterion
|
### Acceptance criterion
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user