diff --git a/README.md b/README.md index 59e564c..d0eae2c 100644 --- a/README.md +++ b/README.md @@ -158,30 +158,46 @@ picks up rkvdec2 patches. 6. **Kodi + mpv validation** — 1080p HEVC at <30% of one core target 7. **Document** — freeze the final recipe in this README -**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 done for H.264 on -ohm; step 5 scaffold landed (`marfrit-packages/arch/ffmpeg-v4l2-request-git/` -on `main`, commit `9994c4e`, not yet pushed — first fermi CI build + ohm -install still to run). Immediate next: push, build, install, validate. +**Status 2026-04-24**: ohm reachable; steps 1 + 2 + 3 + 4 done for H.264 on +ohm (GStreamer `v4l2slh264dec → waylandsink` at 7 % CPU with zero-copy +dmabuf NV12). Step 5 scaffold landed +(`marfrit-packages/arch/ffmpeg-v4l2-request-git/` on `main`, commit +`9994c4e`, unpushed — fermi CI build + ohm install still to run). +Immediate next: push, build, install, validate the ffmpeg path. -### SW baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`) +### Baseline numbers (2026-04-24, ohm, `bbb_1080p30_h264.mp4`) Clip SHA-16 `dcf8a7170fbd49bb` pulled from doppler `/moviedata/fourier-test/` via hertz `lxc file pull` → http bridge → ohm -(same bytes on every fleet device). +(same bytes on every fleet device). **Source frame rate is 24 fps** per +H.264 caps (`framerate=24/1`) — the `1080p30` in the file name is a +misnomer carried from Blender's encode metadata; the actual media rate is +24 fps. That's the number the target/delivered columns below are measured +against. -| Test | CPU% | Frames / wall | Effective fps | Dropped | -|----------------------------------------|-------|----------------|---------------|---------| -| `ffmpeg -hwaccel none -f null -` | 319 % | 1440 / 18.55 s | 77.6 | n/a | -| `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 23.9 (paced) | n/a | -| `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1800 target | ~14 delivered | **973** | +| Test | CPU% | Frames / wall | Effective fps | Dropped | +|--------------------------------------------------------------|-------|-------------------|---------------|---------| +| SW: `ffmpeg -hwaccel none -f null -` | 319 % | 1440 / 18.55 s | 77.6 | n/a | +| SW: `ffmpeg -re -hwaccel none -f null -` | 90 % | 1440 / 60.27 s | 24.0 (paced) | n/a | +| SW: `mpv --hwdec=no --vo=gpu-next`, DSI-1 | 127 % | 1440 source / 60 s| ~7.8 delivered| **973** | +| HW: `gst v4l2slh264dec → fakesink sync=false` | 89 % | 1800 / 48.9 s | 36.8 | n/a | +| HW: `gst v4l2slh264dec → waylandsink` (dmabuf), DSI-1 | 7 % | 1488 source / 62 s| ~24 (paced) | unknown | -Reading: raw SW decode has ~2.6× headroom over realtime on the 4× A55 -(≈1 core saturated per 30 fps stream). But mpv with `gpu-next` through -KWin drops **54 %** of frames while only using 127 % CPU — the -compositor / VO path is the bottleneck, not decode. Exactly the -"compositor-bound ≠ decode-bound" gotcha below. HW decode will cut the -decode cost but won't by itself fix the composition drops; `--vo=drm` -(direct KMS scanout) or a dmabuf-capable VO is the lever for that. +Reading: +- SW decode alone has ~3.2× headroom over source rate (77.6 / 24 fps) but + costs ~3.2 cores unpaced or 1 core at realtime pace. +- SW mpv via `gpu-next` through KWin drops **68 %** of frames while only + using 127 % CPU — the compositor / VO path is the bottleneck, not + decode. Exactly the "compositor-bound ≠ decode-bound" gotcha below. +- HW decode to `fakesink` clocks ~36.8 fps (~1.5× source rate), 89 % CPU. +- **HW decode to `waylandsink` via zero-copy dmabuf (`DMA_DRM` NV12) + drops the CPU to 7 %** — an ≈18× drop from the SW mpv number. The + GStreamer `v4l2codecs → waylandsink` path on KWin negotiates + dmabuf-direct, bypassing any GL upload. +- Frame-drop count for the HW waylandsink run is unknown here + (fpsdisplaysink signal wasn't wired through `gst-launch`); next pass + should parse via `GST_DEBUG=fpsdisplaysink:5` or a `progressreport` + element. ### Acceptance criterion