1 Commits

Author SHA1 Message Date
marfrit 1d0db3b5a9 docs: pure ffmpeg vs daedalus pipeline CPU comparison
Measured on hertz (Pi 5, 6.12.75+rpt-rpi-2712, FFmpeg 7.1.3)
to quantify the architectural cost/benefit of routing decode
through the V4L2 m2m + chardev + dmabuf path vs running
ffmpeg standalone.

1080p × 150 frames, decode-as-fast-as-possible:

  VP9 8-bit:     ffmpeg 214.9% CPU / 1083ms wall
                 daedalus 96.3% CPU / 1229ms wall
  AV1 8-bit:     ffmpeg 201.5% CPU / 1162ms wall
                 daedalus 96.6% CPU / 1478ms wall
  H.264 8-bit:   ffmpeg 205.8% CPU / 1063ms wall
                 daedalus 100.1% CPU / 1020ms wall
  VP9 10-bit:    ffmpeg 155.8% CPU /  269ms wall
                 daedalus 91.6% CPU /  131ms wall

Key takeaway: the daedalus pipeline uses ~half the CPU for
roughly the same wall throughput. FFmpeg standalone defaults
to 2 threads; for single-stream decode that doesn't
parallelise well, so the 2× CPU usage is overhead, not
parallelism benefit. The daemon's single-threaded serialised
event loop avoids that tax.

For the project's 30fps-floor-is-fine target ("daily YouTube
with CPU free for vscode"), daedalus leaves ~2× the CPU
headroom for the rest of the desktop at the same playback
rate.

VP9-10bit is striking — daedalus is faster wallclock too
(131ms vs 269ms) because at small per-frame work FFmpeg's
thread pool spin-up dominates.

Note: "daedalus" still uses FFmpeg internally (Phase 8.8
explicitly deferred QPU substitution after measurement showed
30fps@1080p was already met). The benefit here is
architectural — single-threaded decode, out-of-process
daemon, dmabuf zero-copy — not QPU offload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:20:22 +00:00