daedalus-v4l2

reauktion/daedalus-v4l2

Fork 0

Commit Graph

Author	SHA1	Message	Date
marfrit	1d0db3b5a9	docs: pure ffmpeg vs daedalus pipeline CPU comparison Measured on hertz (Pi 5, 6.12.75+rpt-rpi-2712, FFmpeg 7.1.3) to quantify the architectural cost/benefit of routing decode through the V4L2 m2m + chardev + dmabuf path vs running ffmpeg standalone. 1080p × 150 frames, decode-as-fast-as-possible: VP9 8-bit: ffmpeg 214.9% CPU / 1083ms wall daedalus 96.3% CPU / 1229ms wall AV1 8-bit: ffmpeg 201.5% CPU / 1162ms wall daedalus 96.6% CPU / 1478ms wall H.264 8-bit: ffmpeg 205.8% CPU / 1063ms wall daedalus 100.1% CPU / 1020ms wall VP9 10-bit: ffmpeg 155.8% CPU / 269ms wall daedalus 91.6% CPU / 131ms wall Key takeaway: the daedalus pipeline uses ~half the CPU for roughly the same wall throughput. FFmpeg standalone defaults to 2 threads; for single-stream decode that doesn't parallelise well, so the 2× CPU usage is overhead, not parallelism benefit. The daemon's single-threaded serialised event loop avoids that tax. For the project's 30fps-floor-is-fine target ("daily YouTube with CPU free for vscode"), daedalus leaves ~2× the CPU headroom for the rest of the desktop at the same playback rate. VP9-10bit is striking — daedalus is faster wallclock too (131ms vs 269ms) because at small per-frame work FFmpeg's thread pool spin-up dominates. Note: "daedalus" still uses FFmpeg internally (Phase 8.8 explicitly deferred QPU substitution after measurement showed 30fps@1080p was already met). The benefit here is architectural — single-threaded decode, out-of-process daemon, dmabuf zero-copy — not QPU offload. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-18 17:20:22 +00:00

Author

SHA1

Message

Date

marfrit

1d0db3b5a9

docs: pure ffmpeg vs daedalus pipeline CPU comparison

Measured on hertz (Pi 5, 6.12.75+rpt-rpi-2712, FFmpeg 7.1.3)
to quantify the architectural cost/benefit of routing decode
through the V4L2 m2m + chardev + dmabuf path vs running
ffmpeg standalone.

1080p × 150 frames, decode-as-fast-as-possible:

  VP9 8-bit:     ffmpeg 214.9% CPU / 1083ms wall
                 daedalus 96.3% CPU / 1229ms wall
  AV1 8-bit:     ffmpeg 201.5% CPU / 1162ms wall
                 daedalus 96.6% CPU / 1478ms wall
  H.264 8-bit:   ffmpeg 205.8% CPU / 1063ms wall
                 daedalus 100.1% CPU / 1020ms wall
  VP9 10-bit:    ffmpeg 155.8% CPU /  269ms wall
                 daedalus 91.6% CPU /  131ms wall

Key takeaway: the daedalus pipeline uses ~half the CPU for
roughly the same wall throughput. FFmpeg standalone defaults
to 2 threads; for single-stream decode that doesn't
parallelise well, so the 2× CPU usage is overhead, not
parallelism benefit. The daemon's single-threaded serialised
event loop avoids that tax.

For the project's 30fps-floor-is-fine target ("daily YouTube
with CPU free for vscode"), daedalus leaves ~2× the CPU
headroom for the rest of the desktop at the same playback
rate.

VP9-10bit is striking — daedalus is faster wallclock too
(131ms vs 269ms) because at small per-frame work FFmpeg's
thread pool spin-up dominates.

Note: "daedalus" still uses FFmpeg internally (Phase 8.8
explicitly deferred QPU substitution after measurement showed
30fps@1080p was already met). The benefit here is
architectural — single-threaded decode, out-of-process
daemon, dmabuf zero-copy — not QPU offload.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-05-18 17:20:22 +00:00

1 Commits