V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
daedalus-fourier so kernel-library API stays clean.
Components:
kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
device + chardev bridge to userspace daemon)
daemon/ - userspace decoder daemon (BSD-2-Clause; links
libdaedalus_core.a from sibling; dlopens FFmpeg)
docs/ - architecture + 7-phase roadmap (8.1..8.7)
include/ - shared headers between kernel and daemon
Roadmap (7 sub-phases, ~1 week each):
8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
8.2 chardev bridge (kernel ↔ daemon ping-pong)
8.3 daemon FFmpeg dlopen + parse path
8.4 VP9 end-to-end via daedalus_dispatch_*
8.5 dmabuf / DRM PRIME for zero-copy
8.6 AV1 + H.264 codec support
8.7 performance: hit 30fps@1080p (project floor)
No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
4.2 KiB
daedalus-v4l2 — architecture
Components
1. Kernel module (kernel/)
A Linux V4L2 stateless decoder driver. Registers as
/dev/videoNN with VFL_TYPE_VIDEO and supports VP9, AV1,
and H.264 stateless decoder controls (matching the existing
V4L2 stateless uAPI used by libva-v4l2-request-fourier).
Internally it does NOT decode bitstream. It:
- Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with STATELESS controls, VIDIOC_QBUF, etc.)
- Marshals the bitstream and per-frame/per-slice control structs onto a chardev (or netlink) channel.
- Pulls decoded frames back from userspace daemon.
- Returns them via VIDIOC_DQBUF.
Out-of-tree kernel module (built with make against the
running kernel's headers; loaded via insmod).
2. Userspace daemon (daemon/)
A long-running daemon that:
- Connects to the kernel module's chardev.
- Pulls bitstream + control blobs.
- Drives FFmpeg parsers via
dlopento get per-block metadata (block positions, MVs, coefficients, tc0 arrays, etc.). - Calls
daedalus_dispatch_*fromlibdaedalus_core.a(sibling repo) to do the actual per-block work on NEON / V3D. - Posts decoded frames back to the kernel module via the chardev.
Architecture: single-threaded event loop initially; per-stream worker threads later if needed.
3. Bitstream parser layer (Option γ — runtime dlopen)
Instead of vendoring FFmpeg's parsers, the daemon loads the system FFmpeg at runtime. Two integration patterns:
- a. AVCodec/AVPacket through libavcodec: feed packets to
avcodec_send_packet, intercept the parse-only stage and pull out block metadata before the actual decode runs. - b. Custom parser via libavcodec internal APIs: messier but avoids running FFmpeg's full decode path.
Plan: try (a) first. If FFmpeg's internal API doesn't expose the per-block info we need, fall back to (b) or vendor a minimal parser per codec.
Communication: kernel ↔ daemon
Initial plan: single chardev /dev/daedalus-v4l2 with a
simple request/response protocol:
REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob }
RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format }
Alternative: netlink socket (more standard for kernel-userspace IPC, but more boilerplate). Chardev is simpler for v1.
Memory: DRM PRIME / dmabuf
For browser zero-copy, the kernel module needs to register the decoded frame buffers as dmabuf handles that V4L2 hands out via PRIME export. The daedalus-fourier kernel library writes pixels to CPU-mapped memory; the kernel module manages the DMA-coherent allocation and PRIME export.
Two strategies:
- Strategy A: kernel module allocates dmabuf, mmaps it into daemon via the chardev, daemon writes pixels there.
- Strategy B: daemon allocates via libdrm, transfers dmabuf-fd to kernel via chardev, kernel exposes via PRIME.
Strategy A is simpler; B is more flexible. Start with A.
Build & deploy
- Kernel module:
cd kernel && make(out-of-tree against running kernel headers;/lib/modules/$(uname -r)/buildpath). - Daemon: CMake, depends on installed
libdaedalus_core.afrom sibling repo. Run as systemd service or under direct user invocation.
What's NOT in this repo
- The cycles 1-9 video kernels — those live in sibling
daedalus-fourier and are consumed via
include/daedalus.h. - The browser side (firefox-fourier / chromium-fourier) — those are their own sibling projects.
- libva-v4l2-request-fourier — sibling, talks to our
/dev/videoNNvia V4L2 ioctls.
Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan)
- 8.1: kernel module skeleton — register /dev/videoNN with stub ioctls, no decoding.
- 8.2: chardev bridge — kernel ↔ daemon round-trip with dummy bitstream/dummy frame data.
- 8.3: daemon FFmpeg dlopen + parse path — pull per-frame info from FFmpeg without decoding.
- 8.4: dispatch one codec end-to-end via daedalus-fourier (VP9 first since it has the most QPU-deployed kernels).
- 8.5: dmabuf integration — first browser zero-copy frame.
- 8.6: AV1 + H.264 added.
- 8.7: performance tuning; 30fps@1080p target.
Per-phase effort: each is roughly a week of focused work.