Files
daedalus-v4l2/docs/architecture.md
marfrit c7d8050cc9 Initial scaffold: daedalus-v4l2 sibling repo
V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).

Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
  v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
  browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
  fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
  daedalus-fourier so kernel-library API stays clean.

Components:
  kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
    device + chardev bridge to userspace daemon)
  daemon/ - userspace decoder daemon (BSD-2-Clause; links
    libdaedalus_core.a from sibling; dlopens FFmpeg)
  docs/   - architecture + 7-phase roadmap (8.1..8.7)
  include/ - shared headers between kernel and daemon

Roadmap (7 sub-phases, ~1 week each):
  8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
  8.2 chardev bridge (kernel ↔ daemon ping-pong)
  8.3 daemon FFmpeg dlopen + parse path
  8.4 VP9 end-to-end via daedalus_dispatch_*
  8.5 dmabuf / DRM PRIME for zero-copy
  8.6 AV1 + H.264 codec support
  8.7 performance: hit 30fps@1080p (project floor)

No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:54:56 +00:00

4.2 KiB
Raw Permalink Blame History

daedalus-v4l2 — architecture

Components

1. Kernel module (kernel/)

A Linux V4L2 stateless decoder driver. Registers as /dev/videoNN with VFL_TYPE_VIDEO and supports VP9, AV1, and H.264 stateless decoder controls (matching the existing V4L2 stateless uAPI used by libva-v4l2-request-fourier).

Internally it does NOT decode bitstream. It:

  1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with STATELESS controls, VIDIOC_QBUF, etc.)
  2. Marshals the bitstream and per-frame/per-slice control structs onto a chardev (or netlink) channel.
  3. Pulls decoded frames back from userspace daemon.
  4. Returns them via VIDIOC_DQBUF.

Out-of-tree kernel module (built with make against the running kernel's headers; loaded via insmod).

2. Userspace daemon (daemon/)

A long-running daemon that:

  1. Connects to the kernel module's chardev.
  2. Pulls bitstream + control blobs.
  3. Drives FFmpeg parsers via dlopen to get per-block metadata (block positions, MVs, coefficients, tc0 arrays, etc.).
  4. Calls daedalus_dispatch_* from libdaedalus_core.a (sibling repo) to do the actual per-block work on NEON / V3D.
  5. Posts decoded frames back to the kernel module via the chardev.

Architecture: single-threaded event loop initially; per-stream worker threads later if needed.

3. Bitstream parser layer (Option γ — runtime dlopen)

Instead of vendoring FFmpeg's parsers, the daemon loads the system FFmpeg at runtime. Two integration patterns:

  • a. AVCodec/AVPacket through libavcodec: feed packets to avcodec_send_packet, intercept the parse-only stage and pull out block metadata before the actual decode runs.
  • b. Custom parser via libavcodec internal APIs: messier but avoids running FFmpeg's full decode path.

Plan: try (a) first. If FFmpeg's internal API doesn't expose the per-block info we need, fall back to (b) or vendor a minimal parser per codec.

Communication: kernel ↔ daemon

Initial plan: single chardev /dev/daedalus-v4l2 with a simple request/response protocol:

REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob }
RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format }

Alternative: netlink socket (more standard for kernel-userspace IPC, but more boilerplate). Chardev is simpler for v1.

Memory: DRM PRIME / dmabuf

For browser zero-copy, the kernel module needs to register the decoded frame buffers as dmabuf handles that V4L2 hands out via PRIME export. The daedalus-fourier kernel library writes pixels to CPU-mapped memory; the kernel module manages the DMA-coherent allocation and PRIME export.

Two strategies:

  • Strategy A: kernel module allocates dmabuf, mmaps it into daemon via the chardev, daemon writes pixels there.
  • Strategy B: daemon allocates via libdrm, transfers dmabuf-fd to kernel via chardev, kernel exposes via PRIME.

Strategy A is simpler; B is more flexible. Start with A.

Build & deploy

  • Kernel module: cd kernel && make (out-of-tree against running kernel headers; /lib/modules/$(uname -r)/build path).
  • Daemon: CMake, depends on installed libdaedalus_core.a from sibling repo. Run as systemd service or under direct user invocation.

What's NOT in this repo

  • The cycles 1-9 video kernels — those live in sibling daedalus-fourier and are consumed via include/daedalus.h.
  • The browser side (firefox-fourier / chromium-fourier) — those are their own sibling projects.
  • libva-v4l2-request-fourier — sibling, talks to our /dev/videoNN via V4L2 ioctls.

Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan)

  1. 8.1: kernel module skeleton — register /dev/videoNN with stub ioctls, no decoding.
  2. 8.2: chardev bridge — kernel ↔ daemon round-trip with dummy bitstream/dummy frame data.
  3. 8.3: daemon FFmpeg dlopen + parse path — pull per-frame info from FFmpeg without decoding.
  4. 8.4: dispatch one codec end-to-end via daedalus-fourier (VP9 first since it has the most QPU-deployed kernels).
  5. 8.5: dmabuf integration — first browser zero-copy frame.
  6. 8.6: AV1 + H.264 added.
  7. 8.7: performance tuning; 30fps@1080p target.

Per-phase effort: each is roughly a week of focused work.