c7d8050cc9
V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
daedalus-fourier so kernel-library API stays clean.
Components:
kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
device + chardev bridge to userspace daemon)
daemon/ - userspace decoder daemon (BSD-2-Clause; links
libdaedalus_core.a from sibling; dlopens FFmpeg)
docs/ - architecture + 7-phase roadmap (8.1..8.7)
include/ - shared headers between kernel and daemon
Roadmap (7 sub-phases, ~1 week each):
8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
8.2 chardev bridge (kernel ↔ daemon ping-pong)
8.3 daemon FFmpeg dlopen + parse path
8.4 VP9 end-to-end via daedalus_dispatch_*
8.5 dmabuf / DRM PRIME for zero-copy
8.6 AV1 + H.264 codec support
8.7 performance: hit 30fps@1080p (project floor)
No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
116 lines
4.2 KiB
Markdown
116 lines
4.2 KiB
Markdown
# daedalus-v4l2 — architecture
|
||
|
||
## Components
|
||
|
||
### 1. Kernel module (`kernel/`)
|
||
|
||
A Linux V4L2 stateless decoder driver. Registers as
|
||
`/dev/videoNN` with `VFL_TYPE_VIDEO` and supports VP9, AV1,
|
||
and H.264 stateless decoder controls (matching the existing
|
||
V4L2 stateless uAPI used by libva-v4l2-request-fourier).
|
||
|
||
Internally it does NOT decode bitstream. It:
|
||
1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with
|
||
STATELESS controls, VIDIOC_QBUF, etc.)
|
||
2. Marshals the bitstream and per-frame/per-slice control
|
||
structs onto a chardev (or netlink) channel.
|
||
3. Pulls decoded frames back from userspace daemon.
|
||
4. Returns them via VIDIOC_DQBUF.
|
||
|
||
Out-of-tree kernel module (built with `make` against the
|
||
running kernel's headers; loaded via `insmod`).
|
||
|
||
### 2. Userspace daemon (`daemon/`)
|
||
|
||
A long-running daemon that:
|
||
1. Connects to the kernel module's chardev.
|
||
2. Pulls bitstream + control blobs.
|
||
3. Drives FFmpeg parsers via `dlopen` to get per-block
|
||
metadata (block positions, MVs, coefficients, tc0
|
||
arrays, etc.).
|
||
4. Calls `daedalus_dispatch_*` from `libdaedalus_core.a`
|
||
(sibling repo) to do the actual per-block work on
|
||
NEON / V3D.
|
||
5. Posts decoded frames back to the kernel module via
|
||
the chardev.
|
||
|
||
Architecture: single-threaded event loop initially; per-stream
|
||
worker threads later if needed.
|
||
|
||
### 3. Bitstream parser layer (Option γ — runtime dlopen)
|
||
|
||
Instead of vendoring FFmpeg's parsers, the daemon loads the
|
||
system FFmpeg at runtime. Two integration patterns:
|
||
|
||
- **a. AVCodec/AVPacket through libavcodec**: feed packets to
|
||
`avcodec_send_packet`, intercept the parse-only stage and
|
||
pull out block metadata before the actual decode runs.
|
||
- **b. Custom parser via libavcodec internal APIs**: messier
|
||
but avoids running FFmpeg's full decode path.
|
||
|
||
Plan: try (a) first. If FFmpeg's internal API doesn't expose
|
||
the per-block info we need, fall back to (b) or vendor a
|
||
minimal parser per codec.
|
||
|
||
## Communication: kernel ↔ daemon
|
||
|
||
Initial plan: single chardev `/dev/daedalus-v4l2` with a
|
||
simple request/response protocol:
|
||
|
||
```
|
||
REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob }
|
||
RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format }
|
||
```
|
||
|
||
Alternative: netlink socket (more standard for kernel-userspace
|
||
IPC, but more boilerplate). Chardev is simpler for v1.
|
||
|
||
## Memory: DRM PRIME / dmabuf
|
||
|
||
For browser zero-copy, the kernel module needs to register the
|
||
decoded frame buffers as dmabuf handles that V4L2 hands out via
|
||
PRIME export. The daedalus-fourier kernel library writes pixels
|
||
to CPU-mapped memory; the kernel module manages the
|
||
DMA-coherent allocation and PRIME export.
|
||
|
||
Two strategies:
|
||
- **Strategy A**: kernel module allocates dmabuf, mmaps it into
|
||
daemon via the chardev, daemon writes pixels there.
|
||
- **Strategy B**: daemon allocates via libdrm, transfers
|
||
dmabuf-fd to kernel via chardev, kernel exposes via PRIME.
|
||
|
||
Strategy A is simpler; B is more flexible. Start with A.
|
||
|
||
## Build & deploy
|
||
|
||
- Kernel module: `cd kernel && make` (out-of-tree against running
|
||
kernel headers; `/lib/modules/$(uname -r)/build` path).
|
||
- Daemon: CMake, depends on installed `libdaedalus_core.a` from
|
||
sibling repo. Run as systemd service or under direct user
|
||
invocation.
|
||
|
||
## What's NOT in this repo
|
||
|
||
- The cycles 1-9 video kernels — those live in sibling
|
||
daedalus-fourier and are consumed via `include/daedalus.h`.
|
||
- The browser side (firefox-fourier / chromium-fourier) — those
|
||
are their own sibling projects.
|
||
- libva-v4l2-request-fourier — sibling, talks to our
|
||
`/dev/videoNN` via V4L2 ioctls.
|
||
|
||
## Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan)
|
||
|
||
1. **8.1**: kernel module skeleton — register /dev/videoNN with
|
||
stub ioctls, no decoding.
|
||
2. **8.2**: chardev bridge — kernel ↔ daemon round-trip with
|
||
dummy bitstream/dummy frame data.
|
||
3. **8.3**: daemon FFmpeg dlopen + parse path — pull per-frame
|
||
info from FFmpeg without decoding.
|
||
4. **8.4**: dispatch one codec end-to-end via daedalus-fourier
|
||
(VP9 first since it has the most QPU-deployed kernels).
|
||
5. **8.5**: dmabuf integration — first browser zero-copy frame.
|
||
6. **8.6**: AV1 + H.264 added.
|
||
7. **8.7**: performance tuning; 30fps@1080p target.
|
||
|
||
Per-phase effort: each is roughly a week of focused work.
|