Initial scaffold: daedalus-v4l2 sibling repo
V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).
Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
daedalus-fourier so kernel-library API stays clean.
Components:
kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
device + chardev bridge to userspace daemon)
daemon/ - userspace decoder daemon (BSD-2-Clause; links
libdaedalus_core.a from sibling; dlopens FFmpeg)
docs/ - architecture + 7-phase roadmap (8.1..8.7)
include/ - shared headers between kernel and daemon
Roadmap (7 sub-phases, ~1 week each):
8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
8.2 chardev bridge (kernel ↔ daemon ping-pong)
8.3 daemon FFmpeg dlopen + parse path
8.4 VP9 end-to-end via daedalus_dispatch_*
8.5 dmabuf / DRM PRIME for zero-copy
8.6 AV1 + H.264 codec support
8.7 performance: hit 30fps@1080p (project floor)
No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
+11
@@ -0,0 +1,11 @@
|
|||||||
|
build/
|
||||||
|
*.o
|
||||||
|
*.ko
|
||||||
|
*.mod
|
||||||
|
*.mod.c
|
||||||
|
*.cmd
|
||||||
|
*.symvers
|
||||||
|
*.order
|
||||||
|
.tmp_versions/
|
||||||
|
Module.symvers
|
||||||
|
modules.order
|
||||||
@@ -0,0 +1,83 @@
|
|||||||
|
# daedalus-v4l2
|
||||||
|
|
||||||
|
V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the
|
||||||
|
`daedalus-fourier` kernel library (VP9 + AV1 CDEF + H.264 video
|
||||||
|
decode kernels on VideoCore VII compute + ARM NEON).
|
||||||
|
|
||||||
|
**Status:** scaffold (2026-05-18). Architecture locked per
|
||||||
|
[daedalus-fourier session memory](https://git.reauktion.de/marfrit/daedalus-fourier);
|
||||||
|
implementation not yet begun.
|
||||||
|
|
||||||
|
## What this is
|
||||||
|
|
||||||
|
A two-piece userspace + kernel-module stack that exposes a V4L2
|
||||||
|
stateless decoder interface (`/dev/videoNN`) so that
|
||||||
|
`libva-v4l2-request-fourier` → `firefox-fourier` /
|
||||||
|
`chromium-fourier` can drive it the same way they drive existing
|
||||||
|
hardware-decode pipelines on Pi 5 / RK3588.
|
||||||
|
|
||||||
|
```
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| firefox-fourier / chromium-fourier (existing) |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| VA-API |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| libva-v4l2-request-fourier (existing, sibling project) |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| V4L2 stateless ioctl uAPI |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| daedalus-v4l2 kernel module (`kernel/`) |
|
||||||
|
| - registers /dev/videoNN |
|
||||||
|
| - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) |
|
||||||
|
| - forwards bitstream + controls to userspace daemon |
|
||||||
|
| via chardev or netlink |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| daedalus-v4l2 userspace daemon (`daemon/`) |
|
||||||
|
| - takes bitstream blobs + per-slice controls |
|
||||||
|
| - drives FFmpeg parsers via dlopen (Option γ) |
|
||||||
|
| - dispatches per-block ops via daedalus-fourier |
|
||||||
|
| public API (daedalus_dispatch_*) |
|
||||||
|
| - posts decoded frames back to kernel module |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| daedalus-fourier kernel library (sibling project) |
|
||||||
|
| - exports include/daedalus.h public API |
|
||||||
|
| - per-kernel CPU NEON + opportunistic V3D QPU dispatch |
|
||||||
|
| - 9 closed cycles across VP9, AV1 CDEF, H.264 |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712) |
|
||||||
|
+-----------------------------------------------------------+
|
||||||
|
```
|
||||||
|
|
||||||
|
## Why this architecture (Option B + γ + sibling)
|
||||||
|
|
||||||
|
Locked by user 2026-05-18 from 3 options in
|
||||||
|
`daedalus-fourier/docs/phase8_scoping.md`:
|
||||||
|
|
||||||
|
- **Option B** over A (userspace v4l2loopback): real `/dev/videoNN`,
|
||||||
|
proper DRM PRIME / dmabuf for browser zero-copy.
|
||||||
|
- **Option γ**: dlopen FFmpeg as parser at runtime. No vendoring,
|
||||||
|
fastest to v1.
|
||||||
|
- **Sibling repo**: per `project_consumer_target` convention,
|
||||||
|
V4L2-side work lives outside daedalus-fourier so the
|
||||||
|
kernel-library has a clean API boundary.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Initial scaffold only. See `docs/architecture.md` for the
|
||||||
|
deeper design and `docs/roadmap.md` for the
|
||||||
|
sub-phase breakdown.
|
||||||
|
|
||||||
|
## Repo layout
|
||||||
|
|
||||||
|
- `kernel/` — Linux kernel module (V4L2 device registration +
|
||||||
|
ioctl handling + userspace chardev bridge). Out-of-tree.
|
||||||
|
- `daemon/` — userspace decoder daemon (links
|
||||||
|
`libdaedalus_core.a` from sibling daedalus-fourier; uses
|
||||||
|
dlopen for FFmpeg parser).
|
||||||
|
- `include/` — shared headers between kernel and daemon.
|
||||||
|
- `docs/` — architecture + roadmap.
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
Kernel module: GPLv2 (required for kernel-tree compatibility).
|
||||||
|
Userspace daemon: BSD-2-Clause (matches daedalus-fourier).
|
||||||
@@ -0,0 +1,53 @@
|
|||||||
|
# daemon/ — daedalus-v4l2 userspace decoder daemon
|
||||||
|
|
||||||
|
Userspace daemon that:
|
||||||
|
1. Connects to the kernel module's chardev
|
||||||
|
2. Receives bitstream + V4L2 control blobs
|
||||||
|
3. Parses bitstream via dlopen'd FFmpeg
|
||||||
|
4. Dispatches per-block work via `daedalus_dispatch_*`
|
||||||
|
from sibling `daedalus-fourier`
|
||||||
|
5. Returns decoded frames to kernel
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Scaffold only. Phase 8.3 not yet started.
|
||||||
|
|
||||||
|
## Build dependencies (planned)
|
||||||
|
|
||||||
|
- libdaedalus_core.a from sibling daedalus-fourier (static link)
|
||||||
|
- FFmpeg dev headers (for AVPacket/AVCodec interface types) +
|
||||||
|
runtime FFmpeg .so (loaded via dlopen)
|
||||||
|
- libv4l2 (for V4L2 control struct definitions)
|
||||||
|
- pthread
|
||||||
|
|
||||||
|
## Build (when implemented)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
mkdir build && cd build
|
||||||
|
cmake .. -DDAEDALUS_FOURIER_DIR=/path/to/daedalus-fourier
|
||||||
|
make
|
||||||
|
```
|
||||||
|
|
||||||
|
## Layout (planned)
|
||||||
|
|
||||||
|
- `CMakeLists.txt`
|
||||||
|
- `src/main.c` — event loop, chardev connection
|
||||||
|
- `src/parser.c` — FFmpeg dlopen wrapper + per-codec dispatch
|
||||||
|
- `src/decode_vp9.c`, `src/decode_av1.c`, `src/decode_h264.c` —
|
||||||
|
per-codec block walkers
|
||||||
|
- `src/frame_io.c` — frame allocation, return to kernel
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
BSD-2-Clause (matches daedalus-fourier sibling).
|
||||||
|
|
||||||
|
## Phase 8.3 starting point
|
||||||
|
|
||||||
|
A standalone program that:
|
||||||
|
1. Opens a .ivf or .mp4
|
||||||
|
2. Pulls codec packets via dlopen'd avformat
|
||||||
|
3. Calls dlopen'd avcodec to parse (without decoding)
|
||||||
|
4. Walks the block-level metadata
|
||||||
|
5. Validates output structure
|
||||||
|
|
||||||
|
No kernel involvement yet — just confirm the parse path works.
|
||||||
@@ -0,0 +1,115 @@
|
|||||||
|
# daedalus-v4l2 — architecture
|
||||||
|
|
||||||
|
## Components
|
||||||
|
|
||||||
|
### 1. Kernel module (`kernel/`)
|
||||||
|
|
||||||
|
A Linux V4L2 stateless decoder driver. Registers as
|
||||||
|
`/dev/videoNN` with `VFL_TYPE_VIDEO` and supports VP9, AV1,
|
||||||
|
and H.264 stateless decoder controls (matching the existing
|
||||||
|
V4L2 stateless uAPI used by libva-v4l2-request-fourier).
|
||||||
|
|
||||||
|
Internally it does NOT decode bitstream. It:
|
||||||
|
1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with
|
||||||
|
STATELESS controls, VIDIOC_QBUF, etc.)
|
||||||
|
2. Marshals the bitstream and per-frame/per-slice control
|
||||||
|
structs onto a chardev (or netlink) channel.
|
||||||
|
3. Pulls decoded frames back from userspace daemon.
|
||||||
|
4. Returns them via VIDIOC_DQBUF.
|
||||||
|
|
||||||
|
Out-of-tree kernel module (built with `make` against the
|
||||||
|
running kernel's headers; loaded via `insmod`).
|
||||||
|
|
||||||
|
### 2. Userspace daemon (`daemon/`)
|
||||||
|
|
||||||
|
A long-running daemon that:
|
||||||
|
1. Connects to the kernel module's chardev.
|
||||||
|
2. Pulls bitstream + control blobs.
|
||||||
|
3. Drives FFmpeg parsers via `dlopen` to get per-block
|
||||||
|
metadata (block positions, MVs, coefficients, tc0
|
||||||
|
arrays, etc.).
|
||||||
|
4. Calls `daedalus_dispatch_*` from `libdaedalus_core.a`
|
||||||
|
(sibling repo) to do the actual per-block work on
|
||||||
|
NEON / V3D.
|
||||||
|
5. Posts decoded frames back to the kernel module via
|
||||||
|
the chardev.
|
||||||
|
|
||||||
|
Architecture: single-threaded event loop initially; per-stream
|
||||||
|
worker threads later if needed.
|
||||||
|
|
||||||
|
### 3. Bitstream parser layer (Option γ — runtime dlopen)
|
||||||
|
|
||||||
|
Instead of vendoring FFmpeg's parsers, the daemon loads the
|
||||||
|
system FFmpeg at runtime. Two integration patterns:
|
||||||
|
|
||||||
|
- **a. AVCodec/AVPacket through libavcodec**: feed packets to
|
||||||
|
`avcodec_send_packet`, intercept the parse-only stage and
|
||||||
|
pull out block metadata before the actual decode runs.
|
||||||
|
- **b. Custom parser via libavcodec internal APIs**: messier
|
||||||
|
but avoids running FFmpeg's full decode path.
|
||||||
|
|
||||||
|
Plan: try (a) first. If FFmpeg's internal API doesn't expose
|
||||||
|
the per-block info we need, fall back to (b) or vendor a
|
||||||
|
minimal parser per codec.
|
||||||
|
|
||||||
|
## Communication: kernel ↔ daemon
|
||||||
|
|
||||||
|
Initial plan: single chardev `/dev/daedalus-v4l2` with a
|
||||||
|
simple request/response protocol:
|
||||||
|
|
||||||
|
```
|
||||||
|
REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob }
|
||||||
|
RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format }
|
||||||
|
```
|
||||||
|
|
||||||
|
Alternative: netlink socket (more standard for kernel-userspace
|
||||||
|
IPC, but more boilerplate). Chardev is simpler for v1.
|
||||||
|
|
||||||
|
## Memory: DRM PRIME / dmabuf
|
||||||
|
|
||||||
|
For browser zero-copy, the kernel module needs to register the
|
||||||
|
decoded frame buffers as dmabuf handles that V4L2 hands out via
|
||||||
|
PRIME export. The daedalus-fourier kernel library writes pixels
|
||||||
|
to CPU-mapped memory; the kernel module manages the
|
||||||
|
DMA-coherent allocation and PRIME export.
|
||||||
|
|
||||||
|
Two strategies:
|
||||||
|
- **Strategy A**: kernel module allocates dmabuf, mmaps it into
|
||||||
|
daemon via the chardev, daemon writes pixels there.
|
||||||
|
- **Strategy B**: daemon allocates via libdrm, transfers
|
||||||
|
dmabuf-fd to kernel via chardev, kernel exposes via PRIME.
|
||||||
|
|
||||||
|
Strategy A is simpler; B is more flexible. Start with A.
|
||||||
|
|
||||||
|
## Build & deploy
|
||||||
|
|
||||||
|
- Kernel module: `cd kernel && make` (out-of-tree against running
|
||||||
|
kernel headers; `/lib/modules/$(uname -r)/build` path).
|
||||||
|
- Daemon: CMake, depends on installed `libdaedalus_core.a` from
|
||||||
|
sibling repo. Run as systemd service or under direct user
|
||||||
|
invocation.
|
||||||
|
|
||||||
|
## What's NOT in this repo
|
||||||
|
|
||||||
|
- The cycles 1-9 video kernels — those live in sibling
|
||||||
|
daedalus-fourier and are consumed via `include/daedalus.h`.
|
||||||
|
- The browser side (firefox-fourier / chromium-fourier) — those
|
||||||
|
are their own sibling projects.
|
||||||
|
- libva-v4l2-request-fourier — sibling, talks to our
|
||||||
|
`/dev/videoNN` via V4L2 ioctls.
|
||||||
|
|
||||||
|
## Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan)
|
||||||
|
|
||||||
|
1. **8.1**: kernel module skeleton — register /dev/videoNN with
|
||||||
|
stub ioctls, no decoding.
|
||||||
|
2. **8.2**: chardev bridge — kernel ↔ daemon round-trip with
|
||||||
|
dummy bitstream/dummy frame data.
|
||||||
|
3. **8.3**: daemon FFmpeg dlopen + parse path — pull per-frame
|
||||||
|
info from FFmpeg without decoding.
|
||||||
|
4. **8.4**: dispatch one codec end-to-end via daedalus-fourier
|
||||||
|
(VP9 first since it has the most QPU-deployed kernels).
|
||||||
|
5. **8.5**: dmabuf integration — first browser zero-copy frame.
|
||||||
|
6. **8.6**: AV1 + H.264 added.
|
||||||
|
7. **8.7**: performance tuning; 30fps@1080p target.
|
||||||
|
|
||||||
|
Per-phase effort: each is roughly a week of focused work.
|
||||||
@@ -0,0 +1,80 @@
|
|||||||
|
# daedalus-v4l2 — roadmap
|
||||||
|
|
||||||
|
## Sub-phases
|
||||||
|
|
||||||
|
### Phase 8.1 — kernel module skeleton
|
||||||
|
|
||||||
|
Out-of-tree kernel module that:
|
||||||
|
- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op
|
||||||
|
V4L2 stateless dispatch table.
|
||||||
|
- Accepts open/close, S_FMT, REQBUFS ioctls without doing
|
||||||
|
anything (yet).
|
||||||
|
- Builds against `/lib/modules/$(uname -r)/build`.
|
||||||
|
|
||||||
|
Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices`
|
||||||
|
shows the new device.
|
||||||
|
|
||||||
|
### Phase 8.2 — kernel ↔ daemon chardev bridge
|
||||||
|
|
||||||
|
- Kernel module creates `/dev/daedalus-v4l2` chardev.
|
||||||
|
- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`.
|
||||||
|
- Daemon connects, exchanges echo requests.
|
||||||
|
|
||||||
|
Deliverable: ping-pong test passes.
|
||||||
|
|
||||||
|
### Phase 8.3 — daemon FFmpeg dlopen + parse
|
||||||
|
|
||||||
|
- Daemon links `libdaedalus_core.a` from sibling.
|
||||||
|
- Daemon dlopens FFmpeg.
|
||||||
|
- Test program: feed a VP9 IVF file to FFmpeg parsers,
|
||||||
|
extract block-level metadata, validate against expected.
|
||||||
|
|
||||||
|
Deliverable: daemon can parse a VP9 frame and walk the
|
||||||
|
block-level info.
|
||||||
|
|
||||||
|
### Phase 8.4 — VP9 end-to-end via daedalus-fourier
|
||||||
|
|
||||||
|
- Wire daemon's per-block walker to `daedalus_dispatch_*` calls.
|
||||||
|
- Kernel module passes bitstream + controls to daemon over
|
||||||
|
chardev.
|
||||||
|
- Daemon decodes, writes pixels to a shared buffer, returns
|
||||||
|
result to kernel.
|
||||||
|
- Kernel returns via DQBUF.
|
||||||
|
|
||||||
|
Deliverable: `v4l2-ctl --stream-from=foo.ivf` produces
|
||||||
|
decoded frames (output via `--stream-to` PNG dump).
|
||||||
|
|
||||||
|
### Phase 8.5 — dmabuf / DRM PRIME
|
||||||
|
|
||||||
|
- Kernel module allocates dma-coherent buffers.
|
||||||
|
- Export via VIDIOC_EXPBUF.
|
||||||
|
- Daemon writes via mmap into kernel-allocated dmabuf.
|
||||||
|
- Test: `v4l2-ctl --capture-mmap-dmabuf` works.
|
||||||
|
|
||||||
|
Deliverable: dmabuf-fd is exportable; first browser-friendly
|
||||||
|
frame.
|
||||||
|
|
||||||
|
### Phase 8.6 — AV1 + H.264
|
||||||
|
|
||||||
|
- Add codec support for AV1 (using CDEF QPU helper) and
|
||||||
|
H.264 (using deblock QPU helper for the one cycle 8 path,
|
||||||
|
everything else CPU).
|
||||||
|
|
||||||
|
Deliverable: real AV1/H.264 clips decode end-to-end.
|
||||||
|
|
||||||
|
### Phase 8.7 — performance + 30fps@1080p
|
||||||
|
|
||||||
|
- Profile end-to-end pipeline.
|
||||||
|
- Eliminate copies where possible.
|
||||||
|
- Hit 30fps@1080p for daily YouTube videos
|
||||||
|
(the project's user-facing success criterion per
|
||||||
|
`30fps-floor-is-fine` memory).
|
||||||
|
|
||||||
|
Deliverable: 30fps stable on real content.
|
||||||
|
|
||||||
|
## Effort estimate
|
||||||
|
|
||||||
|
Each phase: ~1 week of focused work (~40 hours).
|
||||||
|
Total: 7 weeks for v1.
|
||||||
|
|
||||||
|
Could be split across multiple sessions / contributors.
|
||||||
@@ -0,0 +1,34 @@
|
|||||||
|
# kernel/ — daedalus-v4l2 Linux kernel module
|
||||||
|
|
||||||
|
Out-of-tree kernel module providing a V4L2 stateless decoder
|
||||||
|
device that forwards work to a userspace daemon.
|
||||||
|
|
||||||
|
## Status
|
||||||
|
|
||||||
|
Scaffold only. Phase 8.1 not yet started.
|
||||||
|
|
||||||
|
## Build (when implemented)
|
||||||
|
|
||||||
|
```sh
|
||||||
|
make -C /lib/modules/$(uname -r)/build M=$(pwd)
|
||||||
|
sudo insmod daedalus_v4l2.ko
|
||||||
|
v4l2-ctl --list-devices # confirm /dev/videoNN appears
|
||||||
|
```
|
||||||
|
|
||||||
|
## Layout (planned)
|
||||||
|
|
||||||
|
- `Makefile` — kbuild stub
|
||||||
|
- `daedalus_v4l2_main.c` — module init + V4L2 device registration
|
||||||
|
- `daedalus_v4l2_chardev.c` — `/dev/daedalus-v4l2` chardev for
|
||||||
|
daemon communication
|
||||||
|
- `daedalus_v4l2_v4l2.c` — V4L2 ioctl dispatch (stateless controls)
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
GPLv2. Required for kernel module symbol compatibility.
|
||||||
|
|
||||||
|
## Phase 8.1 starting point
|
||||||
|
|
||||||
|
Minimal example: register a /dev/videoNN that returns -ENOSYS on
|
||||||
|
every ioctl. Validates that the kernel build works and
|
||||||
|
v4l2-ctl can see the device.
|
||||||
Reference in New Issue
Block a user