Files
marfrit c7d8050cc9 Initial scaffold: daedalus-v4l2 sibling repo
V4L2 stateless decoder for Pi 5, backed by sibling
daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video
decode kernels on VideoCore VII compute + ARM NEON).

Architecture locked 2026-05-18 by mfritsche per
daedalus-fourier/docs/phase8_scoping.md:
- Option B: Linux kernel V4L2 shim + userspace daemon (not
  v4l2loopback). Real /dev/videoNN; proper DRM PRIME for
  browser zero-copy.
- Option γ: dlopen FFmpeg at runtime as parser. No vendoring;
  fastest to v1.
- Sibling repo (this repo): V4L2-side work outside of
  daedalus-fourier so kernel-library API stays clean.

Components:
  kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2
    device + chardev bridge to userspace daemon)
  daemon/ - userspace decoder daemon (BSD-2-Clause; links
    libdaedalus_core.a from sibling; dlopens FFmpeg)
  docs/   - architecture + 7-phase roadmap (8.1..8.7)
  include/ - shared headers between kernel and daemon

Roadmap (7 sub-phases, ~1 week each):
  8.1 kernel skeleton (/dev/videoNN with no-op ioctls)
  8.2 chardev bridge (kernel ↔ daemon ping-pong)
  8.3 daemon FFmpeg dlopen + parse path
  8.4 VP9 end-to-end via daedalus_dispatch_*
  8.5 dmabuf / DRM PRIME for zero-copy
  8.6 AV1 + H.264 codec support
  8.7 performance: hit 30fps@1080p (project floor)

No code yet — only README + design docs + directory structure.
First implementation work starts in Phase 8.1 next session.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:54:56 +00:00

116 lines
4.2 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# daedalus-v4l2 — architecture
## Components
### 1. Kernel module (`kernel/`)
A Linux V4L2 stateless decoder driver. Registers as
`/dev/videoNN` with `VFL_TYPE_VIDEO` and supports VP9, AV1,
and H.264 stateless decoder controls (matching the existing
V4L2 stateless uAPI used by libva-v4l2-request-fourier).
Internally it does NOT decode bitstream. It:
1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with
STATELESS controls, VIDIOC_QBUF, etc.)
2. Marshals the bitstream and per-frame/per-slice control
structs onto a chardev (or netlink) channel.
3. Pulls decoded frames back from userspace daemon.
4. Returns them via VIDIOC_DQBUF.
Out-of-tree kernel module (built with `make` against the
running kernel's headers; loaded via `insmod`).
### 2. Userspace daemon (`daemon/`)
A long-running daemon that:
1. Connects to the kernel module's chardev.
2. Pulls bitstream + control blobs.
3. Drives FFmpeg parsers via `dlopen` to get per-block
metadata (block positions, MVs, coefficients, tc0
arrays, etc.).
4. Calls `daedalus_dispatch_*` from `libdaedalus_core.a`
(sibling repo) to do the actual per-block work on
NEON / V3D.
5. Posts decoded frames back to the kernel module via
the chardev.
Architecture: single-threaded event loop initially; per-stream
worker threads later if needed.
### 3. Bitstream parser layer (Option γ — runtime dlopen)
Instead of vendoring FFmpeg's parsers, the daemon loads the
system FFmpeg at runtime. Two integration patterns:
- **a. AVCodec/AVPacket through libavcodec**: feed packets to
`avcodec_send_packet`, intercept the parse-only stage and
pull out block metadata before the actual decode runs.
- **b. Custom parser via libavcodec internal APIs**: messier
but avoids running FFmpeg's full decode path.
Plan: try (a) first. If FFmpeg's internal API doesn't expose
the per-block info we need, fall back to (b) or vendor a
minimal parser per codec.
## Communication: kernel ↔ daemon
Initial plan: single chardev `/dev/daedalus-v4l2` with a
simple request/response protocol:
```
REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob }
RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format }
```
Alternative: netlink socket (more standard for kernel-userspace
IPC, but more boilerplate). Chardev is simpler for v1.
## Memory: DRM PRIME / dmabuf
For browser zero-copy, the kernel module needs to register the
decoded frame buffers as dmabuf handles that V4L2 hands out via
PRIME export. The daedalus-fourier kernel library writes pixels
to CPU-mapped memory; the kernel module manages the
DMA-coherent allocation and PRIME export.
Two strategies:
- **Strategy A**: kernel module allocates dmabuf, mmaps it into
daemon via the chardev, daemon writes pixels there.
- **Strategy B**: daemon allocates via libdrm, transfers
dmabuf-fd to kernel via chardev, kernel exposes via PRIME.
Strategy A is simpler; B is more flexible. Start with A.
## Build & deploy
- Kernel module: `cd kernel && make` (out-of-tree against running
kernel headers; `/lib/modules/$(uname -r)/build` path).
- Daemon: CMake, depends on installed `libdaedalus_core.a` from
sibling repo. Run as systemd service or under direct user
invocation.
## What's NOT in this repo
- The cycles 1-9 video kernels — those live in sibling
daedalus-fourier and are consumed via `include/daedalus.h`.
- The browser side (firefox-fourier / chromium-fourier) — those
are their own sibling projects.
- libva-v4l2-request-fourier — sibling, talks to our
`/dev/videoNN` via V4L2 ioctls.
## Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan)
1. **8.1**: kernel module skeleton — register /dev/videoNN with
stub ioctls, no decoding.
2. **8.2**: chardev bridge — kernel ↔ daemon round-trip with
dummy bitstream/dummy frame data.
3. **8.3**: daemon FFmpeg dlopen + parse path — pull per-frame
info from FFmpeg without decoding.
4. **8.4**: dispatch one codec end-to-end via daedalus-fourier
(VP9 first since it has the most QPU-deployed kernels).
5. **8.5**: dmabuf integration — first browser zero-copy frame.
6. **8.6**: AV1 + H.264 added.
7. **8.7**: performance tuning; 30fps@1080p target.
Per-phase effort: each is roughly a week of focused work.