From c7d8050cc97c4b3c0dce7702481ddf3ab7259a0a Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 14:54:56 +0000 Subject: [PATCH] Initial scaffold: daedalus-v4l2 sibling repo MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit V4L2 stateless decoder for Pi 5, backed by sibling daedalus-fourier kernel library (VP9 + AV1 CDEF + H.264 video decode kernels on VideoCore VII compute + ARM NEON). Architecture locked 2026-05-18 by mfritsche per daedalus-fourier/docs/phase8_scoping.md: - Option B: Linux kernel V4L2 shim + userspace daemon (not v4l2loopback). Real /dev/videoNN; proper DRM PRIME for browser zero-copy. - Option γ: dlopen FFmpeg at runtime as parser. No vendoring; fastest to v1. - Sibling repo (this repo): V4L2-side work outside of daedalus-fourier so kernel-library API stays clean. Components: kernel/ - Linux out-of-tree kernel module (GPLv2; V4L2 device + chardev bridge to userspace daemon) daemon/ - userspace decoder daemon (BSD-2-Clause; links libdaedalus_core.a from sibling; dlopens FFmpeg) docs/ - architecture + 7-phase roadmap (8.1..8.7) include/ - shared headers between kernel and daemon Roadmap (7 sub-phases, ~1 week each): 8.1 kernel skeleton (/dev/videoNN with no-op ioctls) 8.2 chardev bridge (kernel ↔ daemon ping-pong) 8.3 daemon FFmpeg dlopen + parse path 8.4 VP9 end-to-end via daedalus_dispatch_* 8.5 dmabuf / DRM PRIME for zero-copy 8.6 AV1 + H.264 codec support 8.7 performance: hit 30fps@1080p (project floor) No code yet — only README + design docs + directory structure. First implementation work starts in Phase 8.1 next session. Co-Authored-By: Claude Opus 4.7 (1M context) --- .gitignore | 11 +++++ README.md | 83 +++++++++++++++++++++++++++++++ daemon/README.md | 53 ++++++++++++++++++++ docs/architecture.md | 115 +++++++++++++++++++++++++++++++++++++++++++ docs/roadmap.md | 80 ++++++++++++++++++++++++++++++ kernel/README.md | 34 +++++++++++++ 6 files changed, 376 insertions(+) create mode 100644 .gitignore create mode 100644 README.md create mode 100644 daemon/README.md create mode 100644 docs/architecture.md create mode 100644 docs/roadmap.md create mode 100644 kernel/README.md diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..f28a329 --- /dev/null +++ b/.gitignore @@ -0,0 +1,11 @@ +build/ +*.o +*.ko +*.mod +*.mod.c +*.cmd +*.symvers +*.order +.tmp_versions/ +Module.symvers +modules.order diff --git a/README.md b/README.md new file mode 100644 index 0000000..664f7d4 --- /dev/null +++ b/README.md @@ -0,0 +1,83 @@ +# daedalus-v4l2 + +V4L2 stateless decoder for the Raspberry Pi 5 / CM5, backed by the +`daedalus-fourier` kernel library (VP9 + AV1 CDEF + H.264 video +decode kernels on VideoCore VII compute + ARM NEON). + +**Status:** scaffold (2026-05-18). Architecture locked per +[daedalus-fourier session memory](https://git.reauktion.de/marfrit/daedalus-fourier); +implementation not yet begun. + +## What this is + +A two-piece userspace + kernel-module stack that exposes a V4L2 +stateless decoder interface (`/dev/videoNN`) so that +`libva-v4l2-request-fourier` → `firefox-fourier` / +`chromium-fourier` can drive it the same way they drive existing +hardware-decode pipelines on Pi 5 / RK3588. + +``` ++-----------------------------------------------------------+ +| firefox-fourier / chromium-fourier (existing) | ++-----------------------------------------------------------+ +| VA-API | ++-----------------------------------------------------------+ +| libva-v4l2-request-fourier (existing, sibling project) | ++-----------------------------------------------------------+ +| V4L2 stateless ioctl uAPI | ++-----------------------------------------------------------+ +| daedalus-v4l2 kernel module (`kernel/`) | +| - registers /dev/videoNN | +| - parses V4L2 stateless ioctls (VP9/AV1/H.264 controls) | +| - forwards bitstream + controls to userspace daemon | +| via chardev or netlink | ++-----------------------------------------------------------+ +| daedalus-v4l2 userspace daemon (`daemon/`) | +| - takes bitstream blobs + per-slice controls | +| - drives FFmpeg parsers via dlopen (Option γ) | +| - dispatches per-block ops via daedalus-fourier | +| public API (daedalus_dispatch_*) | +| - posts decoded frames back to kernel module | ++-----------------------------------------------------------+ +| daedalus-fourier kernel library (sibling project) | +| - exports include/daedalus.h public API | +| - per-kernel CPU NEON + opportunistic V3D QPU dispatch | +| - 9 closed cycles across VP9, AV1 CDEF, H.264 | ++-----------------------------------------------------------+ +| V3D 7.1 (Mesa userspace v3dv) + ARM NEON (BCM2712) | ++-----------------------------------------------------------+ +``` + +## Why this architecture (Option B + γ + sibling) + +Locked by user 2026-05-18 from 3 options in +`daedalus-fourier/docs/phase8_scoping.md`: + +- **Option B** over A (userspace v4l2loopback): real `/dev/videoNN`, + proper DRM PRIME / dmabuf for browser zero-copy. +- **Option γ**: dlopen FFmpeg as parser at runtime. No vendoring, + fastest to v1. +- **Sibling repo**: per `project_consumer_target` convention, + V4L2-side work lives outside daedalus-fourier so the + kernel-library has a clean API boundary. + +## Status + +Initial scaffold only. See `docs/architecture.md` for the +deeper design and `docs/roadmap.md` for the +sub-phase breakdown. + +## Repo layout + +- `kernel/` — Linux kernel module (V4L2 device registration + + ioctl handling + userspace chardev bridge). Out-of-tree. +- `daemon/` — userspace decoder daemon (links + `libdaedalus_core.a` from sibling daedalus-fourier; uses + dlopen for FFmpeg parser). +- `include/` — shared headers between kernel and daemon. +- `docs/` — architecture + roadmap. + +## License + +Kernel module: GPLv2 (required for kernel-tree compatibility). +Userspace daemon: BSD-2-Clause (matches daedalus-fourier). diff --git a/daemon/README.md b/daemon/README.md new file mode 100644 index 0000000..158cc08 --- /dev/null +++ b/daemon/README.md @@ -0,0 +1,53 @@ +# daemon/ — daedalus-v4l2 userspace decoder daemon + +Userspace daemon that: +1. Connects to the kernel module's chardev +2. Receives bitstream + V4L2 control blobs +3. Parses bitstream via dlopen'd FFmpeg +4. Dispatches per-block work via `daedalus_dispatch_*` + from sibling `daedalus-fourier` +5. Returns decoded frames to kernel + +## Status + +Scaffold only. Phase 8.3 not yet started. + +## Build dependencies (planned) + +- libdaedalus_core.a from sibling daedalus-fourier (static link) +- FFmpeg dev headers (for AVPacket/AVCodec interface types) + + runtime FFmpeg .so (loaded via dlopen) +- libv4l2 (for V4L2 control struct definitions) +- pthread + +## Build (when implemented) + +```sh +mkdir build && cd build +cmake .. -DDAEDALUS_FOURIER_DIR=/path/to/daedalus-fourier +make +``` + +## Layout (planned) + +- `CMakeLists.txt` +- `src/main.c` — event loop, chardev connection +- `src/parser.c` — FFmpeg dlopen wrapper + per-codec dispatch +- `src/decode_vp9.c`, `src/decode_av1.c`, `src/decode_h264.c` — + per-codec block walkers +- `src/frame_io.c` — frame allocation, return to kernel + +## License + +BSD-2-Clause (matches daedalus-fourier sibling). + +## Phase 8.3 starting point + +A standalone program that: +1. Opens a .ivf or .mp4 +2. Pulls codec packets via dlopen'd avformat +3. Calls dlopen'd avcodec to parse (without decoding) +4. Walks the block-level metadata +5. Validates output structure + +No kernel involvement yet — just confirm the parse path works. diff --git a/docs/architecture.md b/docs/architecture.md new file mode 100644 index 0000000..283ef31 --- /dev/null +++ b/docs/architecture.md @@ -0,0 +1,115 @@ +# daedalus-v4l2 — architecture + +## Components + +### 1. Kernel module (`kernel/`) + +A Linux V4L2 stateless decoder driver. Registers as +`/dev/videoNN` with `VFL_TYPE_VIDEO` and supports VP9, AV1, +and H.264 stateless decoder controls (matching the existing +V4L2 stateless uAPI used by libva-v4l2-request-fourier). + +Internally it does NOT decode bitstream. It: +1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with + STATELESS controls, VIDIOC_QBUF, etc.) +2. Marshals the bitstream and per-frame/per-slice control + structs onto a chardev (or netlink) channel. +3. Pulls decoded frames back from userspace daemon. +4. Returns them via VIDIOC_DQBUF. + +Out-of-tree kernel module (built with `make` against the +running kernel's headers; loaded via `insmod`). + +### 2. Userspace daemon (`daemon/`) + +A long-running daemon that: +1. Connects to the kernel module's chardev. +2. Pulls bitstream + control blobs. +3. Drives FFmpeg parsers via `dlopen` to get per-block + metadata (block positions, MVs, coefficients, tc0 + arrays, etc.). +4. Calls `daedalus_dispatch_*` from `libdaedalus_core.a` + (sibling repo) to do the actual per-block work on + NEON / V3D. +5. Posts decoded frames back to the kernel module via + the chardev. + +Architecture: single-threaded event loop initially; per-stream +worker threads later if needed. + +### 3. Bitstream parser layer (Option γ — runtime dlopen) + +Instead of vendoring FFmpeg's parsers, the daemon loads the +system FFmpeg at runtime. Two integration patterns: + +- **a. AVCodec/AVPacket through libavcodec**: feed packets to + `avcodec_send_packet`, intercept the parse-only stage and + pull out block metadata before the actual decode runs. +- **b. Custom parser via libavcodec internal APIs**: messier + but avoids running FFmpeg's full decode path. + +Plan: try (a) first. If FFmpeg's internal API doesn't expose +the per-block info we need, fall back to (b) or vendor a +minimal parser per codec. + +## Communication: kernel ↔ daemon + +Initial plan: single chardev `/dev/daedalus-v4l2` with a +simple request/response protocol: + +``` +REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob } +RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format } +``` + +Alternative: netlink socket (more standard for kernel-userspace +IPC, but more boilerplate). Chardev is simpler for v1. + +## Memory: DRM PRIME / dmabuf + +For browser zero-copy, the kernel module needs to register the +decoded frame buffers as dmabuf handles that V4L2 hands out via +PRIME export. The daedalus-fourier kernel library writes pixels +to CPU-mapped memory; the kernel module manages the +DMA-coherent allocation and PRIME export. + +Two strategies: +- **Strategy A**: kernel module allocates dmabuf, mmaps it into + daemon via the chardev, daemon writes pixels there. +- **Strategy B**: daemon allocates via libdrm, transfers + dmabuf-fd to kernel via chardev, kernel exposes via PRIME. + +Strategy A is simpler; B is more flexible. Start with A. + +## Build & deploy + +- Kernel module: `cd kernel && make` (out-of-tree against running + kernel headers; `/lib/modules/$(uname -r)/build` path). +- Daemon: CMake, depends on installed `libdaedalus_core.a` from + sibling repo. Run as systemd service or under direct user + invocation. + +## What's NOT in this repo + +- The cycles 1-9 video kernels — those live in sibling + daedalus-fourier and are consumed via `include/daedalus.h`. +- The browser side (firefox-fourier / chromium-fourier) — those + are their own sibling projects. +- libva-v4l2-request-fourier — sibling, talks to our + `/dev/videoNN` via V4L2 ioctls. + +## Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan) + +1. **8.1**: kernel module skeleton — register /dev/videoNN with + stub ioctls, no decoding. +2. **8.2**: chardev bridge — kernel ↔ daemon round-trip with + dummy bitstream/dummy frame data. +3. **8.3**: daemon FFmpeg dlopen + parse path — pull per-frame + info from FFmpeg without decoding. +4. **8.4**: dispatch one codec end-to-end via daedalus-fourier + (VP9 first since it has the most QPU-deployed kernels). +5. **8.5**: dmabuf integration — first browser zero-copy frame. +6. **8.6**: AV1 + H.264 added. +7. **8.7**: performance tuning; 30fps@1080p target. + +Per-phase effort: each is roughly a week of focused work. diff --git a/docs/roadmap.md b/docs/roadmap.md new file mode 100644 index 0000000..24ccd2e --- /dev/null +++ b/docs/roadmap.md @@ -0,0 +1,80 @@ +# daedalus-v4l2 — roadmap + +## Sub-phases + +### Phase 8.1 — kernel module skeleton + +Out-of-tree kernel module that: +- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op + V4L2 stateless dispatch table. +- Accepts open/close, S_FMT, REQBUFS ioctls without doing + anything (yet). +- Builds against `/lib/modules/$(uname -r)/build`. + +Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices` +shows the new device. + +### Phase 8.2 — kernel ↔ daemon chardev bridge + +- Kernel module creates `/dev/daedalus-v4l2` chardev. +- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`. +- Daemon connects, exchanges echo requests. + +Deliverable: ping-pong test passes. + +### Phase 8.3 — daemon FFmpeg dlopen + parse + +- Daemon links `libdaedalus_core.a` from sibling. +- Daemon dlopens FFmpeg. +- Test program: feed a VP9 IVF file to FFmpeg parsers, + extract block-level metadata, validate against expected. + +Deliverable: daemon can parse a VP9 frame and walk the +block-level info. + +### Phase 8.4 — VP9 end-to-end via daedalus-fourier + +- Wire daemon's per-block walker to `daedalus_dispatch_*` calls. +- Kernel module passes bitstream + controls to daemon over + chardev. +- Daemon decodes, writes pixels to a shared buffer, returns + result to kernel. +- Kernel returns via DQBUF. + +Deliverable: `v4l2-ctl --stream-from=foo.ivf` produces +decoded frames (output via `--stream-to` PNG dump). + +### Phase 8.5 — dmabuf / DRM PRIME + +- Kernel module allocates dma-coherent buffers. +- Export via VIDIOC_EXPBUF. +- Daemon writes via mmap into kernel-allocated dmabuf. +- Test: `v4l2-ctl --capture-mmap-dmabuf` works. + +Deliverable: dmabuf-fd is exportable; first browser-friendly +frame. + +### Phase 8.6 — AV1 + H.264 + +- Add codec support for AV1 (using CDEF QPU helper) and + H.264 (using deblock QPU helper for the one cycle 8 path, + everything else CPU). + +Deliverable: real AV1/H.264 clips decode end-to-end. + +### Phase 8.7 — performance + 30fps@1080p + +- Profile end-to-end pipeline. +- Eliminate copies where possible. +- Hit 30fps@1080p for daily YouTube videos + (the project's user-facing success criterion per + `30fps-floor-is-fine` memory). + +Deliverable: 30fps stable on real content. + +## Effort estimate + +Each phase: ~1 week of focused work (~40 hours). +Total: 7 weeks for v1. + +Could be split across multiple sessions / contributors. diff --git a/kernel/README.md b/kernel/README.md new file mode 100644 index 0000000..64d0fda --- /dev/null +++ b/kernel/README.md @@ -0,0 +1,34 @@ +# kernel/ — daedalus-v4l2 Linux kernel module + +Out-of-tree kernel module providing a V4L2 stateless decoder +device that forwards work to a userspace daemon. + +## Status + +Scaffold only. Phase 8.1 not yet started. + +## Build (when implemented) + +```sh +make -C /lib/modules/$(uname -r)/build M=$(pwd) +sudo insmod daedalus_v4l2.ko +v4l2-ctl --list-devices # confirm /dev/videoNN appears +``` + +## Layout (planned) + +- `Makefile` — kbuild stub +- `daedalus_v4l2_main.c` — module init + V4L2 device registration +- `daedalus_v4l2_chardev.c` — `/dev/daedalus-v4l2` chardev for + daemon communication +- `daedalus_v4l2_v4l2.c` — V4L2 ioctl dispatch (stateless controls) + +## License + +GPLv2. Required for kernel module symbol compatibility. + +## Phase 8.1 starting point + +Minimal example: register a /dev/videoNN that returns -ENOSYS on +every ioctl. Validates that the kernel build works and +v4l2-ctl can see the device.