# daedalus-v4l2 — architecture ## Components ### 1. Kernel module (`kernel/`) A Linux V4L2 stateless decoder driver. Registers as `/dev/videoNN` with `VFL_TYPE_VIDEO` and supports VP9, AV1, and H.264 stateless decoder controls (matching the existing V4L2 stateless uAPI used by libva-v4l2-request-fourier). Internally it does NOT decode bitstream. It: 1. Accepts V4L2 ioctls (VIDIOC_S_FMT, VIDIOC_S_CTRL with STATELESS controls, VIDIOC_QBUF, etc.) 2. Marshals the bitstream and per-frame/per-slice control structs onto a chardev (or netlink) channel. 3. Pulls decoded frames back from userspace daemon. 4. Returns them via VIDIOC_DQBUF. Out-of-tree kernel module (built with `make` against the running kernel's headers; loaded via `insmod`). ### 2. Userspace daemon (`daemon/`) A long-running daemon that: 1. Connects to the kernel module's chardev. 2. Pulls bitstream + control blobs. 3. Drives FFmpeg parsers via `dlopen` to get per-block metadata (block positions, MVs, coefficients, tc0 arrays, etc.). 4. Calls `daedalus_dispatch_*` from `libdaedalus_core.a` (sibling repo) to do the actual per-block work on NEON / V3D. 5. Posts decoded frames back to the kernel module via the chardev. Architecture: single-threaded event loop initially; per-stream worker threads later if needed. ### 3. Bitstream parser layer (Option γ — runtime dlopen) Instead of vendoring FFmpeg's parsers, the daemon loads the system FFmpeg at runtime. Two integration patterns: - **a. AVCodec/AVPacket through libavcodec**: feed packets to `avcodec_send_packet`, intercept the parse-only stage and pull out block metadata before the actual decode runs. - **b. Custom parser via libavcodec internal APIs**: messier but avoids running FFmpeg's full decode path. Plan: try (a) first. If FFmpeg's internal API doesn't expose the per-block info we need, fall back to (b) or vendor a minimal parser per codec. ## Communication: kernel ↔ daemon Initial plan: single chardev `/dev/daedalus-v4l2` with a simple request/response protocol: ``` REQ_DECODE { stream_id, frame_idx, codec, controls[], bitstream_blob } RESP_FRAME { stream_id, frame_idx, dma_buf_fd, w, h, format } ``` Alternative: netlink socket (more standard for kernel-userspace IPC, but more boilerplate). Chardev is simpler for v1. ## Memory: DRM PRIME / dmabuf For browser zero-copy, the kernel module needs to register the decoded frame buffers as dmabuf handles that V4L2 hands out via PRIME export. The daedalus-fourier kernel library writes pixels to CPU-mapped memory; the kernel module manages the DMA-coherent allocation and PRIME export. Two strategies: - **Strategy A**: kernel module allocates dmabuf, mmaps it into daemon via the chardev, daemon writes pixels there. - **Strategy B**: daemon allocates via libdrm, transfers dmabuf-fd to kernel via chardev, kernel exposes via PRIME. Strategy A is simpler; B is more flexible. Start with A. ## Build & deploy - Kernel module: `cd kernel && make` (out-of-tree against running kernel headers; `/lib/modules/$(uname -r)/build` path). - Daemon: CMake, depends on installed `libdaedalus_core.a` from sibling repo. Run as systemd service or under direct user invocation. ## What's NOT in this repo - The cycles 1-9 video kernels — those live in sibling daedalus-fourier and are consumed via `include/daedalus.h`. - The browser side (firefox-fourier / chromium-fourier) — those are their own sibling projects. - libva-v4l2-request-fourier — sibling, talks to our `/dev/videoNN` via V4L2 ioctls. ## Sub-phases (roadmap excerpt; see docs/roadmap.md for the full plan) 1. **8.1**: kernel module skeleton — register /dev/videoNN with stub ioctls, no decoding. 2. **8.2**: chardev bridge — kernel ↔ daemon round-trip with dummy bitstream/dummy frame data. 3. **8.3**: daemon FFmpeg dlopen + parse path — pull per-frame info from FFmpeg without decoding. 4. **8.4**: dispatch one codec end-to-end via daedalus-fourier (VP9 first since it has the most QPU-deployed kernels). 5. **8.5**: dmabuf integration — first browser zero-copy frame. 6. **8.6**: AV1 + H.264 added. 7. **8.7**: performance tuning; 30fps@1080p target. Per-phase effort: each is roughly a week of focused work.