--- phase: 8 status: scoping (architecture options + tractable-first-step picked) date_opened: 2026-05-18 prereqs: cycles 1-5 closed (IDCT, LPF wd=4, MC, LPF wd=8, CDEF) consumer_target: libva-v4l2-request-fourier → firefox/chromium-fourier --- # Phase 8 — V4L2 deployment scoping ## What Phase 8 is The "deliver the work" phase. Cycles 1-5 produced 5 individually- measured per-block kernels (3 deployed on QPU, 2 on CPU per the deployment recipe). Phase 8 makes those kernels add up to a decoded video at the user's display. Per `project_consumer_target.md`, the integration target is **libva-v4l2-request-fourier**: a V4L2 stateless decoder node exposing a VP9 (later AV1) contract, bridged via VA-API to browser-fourier builds. Same plumbing mfritsche already runs for HEVC/RK3588, different decoder backend. ## Architecture stack ``` +-------------------------------------------------------+ | firefox-fourier / chromium-fourier (already builds) | +-------------------------------------------------------+ | VA-API | +-------------------------------------------------------+ | libva-v4l2-request-fourier (already runs for HEVC) | +-------------------------------------------------------+ | V4L2 stateless ioctl interface (kernel uAPI) | +-------------------------------------------------------+ | daedalus-fourier V4L2 shim (NEW — Phase 8 work) | | ↳ Parses bitstream control structs (V4L2_CID_STATELESS_VP9_*) | ↳ Drives per-superblock decode loop | ↳ Dispatches per-kernel to CPU NEON or V3D QPU (recipe) +-------------------------------------------------------+ | daedalus-fourier core library (NEW Phase 8 — wraps | | ↳ kernels from cycles 1-5) | +-------------------------------------------------------+ | V3D 7.1 Mesa userspace + ARM NEON | +-------------------------------------------------------+ ``` ## Three architecture options ### Option A — Userspace V4L2 emulation (recommended for v1) Implement a userspace `videodev2`-compatible loopback device (via `v4l2loopback` or a custom UIO-style approach) that exposes `/dev/videoNN` with the VP9 stateless contract. libva-v4l2- request-fourier talks to this normally. **Pros**: stays entirely in userspace; no kernel module work; can iterate quickly; isolation from kernel crash domain. The daedalus-fourier daemon runs as a regular Linux process, taking V4L2 ioctls (via the loopback shim) and emitting decoded frames. **Cons**: v4l2loopback is loosely maintained; userspace V4L2 has some semantic quirks (DRM/PRIME buffer sharing is harder than in a real kernel driver). ### Option B — Tiny kernel V4L2 shim A small kernel module that registers as a V4L2 device, takes the ioctls, and forwards bitstream blobs + control structs to a userspace daemon (the actual decoder) over a UNIX socket or character-device chardev. Daemon decodes and posts frames back. **Pros**: a real `/dev/videoNN` with proper VFL_TYPE_VIDEO semantics. DRM PRIME buffer sharing works correctly. **Cons**: kernel module work. Cross-process buffer marshaling adds latency. Out-of-tree maintenance burden. ### Option C — Direct libva integration (not recommended) Skip V4L2 entirely; implement a libva backend module directly. **Pros**: avoids the V4L2 wrapper layer entirely. **Cons**: contradicts `project_consumer_target.md` (decision to use V4L2 path locked in). libva backend maintenance burden is roughly equivalent to V4L2 shim with no portability gain. **Pick A** for v1; revisit if userspace V4L2 semantics block DRM PRIME / dmabuf for browser zero-copy. ## What's tractable this session Phase 8 in full is **days of work** (V4L2 ioctl glue, bitstream parser, superblock loop, frame buffer management, dmabuf handling, end-to-end test against a real VP9 clip). Out of scope for a single session continuation. What IS tractable now: 1. **Public C API header** (`include/daedalus.h`): declare the library's stable function surface for the 5 kernels + substrate selection + init/teardown. Future Phase 8 V4L2 shim consumes this header. This: - Locks the API shape so V4L2 work doesn't need to plumb through internal types. - Documents which kernels deploy where (recipe encoded in API). - Forces a clean separation between "kernel work" (cycles 1-5) and "decoder pipeline" (Phase 8). 2. **A minimal core library** (`src/daedalus_core.{h,c}`): skeleton that compiles, has the right typedefs and dispatch tables, but body of each function is `assert(0 && "TODO")`. Builds against existing kernel implementations. 3. **One integration test** (`tests/test_idct_through_api.c`): exercise the public API for ONE kernel end-to-end. Proves the API can connect to existing benches. This commit gives the integration target something concrete to hook into without prejudging V4L2 architecture (A/B/C). ## Out of scope for this session - v4l2loopback setup (Option A specifics). - VP9 bitstream parser (huge — borrow from FFmpeg / VP9 reference). - Superblock-level decode loop. - Frame buffer / dmabuf integration. - libva-v4l2-request-fourier modifications (separate sibling repo). These are tracked as future phases / issues. ## Acceptance for this Phase 8 scoping deliverable - `include/daedalus.h` exists and is documented. - `src/daedalus_core.{h,c}` skeleton compiles + links into the existing CMake build. - One pass-through test (`test_idct_through_api`) runs and exercises the public API path for at least one kernel, producing the same M1 bit-exact result the cycle 1 bench did. - Recipe table (which kernel runs where) is documented in the header and the docs/k* phase7 docs cross-reference it.