daedalus-fourier/docs/phase8_scoping.md

---
phase: 8
status: scoping (architecture options + tractable-first-step picked)
date_opened: 2026-05-18
prereqs: cycles 1-5 closed (IDCT, LPF wd=4, MC, LPF wd=8, CDEF)
consumer_target: libva-v4l2-request-fourier → firefox/chromium-fourier
---

# Phase 8 — V4L2 deployment scoping

## What Phase 8 is

The "deliver the work" phase. Cycles 1-5 produced 5 individually-
measured per-block kernels (3 deployed on QPU, 2 on CPU per the
deployment recipe). Phase 8 makes those kernels add up to a
decoded video at the user's display.

Per `project_consumer_target.md`, the integration target is
**libva-v4l2-request-fourier**: a V4L2 stateless decoder node
exposing a VP9 (later AV1) contract, bridged via VA-API to
browser-fourier builds. Same plumbing mfritsche already runs for
HEVC/RK3588, different decoder backend.

## Architecture stack

```
+-------------------------------------------------------+
| firefox-fourier / chromium-fourier  (already builds)  |
+-------------------------------------------------------+
| VA-API                                                |
+-------------------------------------------------------+
| libva-v4l2-request-fourier  (already runs for HEVC)   |
+-------------------------------------------------------+
| V4L2 stateless ioctl interface  (kernel uAPI)         |
+-------------------------------------------------------+
| daedalus-fourier V4L2 shim  (NEW — Phase 8 work)      |
| ↳ Parses bitstream control structs (V4L2_CID_STATELESS_VP9_*)
| ↳ Drives per-superblock decode loop
| ↳ Dispatches per-kernel to CPU NEON or V3D QPU (recipe)
+-------------------------------------------------------+
| daedalus-fourier core library  (NEW Phase 8 — wraps   |
| ↳ kernels from cycles 1-5)                            |
+-------------------------------------------------------+
| V3D 7.1 Mesa userspace + ARM NEON                     |
+-------------------------------------------------------+
```

## Three architecture options

### Option A — Userspace V4L2 emulation (recommended for v1)

Implement a userspace `videodev2`-compatible loopback device
(via `v4l2loopback` or a custom UIO-style approach) that exposes
`/dev/videoNN` with the VP9 stateless contract. libva-v4l2-
request-fourier talks to this normally.

**Pros**: stays entirely in userspace; no kernel module work; can
iterate quickly; isolation from kernel crash domain. The
daedalus-fourier daemon runs as a regular Linux process, taking
V4L2 ioctls (via the loopback shim) and emitting decoded frames.

**Cons**: v4l2loopback is loosely maintained; userspace V4L2 has
some semantic quirks (DRM/PRIME buffer sharing is harder than in
a real kernel driver).

### Option B — Tiny kernel V4L2 shim

A small kernel module that registers as a V4L2 device, takes the
ioctls, and forwards bitstream blobs + control structs to a
userspace daemon (the actual decoder) over a UNIX socket or
character-device chardev. Daemon decodes and posts frames back.

**Pros**: a real `/dev/videoNN` with proper VFL_TYPE_VIDEO
semantics. DRM PRIME buffer sharing works correctly.

**Cons**: kernel module work. Cross-process buffer marshaling
adds latency. Out-of-tree maintenance burden.

### Option C — Direct libva integration (not recommended)

Skip V4L2 entirely; implement a libva backend module directly.

**Pros**: avoids the V4L2 wrapper layer entirely.

**Cons**: contradicts `project_consumer_target.md` (decision to
use V4L2 path locked in). libva backend maintenance burden is
roughly equivalent to V4L2 shim with no portability gain.

**Pick A** for v1; revisit if userspace V4L2 semantics block
DRM PRIME / dmabuf for browser zero-copy.

## What's tractable this session

Phase 8 in full is **days of work** (V4L2 ioctl glue, bitstream
parser, superblock loop, frame buffer management, dmabuf handling,
end-to-end test against a real VP9 clip). Out of scope for a
single session continuation.

What IS tractable now:

1. **Public C API header** (`include/daedalus.h`): declare the
   library's stable function surface for the 5 kernels +
   substrate selection + init/teardown. Future Phase 8 V4L2 shim
   consumes this header. This:
   - Locks the API shape so V4L2 work doesn't need to plumb
     through internal types.
   - Documents which kernels deploy where (recipe encoded in API).
   - Forces a clean separation between "kernel work" (cycles 1-5)
     and "decoder pipeline" (Phase 8).

2. **A minimal core library** (`src/daedalus_core.{h,c}`):
   skeleton that compiles, has the right typedefs and dispatch
   tables, but body of each function is `assert(0 && "TODO")`.
   Builds against existing kernel implementations.

3. **One integration test** (`tests/test_idct_through_api.c`):
   exercise the public API for ONE kernel end-to-end. Proves the
   API can connect to existing benches.

This commit gives the integration target something concrete to
hook into without prejudging V4L2 architecture (A/B/C).

## Out of scope for this session

- v4l2loopback setup (Option A specifics).
- VP9 bitstream parser (huge — borrow from FFmpeg / VP9 reference).
- Superblock-level decode loop.
- Frame buffer / dmabuf integration.
- libva-v4l2-request-fourier modifications (separate sibling repo).

These are tracked as future phases / issues.

## Acceptance for this Phase 8 scoping deliverable

- `include/daedalus.h` exists and is documented.
- `src/daedalus_core.{h,c}` skeleton compiles + links into the
  existing CMake build.
- One pass-through test (`test_idct_through_api`) runs and
  exercises the public API path for at least one kernel,
  producing the same M1 bit-exact result the cycle 1 bench did.
- Recipe table (which kernel runs where) is documented in the
  header and the docs/k* phase7 docs cross-reference it.