daedalus-v4l2/docs/roadmap.md

# daedalus-v4l2 — roadmap

## Sub-phases

### Phase 8.1 — kernel module skeleton

Out-of-tree kernel module that:
- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op
  V4L2 stateless dispatch table.
- Accepts open/close, S_FMT, REQBUFS ioctls without doing
  anything (yet).
- Builds against `/lib/modules/$(uname -r)/build`.

Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices`
shows the new device.

### Phase 8.2 — kernel ↔ daemon chardev bridge

- Kernel module creates `/dev/daedalus-v4l2` chardev.
- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`.
- Daemon connects, exchanges echo requests.

Deliverable: ping-pong test passes.

### Phase 8.3 — daemon FFmpeg dlopen + parse

- Daemon links `libdaedalus_core.a` from sibling.
- Daemon dlopens FFmpeg.
- Test program: feed a VP9 IVF file to FFmpeg parsers,
  extract block-level metadata, validate against expected.

Deliverable: daemon can parse a VP9 frame and walk the
block-level info.

### Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18)

Shipped as a debugfs-triggered chardev round-trip rather than
the original V4L2-ioctl plan (which moved to Phase 8.5).

- REQ_DECODE / RESP_FRAME wire protocol
- Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest
- Verified content-dependent + deterministic; structured
  error handling for bad bitstreams

See `docs/phase_8_4_closure.md`.

### Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18)

Real V4L2 m2m driver — userspace clients drive
`S_FMT`/`REQBUFS`/`QBUF`/`DQBUF` the standard way.  Bitstream
flows kernel→daemon as inline REQ_DECODE payload; decoded NV12
pixels flow daemon→kernel as inline RESP_FRAME payload.  Works
end-to-end for small frames (≤ ~64 KiB NV12).

Deliverable hit: kernel m2m driver passes most v4l2-compliance
checks; `tools/test_m2m_decode` produces a NV12 frame that's
byte-for-byte identical to `ffmpeg -pix_fmt nv12` reference.

See `docs/phase_8_5_closure.md`.

### Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18)

- CAPTURE (and OUTPUT) on `vb2_dma_contig_memops`.
- New `DAEDALUS_IOC_GET_DMABUF` chardev ioctl — daemon
  mmaps the in-flight CAPTURE buffer, decodes pixels in
  place, sends RESP_FRAME metadata-only.
- 64 KiB frame-size cap removed.  1080p VP9 + 128×96 AV1
  + 128×96 H.264 all byte-exact against reference FFmpeg
  decode.
- V4L2 stateless controls registered for VP9 / AV1 /
  H.264 (11 controls visible to userspace).
- Colorspace round-trip fix (TRY_FMT preserve, S_FMT
  OUTPUT→CAPTURE propagation).
- Cookie unified across V4L2 + debugfs paths.
- v4l2-compliance: 47/48 (only DECODER_CMD remains,
  needs media controller — moved to 8.7).

See `docs/phase_8_6_closure.md`.

### Phase 8.7 — media controller + multi-frame streaming (closed 2026-05-18)

- Media controller bound via
  `v4l2_m2m_register_media_controller` +
  `media_device_register`; `/dev/mediaN` published.
- `tools/test_m2m_stream` parses IVF and pushes frames
  sequentially through a 4-deep buffer ring; daemon
  AVCodecContext preserves reference frames across calls.
- 30-frame VP9 320×240 stream byte-exact (3.46 MB across
  1 keyframe + 29 P-frames).
- 10-frame VP9 1080p stream byte-exact (31 MB across
  10 frames at full HD).
- v4l2-compliance: **49/49 passing** (was 47/48 in 8.6;
  media controller added a 49th test and closed DECODER_CMD).

See `docs/phase_8_7_closure.md`.

### Phase 8.8 — throughput baseline + multi-codec streams + HDR (closed 2026-05-18)

- Per-frame µs timing in test_m2m_stream; multi-codec
  baseline:
  - VP9 1080p: 83.1 fps
  - AV1 1080p: 65.0 fps
  - H.264 1080p: 88.3 fps
  All byte-exact vs ffmpeg reference; all 2-3× over the
  30fps-floor-is-fine criterion.
- QPU dispatch substitution explicitly **not needed** — measurement
  shows the FFmpeg software path already clears the target on
  Pi 5's Cortex-A76.  Substitution moves to the
  optimisation roadmap.
- Annex-B H.264 access-unit splitter in the test harness
  (NALs grouped by VCL boundary).
- HDR / 10-bit: V4L2_PIX_FMT_P010 added as CAPTURE format;
  daemon pack_p010_to_plane handles YUV420P10LE → P010
  with MSB-aligned 10-bit data.  10-bit 1080p byte-exact
  at 48.8 fps.

See `docs/phase_8_8_closure.md`.

### Phase 8.9 — long-form stress + multi-codec HDR + libva scoping (closed 2026-05-18)

- libva-v4l2-request investigation: upstream supports only
  MPEG-2 / H.264 / HEVC (no VP9 or AV1) and expects the
  older `V4L2_PIX_FMT_H264_SLICE_RAW` fourcc.  Real
  integration requires adding VP9 + AV1 support to the
  library itself — pushed to Phase 8.10.
- Long-form stress: 1800-frame VP9 1080p (60s @ 30fps),
  120.9 fps sustained, p99 17.3 ms/frame, no errors, no
  leaks, daemon alive after 3620 cookies across two runs.
- HDR multi-codec byte-exact: VP9-10bit (48.8 fps,
  from 8.8), AV1-10bit (17.1 fps), H.264-10bit (26.9 fps).
  10-bit is intrinsically more expensive — AV1 falls
  short of 30fps but acceptable for the user-facing
  goal (mostly SDR YouTube).

See `docs/phase_8_9_closure.md`.

### Phase 8.10 — libva-v4l2-request VP9/AV1 patch + end-to-end consumer

1. Build libva-v4l2-request from source on hertz.
2. Add VP9_FRAME + AV1_FRAME profile mappings; add
   V4L2_PIX_FMT_NV12 (single-plane) to our CAPTURE so
   the library's video.c picks us.
3. End-to-end: `mpv --hwdec=vaapi` against test files;
   then Firefox.
4. (Stretch) Upstream the patches to bootlin.

After 8.10 the project's user-facing loop is closed.
Optimisation phases (QPU dispatch, 4K) ship when motivated.

## Effort estimate

Each phase: ~1 week of focused work (~40 hours).
Total: 7 weeks for v1.

Could be split across multiple sessions / contributors.