# daedalus-v4l2 — roadmap ## Sub-phases ### Phase 8.1 — kernel module skeleton Out-of-tree kernel module that: - Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op V4L2 stateless dispatch table. - Accepts open/close, S_FMT, REQBUFS ioctls without doing anything (yet). - Builds against `/lib/modules/$(uname -r)/build`. Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices` shows the new device. ### Phase 8.2 — kernel ↔ daemon chardev bridge - Kernel module creates `/dev/daedalus-v4l2` chardev. - Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`. - Daemon connects, exchanges echo requests. Deliverable: ping-pong test passes. ### Phase 8.3 — daemon FFmpeg dlopen + parse - Daemon links `libdaedalus_core.a` from sibling. - Daemon dlopens FFmpeg. - Test program: feed a VP9 IVF file to FFmpeg parsers, extract block-level metadata, validate against expected. Deliverable: daemon can parse a VP9 frame and walk the block-level info. ### Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18) Shipped as a debugfs-triggered chardev round-trip rather than the original V4L2-ioctl plan (which moved to Phase 8.5). - REQ_DECODE / RESP_FRAME wire protocol - Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest - Verified content-dependent + deterministic; structured error handling for bad bitstreams See `docs/phase_8_4_closure.md`. ### Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18) Real V4L2 m2m driver — userspace clients drive `S_FMT`/`REQBUFS`/`QBUF`/`DQBUF` the standard way. Bitstream flows kernel→daemon as inline REQ_DECODE payload; decoded NV12 pixels flow daemon→kernel as inline RESP_FRAME payload. Works end-to-end for small frames (≤ ~64 KiB NV12). Deliverable hit: kernel m2m driver passes most v4l2-compliance checks; `tools/test_m2m_decode` produces a NV12 frame that's byte-for-byte identical to `ffmpeg -pix_fmt nv12` reference. See `docs/phase_8_5_closure.md`. ### Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18) - CAPTURE (and OUTPUT) on `vb2_dma_contig_memops`. - New `DAEDALUS_IOC_GET_DMABUF` chardev ioctl — daemon mmaps the in-flight CAPTURE buffer, decodes pixels in place, sends RESP_FRAME metadata-only. - 64 KiB frame-size cap removed. 1080p VP9 + 128×96 AV1 + 128×96 H.264 all byte-exact against reference FFmpeg decode. - V4L2 stateless controls registered for VP9 / AV1 / H.264 (11 controls visible to userspace). - Colorspace round-trip fix (TRY_FMT preserve, S_FMT OUTPUT→CAPTURE propagation). - Cookie unified across V4L2 + debugfs paths. - v4l2-compliance: 47/48 (only DECODER_CMD remains, needs media controller — moved to 8.7). See `docs/phase_8_6_closure.md`. ### Phase 8.7 — media controller + multi-frame streaming (closed 2026-05-18) - Media controller bound via `v4l2_m2m_register_media_controller` + `media_device_register`; `/dev/mediaN` published. - `tools/test_m2m_stream` parses IVF and pushes frames sequentially through a 4-deep buffer ring; daemon AVCodecContext preserves reference frames across calls. - 30-frame VP9 320×240 stream byte-exact (3.46 MB across 1 keyframe + 29 P-frames). - 10-frame VP9 1080p stream byte-exact (31 MB across 10 frames at full HD). - v4l2-compliance: **49/49 passing** (was 47/48 in 8.6; media controller added a 49th test and closed DECODER_CMD). See `docs/phase_8_7_closure.md`. ### Phase 8.8 — throughput baseline + multi-codec streams + HDR (closed 2026-05-18) - Per-frame µs timing in test_m2m_stream; multi-codec baseline: - VP9 1080p: 83.1 fps - AV1 1080p: 65.0 fps - H.264 1080p: 88.3 fps All byte-exact vs ffmpeg reference; all 2-3× over the 30fps-floor-is-fine criterion. - QPU dispatch substitution explicitly **not needed** — measurement shows the FFmpeg software path already clears the target on Pi 5's Cortex-A76. Substitution moves to the optimisation roadmap. - Annex-B H.264 access-unit splitter in the test harness (NALs grouped by VCL boundary). - HDR / 10-bit: V4L2_PIX_FMT_P010 added as CAPTURE format; daemon pack_p010_to_plane handles YUV420P10LE → P010 with MSB-aligned 10-bit data. 10-bit 1080p byte-exact at 48.8 fps. See `docs/phase_8_8_closure.md`. ### Phase 8.9 — long-form stress + multi-codec HDR + libva scoping (closed 2026-05-18) - libva-v4l2-request investigation: upstream supports only MPEG-2 / H.264 / HEVC (no VP9 or AV1) and expects the older `V4L2_PIX_FMT_H264_SLICE_RAW` fourcc. Real integration requires adding VP9 + AV1 support to the library itself — pushed to Phase 8.10. - Long-form stress: 1800-frame VP9 1080p (60s @ 30fps), 120.9 fps sustained, p99 17.3 ms/frame, no errors, no leaks, daemon alive after 3620 cookies across two runs. - HDR multi-codec byte-exact: VP9-10bit (48.8 fps, from 8.8), AV1-10bit (17.1 fps), H.264-10bit (26.9 fps). 10-bit is intrinsically more expensive — AV1 falls short of 30fps but acceptable for the user-facing goal (mostly SDR YouTube). See `docs/phase_8_9_closure.md`. ### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18) - Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2- request-fourier (gitea); discovered the existing fourier fork already had VP9/AV1/HEVC support on Rockchip. - Added daedalus_v4l2 to known_decoder_drivers + meson build option. - Added V4L2_PIX_FMT_NV12 single-plane + request API media ops + stateless control registration to our kernel. - vainfo enumerates 7 VAProfile entries via our driver. See `docs/phase_8_10_11_closure.md`. ### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18) - vb2_queue supports_requests; vb2_ops buf_out_validate + buf_request_complete; v4l2_ctrl_new_custom for stateless ctrl registration. - Daemon decoded byte-exact VP9 keyframe via the full libva path (FNV-1a 0x1eb34bfe matches standalone). - ffmpeg still timed out waiting for media_request completion (request bind state). See `docs/phase_8_12_closure.md`. ### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18) - Traced the misleading "Unable to set control(s): Invalid argument" — actually libva probing H264/HEVC PROFILE/LEVEL we don't expose (harmless); the real VP9 stateless control SET succeeds. - Diagnosed the "Timeout when waiting for media request" root cause: per-request control object stays bound because vb2's normal buf_done path doesn't fire buf_request_complete (only queue-cancel does). - Fix: capture media_request from src_buf in inflight entry, call v4l2_ctrl_request_complete from the completion path before buf_done_and_job_finish. - **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi → libva-v4l2-request-fourier → /dev/video0 → daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte identical to ffmpeg software reference. - **Project consumer target hit**: V4L2 stateless node consumed by a real VAAPI client. See `docs/phase_8_13_closure.md`. ### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers 1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9 stress test) — P-frame references across requests. 2. AV1 + H.264 single-frame via libva. 3. mpv --hwdec=vaapi end-to-end. 4. Firefox / WebRTC if motivated. Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships when there's a concrete user-facing need. ## Effort estimate Each phase: ~1 week of focused work (~40 hours). Total: 7 weeks for v1. Could be split across multiple sessions / contributors.