Files
daedalus-v4l2/docs/roadmap.md
T
marfrit d84efdb125 Phase 8.9: long-form stress + multi-codec HDR + libva scoping
Three verification deliverables; no production code changes
(infrastructure from 8.8 was sufficient).

1. libva-v4l2-request consumer investigation (task 95):
   - bootlin/libva-v4l2-request@master supports MPEG-2 /
     H.264 / HEVC only. No VP9, no AV1.
   - H264 expects V4L2_PIX_FMT_H264_SLICE_RAW (older
     fourcc); we advertise V4L2_PIX_FMT_H264_SLICE.
   - CAPTURE expects V4L2_PIX_FMT_NV12 (single-plane);
     we advertise NV12M + P010.
   - Real integration = patch libva-v4l2-request to add
     VP9 + AV1 mappings + accept the newer H.264 fourcc.
     Multi-session work — pushed to Phase 8.10.

2. Long-form stress test (task 96):
   - Built a 1800-frame (60s @ 30fps) VP9 1080p stream
     by Python concat of vp9_5s.ivf × 12 with PTS
     adjustment and re-muxed IVF header.
   - 1800 / 1800 frames decoded cleanly through
     test_m2m_stream + daemon, fps=120.9 sustained
     across 14.9 s wall, p99=17.3 ms/frame (well inside
     the 33 ms 30fps budget).
   - Daemon alive after 3620 cookies across two
     back-to-back runs, RSS=23 MiB — no leak.
   - No kernel oops/WARN, no fps degradation across
     the long run.

3. Multi-codec HDR (task 97):
   - AV1 1080p 10-bit → P010: byte-exact vs ffmpeg
     p010le. fps 17.1 (below 30fps target; AV1 10-bit
     is intrinsically expensive).
   - H.264 1080p 10-bit (high10) → P010: byte-exact
     vs ffmpeg p010le. fps 26.9 (close to target).
   - Combined with 8.8's VP9-10bit P010 result
     (48.8 fps): all three codecs' 10-bit paths
     produce byte-exact P010 output.

Roadmap update (docs/roadmap.md):
- 8.9 marked closed with the scope-cut explained.
- 8.10 = libva-v4l2-request VP9/AV1 patch + end-to-end
  consumer integration (the actual user-facing loop:
  mpv --hwdec=vaapi → libva-v4l2-request → /dev/video0
  → daemon → decoded frame).

Per correctness-before-speed: characterised the libva
integration scope rigorously rather than starting a
multi-session battle in this phase. The bounded
deliverables (stress test + HDR matrix) ship clean and
prove the existing infrastructure handles real-world
workloads stably.

Phase 8.10 next: build + patch libva-v4l2-request on
hertz; end-to-end with mpv.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 17:26:42 +00:00

5.6 KiB
Raw Blame History

daedalus-v4l2 — roadmap

Sub-phases

Phase 8.1 — kernel module skeleton

Out-of-tree kernel module that:

  • Registers /dev/videoNN with VFL_TYPE_VIDEO + a no-op V4L2 stateless dispatch table.
  • Accepts open/close, S_FMT, REQBUFS ioctls without doing anything (yet).
  • Builds against /lib/modules/$(uname -r)/build.

Deliverable: modprobe daedalus_v4l2 works, v4l2-ctl --list-devices shows the new device.

Phase 8.2 — kernel ↔ daemon chardev bridge

  • Kernel module creates /dev/daedalus-v4l2 chardev.
  • Defines a simple req/resp protocol in include/daedalus_v4l2_proto.h.
  • Daemon connects, exchanges echo requests.

Deliverable: ping-pong test passes.

Phase 8.3 — daemon FFmpeg dlopen + parse

  • Daemon links libdaedalus_core.a from sibling.
  • Daemon dlopens FFmpeg.
  • Test program: feed a VP9 IVF file to FFmpeg parsers, extract block-level metadata, validate against expected.

Deliverable: daemon can parse a VP9 frame and walk the block-level info.

Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18)

Shipped as a debugfs-triggered chardev round-trip rather than the original V4L2-ioctl plan (which moved to Phase 8.5).

  • REQ_DECODE / RESP_FRAME wire protocol
  • Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest
  • Verified content-dependent + deterministic; structured error handling for bad bitstreams

See docs/phase_8_4_closure.md.

Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18)

Real V4L2 m2m driver — userspace clients drive S_FMT/REQBUFS/QBUF/DQBUF the standard way. Bitstream flows kernel→daemon as inline REQ_DECODE payload; decoded NV12 pixels flow daemon→kernel as inline RESP_FRAME payload. Works end-to-end for small frames (≤ ~64 KiB NV12).

Deliverable hit: kernel m2m driver passes most v4l2-compliance checks; tools/test_m2m_decode produces a NV12 frame that's byte-for-byte identical to ffmpeg -pix_fmt nv12 reference.

See docs/phase_8_5_closure.md.

Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18)

  • CAPTURE (and OUTPUT) on vb2_dma_contig_memops.
  • New DAEDALUS_IOC_GET_DMABUF chardev ioctl — daemon mmaps the in-flight CAPTURE buffer, decodes pixels in place, sends RESP_FRAME metadata-only.
  • 64 KiB frame-size cap removed. 1080p VP9 + 128×96 AV1
    • 128×96 H.264 all byte-exact against reference FFmpeg decode.
  • V4L2 stateless controls registered for VP9 / AV1 / H.264 (11 controls visible to userspace).
  • Colorspace round-trip fix (TRY_FMT preserve, S_FMT OUTPUT→CAPTURE propagation).
  • Cookie unified across V4L2 + debugfs paths.
  • v4l2-compliance: 47/48 (only DECODER_CMD remains, needs media controller — moved to 8.7).

See docs/phase_8_6_closure.md.

Phase 8.7 — media controller + multi-frame streaming (closed 2026-05-18)

  • Media controller bound via v4l2_m2m_register_media_controller + media_device_register; /dev/mediaN published.
  • tools/test_m2m_stream parses IVF and pushes frames sequentially through a 4-deep buffer ring; daemon AVCodecContext preserves reference frames across calls.
  • 30-frame VP9 320×240 stream byte-exact (3.46 MB across 1 keyframe + 29 P-frames).
  • 10-frame VP9 1080p stream byte-exact (31 MB across 10 frames at full HD).
  • v4l2-compliance: 49/49 passing (was 47/48 in 8.6; media controller added a 49th test and closed DECODER_CMD).

See docs/phase_8_7_closure.md.

Phase 8.8 — throughput baseline + multi-codec streams + HDR (closed 2026-05-18)

  • Per-frame µs timing in test_m2m_stream; multi-codec baseline:
    • VP9 1080p: 83.1 fps
    • AV1 1080p: 65.0 fps
    • H.264 1080p: 88.3 fps All byte-exact vs ffmpeg reference; all 2-3× over the 30fps-floor-is-fine criterion.
  • QPU dispatch substitution explicitly not needed — measurement shows the FFmpeg software path already clears the target on Pi 5's Cortex-A76. Substitution moves to the optimisation roadmap.
  • Annex-B H.264 access-unit splitter in the test harness (NALs grouped by VCL boundary).
  • HDR / 10-bit: V4L2_PIX_FMT_P010 added as CAPTURE format; daemon pack_p010_to_plane handles YUV420P10LE → P010 with MSB-aligned 10-bit data. 10-bit 1080p byte-exact at 48.8 fps.

See docs/phase_8_8_closure.md.

Phase 8.9 — long-form stress + multi-codec HDR + libva scoping (closed 2026-05-18)

  • libva-v4l2-request investigation: upstream supports only MPEG-2 / H.264 / HEVC (no VP9 or AV1) and expects the older V4L2_PIX_FMT_H264_SLICE_RAW fourcc. Real integration requires adding VP9 + AV1 support to the library itself — pushed to Phase 8.10.
  • Long-form stress: 1800-frame VP9 1080p (60s @ 30fps), 120.9 fps sustained, p99 17.3 ms/frame, no errors, no leaks, daemon alive after 3620 cookies across two runs.
  • HDR multi-codec byte-exact: VP9-10bit (48.8 fps, from 8.8), AV1-10bit (17.1 fps), H.264-10bit (26.9 fps). 10-bit is intrinsically more expensive — AV1 falls short of 30fps but acceptable for the user-facing goal (mostly SDR YouTube).

See docs/phase_8_9_closure.md.

Phase 8.10 — libva-v4l2-request VP9/AV1 patch + end-to-end consumer

  1. Build libva-v4l2-request from source on hertz.
  2. Add VP9_FRAME + AV1_FRAME profile mappings; add V4L2_PIX_FMT_NV12 (single-plane) to our CAPTURE so the library's video.c picks us.
  3. End-to-end: mpv --hwdec=vaapi against test files; then Firefox.
  4. (Stretch) Upstream the patches to bootlin.

After 8.10 the project's user-facing loop is closed. Optimisation phases (QPU dispatch, 4K) ship when motivated.

Effort estimate

Each phase: ~1 week of focused work (~40 hours). Total: 7 weeks for v1.

Could be split across multiple sessions / contributors.