Files
daedalus-v4l2/docs/roadmap.md
T
marfrit c7f6fb90cb Phase 8.6: dmabuf + AV1 + H.264 + stateless controls
Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE
buffers as dmabuf-fds the daemon mmaps and writes pixels into
directly. Adds AV1 + H.264 codec support, V4L2 stateless control
registration, and the compliance polish that brings the driver
to 47/48 v4l2-compliance pass.

Protocol (include/daedalus_v4l2_proto.h):
- struct daedalus_req_decode grew capture-buffer metadata
  (width/height/pix_fmt/num_planes + per-plane size+stride).
- New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon
  asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf
  in daemon task context so the fd lands in the daemon's table.

Kernel m2m driver (kernel/daedalus_v4l2_main.c):
- Both queues switched to vb2_dma_contig_memops. OUTPUT was
  vmalloc in 8.5; the switch is needed because vmalloc doesn't
  honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's
  REQBUFS test rejected the driver because of it. We still
  read bitstream via vb2_plane_vaddr (dma_contig gives a
  kernel virtual address just like vmalloc did).
- dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe.
- queue_setup populates alloc_devs[plane] = &pdev->dev for
  both queues; allow_cache_hints=1 on both.
- daedalus_export_capture_dmabuf(cookie, plane, flags, *fd):
  walks inflight list, calls vb2_core_expbuf on the CAPTURE
  buffer in the caller's (daemon's) task context.
- device_run fills the new REQ_DECODE capture fields from
  ctx->dst_fmt and maps ctx->src_fmt.pixelformat to
  DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9).
- daedalus_complete_resp_frame handles both the 8.5 inline
  path (kept for debugging) and the 8.6 dmabuf path (pixels
  already in CAPTURE buffer, just set payload from metadata).
- enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264).
- try_fmt preserves userspace colorspace fields instead of
  overwriting with REC709 defaults (fixes 8.5 compliance fail).
- s_fmt propagates OUTPUT colorspace → CAPTURE (stateless
  decoder round-trip test at v4l2-test-formats.cpp:958).
- 12 V4L2 stateless controls registered per open (VP9_FRAME,
  VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/
  SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/
  TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg
  re-parses); registration is what makes libva-v4l2-request
  see us.

Kernel chardev (kernel/daedalus_v4l2_chardev.c):
- New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to
  daedalus_export_capture_dmabuf.
- debugfs test_decode cookies unified with the m2m cookie
  allocator via shared daedalus_next_cookie() — kills the
  Phase 8.5 namespace collision.

Daemon (daemon/src/...):
- New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on
  REQ_DECODE; munmap + close on completion. O_RDWR | O_CLOEXEC
  is essential — vb2_core_expbuf extracts O_ACCMODE from flags
  and exports read-only by default (caught on first run; mmap
  -EACCES on PROT_WRITE).
- decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in
  addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes
  writes Y line-by-line into planes[0] with planes[0].stride;
  interleaves Cb/Cr into planes[1] with planes[1].stride.
- chardev_client.c handle_req_decode: opens dmabuf planes,
  runs decode (pixels land in CAPTURE buffer directly), closes
  planes, sends metadata-only RESP_FRAME. No wire-pixel
  allocation.

Test harness (tools/test_m2m_decode.c):
- Optional 5th arg `codec` (vp9 | av1 | h264). Same client
  drives all three codecs.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`:
  VP9   1920x1080  3,110,400 bytes  MATCH
  AV1     128x96      18,432 bytes  MATCH
  H.264   128x96      18,432 bytes  MATCH

VP9 1080p went through the full dmabuf path with no chardev
payload bloat — the same chardev that capped at 64 KiB in 8.5
now ferries metadata only and lets the daemon mmap+write a
3.1 MB frame directly into the V4L2 client's buffer.

v4l2-compliance:
  Phase 8.1: 44/48
  Phase 8.5: 44/48 (different fails after m2m landed)
  Phase 8.6: 47/48
  Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media
  controller — explicitly Phase 8.7 work).

11 standard compound controls visible:
  vp9_frame_decode_parameters, vp9_probabilities_updates,
  h264_sequence_parameter_set, h264_picture_parameter_set,
  h264_scaling_matrix, h264_prediction_weight_table,
  h264_slice_parameters, h264_decode_parameters,
  av1_sequence_parameters, av1_frame_parameters,
  av1_film_grain (av1_tile_group_entry refused by hdl->error
  on this kernel — skipped silently).

Clean SIGTERM + rmmod, no oops/WARN.

Roadmap update (docs/roadmap.md):
- Phase 8.6 marked closed with the closure-doc reference.
- Phase 8.7 reshaped to (1) media controller, (2) perf +
  daedalus_dispatch_* substitution, (3) HDR/10-bit, (4)
  long-form multi-frame streaming.

Per correctness-before-speed:
- Real V4L2 dmabuf via vb2_core_expbuf (not a sideband
  fd-passing hack).
- O_RDWR access mode threaded through correctly.
- Strict pixel-byte comparison against ffmpeg, not "looks
  right" eyeballing.
- Each compliance edge documented with the underlying test
  source-line + the fix.
- All resource paths cleaned (munmap + close per plane on
  every exit, including error paths).

Phase 8.7 next: media controller binding (closes last
compliance fail), per-frame profiling, QPU dispatch
substitution targeting 30fps@1080p from
30fps-floor-is-fine memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:16:06 +00:00

105 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# daedalus-v4l2 — roadmap
## Sub-phases
### Phase 8.1 — kernel module skeleton
Out-of-tree kernel module that:
- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op
V4L2 stateless dispatch table.
- Accepts open/close, S_FMT, REQBUFS ioctls without doing
anything (yet).
- Builds against `/lib/modules/$(uname -r)/build`.
Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices`
shows the new device.
### Phase 8.2 — kernel ↔ daemon chardev bridge
- Kernel module creates `/dev/daedalus-v4l2` chardev.
- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`.
- Daemon connects, exchanges echo requests.
Deliverable: ping-pong test passes.
### Phase 8.3 — daemon FFmpeg dlopen + parse
- Daemon links `libdaedalus_core.a` from sibling.
- Daemon dlopens FFmpeg.
- Test program: feed a VP9 IVF file to FFmpeg parsers,
extract block-level metadata, validate against expected.
Deliverable: daemon can parse a VP9 frame and walk the
block-level info.
### Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18)
Shipped as a debugfs-triggered chardev round-trip rather than
the original V4L2-ioctl plan (which moved to Phase 8.5).
- REQ_DECODE / RESP_FRAME wire protocol
- Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest
- Verified content-dependent + deterministic; structured
error handling for bad bitstreams
See `docs/phase_8_4_closure.md`.
### Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18)
Real V4L2 m2m driver — userspace clients drive
`S_FMT`/`REQBUFS`/`QBUF`/`DQBUF` the standard way. Bitstream
flows kernel→daemon as inline REQ_DECODE payload; decoded NV12
pixels flow daemon→kernel as inline RESP_FRAME payload. Works
end-to-end for small frames (≤ ~64 KiB NV12).
Deliverable hit: kernel m2m driver passes most v4l2-compliance
checks; `tools/test_m2m_decode` produces a NV12 frame that's
byte-for-byte identical to `ffmpeg -pix_fmt nv12` reference.
See `docs/phase_8_5_closure.md`.
### Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18)
- CAPTURE (and OUTPUT) on `vb2_dma_contig_memops`.
- New `DAEDALUS_IOC_GET_DMABUF` chardev ioctl — daemon
mmaps the in-flight CAPTURE buffer, decodes pixels in
place, sends RESP_FRAME metadata-only.
- 64 KiB frame-size cap removed. 1080p VP9 + 128×96 AV1
+ 128×96 H.264 all byte-exact against reference FFmpeg
decode.
- V4L2 stateless controls registered for VP9 / AV1 /
H.264 (11 controls visible to userspace).
- Colorspace round-trip fix (TRY_FMT preserve, S_FMT
OUTPUT→CAPTURE propagation).
- Cookie unified across V4L2 + debugfs paths.
- v4l2-compliance: 47/48 (only DECODER_CMD remains,
needs media controller — moved to 8.7).
See `docs/phase_8_6_closure.md`.
### Phase 8.7 — media controller, perf, HDR, long-form streams
1. Media controller binding via
`v4l2_m2m_register_media_controller` (closes the last
v4l2-compliance fail and unlocks the request API).
2. Profile the daemon's per-frame cost on hertz; substitute
`daedalus_dispatch_*` for FFmpeg's per-block paths where
the kernel implementation matches. Target the
`30fps-floor-is-fine` memory's daily-YouTube criterion:
30fps@1080p with CPU left over for vscode.
3. HDR / 10-bit support — P010M CAPTURE, depth-aware
`pack_nv12_to_planes`.
4. Long-form multi-frame streaming tests (B-frame refs,
GOP boundaries) — current test client is one keyframe
per run.
Deliverable: 30fps stable on real content + full
compliance pass.
## Effort estimate
Each phase: ~1 week of focused work (~40 hours).
Total: 7 weeks for v1.
Could be split across multiple sessions / contributors.