Commit Graph

3 Commits

Author SHA1 Message Date
marfrit c7f6fb90cb Phase 8.6: dmabuf + AV1 + H.264 + stateless controls
Removes the Phase 8.5 64 KiB frame-size cap by exporting CAPTURE
buffers as dmabuf-fds the daemon mmaps and writes pixels into
directly. Adds AV1 + H.264 codec support, V4L2 stateless control
registration, and the compliance polish that brings the driver
to 47/48 v4l2-compliance pass.

Protocol (include/daedalus_v4l2_proto.h):
- struct daedalus_req_decode grew capture-buffer metadata
  (width/height/pix_fmt/num_planes + per-plane size+stride).
- New DAEDALUS_IOC_GET_DMABUF ioctl on the chardev: daemon
  asks for a per-plane dmabuf fd, kernel calls vb2_core_expbuf
  in daemon task context so the fd lands in the daemon's table.

Kernel m2m driver (kernel/daedalus_v4l2_main.c):
- Both queues switched to vb2_dma_contig_memops. OUTPUT was
  vmalloc in 8.5; the switch is needed because vmalloc doesn't
  honour V4L2_MEMORY_FLAG_NON_COHERENT and v4l2-compliance's
  REQBUFS test rejected the driver because of it. We still
  read bitstream via vb2_plane_vaddr (dma_contig gives a
  kernel virtual address just like vmalloc did).
- dma_coerce_mask_and_coherent(DMA_BIT_MASK(32)) in probe.
- queue_setup populates alloc_devs[plane] = &pdev->dev for
  both queues; allow_cache_hints=1 on both.
- daedalus_export_capture_dmabuf(cookie, plane, flags, *fd):
  walks inflight list, calls vb2_core_expbuf on the CAPTURE
  buffer in the caller's (daemon's) task context.
- device_run fills the new REQ_DECODE capture fields from
  ctx->dst_fmt and maps ctx->src_fmt.pixelformat to
  DAEDALUS_CODEC_VP9 / _AV1 / _H264 (was hard-wired to VP9).
- daedalus_complete_resp_frame handles both the 8.5 inline
  path (kept for debugging) and the 8.6 dmabuf path (pixels
  already in CAPTURE buffer, just set payload from metadata).
- enum_fmt advertises all 3 OUTPUT formats (VP9F, AV1F, S264).
- try_fmt preserves userspace colorspace fields instead of
  overwriting with REC709 defaults (fixes 8.5 compliance fail).
- s_fmt propagates OUTPUT colorspace → CAPTURE (stateless
  decoder round-trip test at v4l2-test-formats.cpp:958).
- 12 V4L2 stateless controls registered per open (VP9_FRAME,
  VP9_COMPRESSED_HDR, H264_SPS/PPS/SCALING/PRED_WEIGHTS/
  SLICE_PARAMS/DECODE_PARAMS, AV1_FRAME/SEQUENCE/
  TILE_GROUP_ENTRY/FILM_GRAIN). Daemon ignores values (FFmpeg
  re-parses); registration is what makes libva-v4l2-request
  see us.

Kernel chardev (kernel/daedalus_v4l2_chardev.c):
- New unlocked_ioctl dispatching DAEDALUS_IOC_GET_DMABUF to
  daedalus_export_capture_dmabuf.
- debugfs test_decode cookies unified with the m2m cookie
  allocator via shared daedalus_next_cookie() — kills the
  Phase 8.5 namespace collision.

Daemon (daemon/src/...):
- New dmabuf_capture.{c,h}: GET_DMABUF + mmap each plane on
  REQ_DECODE; munmap + close on completion. O_RDWR | O_CLOEXEC
  is essential — vb2_core_expbuf extracts O_ACCMODE from flags
  and exports read-only by default (caught on first run; mmap
  -EACCES on PROT_WRITE).
- decoder.{c,h}: lazily opens AV1 + H.264 AVCodecContexts in
  addition to VP9 (dropped the -ENOSYS stubs). pack_nv12_to_planes
  writes Y line-by-line into planes[0] with planes[0].stride;
  interleaves Cb/Cr into planes[1] with planes[1].stride.
- chardev_client.c handle_req_decode: opens dmabuf planes,
  runs decode (pixels land in CAPTURE buffer directly), closes
  planes, sends metadata-only RESP_FRAME. No wire-pixel
  allocation.

Test harness (tools/test_m2m_decode.c):
- Optional 5th arg `codec` (vp9 | av1 | h264). Same client
  drives all three codecs.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

Bit-exact end-to-end vs `ffmpeg -pix_fmt nv12`:
  VP9   1920x1080  3,110,400 bytes  MATCH
  AV1     128x96      18,432 bytes  MATCH
  H.264   128x96      18,432 bytes  MATCH

VP9 1080p went through the full dmabuf path with no chardev
payload bloat — the same chardev that capped at 64 KiB in 8.5
now ferries metadata only and lets the daemon mmap+write a
3.1 MB frame directly into the V4L2 client's buffer.

v4l2-compliance:
  Phase 8.1: 44/48
  Phase 8.5: 44/48 (different fails after m2m landed)
  Phase 8.6: 47/48
  Only remaining: VIDIOC_(TRY_)DECODER_CMD (needs media
  controller — explicitly Phase 8.7 work).

11 standard compound controls visible:
  vp9_frame_decode_parameters, vp9_probabilities_updates,
  h264_sequence_parameter_set, h264_picture_parameter_set,
  h264_scaling_matrix, h264_prediction_weight_table,
  h264_slice_parameters, h264_decode_parameters,
  av1_sequence_parameters, av1_frame_parameters,
  av1_film_grain (av1_tile_group_entry refused by hdl->error
  on this kernel — skipped silently).

Clean SIGTERM + rmmod, no oops/WARN.

Roadmap update (docs/roadmap.md):
- Phase 8.6 marked closed with the closure-doc reference.
- Phase 8.7 reshaped to (1) media controller, (2) perf +
  daedalus_dispatch_* substitution, (3) HDR/10-bit, (4)
  long-form multi-frame streaming.

Per correctness-before-speed:
- Real V4L2 dmabuf via vb2_core_expbuf (not a sideband
  fd-passing hack).
- O_RDWR access mode threaded through correctly.
- Strict pixel-byte comparison against ffmpeg, not "looks
  right" eyeballing.
- Each compliance edge documented with the underlying test
  source-line + the fix.
- All resource paths cleaned (munmap + close per plane on
  every exit, including error paths).

Phase 8.7 next: media controller binding (closes last
compliance fail), per-frame profiling, QPU dispatch
substitution targeting 30fps@1080p from
30fps-floor-is-fine memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:16:06 +00:00
marfrit 2a449632b9 Phase 8.4: daemon ↔ kernel decode round-trip (VP9 end-to-end)
Wires the Phase 8.3 FFmpeg loader through the Phase 8.2 chardev
bridge: kernel injects REQ_DECODE carrying a raw VP9 access unit,
daemon hands the bitstream to libavcodec via dlopen, sends
RESP_FRAME back with a content-dependent FNV-1a digest of the
decoded YUV planes. Pure CPU decode for now — Phase 8.5 swaps in
dmabuf + QPU dispatch.

Protocol (include/daedalus_v4l2_proto.h):
- New REQ_DECODE (kernel→daemon) and RESP_FRAME (daemon→kernel)
  message types, with fixed-size payload structs.
- New DAEDALUS_CODEC_VP9/AV1/H264 enum (wire-stable so 8.6's
  AV1+H.264 work doesn't move existing values).
- New DAEDALUS_DECODE_* status enum (OK / NO_FRAME / ERR_OPEN /
  ERR_SEND / ERR_RECV / ERR_CODEC).
- Converted the prior `enum daedalus_msg_type` to #defines —
  high-bit values exceed INT_MAX and tripped -Wpedantic on
  userspace; kernel uABI headers use the same idiom.

Kernel (kernel/daedalus_v4l2_chardev.c):
- New debugfs entry /sys/kernel/debug/daedalus_v4l2/test_decode:
  writing raw bitstream bytes wraps them in a REQ_DECODE
  (codec=VP9 for Phase 8.4) and enqueues with an
  auto-incrementing cookie.
- daedalus_chardev_write learned RESP_FRAME: parses the payload
  and emits a single pr_info line with decode metadata. Keeps
  existing PONG handling on the default arm.

Daemon (daemon/src/...):
- chardev_client.{c,h} — opens /dev/daedalus-v4l2, blocking read
  loop, single-buffer write() responses (kernel chardev has only
  .write, not .write_iter, so writev lands as -EINVAL —
  discovered the hard way during first run).
- decoder.{c,h} — lazily-opened AVCodecContext per codec, shared
  AVPacket/AVFrame pair, descriptor-driven plane walker
  (av_pix_fmt_desc_get) so the same hash path covers YUV420P,
  YUV422P, YUV444P, GBRP and other 8-bit planar layouts.
  Generalised after first run decoded testsrc as GBRP (71)
  rather than the assumed YUV420P.
- `daemon` command in main.c opens the chardev and runs the loop
  until SIGINT/SIGTERM. Cookie correlation handled end-to-end.
- ffmpeg_loader gained av_pix_fmt_desc_get (23 symbols total).

Build:
- CMakeLists adds chardev_client.c + decoder.c; explicit
  -I../include for the shared protocol header.
- Still -Wall -Wextra -Wpedantic clean.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  $ ffmpeg ... -pix_fmt yuv420p -c:v libvpx-vp9 -frames:v 1 \
           -y /tmp/vp9_test.ivf
  $ python3 ... strip IVF framing → vp9_keyframe.bin (3268 B)

  $ sudo insmod kernel/daedalus_v4l2.ko
  $ daedalus_v4l2_daemon -v daemon &
  $ sudo dd if=vp9_keyframe.bin \
         of=/sys/kernel/debug/daedalus_v4l2/test_decode

  daemon: REQ_DECODE cookie=2 → decoded yuv420p 320x240
          fnv1a=0x6ef10d71 luma=76800 chroma=38400
  kernel: RESP_FRAME cookie=2 status=0 320x240 pixfmt=0
          fnv1a=0x6ef10d71  ← matches daemon ✓

Hash properties verified:
  cookie=2  testsrc 3268 B → 0x6ef10d71  (first decode)
  cookie=3  red     44 B   → 0x7f6e5dc5  (content-dependent ✓)
  cookie=4  testsrc 3268 B → 0x6ef10d71  (deterministic ✓)
  cookie=5  64 B random    → status=101  (ERR_SEND, daemon alive)

Daemon survives bad input (FFmpeg "Invalid sync code" wrapped
into structured ERR_SEND response). Clean SIGTERM shutdown,
clean rmmod.

Phase 8.4 acceptance criteria met:
- ✓ end-to-end kernel→daemon→FFmpeg→kernel round-trip
- ✓ cookie correlation per request/response pair
- ✓ content-dependent + deterministic digest
- ✓ structured error responses (no daemon crash on bad input)
- ✓ clean teardown (SIGTERM + rmmod)
- ✓ builds clean on both kernel kbuild and daemon CMake

Per correctness-before-speed:
- Real chardev I/O (no shortcuts, no select-loop hacks)
- Real FFmpeg AVCodecContext lifecycle (lazily opened, properly
  freed on cleanup)
- Descriptor-driven plane walk (generalises across pix_fmts)
- Structured error path (not just log-and-continue)
- All resource paths cleaned up on every error branch
- Documented why FNV-1a digest, why write() not writev(), why
  pix_desc walk in docs/phase_8_4_closure.md

Phase 8.5 next: V4L2 m2m queue submits REQ_DECODE from
vidioc_qbuf; dmabuf carries actual pixel data so the chardev's
64 KiB cap doesn't gate frame size; begin substituting
daedalus_dispatch_* into the daemon's decode path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:22:16 +00:00
marfrit 895f57c63a Phase 8.2: kernel ↔ daemon chardev bridge with round-trip test
Adds /dev/daedalus-v4l2 misc chardev to the kernel module. The
chardev is the IPC channel for the future userspace decoder
daemon: kernel enqueues REQ_* messages, daemon read()s them,
processes, write()s RESP_* back.

Wire protocol (pre-1.0, header in include/daedalus_v4l2_proto.h):
- struct daedalus_msg_hdr: magic (D04V) + version + type +
  cookie + payload_len + reserved
- Request/response separated by high bit of type field
- Max 64 KiB payload per message
- Cookie correlates request with matching response

Kernel implementation (kernel/daedalus_v4l2_chardev.{c,h}):
- Single-instance chardev (-EBUSY on second open)
- In-kernel FIFO bounded at 64 messages
- Blocking + non-blocking read; poll() with EPOLLIN on queued
- write() parses + validates header, logs response at pr_debug
- Bad magic → -EBADMSG, bad version → -EPROTO, oversize → -EMSGSIZE
- All error paths free resources

Phase 8.2 test trigger via debugfs:
- /sys/kernel/debug/daedalus_v4l2/test_ping — any byte
  enqueues a PING with a fixed 24-byte payload. Removed in
  Phase 8.4 when real REQ_DECODE from V4L2 path takes over.

Userspace verification tool (tools/test_chardev_pingpong.c):
- Real C program, proper error reporting via strerror
- Validates the 6-step round-trip: open → empty-queue EAGAIN →
  trigger ping → read PING → verify all fields → write PONG → close
- Builds with -Wall -Wextra clean

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
  $ sudo insmod daedalus_v4l2.ko
  $ sudo tools/test_chardev_pingpong
  opening /dev/daedalus-v4l2...
    non-blocking read on empty queue: EAGAIN ✓
    injected PING via debugfs ✓
    read PING: magic ✓ version ✓ type=PING ✓ cookie=0x1234 ✓ payload=24 bytes
      payload: "DAEDALUS-V4L2-PING-PL"
    wrote PONG (cookie=0x1234) ✓
  ALL TESTS PASSED.
  $ sudo rmmod daedalus_v4l2      # clean

Per correctness-before-speed: full kerneldoc on structs, 8-tab
kernel style, SPDX headers, proper error paths, real test
program (not "I ran it once"), failure-mode coverage documented.

Phase 8.3 next: userspace daemon with dlopen'd FFmpeg parse path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 15:05:54 +00:00