Files
daedalus-v4l2/docs/phase_8_7_closure.md
marfrit 5965805d86 Phase 8.7: media controller + multi-frame streaming verification
Two pieces — both shipped:

1. Media controller binding closes the last v4l2-compliance
   failure from 8.6 (DECODER_CMD, which requires has_media on
   stateless decoders) and unlocks the V4L2 request API for
   libva-v4l2-request.

2. Multi-frame streaming test exercises the daemon's
   AVCodecContext state preservation across many REQ_DECODE
   calls — Phase 8.6's tests pushed exactly one keyframe per
   invocation; real content has P-frame references.

Compliance now reaches **49/49 passing.**

Kernel (kernel/daedalus_v4l2_main.{c,h}):
- Added `struct media_device mdev` to daedalus_dev.
- media_device_init(&mdev) BEFORE v4l2_device_register so
  v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m
  entities into the graph during register.
- After video_register_device:
  v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER)
  then media_device_register so userspace sees the complete
  graph in /dev/mediaN with the decoder entity tagged.
- daedalus_remove unwinds in reverse: unregister media,
  unregister mc, unregister video, release m2m, unregister
  v4l2, cleanup mdev.
- Error paths added for both new failure points.

Test harness (tools/test_m2m_stream.c, new):
- Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer
  rings on both queues → per-frame QBUF/DQBUF loop →
  concatenates decoded NV12 to output file. Returns 0 only
  if every input frame decoded without error.
- Same codec vocabulary as test_m2m_decode (vp9 | av1 |
  h264 via 5th arg).

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings.

  $ v4l2-ctl --list-devices
  daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
        /dev/video0
        /dev/media3

VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB
NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt
nv12 -f rawvideo`.

VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf
path): byte-for-byte match vs same reference command.

Daemon log shows cookies 1..30 all completing cleanly in
order; lazily-opened AVCodecContext maintains reference
frames across the chardev round-trips.

Clean SIGTERM + rmmod, no oops/WARN.

Roadmap update (docs/roadmap.md):
- 8.7 marked closed with closure-doc reference.
- 8.8 reshaped: perf profiling, QPU dispatch substitution
  via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M).

Per correctness-before-speed:
- Order-correct media controller lifecycle (init → bind
  v4l2_dev → register video → register mc → register
  media; reverse for teardown).
- 4-deep buffer rings on both queues — the scheduler
  actually pipelines multiple in-flight cookies through
  the chardev (not just one-at-a-time as in 8.5/8.6 tests).
- Bit-exact comparison against ffmpeg, not "looks right."
- All resource paths cleaned on every error branch.

Phase 8.8 next: profile daemon hot loops, dlopen
daedalus-fourier from the daemon, swap FFmpeg per-block
calls for daedalus_dispatch_* where the kernel matches,
target 30fps@1080p from 30fps-floor-is-fine memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:21:58 +00:00

7.0 KiB
Raw Permalink Blame History

Phase 8.7 closure — media controller + multi-frame streaming

Status: closed 2026-05-18.

Two pieces:

  1. Media controller binding — closes the last v4l2-compliance failure from Phase 8.6 (DECODER_CMD, which requires has_media on stateless decoders) and unlocks the V4L2 request API for libva-v4l2-request.
  2. Multi-frame streaming verification — Phase 8.6's end-to-end tests pushed exactly one keyframe per invocation. Real-world content has reference frames (P-frames in VP9, etc.), so the daemon must maintain AVCodecContext state across many REQ_DECODE calls. This phase exercises that path with 30+10-frame streams and checks pixel-bit-exact equivalence to a reference FFmpeg decode.

Compliance now reaches 49/49 passing.

What lands

Kernel media controller (kernel/daedalus_v4l2_main.{c,h})

  • New struct media_device mdev field in daedalus_dev.
  • media_device_init(&dev->mdev) before v4l2_device_register so v4l2-core picks up the mdev binding when it sees v4l2_dev.mdev = &dev->mdev.
  • After video_register_device succeeds: v4l2_m2m_register_media_controller(m2m_dev, &dev->vdev, MEDIA_ENT_F_PROC_VIDEO_DECODER) wires up the m2m entities; then media_device_register(&dev->mdev) makes it visible to userspace as /dev/mediaN.
  • Error paths cleaned for the new failure points (err_m2m_mc → unregister mc + vdev; v4l2_device_register failure also cleans mdev).
  • daedalus_remove reverses the order: unregister media, unregister mc, unregister video, release m2m, unregister v4l2, cleanup mdev.
  • Phase banner updated from 8.5 → 8.7.

Test harness (tools/test_m2m_stream.c, new)

  • Multi-frame V4L2 m2m client that:
    1. Parses an IVF file into a struct ivf_frame[] (per- frame size + bitstream blob).
    2. Allocates 4 OUTPUT + 4 CAPTURE buffers (ring of mmap'd OUTPUT mappings; CAPTURE buffers all QBUF'd up front).
    3. For each frame: copy bitstream into OUTPUT[i%N], QBUF, poll, DQBUF OUTPUT, DQBUF CAPTURE, dump NV12 to output file, recycle CAPTURE via QBUF.
    4. Returns 0 only if all input frames decoded without error.
  • Accepts [w] [h] [codec] overrides; same codec vocabulary as test_m2m_decode.

tools/Makefile

  • Adds test_m2m_stream to the build target list.

Verification

v4l2-compliance — full pass

Total for daedalus_v4l2 device /dev/video0:
  49 tests, Succeeded: 49, Failed: 0, Warnings: 0

Progression:

  • 8.1: 44/48
  • 8.5: 44/48 (different fails)
  • 8.6: 47/48
  • 8.7: 49/49 (media controller added one more pass-eligible test and it passes)
$ v4l2-ctl --list-devices
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
        /dev/video0
        /dev/media3

30-frame VP9 320×240 stream, byte-exact

$ ffmpeg -f lavfi -i 'testsrc=duration=1.2:size=320x240:rate=25' \
       -pix_fmt yuv420p -c:v libvpx-vp9 -y /tmp/vp9_stream.ivf
$ ffmpeg -i /tmp/vp9_stream.ivf -pix_fmt nv12 -f rawvideo \
       -y /tmp/vp9_stream_ref.nv12

$ sudo ./tools/test_m2m_stream /tmp/vp9_stream.ivf \
       /tmp/vp9_stream_out.nv12
  parsed 30 frames, 320x240
  OUTPUT reqbufs -> 4
  CAPTURE reqbufs -> 4
  STREAMON both
  decoded 30 / 30 frames to /tmp/vp9_stream_out.nv12

$ cmp /tmp/vp9_stream_out.nv12 /tmp/vp9_stream_ref.nv12
$ echo $?
0   # 3.46 MB across 30 frames, byte-for-byte match

1 keyframe + 29 P-frames. The daemon's lazily-opened AVCodecContext maintains reference frames across the 30 sequential REQ_DECODE calls — pixel-bit-exact equivalence proves the decoder state is preserved correctly across the chardev round-trips.

10-frame VP9 1080p stream, byte-exact

$ sudo ./tools/test_m2m_stream /tmp/vp9_1080_stream.ivf \
       /tmp/vp9_1080_stream_out.nv12 1920 1080
  parsed 10 frames, 1920x1080
  ...
  decoded 10 / 10 frames

$ cmp /tmp/vp9_1080_stream_out.nv12 /tmp/vp9_1080_stream_ref.nv12
0   # 31 MB across 10 frames, byte-for-byte match

Combined with Phase 8.6's single-frame 1080p test: real video content streams correctly through the dmabuf-driven m2m path at 1080p.

Clean teardown

$ pkill -TERM daedalus_v4l2_daemon
$ sudo rmmod daedalus_v4l2
$ sudo dmesg | grep -E 'BUG|oops'
(empty)

No kernel oops / WARN. 4-deep buffer rings on both queues mean the scheduler is actually pipelining requests through the chardev (multiple in-flight cookies) — no deadlocks, no fd leaks, no buffer-state corruption.

Design decisions

Why register the media controller in this order?

media_device_init must run before v4l2_device_register because v4l2-core uses v4l2_dev.mdev to add the v4l2 entities into the media graph during register.

v4l2_m2m_register_media_controller must run after video_register_device because it creates the I/O entity tied to vdev->num, which only exists after register.

media_device_register must be last so userspace sees the graph in a consistent state (all entities + links present).

Reverse for tear-down. Caught by reading drivers/cedrus, hantro, rkvdec.

Why 4-deep buffer rings?

Phase 8.5 used REQBUFS count=1 — just enough to prove QBUF/DQBUF works. Real-world workloads pipeline: while the daemon decodes frame N, the client wants to QBUF N+1. A ring of 4 OUTPUT and 4 CAPTURE buffers gives the scheduler slack to maintain forward progress under poll() latency spikes. Bumping further has diminishing returns because the test serialises DQBUF after each QBUF anyway — but the infrastructure is in place for asynchronous pipelining when Phase 8.8 starts profiling throughput.

Why test_m2m_stream is its own binary

test_m2m_decode is the per-frame smoke test (single QBUF/ DQBUF cycle, exits on first error, ideal for bisecting). test_m2m_stream is the soak test (long-running, exercises the FFmpeg reference-frame plumbing, ideal for regression). Keeping them separate keeps each binary's intent and exit semantics clear.

What's NOT here (deferred to Phase 8.8)

  • Performance profiling and QPU dispatch. Currently the daemon decodes entirely on CPU via FFmpeg. Substituting per-block daedalus_dispatch_* calls into FFmpeg's hot path for the kernels where our V3D7 implementation matches is the road to 30fps@1080p (the 30fps-floor-is-fine memory's user-facing criterion).
  • HDR / 10-bit. CAPTURE is NV12M only; no P010M or YUV420P10LE pack path.
  • Multi-codec stream tests. Phase 8.7's streaming tests are VP9 only (multi-frame test files are easy to make). AV1 + H.264 multi-frame round-trips should be done by Phase 8.8.

Phase 8.8 plan

  1. Profile daemon end-to-end on hertz: identify FFmpeg hot functions for VP9 / AV1 / H.264.
  2. Map matching daedalus_dispatch_* calls (IDCT, MC, deblock, qpel — from cycles 1, 2, 4, 9).
  3. dlopen-binding from the daemon into daedalus-fourier's per-kernel entry points (sibling repo).
  4. Validate bit-exactness after each substitution.
  5. Measure fps@1080p; target 30fps stable on VP9 (8.6's single-frame proved 1080p works; 8.8 makes it real-time).
  6. Multi-frame AV1 + H.264 round-trips.
  7. P010M / YUV420P10LE for HDR sources.