Files
daedalus-v4l2/docs/phase_8_7_closure.md
T
marfrit 5965805d86 Phase 8.7: media controller + multi-frame streaming verification
Two pieces — both shipped:

1. Media controller binding closes the last v4l2-compliance
   failure from 8.6 (DECODER_CMD, which requires has_media on
   stateless decoders) and unlocks the V4L2 request API for
   libva-v4l2-request.

2. Multi-frame streaming test exercises the daemon's
   AVCodecContext state preservation across many REQ_DECODE
   calls — Phase 8.6's tests pushed exactly one keyframe per
   invocation; real content has P-frame references.

Compliance now reaches **49/49 passing.**

Kernel (kernel/daedalus_v4l2_main.{c,h}):
- Added `struct media_device mdev` to daedalus_dev.
- media_device_init(&mdev) BEFORE v4l2_device_register so
  v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m
  entities into the graph during register.
- After video_register_device:
  v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER)
  then media_device_register so userspace sees the complete
  graph in /dev/mediaN with the decoder entity tagged.
- daedalus_remove unwinds in reverse: unregister media,
  unregister mc, unregister video, release m2m, unregister
  v4l2, cleanup mdev.
- Error paths added for both new failure points.

Test harness (tools/test_m2m_stream.c, new):
- Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer
  rings on both queues → per-frame QBUF/DQBUF loop →
  concatenates decoded NV12 to output file. Returns 0 only
  if every input frame decoded without error.
- Same codec vocabulary as test_m2m_decode (vp9 | av1 |
  h264 via 5th arg).

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings.

  $ v4l2-ctl --list-devices
  daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
        /dev/video0
        /dev/media3

VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB
NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt
nv12 -f rawvideo`.

VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf
path): byte-for-byte match vs same reference command.

Daemon log shows cookies 1..30 all completing cleanly in
order; lazily-opened AVCodecContext maintains reference
frames across the chardev round-trips.

Clean SIGTERM + rmmod, no oops/WARN.

Roadmap update (docs/roadmap.md):
- 8.7 marked closed with closure-doc reference.
- 8.8 reshaped: perf profiling, QPU dispatch substitution
  via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M).

Per correctness-before-speed:
- Order-correct media controller lifecycle (init → bind
  v4l2_dev → register video → register mc → register
  media; reverse for teardown).
- 4-deep buffer rings on both queues — the scheduler
  actually pipelines multiple in-flight cookies through
  the chardev (not just one-at-a-time as in 8.5/8.6 tests).
- Bit-exact comparison against ffmpeg, not "looks right."
- All resource paths cleaned on every error branch.

Phase 8.8 next: profile daemon hot loops, dlopen
daedalus-fourier from the daemon, swap FFmpeg per-block
calls for daedalus_dispatch_* where the kernel matches,
target 30fps@1080p from 30fps-floor-is-fine memory.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 16:21:58 +00:00

208 lines
7.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 8.7 closure — media controller + multi-frame streaming
**Status:** closed 2026-05-18.
Two pieces:
1. **Media controller binding** — closes the last
v4l2-compliance failure from Phase 8.6 (DECODER_CMD,
which requires `has_media` on stateless decoders) and
unlocks the V4L2 request API for libva-v4l2-request.
2. **Multi-frame streaming verification** — Phase 8.6's
end-to-end tests pushed exactly one keyframe per
invocation. Real-world content has reference frames
(P-frames in VP9, etc.), so the daemon must maintain
AVCodecContext state across many REQ_DECODE calls.
This phase exercises that path with 30+10-frame
streams and checks pixel-bit-exact equivalence to a
reference FFmpeg decode.
Compliance now reaches **49/49 passing.**
## What lands
### Kernel media controller (`kernel/daedalus_v4l2_main.{c,h}`)
- New `struct media_device mdev` field in `daedalus_dev`.
- `media_device_init(&dev->mdev)` before
`v4l2_device_register` so v4l2-core picks up the
mdev binding when it sees `v4l2_dev.mdev = &dev->mdev`.
- After `video_register_device` succeeds:
`v4l2_m2m_register_media_controller(m2m_dev, &dev->vdev,
MEDIA_ENT_F_PROC_VIDEO_DECODER)` wires up the m2m
entities; then `media_device_register(&dev->mdev)` makes
it visible to userspace as `/dev/mediaN`.
- Error paths cleaned for the new failure points
(err_m2m_mc → unregister mc + vdev; v4l2_device_register
failure also cleans mdev).
- `daedalus_remove` reverses the order: unregister media,
unregister mc, unregister video, release m2m, unregister
v4l2, cleanup mdev.
- Phase banner updated from 8.5 → 8.7.
### Test harness (`tools/test_m2m_stream.c`, new)
- Multi-frame V4L2 m2m client that:
1. Parses an IVF file into a `struct ivf_frame[]` (per-
frame size + bitstream blob).
2. Allocates 4 OUTPUT + 4 CAPTURE buffers (ring of
mmap'd OUTPUT mappings; CAPTURE buffers all QBUF'd
up front).
3. For each frame: copy bitstream into OUTPUT[i%N],
QBUF, poll, DQBUF OUTPUT, DQBUF CAPTURE, dump NV12
to output file, recycle CAPTURE via QBUF.
4. Returns 0 only if **all** input frames decoded
without error.
- Accepts `[w] [h] [codec]` overrides; same codec
vocabulary as `test_m2m_decode`.
### tools/Makefile
- Adds `test_m2m_stream` to the build target list.
## Verification
### v4l2-compliance — full pass
```
Total for daedalus_v4l2 device /dev/video0:
49 tests, Succeeded: 49, Failed: 0, Warnings: 0
```
Progression:
- 8.1: 44/48
- 8.5: 44/48 (different fails)
- 8.6: 47/48
- **8.7: 49/49** (media controller added one more pass-eligible
test and it passes)
```
$ v4l2-ctl --list-devices
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
/dev/video0
/dev/media3
```
### 30-frame VP9 320×240 stream, byte-exact
```
$ ffmpeg -f lavfi -i 'testsrc=duration=1.2:size=320x240:rate=25' \
-pix_fmt yuv420p -c:v libvpx-vp9 -y /tmp/vp9_stream.ivf
$ ffmpeg -i /tmp/vp9_stream.ivf -pix_fmt nv12 -f rawvideo \
-y /tmp/vp9_stream_ref.nv12
$ sudo ./tools/test_m2m_stream /tmp/vp9_stream.ivf \
/tmp/vp9_stream_out.nv12
parsed 30 frames, 320x240
OUTPUT reqbufs -> 4
CAPTURE reqbufs -> 4
STREAMON both
decoded 30 / 30 frames to /tmp/vp9_stream_out.nv12
$ cmp /tmp/vp9_stream_out.nv12 /tmp/vp9_stream_ref.nv12
$ echo $?
0 # 3.46 MB across 30 frames, byte-for-byte match
```
1 keyframe + 29 P-frames. The daemon's lazily-opened
AVCodecContext maintains reference frames across the
30 sequential REQ_DECODE calls — pixel-bit-exact equivalence
proves the decoder state is preserved correctly across the
chardev round-trips.
### 10-frame VP9 1080p stream, byte-exact
```
$ sudo ./tools/test_m2m_stream /tmp/vp9_1080_stream.ivf \
/tmp/vp9_1080_stream_out.nv12 1920 1080
parsed 10 frames, 1920x1080
...
decoded 10 / 10 frames
$ cmp /tmp/vp9_1080_stream_out.nv12 /tmp/vp9_1080_stream_ref.nv12
0 # 31 MB across 10 frames, byte-for-byte match
```
Combined with Phase 8.6's single-frame 1080p test: real video
content streams correctly through the dmabuf-driven m2m path
at 1080p.
### Clean teardown
```
$ pkill -TERM daedalus_v4l2_daemon
$ sudo rmmod daedalus_v4l2
$ sudo dmesg | grep -E 'BUG|oops'
(empty)
```
No kernel oops / WARN. 4-deep buffer rings on both queues
mean the scheduler is actually pipelining requests through
the chardev (multiple in-flight cookies) — no deadlocks, no
fd leaks, no buffer-state corruption.
## Design decisions
### Why register the media controller in this order?
`media_device_init` must run before `v4l2_device_register`
because v4l2-core uses `v4l2_dev.mdev` to add the v4l2
entities into the media graph during register.
`v4l2_m2m_register_media_controller` must run *after*
`video_register_device` because it creates the I/O entity
tied to `vdev->num`, which only exists after register.
`media_device_register` must be last so userspace sees the
graph in a consistent state (all entities + links present).
Reverse for tear-down. Caught by reading drivers/cedrus,
hantro, rkvdec.
### Why 4-deep buffer rings?
Phase 8.5 used REQBUFS count=1 — just enough to prove
QBUF/DQBUF works. Real-world workloads pipeline: while the
daemon decodes frame N, the client wants to QBUF N+1. A ring
of 4 OUTPUT and 4 CAPTURE buffers gives the scheduler
slack to maintain forward progress under poll() latency
spikes. Bumping further has diminishing returns because the
test serialises DQBUF after each QBUF anyway — but the
infrastructure is in place for asynchronous pipelining when
Phase 8.8 starts profiling throughput.
### Why test_m2m_stream is its own binary
`test_m2m_decode` is the per-frame smoke test (single QBUF/
DQBUF cycle, exits on first error, ideal for bisecting).
`test_m2m_stream` is the soak test (long-running, exercises
the FFmpeg reference-frame plumbing, ideal for regression).
Keeping them separate keeps each binary's intent and exit
semantics clear.
## What's NOT here (deferred to Phase 8.8)
- **Performance profiling and QPU dispatch.** Currently the
daemon decodes entirely on CPU via FFmpeg. Substituting
per-block `daedalus_dispatch_*` calls into FFmpeg's hot
path for the kernels where our V3D7 implementation matches
is the road to 30fps@1080p (the
`30fps-floor-is-fine` memory's user-facing criterion).
- **HDR / 10-bit.** CAPTURE is NV12M only; no P010M or
YUV420P10LE pack path.
- **Multi-codec stream tests.** Phase 8.7's streaming tests
are VP9 only (multi-frame test files are easy to make).
AV1 + H.264 multi-frame round-trips should be done by
Phase 8.8.
## Phase 8.8 plan
1. Profile daemon end-to-end on hertz: identify FFmpeg hot
functions for VP9 / AV1 / H.264.
2. Map matching `daedalus_dispatch_*` calls (IDCT, MC,
deblock, qpel — from cycles 1, 2, 4, 9).
3. dlopen-binding from the daemon into daedalus-fourier's
per-kernel entry points (sibling repo).
4. Validate bit-exactness after each substitution.
5. Measure fps@1080p; target 30fps stable on VP9 (8.6's
single-frame proved 1080p works; 8.8 makes it real-time).
6. Multi-frame AV1 + H.264 round-trips.
7. P010M / YUV420P10LE for HDR sources.