5965805d86
Two pieces — both shipped:
1. Media controller binding closes the last v4l2-compliance
failure from 8.6 (DECODER_CMD, which requires has_media on
stateless decoders) and unlocks the V4L2 request API for
libva-v4l2-request.
2. Multi-frame streaming test exercises the daemon's
AVCodecContext state preservation across many REQ_DECODE
calls — Phase 8.6's tests pushed exactly one keyframe per
invocation; real content has P-frame references.
Compliance now reaches **49/49 passing.**
Kernel (kernel/daedalus_v4l2_main.{c,h}):
- Added `struct media_device mdev` to daedalus_dev.
- media_device_init(&mdev) BEFORE v4l2_device_register so
v4l2-core sees v4l2_dev.mdev = &mdev and binds the m2m
entities into the graph during register.
- After video_register_device:
v4l2_m2m_register_media_controller(..., MEDIA_ENT_F_PROC_VIDEO_DECODER)
then media_device_register so userspace sees the complete
graph in /dev/mediaN with the decoder entity tagged.
- daedalus_remove unwinds in reverse: unregister media,
unregister mc, unregister video, release m2m, unregister
v4l2, cleanup mdev.
- Error paths added for both new failure points.
Test harness (tools/test_m2m_stream.c, new):
- Multi-frame V4L2 m2m client: parses IVF → 4-deep buffer
rings on both queues → per-frame QBUF/DQBUF loop →
concatenates decoded NV12 to output file. Returns 0 only
if every input frame decoded without error.
- Same codec vocabulary as test_m2m_decode (vp9 | av1 |
h264 via 5th arg).
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
v4l2-compliance: 49 tests, 49 passed, 0 failed, 0 warnings.
$ v4l2-ctl --list-devices
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
/dev/video0
/dev/media3
VP9 320×240 30 frames (1 keyframe + 29 P-frames, 3.46 MB
NV12): byte-for-byte match vs `ffmpeg -i in.ivf -pix_fmt
nv12 -f rawvideo`.
VP9 1920×1080 10 frames (31 MB NV12 through the dmabuf
path): byte-for-byte match vs same reference command.
Daemon log shows cookies 1..30 all completing cleanly in
order; lazily-opened AVCodecContext maintains reference
frames across the chardev round-trips.
Clean SIGTERM + rmmod, no oops/WARN.
Roadmap update (docs/roadmap.md):
- 8.7 marked closed with closure-doc reference.
- 8.8 reshaped: perf profiling, QPU dispatch substitution
via daedalus-fourier, multi-frame AV1/H.264, HDR (P010M).
Per correctness-before-speed:
- Order-correct media controller lifecycle (init → bind
v4l2_dev → register video → register mc → register
media; reverse for teardown).
- 4-deep buffer rings on both queues — the scheduler
actually pipelines multiple in-flight cookies through
the chardev (not just one-at-a-time as in 8.5/8.6 tests).
- Bit-exact comparison against ffmpeg, not "looks right."
- All resource paths cleaned on every error branch.
Phase 8.8 next: profile daemon hot loops, dlopen
daedalus-fourier from the daemon, swap FFmpeg per-block
calls for daedalus_dispatch_* where the kernel matches,
target 30fps@1080p from 30fps-floor-is-fine memory.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
208 lines
7.0 KiB
Markdown
208 lines
7.0 KiB
Markdown
# Phase 8.7 closure — media controller + multi-frame streaming
|
||
|
||
**Status:** closed 2026-05-18.
|
||
|
||
Two pieces:
|
||
|
||
1. **Media controller binding** — closes the last
|
||
v4l2-compliance failure from Phase 8.6 (DECODER_CMD,
|
||
which requires `has_media` on stateless decoders) and
|
||
unlocks the V4L2 request API for libva-v4l2-request.
|
||
2. **Multi-frame streaming verification** — Phase 8.6's
|
||
end-to-end tests pushed exactly one keyframe per
|
||
invocation. Real-world content has reference frames
|
||
(P-frames in VP9, etc.), so the daemon must maintain
|
||
AVCodecContext state across many REQ_DECODE calls.
|
||
This phase exercises that path with 30+10-frame
|
||
streams and checks pixel-bit-exact equivalence to a
|
||
reference FFmpeg decode.
|
||
|
||
Compliance now reaches **49/49 passing.**
|
||
|
||
## What lands
|
||
|
||
### Kernel media controller (`kernel/daedalus_v4l2_main.{c,h}`)
|
||
- New `struct media_device mdev` field in `daedalus_dev`.
|
||
- `media_device_init(&dev->mdev)` before
|
||
`v4l2_device_register` so v4l2-core picks up the
|
||
mdev binding when it sees `v4l2_dev.mdev = &dev->mdev`.
|
||
- After `video_register_device` succeeds:
|
||
`v4l2_m2m_register_media_controller(m2m_dev, &dev->vdev,
|
||
MEDIA_ENT_F_PROC_VIDEO_DECODER)` wires up the m2m
|
||
entities; then `media_device_register(&dev->mdev)` makes
|
||
it visible to userspace as `/dev/mediaN`.
|
||
- Error paths cleaned for the new failure points
|
||
(err_m2m_mc → unregister mc + vdev; v4l2_device_register
|
||
failure also cleans mdev).
|
||
- `daedalus_remove` reverses the order: unregister media,
|
||
unregister mc, unregister video, release m2m, unregister
|
||
v4l2, cleanup mdev.
|
||
- Phase banner updated from 8.5 → 8.7.
|
||
|
||
### Test harness (`tools/test_m2m_stream.c`, new)
|
||
- Multi-frame V4L2 m2m client that:
|
||
1. Parses an IVF file into a `struct ivf_frame[]` (per-
|
||
frame size + bitstream blob).
|
||
2. Allocates 4 OUTPUT + 4 CAPTURE buffers (ring of
|
||
mmap'd OUTPUT mappings; CAPTURE buffers all QBUF'd
|
||
up front).
|
||
3. For each frame: copy bitstream into OUTPUT[i%N],
|
||
QBUF, poll, DQBUF OUTPUT, DQBUF CAPTURE, dump NV12
|
||
to output file, recycle CAPTURE via QBUF.
|
||
4. Returns 0 only if **all** input frames decoded
|
||
without error.
|
||
- Accepts `[w] [h] [codec]` overrides; same codec
|
||
vocabulary as `test_m2m_decode`.
|
||
|
||
### tools/Makefile
|
||
- Adds `test_m2m_stream` to the build target list.
|
||
|
||
## Verification
|
||
|
||
### v4l2-compliance — full pass
|
||
|
||
```
|
||
Total for daedalus_v4l2 device /dev/video0:
|
||
49 tests, Succeeded: 49, Failed: 0, Warnings: 0
|
||
```
|
||
|
||
Progression:
|
||
- 8.1: 44/48
|
||
- 8.5: 44/48 (different fails)
|
||
- 8.6: 47/48
|
||
- **8.7: 49/49** (media controller added one more pass-eligible
|
||
test and it passes)
|
||
|
||
```
|
||
$ v4l2-ctl --list-devices
|
||
daedalus-fourier V3D7+NEON (platform:daedalus_v4l2):
|
||
/dev/video0
|
||
/dev/media3
|
||
```
|
||
|
||
### 30-frame VP9 320×240 stream, byte-exact
|
||
|
||
```
|
||
$ ffmpeg -f lavfi -i 'testsrc=duration=1.2:size=320x240:rate=25' \
|
||
-pix_fmt yuv420p -c:v libvpx-vp9 -y /tmp/vp9_stream.ivf
|
||
$ ffmpeg -i /tmp/vp9_stream.ivf -pix_fmt nv12 -f rawvideo \
|
||
-y /tmp/vp9_stream_ref.nv12
|
||
|
||
$ sudo ./tools/test_m2m_stream /tmp/vp9_stream.ivf \
|
||
/tmp/vp9_stream_out.nv12
|
||
parsed 30 frames, 320x240
|
||
OUTPUT reqbufs -> 4
|
||
CAPTURE reqbufs -> 4
|
||
STREAMON both
|
||
decoded 30 / 30 frames to /tmp/vp9_stream_out.nv12
|
||
|
||
$ cmp /tmp/vp9_stream_out.nv12 /tmp/vp9_stream_ref.nv12
|
||
$ echo $?
|
||
0 # 3.46 MB across 30 frames, byte-for-byte match
|
||
```
|
||
|
||
1 keyframe + 29 P-frames. The daemon's lazily-opened
|
||
AVCodecContext maintains reference frames across the
|
||
30 sequential REQ_DECODE calls — pixel-bit-exact equivalence
|
||
proves the decoder state is preserved correctly across the
|
||
chardev round-trips.
|
||
|
||
### 10-frame VP9 1080p stream, byte-exact
|
||
|
||
```
|
||
$ sudo ./tools/test_m2m_stream /tmp/vp9_1080_stream.ivf \
|
||
/tmp/vp9_1080_stream_out.nv12 1920 1080
|
||
parsed 10 frames, 1920x1080
|
||
...
|
||
decoded 10 / 10 frames
|
||
|
||
$ cmp /tmp/vp9_1080_stream_out.nv12 /tmp/vp9_1080_stream_ref.nv12
|
||
0 # 31 MB across 10 frames, byte-for-byte match
|
||
```
|
||
|
||
Combined with Phase 8.6's single-frame 1080p test: real video
|
||
content streams correctly through the dmabuf-driven m2m path
|
||
at 1080p.
|
||
|
||
### Clean teardown
|
||
|
||
```
|
||
$ pkill -TERM daedalus_v4l2_daemon
|
||
$ sudo rmmod daedalus_v4l2
|
||
$ sudo dmesg | grep -E 'BUG|oops'
|
||
(empty)
|
||
```
|
||
|
||
No kernel oops / WARN. 4-deep buffer rings on both queues
|
||
mean the scheduler is actually pipelining requests through
|
||
the chardev (multiple in-flight cookies) — no deadlocks, no
|
||
fd leaks, no buffer-state corruption.
|
||
|
||
## Design decisions
|
||
|
||
### Why register the media controller in this order?
|
||
|
||
`media_device_init` must run before `v4l2_device_register`
|
||
because v4l2-core uses `v4l2_dev.mdev` to add the v4l2
|
||
entities into the media graph during register.
|
||
|
||
`v4l2_m2m_register_media_controller` must run *after*
|
||
`video_register_device` because it creates the I/O entity
|
||
tied to `vdev->num`, which only exists after register.
|
||
|
||
`media_device_register` must be last so userspace sees the
|
||
graph in a consistent state (all entities + links present).
|
||
|
||
Reverse for tear-down. Caught by reading drivers/cedrus,
|
||
hantro, rkvdec.
|
||
|
||
### Why 4-deep buffer rings?
|
||
|
||
Phase 8.5 used REQBUFS count=1 — just enough to prove
|
||
QBUF/DQBUF works. Real-world workloads pipeline: while the
|
||
daemon decodes frame N, the client wants to QBUF N+1. A ring
|
||
of 4 OUTPUT and 4 CAPTURE buffers gives the scheduler
|
||
slack to maintain forward progress under poll() latency
|
||
spikes. Bumping further has diminishing returns because the
|
||
test serialises DQBUF after each QBUF anyway — but the
|
||
infrastructure is in place for asynchronous pipelining when
|
||
Phase 8.8 starts profiling throughput.
|
||
|
||
### Why test_m2m_stream is its own binary
|
||
|
||
`test_m2m_decode` is the per-frame smoke test (single QBUF/
|
||
DQBUF cycle, exits on first error, ideal for bisecting).
|
||
`test_m2m_stream` is the soak test (long-running, exercises
|
||
the FFmpeg reference-frame plumbing, ideal for regression).
|
||
Keeping them separate keeps each binary's intent and exit
|
||
semantics clear.
|
||
|
||
## What's NOT here (deferred to Phase 8.8)
|
||
|
||
- **Performance profiling and QPU dispatch.** Currently the
|
||
daemon decodes entirely on CPU via FFmpeg. Substituting
|
||
per-block `daedalus_dispatch_*` calls into FFmpeg's hot
|
||
path for the kernels where our V3D7 implementation matches
|
||
is the road to 30fps@1080p (the
|
||
`30fps-floor-is-fine` memory's user-facing criterion).
|
||
- **HDR / 10-bit.** CAPTURE is NV12M only; no P010M or
|
||
YUV420P10LE pack path.
|
||
- **Multi-codec stream tests.** Phase 8.7's streaming tests
|
||
are VP9 only (multi-frame test files are easy to make).
|
||
AV1 + H.264 multi-frame round-trips should be done by
|
||
Phase 8.8.
|
||
|
||
## Phase 8.8 plan
|
||
|
||
1. Profile daemon end-to-end on hertz: identify FFmpeg hot
|
||
functions for VP9 / AV1 / H.264.
|
||
2. Map matching `daedalus_dispatch_*` calls (IDCT, MC,
|
||
deblock, qpel — from cycles 1, 2, 4, 9).
|
||
3. dlopen-binding from the daemon into daedalus-fourier's
|
||
per-kernel entry points (sibling repo).
|
||
4. Validate bit-exactness after each substitution.
|
||
5. Measure fps@1080p; target 30fps stable on VP9 (8.6's
|
||
single-frame proved 1080p works; 8.8 makes it real-time).
|
||
6. Multi-frame AV1 + H.264 round-trips.
|
||
7. P010M / YUV420P10LE for HDR sources.
|