f04d7000f8
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 → match
18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.
Two related kernel changes:
1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
matches rkvdec/cedrus/hantro. Brings each registered
compound control out of "uninitialised" state via
std_init_compound defaults.
2. Per-request control completion in the decode path —
the real fix for "Timeout when waiting for media request".
vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
on normal decode completion, but the per-request CONTROL
object stays bound. buf_request_complete fires only from
queue-cancel paths (vb2-core line 2284), NOT from normal
buf_done. The driver must call
v4l2_ctrl_request_complete(req, hdl) explicitly from the
completion path.
struct daedalus_inflight gained a `struct media_request
*req` field, captured from src_buf->vb2_buf.req_obj.req
in device_run. daedalus_complete_resp_frame then calls
v4l2_ctrl_request_complete before
v4l2_m2m_buf_done_and_job_finish — triggers
MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
poll.
For non-request flows (test_m2m_stream direct QBUF)
inf->req is NULL; the conditional skips the call.
Both consumer styles work concurrently.
Diagnostic clarification (was Phase 8.13a):
strace traced three S_EXT_CTRLS calls per frame:
1. H264_PROFILE + H264_LEVEL → EINVAL (we don't register)
2. HEVC_PROFILE + HEVC_LEVEL → EINVAL (we don't register)
3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS
The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...
ffmpeg side:
no Timeout, no Decoding error
/tmp/vp9_via_libva.nv12: 18432 bytes
cmp vs reference: byte-for-byte identical.
Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
higher-level consumers.
Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
test_m2m_stream non-request path — both consumer styles work
concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."
Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
202 lines
7.4 KiB
Markdown
202 lines
7.4 KiB
Markdown
# daedalus-v4l2 — roadmap
|
||
|
||
## Sub-phases
|
||
|
||
### Phase 8.1 — kernel module skeleton
|
||
|
||
Out-of-tree kernel module that:
|
||
- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op
|
||
V4L2 stateless dispatch table.
|
||
- Accepts open/close, S_FMT, REQBUFS ioctls without doing
|
||
anything (yet).
|
||
- Builds against `/lib/modules/$(uname -r)/build`.
|
||
|
||
Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices`
|
||
shows the new device.
|
||
|
||
### Phase 8.2 — kernel ↔ daemon chardev bridge
|
||
|
||
- Kernel module creates `/dev/daedalus-v4l2` chardev.
|
||
- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`.
|
||
- Daemon connects, exchanges echo requests.
|
||
|
||
Deliverable: ping-pong test passes.
|
||
|
||
### Phase 8.3 — daemon FFmpeg dlopen + parse
|
||
|
||
- Daemon links `libdaedalus_core.a` from sibling.
|
||
- Daemon dlopens FFmpeg.
|
||
- Test program: feed a VP9 IVF file to FFmpeg parsers,
|
||
extract block-level metadata, validate against expected.
|
||
|
||
Deliverable: daemon can parse a VP9 frame and walk the
|
||
block-level info.
|
||
|
||
### Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18)
|
||
|
||
Shipped as a debugfs-triggered chardev round-trip rather than
|
||
the original V4L2-ioctl plan (which moved to Phase 8.5).
|
||
|
||
- REQ_DECODE / RESP_FRAME wire protocol
|
||
- Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest
|
||
- Verified content-dependent + deterministic; structured
|
||
error handling for bad bitstreams
|
||
|
||
See `docs/phase_8_4_closure.md`.
|
||
|
||
### Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18)
|
||
|
||
Real V4L2 m2m driver — userspace clients drive
|
||
`S_FMT`/`REQBUFS`/`QBUF`/`DQBUF` the standard way. Bitstream
|
||
flows kernel→daemon as inline REQ_DECODE payload; decoded NV12
|
||
pixels flow daemon→kernel as inline RESP_FRAME payload. Works
|
||
end-to-end for small frames (≤ ~64 KiB NV12).
|
||
|
||
Deliverable hit: kernel m2m driver passes most v4l2-compliance
|
||
checks; `tools/test_m2m_decode` produces a NV12 frame that's
|
||
byte-for-byte identical to `ffmpeg -pix_fmt nv12` reference.
|
||
|
||
See `docs/phase_8_5_closure.md`.
|
||
|
||
### Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18)
|
||
|
||
- CAPTURE (and OUTPUT) on `vb2_dma_contig_memops`.
|
||
- New `DAEDALUS_IOC_GET_DMABUF` chardev ioctl — daemon
|
||
mmaps the in-flight CAPTURE buffer, decodes pixels in
|
||
place, sends RESP_FRAME metadata-only.
|
||
- 64 KiB frame-size cap removed. 1080p VP9 + 128×96 AV1
|
||
+ 128×96 H.264 all byte-exact against reference FFmpeg
|
||
decode.
|
||
- V4L2 stateless controls registered for VP9 / AV1 /
|
||
H.264 (11 controls visible to userspace).
|
||
- Colorspace round-trip fix (TRY_FMT preserve, S_FMT
|
||
OUTPUT→CAPTURE propagation).
|
||
- Cookie unified across V4L2 + debugfs paths.
|
||
- v4l2-compliance: 47/48 (only DECODER_CMD remains,
|
||
needs media controller — moved to 8.7).
|
||
|
||
See `docs/phase_8_6_closure.md`.
|
||
|
||
### Phase 8.7 — media controller + multi-frame streaming (closed 2026-05-18)
|
||
|
||
- Media controller bound via
|
||
`v4l2_m2m_register_media_controller` +
|
||
`media_device_register`; `/dev/mediaN` published.
|
||
- `tools/test_m2m_stream` parses IVF and pushes frames
|
||
sequentially through a 4-deep buffer ring; daemon
|
||
AVCodecContext preserves reference frames across calls.
|
||
- 30-frame VP9 320×240 stream byte-exact (3.46 MB across
|
||
1 keyframe + 29 P-frames).
|
||
- 10-frame VP9 1080p stream byte-exact (31 MB across
|
||
10 frames at full HD).
|
||
- v4l2-compliance: **49/49 passing** (was 47/48 in 8.6;
|
||
media controller added a 49th test and closed DECODER_CMD).
|
||
|
||
See `docs/phase_8_7_closure.md`.
|
||
|
||
### Phase 8.8 — throughput baseline + multi-codec streams + HDR (closed 2026-05-18)
|
||
|
||
- Per-frame µs timing in test_m2m_stream; multi-codec
|
||
baseline:
|
||
- VP9 1080p: 83.1 fps
|
||
- AV1 1080p: 65.0 fps
|
||
- H.264 1080p: 88.3 fps
|
||
All byte-exact vs ffmpeg reference; all 2-3× over the
|
||
30fps-floor-is-fine criterion.
|
||
- QPU dispatch substitution explicitly **not needed** — measurement
|
||
shows the FFmpeg software path already clears the target on
|
||
Pi 5's Cortex-A76. Substitution moves to the
|
||
optimisation roadmap.
|
||
- Annex-B H.264 access-unit splitter in the test harness
|
||
(NALs grouped by VCL boundary).
|
||
- HDR / 10-bit: V4L2_PIX_FMT_P010 added as CAPTURE format;
|
||
daemon pack_p010_to_plane handles YUV420P10LE → P010
|
||
with MSB-aligned 10-bit data. 10-bit 1080p byte-exact
|
||
at 48.8 fps.
|
||
|
||
See `docs/phase_8_8_closure.md`.
|
||
|
||
### Phase 8.9 — long-form stress + multi-codec HDR + libva scoping (closed 2026-05-18)
|
||
|
||
- libva-v4l2-request investigation: upstream supports only
|
||
MPEG-2 / H.264 / HEVC (no VP9 or AV1) and expects the
|
||
older `V4L2_PIX_FMT_H264_SLICE_RAW` fourcc. Real
|
||
integration requires adding VP9 + AV1 support to the
|
||
library itself — pushed to Phase 8.10.
|
||
- Long-form stress: 1800-frame VP9 1080p (60s @ 30fps),
|
||
120.9 fps sustained, p99 17.3 ms/frame, no errors, no
|
||
leaks, daemon alive after 3620 cookies across two runs.
|
||
- HDR multi-codec byte-exact: VP9-10bit (48.8 fps,
|
||
from 8.8), AV1-10bit (17.1 fps), H.264-10bit (26.9 fps).
|
||
10-bit is intrinsically more expensive — AV1 falls
|
||
short of 30fps but acceptable for the user-facing
|
||
goal (mostly SDR YouTube).
|
||
|
||
See `docs/phase_8_9_closure.md`.
|
||
|
||
### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18)
|
||
|
||
- Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2-
|
||
request-fourier (gitea); discovered the existing fourier
|
||
fork already had VP9/AV1/HEVC support on Rockchip.
|
||
- Added daedalus_v4l2 to known_decoder_drivers + meson
|
||
build option.
|
||
- Added V4L2_PIX_FMT_NV12 single-plane + request API
|
||
media ops + stateless control registration to our
|
||
kernel.
|
||
- vainfo enumerates 7 VAProfile entries via our driver.
|
||
|
||
See `docs/phase_8_10_11_closure.md`.
|
||
|
||
### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18)
|
||
|
||
- vb2_queue supports_requests; vb2_ops buf_out_validate +
|
||
buf_request_complete; v4l2_ctrl_new_custom for stateless
|
||
ctrl registration.
|
||
- Daemon decoded byte-exact VP9 keyframe via the full
|
||
libva path (FNV-1a 0x1eb34bfe matches standalone).
|
||
- ffmpeg still timed out waiting for media_request
|
||
completion (request bind state).
|
||
|
||
See `docs/phase_8_12_closure.md`.
|
||
|
||
### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18)
|
||
|
||
- Traced the misleading "Unable to set control(s):
|
||
Invalid argument" — actually libva probing H264/HEVC
|
||
PROFILE/LEVEL we don't expose (harmless); the real VP9
|
||
stateless control SET succeeds.
|
||
- Diagnosed the "Timeout when waiting for media request"
|
||
root cause: per-request control object stays bound
|
||
because vb2's normal buf_done path doesn't fire
|
||
buf_request_complete (only queue-cancel does).
|
||
- Fix: capture media_request from src_buf in inflight
|
||
entry, call v4l2_ctrl_request_complete from the
|
||
completion path before buf_done_and_job_finish.
|
||
- **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi →
|
||
libva-v4l2-request-fourier → /dev/video0 →
|
||
daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte
|
||
identical to ffmpeg software reference.
|
||
- **Project consumer target hit**: V4L2 stateless node
|
||
consumed by a real VAAPI client.
|
||
|
||
See `docs/phase_8_13_closure.md`.
|
||
|
||
### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers
|
||
|
||
1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9
|
||
stress test) — P-frame references across requests.
|
||
2. AV1 + H.264 single-frame via libva.
|
||
3. mpv --hwdec=vaapi end-to-end.
|
||
4. Firefox / WebRTC if motivated.
|
||
|
||
Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships
|
||
when there's a concrete user-facing need.
|
||
|
||
## Effort estimate
|
||
|
||
Each phase: ~1 week of focused work (~40 hours).
|
||
Total: 7 weeks for v1.
|
||
|
||
Could be split across multiple sessions / contributors.
|