Files
daedalus-v4l2/docs/roadmap.md
T
marfrit f04d7000f8 Phase 8.13: byte-exact end-to-end via libva (consumer target hit)
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.

  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12
  cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12  → match

18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.

Two related kernel changes:

1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
   matches rkvdec/cedrus/hantro. Brings each registered
   compound control out of "uninitialised" state via
   std_init_compound defaults.

2. Per-request control completion in the decode path —
   the real fix for "Timeout when waiting for media request".
   vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
   on normal decode completion, but the per-request CONTROL
   object stays bound. buf_request_complete fires only from
   queue-cancel paths (vb2-core line 2284), NOT from normal
   buf_done. The driver must call
   v4l2_ctrl_request_complete(req, hdl) explicitly from the
   completion path.

   struct daedalus_inflight gained a `struct media_request
   *req` field, captured from src_buf->vb2_buf.req_obj.req
   in device_run. daedalus_complete_resp_frame then calls
   v4l2_ctrl_request_complete before
   v4l2_m2m_buf_done_and_job_finish — triggers
   MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
   poll.

   For non-request flows (test_m2m_stream direct QBUF)
   inf->req is NULL; the conditional skips the call.
   Both consumer styles work concurrently.

Diagnostic clarification (was Phase 8.13a):

strace traced three S_EXT_CTRLS calls per frame:
  1. H264_PROFILE + H264_LEVEL → EINVAL  (we don't register)
  2. HEVC_PROFILE + HEVC_LEVEL → EINVAL  (we don't register)
  3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS

The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  daemon log:
    REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
    decoder: opened vp9 context
    decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...

  ffmpeg side:
    no Timeout, no Decoding error
    /tmp/vp9_via_libva.nv12: 18432 bytes

  cmp vs reference: byte-for-byte identical.

Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
  higher-level consumers.

Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
  rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
  test_m2m_stream non-request path — both consumer styles work
  concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."

Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:14:34 +00:00

202 lines
7.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# daedalus-v4l2 — roadmap
## Sub-phases
### Phase 8.1 — kernel module skeleton
Out-of-tree kernel module that:
- Registers `/dev/videoNN` with `VFL_TYPE_VIDEO` + a no-op
V4L2 stateless dispatch table.
- Accepts open/close, S_FMT, REQBUFS ioctls without doing
anything (yet).
- Builds against `/lib/modules/$(uname -r)/build`.
Deliverable: `modprobe daedalus_v4l2` works, `v4l2-ctl --list-devices`
shows the new device.
### Phase 8.2 — kernel ↔ daemon chardev bridge
- Kernel module creates `/dev/daedalus-v4l2` chardev.
- Defines a simple req/resp protocol in `include/daedalus_v4l2_proto.h`.
- Daemon connects, exchanges echo requests.
Deliverable: ping-pong test passes.
### Phase 8.3 — daemon FFmpeg dlopen + parse
- Daemon links `libdaedalus_core.a` from sibling.
- Daemon dlopens FFmpeg.
- Test program: feed a VP9 IVF file to FFmpeg parsers,
extract block-level metadata, validate against expected.
Deliverable: daemon can parse a VP9 frame and walk the
block-level info.
### Phase 8.4 — daemon ↔ kernel decode round-trip (closed 2026-05-18)
Shipped as a debugfs-triggered chardev round-trip rather than
the original V4L2-ioctl plan (which moved to Phase 8.5).
- REQ_DECODE / RESP_FRAME wire protocol
- Daemon decodes VP9 via FFmpeg dlopen, returns FNV-1a digest
- Verified content-dependent + deterministic; structured
error handling for bad bitstreams
See `docs/phase_8_4_closure.md`.
### Phase 8.5 — full V4L2 m2m driver (closed 2026-05-18)
Real V4L2 m2m driver — userspace clients drive
`S_FMT`/`REQBUFS`/`QBUF`/`DQBUF` the standard way. Bitstream
flows kernel→daemon as inline REQ_DECODE payload; decoded NV12
pixels flow daemon→kernel as inline RESP_FRAME payload. Works
end-to-end for small frames (≤ ~64 KiB NV12).
Deliverable hit: kernel m2m driver passes most v4l2-compliance
checks; `tools/test_m2m_decode` produces a NV12 frame that's
byte-for-byte identical to `ffmpeg -pix_fmt nv12` reference.
See `docs/phase_8_5_closure.md`.
### Phase 8.6 — dmabuf + AV1 + H.264 + stateless controls (closed 2026-05-18)
- CAPTURE (and OUTPUT) on `vb2_dma_contig_memops`.
- New `DAEDALUS_IOC_GET_DMABUF` chardev ioctl — daemon
mmaps the in-flight CAPTURE buffer, decodes pixels in
place, sends RESP_FRAME metadata-only.
- 64 KiB frame-size cap removed. 1080p VP9 + 128×96 AV1
+ 128×96 H.264 all byte-exact against reference FFmpeg
decode.
- V4L2 stateless controls registered for VP9 / AV1 /
H.264 (11 controls visible to userspace).
- Colorspace round-trip fix (TRY_FMT preserve, S_FMT
OUTPUT→CAPTURE propagation).
- Cookie unified across V4L2 + debugfs paths.
- v4l2-compliance: 47/48 (only DECODER_CMD remains,
needs media controller — moved to 8.7).
See `docs/phase_8_6_closure.md`.
### Phase 8.7 — media controller + multi-frame streaming (closed 2026-05-18)
- Media controller bound via
`v4l2_m2m_register_media_controller` +
`media_device_register`; `/dev/mediaN` published.
- `tools/test_m2m_stream` parses IVF and pushes frames
sequentially through a 4-deep buffer ring; daemon
AVCodecContext preserves reference frames across calls.
- 30-frame VP9 320×240 stream byte-exact (3.46 MB across
1 keyframe + 29 P-frames).
- 10-frame VP9 1080p stream byte-exact (31 MB across
10 frames at full HD).
- v4l2-compliance: **49/49 passing** (was 47/48 in 8.6;
media controller added a 49th test and closed DECODER_CMD).
See `docs/phase_8_7_closure.md`.
### Phase 8.8 — throughput baseline + multi-codec streams + HDR (closed 2026-05-18)
- Per-frame µs timing in test_m2m_stream; multi-codec
baseline:
- VP9 1080p: 83.1 fps
- AV1 1080p: 65.0 fps
- H.264 1080p: 88.3 fps
All byte-exact vs ffmpeg reference; all 2-3× over the
30fps-floor-is-fine criterion.
- QPU dispatch substitution explicitly **not needed** — measurement
shows the FFmpeg software path already clears the target on
Pi 5's Cortex-A76. Substitution moves to the
optimisation roadmap.
- Annex-B H.264 access-unit splitter in the test harness
(NALs grouped by VCL boundary).
- HDR / 10-bit: V4L2_PIX_FMT_P010 added as CAPTURE format;
daemon pack_p010_to_plane handles YUV420P10LE → P010
with MSB-aligned 10-bit data. 10-bit 1080p byte-exact
at 48.8 fps.
See `docs/phase_8_8_closure.md`.
### Phase 8.9 — long-form stress + multi-codec HDR + libva scoping (closed 2026-05-18)
- libva-v4l2-request investigation: upstream supports only
MPEG-2 / H.264 / HEVC (no VP9 or AV1) and expects the
older `V4L2_PIX_FMT_H264_SLICE_RAW` fourcc. Real
integration requires adding VP9 + AV1 support to the
library itself — pushed to Phase 8.10.
- Long-form stress: 1800-frame VP9 1080p (60s @ 30fps),
120.9 fps sustained, p99 17.3 ms/frame, no errors, no
leaks, daemon alive after 3620 cookies across two runs.
- HDR multi-codec byte-exact: VP9-10bit (48.8 fps,
from 8.8), AV1-10bit (17.1 fps), H.264-10bit (26.9 fps).
10-bit is intrinsically more expensive — AV1 falls
short of 30fps but acceptable for the user-facing
goal (mostly SDR YouTube).
See `docs/phase_8_9_closure.md`.
### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18)
- Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2-
request-fourier (gitea); discovered the existing fourier
fork already had VP9/AV1/HEVC support on Rockchip.
- Added daedalus_v4l2 to known_decoder_drivers + meson
build option.
- Added V4L2_PIX_FMT_NV12 single-plane + request API
media ops + stateless control registration to our
kernel.
- vainfo enumerates 7 VAProfile entries via our driver.
See `docs/phase_8_10_11_closure.md`.
### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18)
- vb2_queue supports_requests; vb2_ops buf_out_validate +
buf_request_complete; v4l2_ctrl_new_custom for stateless
ctrl registration.
- Daemon decoded byte-exact VP9 keyframe via the full
libva path (FNV-1a 0x1eb34bfe matches standalone).
- ffmpeg still timed out waiting for media_request
completion (request bind state).
See `docs/phase_8_12_closure.md`.
### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18)
- Traced the misleading "Unable to set control(s):
Invalid argument" — actually libva probing H264/HEVC
PROFILE/LEVEL we don't expose (harmless); the real VP9
stateless control SET succeeds.
- Diagnosed the "Timeout when waiting for media request"
root cause: per-request control object stays bound
because vb2's normal buf_done path doesn't fire
buf_request_complete (only queue-cancel does).
- Fix: capture media_request from src_buf in inflight
entry, call v4l2_ctrl_request_complete from the
completion path before buf_done_and_job_finish.
- **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi →
libva-v4l2-request-fourier → /dev/video0 →
daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte
identical to ffmpeg software reference.
- **Project consumer target hit**: V4L2 stateless node
consumed by a real VAAPI client.
See `docs/phase_8_13_closure.md`.
### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers
1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9
stress test) — P-frame references across requests.
2. AV1 + H.264 single-frame via libva.
3. mpv --hwdec=vaapi end-to-end.
4. Firefox / WebRTC if motivated.
Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships
when there's a concrete user-facing need.
## Effort estimate
Each phase: ~1 week of focused work (~40 hours).
Total: 7 weeks for v1.
Could be split across multiple sessions / contributors.