Files
daedalus-v4l2/docs/phase_8_13_closure.md
T
marfrit f04d7000f8 Phase 8.13: byte-exact end-to-end via libva (consumer target hit)
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.

  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12
  cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12  → match

18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.

Two related kernel changes:

1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
   matches rkvdec/cedrus/hantro. Brings each registered
   compound control out of "uninitialised" state via
   std_init_compound defaults.

2. Per-request control completion in the decode path —
   the real fix for "Timeout when waiting for media request".
   vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
   on normal decode completion, but the per-request CONTROL
   object stays bound. buf_request_complete fires only from
   queue-cancel paths (vb2-core line 2284), NOT from normal
   buf_done. The driver must call
   v4l2_ctrl_request_complete(req, hdl) explicitly from the
   completion path.

   struct daedalus_inflight gained a `struct media_request
   *req` field, captured from src_buf->vb2_buf.req_obj.req
   in device_run. daedalus_complete_resp_frame then calls
   v4l2_ctrl_request_complete before
   v4l2_m2m_buf_done_and_job_finish — triggers
   MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
   poll.

   For non-request flows (test_m2m_stream direct QBUF)
   inf->req is NULL; the conditional skips the call.
   Both consumer styles work concurrently.

Diagnostic clarification (was Phase 8.13a):

strace traced three S_EXT_CTRLS calls per frame:
  1. H264_PROFILE + H264_LEVEL → EINVAL  (we don't register)
  2. HEVC_PROFILE + HEVC_LEVEL → EINVAL  (we don't register)
  3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS

The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  daemon log:
    REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
    decoder: opened vp9 context
    decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...

  ffmpeg side:
    no Timeout, no Decoding error
    /tmp/vp9_via_libva.nv12: 18432 bytes

  cmp vs reference: byte-for-byte identical.

Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
  higher-level consumers.

Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
  rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
  test_m2m_stream non-request path — both consumer styles work
  concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."

Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 18:14:34 +00:00

201 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 8.13 closure — byte-exact end-to-end via libva
**Status:** closed 2026-05-18.
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
```
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match
```
18432 bytes, byte-for-byte identical to plain
`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same
VP9 keyframe. The `project_consumer_target` memory's deliverable
shape — "V4L2 stateless node consumed by a real VAAPI client" —
is achieved.
## What lands
Two related kernel changes that unstick the libva request
completion handshake:
### 1. Stateless control handler initialisation
`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration —
matches rkvdec/cedrus/hantro. Brings each registered compound
control out of "uninitialised" state via the std_init_compound
defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`).
### 2. Per-request control completion in the decode path
The actual root cause of "Timeout when waiting for media
request":
- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on
normal decode completion.
- But the per-request CONTROL object stays bound until
`v4l2_ctrl_request_complete` runs.
- The vb2 `buf_request_complete` op fires only from queue-cancel
paths (vb2-core line 2284), NOT from normal buf_done.
- The driver must call `v4l2_ctrl_request_complete(req, hdl)`
explicitly from its decode-completion path.
Fix (in `kernel/daedalus_v4l2_main.c`):
```c
struct daedalus_inflight {
...
struct media_request *req; /* captured from src_buf */
};
static void daedalus_device_run(void *priv) {
...
inf->req = src_buf->vb2_buf.req_obj.req;
...
}
void daedalus_complete_resp_frame(...) {
...
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
v4l2_m2m_buf_done_and_job_finish(...);
}
```
For non-request flows (test_m2m_stream's direct QBUF)
`inf->req` is NULL; the conditional skips the
`v4l2_ctrl_request_complete` call. Both consumer styles
work concurrently.
### 3. Diagnostic improvements
- libva-v4l2-request-fourier `src/v4l2.c`: better error
logging in `v4l2_set_controls` (logs `error_idx`, failing
control id, size). Made the diagnosis above tractable.
## Verification
### End-to-end via ffmpeg + libva, byte-exact
```
$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
(no timeout, no decode error)
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
$ ls -la /tmp/vp9_via_libva.nv12
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12
$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
-y /tmp/vp9_ref_for_libva.nv12
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
$ echo $?
0
```
Byte-for-byte match. The two `Unable to set control(s):
EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL
integer controls during config negotiation — we don't
register those (since we expose VP9/AV1/H264 stateless), libva
gets EINVAL, logs it, and moves on. Functional flow is
unaffected.
## Design analysis
### Why was Phase 8.12 close but not complete
8.12 wired all the request API hooks (supports_requests,
buf_out_validate, buf_request_complete) and the daemon-side
decode worked byte-exact — but the per-request control object
stayed bound forever because `buf_request_complete` only fires
from queue-cancel paths in vb2-core, not from normal buf_done.
Result: request never transitioned to COMPLETE, libva poll
timed out.
Phase 8.13 closes that loop by capturing the media_request from
the OUTPUT vb2_buffer's req_obj at device_run time and calling
`v4l2_ctrl_request_complete` explicitly when the decode
finishes (chardev RESP_FRAME path). Mirrors what rkvdec does
from its IRQ handler and cedrus from its device_run completion.
### Why the EINVAL noise was misleading
Earlier phases (8.108.12) kept seeing "Unable to set
control(s): Invalid argument" and assumed it pointed at our
stateless control registration. strace revealed three
separate S_EXT_CTRLS calls per frame:
| # | controls | result | meaning |
|---|----------|--------|---------|
| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it |
| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it |
| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls |
Calls 1 and 2 are harmless: libva detects we don't support
H264/HEVC integer probes and falls back to the stateless
controls it does have. Call 3 (the actual VP9 stateless
controls) succeeded all along. Only the completion handshake
was broken.
Phase 8.13's added `error_idx` logging in v4l2.c
(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the
distinction visible.
### Why one fix unblocked both 8.13 and 8.14
The original plan split Phase 8.13 ("trace the EINVAL") from
Phase 8.14 ("call request_complete from the right place").
Once strace clarified that the EINVAL was probe noise, the
real fix was just the request_complete call from the decode
path — a 10-line change. Doing both in one shot avoided a
phase boundary that wouldn't have shipped anything additional.
## What's NOT here (Phase 8.14+ scope)
- **Multi-frame stream via libva.** Verified single keyframe;
P-frame reference handling across requests untested. Likely
works (the daemon's AVCodecContext is persistent across
REQ_DECODE calls — already proven via test_m2m_stream).
- **AV1 + H.264 via libva.** Different stateless control
sets; needs the same control-payload validation path. May
need similar `v4l2_ctrl_request_complete` adjustments per
codec.
- **mpv + Firefox end-to-end.** The lower-level harness
(ffmpeg vaapi) works; higher-level consumers should follow
but each has its own VAAPI quirks.
- **The two harmless EINVALs from H264/HEVC profile probes.**
Could be suppressed by registering those integer controls
too (always rejecting writes) but that's a polish item.
## Phase 8.14 plan
1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from
Phase 8.9 stress test).
2. AV1 + H.264 single-frame via libva (likely needs codec-
specific tweaks).
3. Document any remaining libva-side quirks for higher-level
consumers (mpv, Firefox).