Phase 8.13: byte-exact end-to-end via libva (consumer target hit)

The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.

  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12
  cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12  → match

18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.

Two related kernel changes:

1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
   matches rkvdec/cedrus/hantro. Brings each registered
   compound control out of "uninitialised" state via
   std_init_compound defaults.

2. Per-request control completion in the decode path —
   the real fix for "Timeout when waiting for media request".
   vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
   on normal decode completion, but the per-request CONTROL
   object stays bound. buf_request_complete fires only from
   queue-cancel paths (vb2-core line 2284), NOT from normal
   buf_done. The driver must call
   v4l2_ctrl_request_complete(req, hdl) explicitly from the
   completion path.

   struct daedalus_inflight gained a `struct media_request
   *req` field, captured from src_buf->vb2_buf.req_obj.req
   in device_run. daedalus_complete_resp_frame then calls
   v4l2_ctrl_request_complete before
   v4l2_m2m_buf_done_and_job_finish — triggers
   MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
   poll.

   For non-request flows (test_m2m_stream direct QBUF)
   inf->req is NULL; the conditional skips the call.
   Both consumer styles work concurrently.

Diagnostic clarification (was Phase 8.13a):

strace traced three S_EXT_CTRLS calls per frame:
  1. H264_PROFILE + H264_LEVEL → EINVAL  (we don't register)
  2. HEVC_PROFILE + HEVC_LEVEL → EINVAL  (we don't register)
  3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS

The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  daemon log:
    REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
    decoder: opened vp9 context
    decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...

  ffmpeg side:
    no Timeout, no Decoding error
    /tmp/vp9_via_libva.nv12: 18432 bytes

  cmp vs reference: byte-for-byte identical.

Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
  higher-level consumers.

Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
  rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
  test_m2m_stream non-request path — both consumer styles work
  concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."

Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 18:14:34 +00:00
parent a7d585eee8
commit f04d7000f8
3 changed files with 297 additions and 10 deletions
+200
View File
@@ -0,0 +1,200 @@
# Phase 8.13 closure — byte-exact end-to-end via libva
**Status:** closed 2026-05-18.
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
```
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match
```
18432 bytes, byte-for-byte identical to plain
`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same
VP9 keyframe. The `project_consumer_target` memory's deliverable
shape — "V4L2 stateless node consumed by a real VAAPI client" —
is achieved.
## What lands
Two related kernel changes that unstick the libva request
completion handshake:
### 1. Stateless control handler initialisation
`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration —
matches rkvdec/cedrus/hantro. Brings each registered compound
control out of "uninitialised" state via the std_init_compound
defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`).
### 2. Per-request control completion in the decode path
The actual root cause of "Timeout when waiting for media
request":
- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on
normal decode completion.
- But the per-request CONTROL object stays bound until
`v4l2_ctrl_request_complete` runs.
- The vb2 `buf_request_complete` op fires only from queue-cancel
paths (vb2-core line 2284), NOT from normal buf_done.
- The driver must call `v4l2_ctrl_request_complete(req, hdl)`
explicitly from its decode-completion path.
Fix (in `kernel/daedalus_v4l2_main.c`):
```c
struct daedalus_inflight {
...
struct media_request *req; /* captured from src_buf */
};
static void daedalus_device_run(void *priv) {
...
inf->req = src_buf->vb2_buf.req_obj.req;
...
}
void daedalus_complete_resp_frame(...) {
...
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
v4l2_m2m_buf_done_and_job_finish(...);
}
```
For non-request flows (test_m2m_stream's direct QBUF)
`inf->req` is NULL; the conditional skips the
`v4l2_ctrl_request_complete` call. Both consumer styles
work concurrently.
### 3. Diagnostic improvements
- libva-v4l2-request-fourier `src/v4l2.c`: better error
logging in `v4l2_set_controls` (logs `error_idx`, failing
control id, size). Made the diagnosis above tractable.
## Verification
### End-to-end via ffmpeg + libva, byte-exact
```
$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
(no timeout, no decode error)
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
$ ls -la /tmp/vp9_via_libva.nv12
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12
$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
-y /tmp/vp9_ref_for_libva.nv12
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
$ echo $?
0
```
Byte-for-byte match. The two `Unable to set control(s):
EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL
integer controls during config negotiation — we don't
register those (since we expose VP9/AV1/H264 stateless), libva
gets EINVAL, logs it, and moves on. Functional flow is
unaffected.
## Design analysis
### Why was Phase 8.12 close but not complete
8.12 wired all the request API hooks (supports_requests,
buf_out_validate, buf_request_complete) and the daemon-side
decode worked byte-exact — but the per-request control object
stayed bound forever because `buf_request_complete` only fires
from queue-cancel paths in vb2-core, not from normal buf_done.
Result: request never transitioned to COMPLETE, libva poll
timed out.
Phase 8.13 closes that loop by capturing the media_request from
the OUTPUT vb2_buffer's req_obj at device_run time and calling
`v4l2_ctrl_request_complete` explicitly when the decode
finishes (chardev RESP_FRAME path). Mirrors what rkvdec does
from its IRQ handler and cedrus from its device_run completion.
### Why the EINVAL noise was misleading
Earlier phases (8.108.12) kept seeing "Unable to set
control(s): Invalid argument" and assumed it pointed at our
stateless control registration. strace revealed three
separate S_EXT_CTRLS calls per frame:
| # | controls | result | meaning |
|---|----------|--------|---------|
| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it |
| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it |
| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls |
Calls 1 and 2 are harmless: libva detects we don't support
H264/HEVC integer probes and falls back to the stateless
controls it does have. Call 3 (the actual VP9 stateless
controls) succeeded all along. Only the completion handshake
was broken.
Phase 8.13's added `error_idx` logging in v4l2.c
(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the
distinction visible.
### Why one fix unblocked both 8.13 and 8.14
The original plan split Phase 8.13 ("trace the EINVAL") from
Phase 8.14 ("call request_complete from the right place").
Once strace clarified that the EINVAL was probe noise, the
real fix was just the request_complete call from the decode
path — a 10-line change. Doing both in one shot avoided a
phase boundary that wouldn't have shipped anything additional.
## What's NOT here (Phase 8.14+ scope)
- **Multi-frame stream via libva.** Verified single keyframe;
P-frame reference handling across requests untested. Likely
works (the daemon's AVCodecContext is persistent across
REQ_DECODE calls — already proven via test_m2m_stream).
- **AV1 + H.264 via libva.** Different stateless control
sets; needs the same control-payload validation path. May
need similar `v4l2_ctrl_request_complete` adjustments per
codec.
- **mpv + Firefox end-to-end.** The lower-level harness
(ffmpeg vaapi) works; higher-level consumers should follow
but each has its own VAAPI quirks.
- **The two harmless EINVALs from H264/HEVC profile probes.**
Could be suppressed by registering those integer controls
too (always rejecting writes) but that's a polish item.
## Phase 8.14 plan
1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from
Phase 8.9 stress test).
2. AV1 + H.264 single-frame via libva (likely needs codec-
specific tweaks).
3. Document any remaining libva-side quirks for higher-level
consumers (mpv, Firefox).