f04d7000f8
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 → match
18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.
Two related kernel changes:
1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
matches rkvdec/cedrus/hantro. Brings each registered
compound control out of "uninitialised" state via
std_init_compound defaults.
2. Per-request control completion in the decode path —
the real fix for "Timeout when waiting for media request".
vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
on normal decode completion, but the per-request CONTROL
object stays bound. buf_request_complete fires only from
queue-cancel paths (vb2-core line 2284), NOT from normal
buf_done. The driver must call
v4l2_ctrl_request_complete(req, hdl) explicitly from the
completion path.
struct daedalus_inflight gained a `struct media_request
*req` field, captured from src_buf->vb2_buf.req_obj.req
in device_run. daedalus_complete_resp_frame then calls
v4l2_ctrl_request_complete before
v4l2_m2m_buf_done_and_job_finish — triggers
MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
poll.
For non-request flows (test_m2m_stream direct QBUF)
inf->req is NULL; the conditional skips the call.
Both consumer styles work concurrently.
Diagnostic clarification (was Phase 8.13a):
strace traced three S_EXT_CTRLS calls per frame:
1. H264_PROFILE + H264_LEVEL → EINVAL (we don't register)
2. HEVC_PROFILE + HEVC_LEVEL → EINVAL (we don't register)
3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS
The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...
ffmpeg side:
no Timeout, no Decoding error
/tmp/vp9_via_libva.nv12: 18432 bytes
cmp vs reference: byte-for-byte identical.
Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
higher-level consumers.
Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
test_m2m_stream non-request path — both consumer styles work
concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."
Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
201 lines
7.1 KiB
Markdown
201 lines
7.1 KiB
Markdown
# Phase 8.13 closure — byte-exact end-to-end via libva
|
||
|
||
**Status:** closed 2026-05-18.
|
||
|
||
The project's consumer-side goal landed: a real VAAPI consumer
|
||
(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2
|
||
driver → daemon → byte-exact NV12 output back to ffmpeg.
|
||
|
||
```
|
||
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
|
||
-hwaccel_output_format nv12 -i vp9_small.ivf \
|
||
-f rawvideo -y /tmp/vp9_via_libva.nv12
|
||
|
||
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match
|
||
```
|
||
|
||
18432 bytes, byte-for-byte identical to plain
|
||
`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same
|
||
VP9 keyframe. The `project_consumer_target` memory's deliverable
|
||
shape — "V4L2 stateless node consumed by a real VAAPI client" —
|
||
is achieved.
|
||
|
||
## What lands
|
||
|
||
Two related kernel changes that unstick the libva request
|
||
completion handshake:
|
||
|
||
### 1. Stateless control handler initialisation
|
||
|
||
`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration —
|
||
matches rkvdec/cedrus/hantro. Brings each registered compound
|
||
control out of "uninitialised" state via the std_init_compound
|
||
defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`).
|
||
|
||
### 2. Per-request control completion in the decode path
|
||
|
||
The actual root cause of "Timeout when waiting for media
|
||
request":
|
||
|
||
- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on
|
||
normal decode completion.
|
||
- But the per-request CONTROL object stays bound until
|
||
`v4l2_ctrl_request_complete` runs.
|
||
- The vb2 `buf_request_complete` op fires only from queue-cancel
|
||
paths (vb2-core line 2284), NOT from normal buf_done.
|
||
- The driver must call `v4l2_ctrl_request_complete(req, hdl)`
|
||
explicitly from its decode-completion path.
|
||
|
||
Fix (in `kernel/daedalus_v4l2_main.c`):
|
||
|
||
```c
|
||
struct daedalus_inflight {
|
||
...
|
||
struct media_request *req; /* captured from src_buf */
|
||
};
|
||
|
||
static void daedalus_device_run(void *priv) {
|
||
...
|
||
inf->req = src_buf->vb2_buf.req_obj.req;
|
||
...
|
||
}
|
||
|
||
void daedalus_complete_resp_frame(...) {
|
||
...
|
||
if (inf->req)
|
||
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
|
||
v4l2_m2m_buf_done_and_job_finish(...);
|
||
}
|
||
```
|
||
|
||
For non-request flows (test_m2m_stream's direct QBUF)
|
||
`inf->req` is NULL; the conditional skips the
|
||
`v4l2_ctrl_request_complete` call. Both consumer styles
|
||
work concurrently.
|
||
|
||
### 3. Diagnostic improvements
|
||
|
||
- libva-v4l2-request-fourier `src/v4l2.c`: better error
|
||
logging in `v4l2_set_controls` (logs `error_idx`, failing
|
||
control id, size). Made the diagnosis above tractable.
|
||
|
||
## Verification
|
||
|
||
### End-to-end via ffmpeg + libva, byte-exact
|
||
|
||
```
|
||
$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
|
||
$ sudo insmod kernel/daedalus_v4l2.ko
|
||
$ daedalus_v4l2_daemon -v daemon &
|
||
|
||
$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||
LIBVA_DRIVER_NAME=v4l2_request \
|
||
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
|
||
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
|
||
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
|
||
-hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
|
||
-f rawvideo -y /tmp/vp9_via_libva.nv12
|
||
|
||
v4l2-request: cap_pool_init: 24 slots ready
|
||
v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
|
||
v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
|
||
(no timeout, no decode error)
|
||
|
||
daemon log:
|
||
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
|
||
decoder: opened vp9 context
|
||
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
|
||
|
||
$ ls -la /tmp/vp9_via_libva.nv12
|
||
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12
|
||
|
||
$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
|
||
-y /tmp/vp9_ref_for_libva.nv12
|
||
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
|
||
$ echo $?
|
||
0
|
||
```
|
||
|
||
Byte-for-byte match. The two `Unable to set control(s):
|
||
EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL
|
||
integer controls during config negotiation — we don't
|
||
register those (since we expose VP9/AV1/H264 stateless), libva
|
||
gets EINVAL, logs it, and moves on. Functional flow is
|
||
unaffected.
|
||
|
||
## Design analysis
|
||
|
||
### Why was Phase 8.12 close but not complete
|
||
|
||
8.12 wired all the request API hooks (supports_requests,
|
||
buf_out_validate, buf_request_complete) and the daemon-side
|
||
decode worked byte-exact — but the per-request control object
|
||
stayed bound forever because `buf_request_complete` only fires
|
||
from queue-cancel paths in vb2-core, not from normal buf_done.
|
||
Result: request never transitioned to COMPLETE, libva poll
|
||
timed out.
|
||
|
||
Phase 8.13 closes that loop by capturing the media_request from
|
||
the OUTPUT vb2_buffer's req_obj at device_run time and calling
|
||
`v4l2_ctrl_request_complete` explicitly when the decode
|
||
finishes (chardev RESP_FRAME path). Mirrors what rkvdec does
|
||
from its IRQ handler and cedrus from its device_run completion.
|
||
|
||
### Why the EINVAL noise was misleading
|
||
|
||
Earlier phases (8.10–8.12) kept seeing "Unable to set
|
||
control(s): Invalid argument" and assumed it pointed at our
|
||
stateless control registration. strace revealed three
|
||
separate S_EXT_CTRLS calls per frame:
|
||
|
||
| # | controls | result | meaning |
|
||
|---|----------|--------|---------|
|
||
| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it |
|
||
| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it |
|
||
| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls |
|
||
|
||
Calls 1 and 2 are harmless: libva detects we don't support
|
||
H264/HEVC integer probes and falls back to the stateless
|
||
controls it does have. Call 3 (the actual VP9 stateless
|
||
controls) succeeded all along. Only the completion handshake
|
||
was broken.
|
||
|
||
Phase 8.13's added `error_idx` logging in v4l2.c
|
||
(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the
|
||
distinction visible.
|
||
|
||
### Why one fix unblocked both 8.13 and 8.14
|
||
|
||
The original plan split Phase 8.13 ("trace the EINVAL") from
|
||
Phase 8.14 ("call request_complete from the right place").
|
||
Once strace clarified that the EINVAL was probe noise, the
|
||
real fix was just the request_complete call from the decode
|
||
path — a 10-line change. Doing both in one shot avoided a
|
||
phase boundary that wouldn't have shipped anything additional.
|
||
|
||
## What's NOT here (Phase 8.14+ scope)
|
||
|
||
- **Multi-frame stream via libva.** Verified single keyframe;
|
||
P-frame reference handling across requests untested. Likely
|
||
works (the daemon's AVCodecContext is persistent across
|
||
REQ_DECODE calls — already proven via test_m2m_stream).
|
||
- **AV1 + H.264 via libva.** Different stateless control
|
||
sets; needs the same control-payload validation path. May
|
||
need similar `v4l2_ctrl_request_complete` adjustments per
|
||
codec.
|
||
- **mpv + Firefox end-to-end.** The lower-level harness
|
||
(ffmpeg vaapi) works; higher-level consumers should follow
|
||
but each has its own VAAPI quirks.
|
||
- **The two harmless EINVALs from H264/HEVC profile probes.**
|
||
Could be suppressed by registering those integer controls
|
||
too (always rejecting writes) but that's a polish item.
|
||
|
||
## Phase 8.14 plan
|
||
|
||
1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from
|
||
Phase 8.9 stress test).
|
||
2. AV1 + H.264 single-frame via libva (likely needs codec-
|
||
specific tweaks).
|
||
3. Document any remaining libva-side quirks for higher-level
|
||
consumers (mpv, Firefox).
|