Phase 8.13: byte-exact end-to-end via libva (consumer target hit)
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 → match
18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.
Two related kernel changes:
1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
matches rkvdec/cedrus/hantro. Brings each registered
compound control out of "uninitialised" state via
std_init_compound defaults.
2. Per-request control completion in the decode path —
the real fix for "Timeout when waiting for media request".
vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
on normal decode completion, but the per-request CONTROL
object stays bound. buf_request_complete fires only from
queue-cancel paths (vb2-core line 2284), NOT from normal
buf_done. The driver must call
v4l2_ctrl_request_complete(req, hdl) explicitly from the
completion path.
struct daedalus_inflight gained a `struct media_request
*req` field, captured from src_buf->vb2_buf.req_obj.req
in device_run. daedalus_complete_resp_frame then calls
v4l2_ctrl_request_complete before
v4l2_m2m_buf_done_and_job_finish — triggers
MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
poll.
For non-request flows (test_m2m_stream direct QBUF)
inf->req is NULL; the conditional skips the call.
Both consumer styles work concurrently.
Diagnostic clarification (was Phase 8.13a):
strace traced three S_EXT_CTRLS calls per frame:
1. H264_PROFILE + H264_LEVEL → EINVAL (we don't register)
2. HEVC_PROFILE + HEVC_LEVEL → EINVAL (we don't register)
3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS
The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.
Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...
ffmpeg side:
no Timeout, no Decoding error
/tmp/vp9_via_libva.nv12: 18432 bytes
cmp vs reference: byte-for-byte identical.
Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
higher-level consumers.
Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
test_m2m_stream non-request path — both consumer styles work
concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."
Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,200 @@
|
||||
# Phase 8.13 closure — byte-exact end-to-end via libva
|
||||
|
||||
**Status:** closed 2026-05-18.
|
||||
|
||||
The project's consumer-side goal landed: a real VAAPI consumer
|
||||
(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2
|
||||
driver → daemon → byte-exact NV12 output back to ffmpeg.
|
||||
|
||||
```
|
||||
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
|
||||
-hwaccel_output_format nv12 -i vp9_small.ivf \
|
||||
-f rawvideo -y /tmp/vp9_via_libva.nv12
|
||||
|
||||
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match
|
||||
```
|
||||
|
||||
18432 bytes, byte-for-byte identical to plain
|
||||
`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same
|
||||
VP9 keyframe. The `project_consumer_target` memory's deliverable
|
||||
shape — "V4L2 stateless node consumed by a real VAAPI client" —
|
||||
is achieved.
|
||||
|
||||
## What lands
|
||||
|
||||
Two related kernel changes that unstick the libva request
|
||||
completion handshake:
|
||||
|
||||
### 1. Stateless control handler initialisation
|
||||
|
||||
`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration —
|
||||
matches rkvdec/cedrus/hantro. Brings each registered compound
|
||||
control out of "uninitialised" state via the std_init_compound
|
||||
defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`).
|
||||
|
||||
### 2. Per-request control completion in the decode path
|
||||
|
||||
The actual root cause of "Timeout when waiting for media
|
||||
request":
|
||||
|
||||
- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on
|
||||
normal decode completion.
|
||||
- But the per-request CONTROL object stays bound until
|
||||
`v4l2_ctrl_request_complete` runs.
|
||||
- The vb2 `buf_request_complete` op fires only from queue-cancel
|
||||
paths (vb2-core line 2284), NOT from normal buf_done.
|
||||
- The driver must call `v4l2_ctrl_request_complete(req, hdl)`
|
||||
explicitly from its decode-completion path.
|
||||
|
||||
Fix (in `kernel/daedalus_v4l2_main.c`):
|
||||
|
||||
```c
|
||||
struct daedalus_inflight {
|
||||
...
|
||||
struct media_request *req; /* captured from src_buf */
|
||||
};
|
||||
|
||||
static void daedalus_device_run(void *priv) {
|
||||
...
|
||||
inf->req = src_buf->vb2_buf.req_obj.req;
|
||||
...
|
||||
}
|
||||
|
||||
void daedalus_complete_resp_frame(...) {
|
||||
...
|
||||
if (inf->req)
|
||||
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
|
||||
v4l2_m2m_buf_done_and_job_finish(...);
|
||||
}
|
||||
```
|
||||
|
||||
For non-request flows (test_m2m_stream's direct QBUF)
|
||||
`inf->req` is NULL; the conditional skips the
|
||||
`v4l2_ctrl_request_complete` call. Both consumer styles
|
||||
work concurrently.
|
||||
|
||||
### 3. Diagnostic improvements
|
||||
|
||||
- libva-v4l2-request-fourier `src/v4l2.c`: better error
|
||||
logging in `v4l2_set_controls` (logs `error_idx`, failing
|
||||
control id, size). Made the diagnosis above tractable.
|
||||
|
||||
## Verification
|
||||
|
||||
### End-to-end via ffmpeg + libva, byte-exact
|
||||
|
||||
```
|
||||
$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
|
||||
$ sudo insmod kernel/daedalus_v4l2.ko
|
||||
$ daedalus_v4l2_daemon -v daemon &
|
||||
|
||||
$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
|
||||
LIBVA_DRIVER_NAME=v4l2_request \
|
||||
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
|
||||
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
|
||||
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
|
||||
-hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
|
||||
-f rawvideo -y /tmp/vp9_via_libva.nv12
|
||||
|
||||
v4l2-request: cap_pool_init: 24 slots ready
|
||||
v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
|
||||
v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
|
||||
(no timeout, no decode error)
|
||||
|
||||
daemon log:
|
||||
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
|
||||
decoder: opened vp9 context
|
||||
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
|
||||
|
||||
$ ls -la /tmp/vp9_via_libva.nv12
|
||||
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12
|
||||
|
||||
$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
|
||||
-y /tmp/vp9_ref_for_libva.nv12
|
||||
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
|
||||
$ echo $?
|
||||
0
|
||||
```
|
||||
|
||||
Byte-for-byte match. The two `Unable to set control(s):
|
||||
EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL
|
||||
integer controls during config negotiation — we don't
|
||||
register those (since we expose VP9/AV1/H264 stateless), libva
|
||||
gets EINVAL, logs it, and moves on. Functional flow is
|
||||
unaffected.
|
||||
|
||||
## Design analysis
|
||||
|
||||
### Why was Phase 8.12 close but not complete
|
||||
|
||||
8.12 wired all the request API hooks (supports_requests,
|
||||
buf_out_validate, buf_request_complete) and the daemon-side
|
||||
decode worked byte-exact — but the per-request control object
|
||||
stayed bound forever because `buf_request_complete` only fires
|
||||
from queue-cancel paths in vb2-core, not from normal buf_done.
|
||||
Result: request never transitioned to COMPLETE, libva poll
|
||||
timed out.
|
||||
|
||||
Phase 8.13 closes that loop by capturing the media_request from
|
||||
the OUTPUT vb2_buffer's req_obj at device_run time and calling
|
||||
`v4l2_ctrl_request_complete` explicitly when the decode
|
||||
finishes (chardev RESP_FRAME path). Mirrors what rkvdec does
|
||||
from its IRQ handler and cedrus from its device_run completion.
|
||||
|
||||
### Why the EINVAL noise was misleading
|
||||
|
||||
Earlier phases (8.10–8.12) kept seeing "Unable to set
|
||||
control(s): Invalid argument" and assumed it pointed at our
|
||||
stateless control registration. strace revealed three
|
||||
separate S_EXT_CTRLS calls per frame:
|
||||
|
||||
| # | controls | result | meaning |
|
||||
|---|----------|--------|---------|
|
||||
| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it |
|
||||
| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it |
|
||||
| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls |
|
||||
|
||||
Calls 1 and 2 are harmless: libva detects we don't support
|
||||
H264/HEVC integer probes and falls back to the stateless
|
||||
controls it does have. Call 3 (the actual VP9 stateless
|
||||
controls) succeeded all along. Only the completion handshake
|
||||
was broken.
|
||||
|
||||
Phase 8.13's added `error_idx` logging in v4l2.c
|
||||
(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the
|
||||
distinction visible.
|
||||
|
||||
### Why one fix unblocked both 8.13 and 8.14
|
||||
|
||||
The original plan split Phase 8.13 ("trace the EINVAL") from
|
||||
Phase 8.14 ("call request_complete from the right place").
|
||||
Once strace clarified that the EINVAL was probe noise, the
|
||||
real fix was just the request_complete call from the decode
|
||||
path — a 10-line change. Doing both in one shot avoided a
|
||||
phase boundary that wouldn't have shipped anything additional.
|
||||
|
||||
## What's NOT here (Phase 8.14+ scope)
|
||||
|
||||
- **Multi-frame stream via libva.** Verified single keyframe;
|
||||
P-frame reference handling across requests untested. Likely
|
||||
works (the daemon's AVCodecContext is persistent across
|
||||
REQ_DECODE calls — already proven via test_m2m_stream).
|
||||
- **AV1 + H.264 via libva.** Different stateless control
|
||||
sets; needs the same control-payload validation path. May
|
||||
need similar `v4l2_ctrl_request_complete` adjustments per
|
||||
codec.
|
||||
- **mpv + Firefox end-to-end.** The lower-level harness
|
||||
(ffmpeg vaapi) works; higher-level consumers should follow
|
||||
but each has its own VAAPI quirks.
|
||||
- **The two harmless EINVALs from H264/HEVC profile probes.**
|
||||
Could be suppressed by registering those integer controls
|
||||
too (always rejecting writes) but that's a polish item.
|
||||
|
||||
## Phase 8.14 plan
|
||||
|
||||
1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from
|
||||
Phase 8.9 stress test).
|
||||
2. AV1 + H.264 single-frame via libva (likely needs codec-
|
||||
specific tweaks).
|
||||
3. Document any remaining libva-side quirks for higher-level
|
||||
consumers (mpv, Firefox).
|
||||
+56
-10
@@ -134,18 +134,64 @@ See `docs/phase_8_8_closure.md`.
|
||||
|
||||
See `docs/phase_8_9_closure.md`.
|
||||
|
||||
### Phase 8.10 — libva-v4l2-request VP9/AV1 patch + end-to-end consumer
|
||||
### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18)
|
||||
|
||||
1. Build libva-v4l2-request from source on hertz.
|
||||
2. Add VP9_FRAME + AV1_FRAME profile mappings; add
|
||||
V4L2_PIX_FMT_NV12 (single-plane) to our CAPTURE so
|
||||
the library's video.c picks us.
|
||||
3. End-to-end: `mpv --hwdec=vaapi` against test files;
|
||||
then Firefox.
|
||||
4. (Stretch) Upstream the patches to bootlin.
|
||||
- Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2-
|
||||
request-fourier (gitea); discovered the existing fourier
|
||||
fork already had VP9/AV1/HEVC support on Rockchip.
|
||||
- Added daedalus_v4l2 to known_decoder_drivers + meson
|
||||
build option.
|
||||
- Added V4L2_PIX_FMT_NV12 single-plane + request API
|
||||
media ops + stateless control registration to our
|
||||
kernel.
|
||||
- vainfo enumerates 7 VAProfile entries via our driver.
|
||||
|
||||
After 8.10 the project's user-facing loop is closed.
|
||||
Optimisation phases (QPU dispatch, 4K) ship when motivated.
|
||||
See `docs/phase_8_10_11_closure.md`.
|
||||
|
||||
### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18)
|
||||
|
||||
- vb2_queue supports_requests; vb2_ops buf_out_validate +
|
||||
buf_request_complete; v4l2_ctrl_new_custom for stateless
|
||||
ctrl registration.
|
||||
- Daemon decoded byte-exact VP9 keyframe via the full
|
||||
libva path (FNV-1a 0x1eb34bfe matches standalone).
|
||||
- ffmpeg still timed out waiting for media_request
|
||||
completion (request bind state).
|
||||
|
||||
See `docs/phase_8_12_closure.md`.
|
||||
|
||||
### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18)
|
||||
|
||||
- Traced the misleading "Unable to set control(s):
|
||||
Invalid argument" — actually libva probing H264/HEVC
|
||||
PROFILE/LEVEL we don't expose (harmless); the real VP9
|
||||
stateless control SET succeeds.
|
||||
- Diagnosed the "Timeout when waiting for media request"
|
||||
root cause: per-request control object stays bound
|
||||
because vb2's normal buf_done path doesn't fire
|
||||
buf_request_complete (only queue-cancel does).
|
||||
- Fix: capture media_request from src_buf in inflight
|
||||
entry, call v4l2_ctrl_request_complete from the
|
||||
completion path before buf_done_and_job_finish.
|
||||
- **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi →
|
||||
libva-v4l2-request-fourier → /dev/video0 →
|
||||
daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte
|
||||
identical to ffmpeg software reference.
|
||||
- **Project consumer target hit**: V4L2 stateless node
|
||||
consumed by a real VAAPI client.
|
||||
|
||||
See `docs/phase_8_13_closure.md`.
|
||||
|
||||
### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers
|
||||
|
||||
1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9
|
||||
stress test) — P-frame references across requests.
|
||||
2. AV1 + H.264 single-frame via libva.
|
||||
3. mpv --hwdec=vaapi end-to-end.
|
||||
4. Firefox / WebRTC if motivated.
|
||||
|
||||
Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships
|
||||
when there's a concrete user-facing need.
|
||||
|
||||
## Effort estimate
|
||||
|
||||
|
||||
@@ -520,6 +520,14 @@ struct daedalus_inflight {
|
||||
struct daedalus_ctx *ctx;
|
||||
struct vb2_v4l2_buffer *src_buf;
|
||||
struct vb2_v4l2_buffer *dst_buf;
|
||||
/*
|
||||
* Captured media_request the src_buf was bound to (if any).
|
||||
* Set by device_run from src_buf->vb2_buf.req_obj.req;
|
||||
* consumed by the completion path to call
|
||||
* v4l2_ctrl_request_complete + signal request fd. NULL for
|
||||
* non-request flows (e.g. test_m2m_stream direct QBUF).
|
||||
*/
|
||||
struct media_request *req;
|
||||
};
|
||||
|
||||
static struct daedalus_inflight *
|
||||
@@ -666,6 +674,14 @@ static void daedalus_device_run(void *priv)
|
||||
inf->ctx = ctx;
|
||||
inf->src_buf = src_buf;
|
||||
inf->dst_buf = dst_buf;
|
||||
/*
|
||||
* Capture the bound media_request (if any) so the
|
||||
* completion path can call v4l2_ctrl_request_complete +
|
||||
* trigger MEDIA_REQUEST_STATE_COMPLETE. vb2-core's normal
|
||||
* buf_done path unbinds the buffer's req_obj but leaves the
|
||||
* control object bound — the driver has to complete it.
|
||||
*/
|
||||
inf->req = src_buf->vb2_buf.req_obj.req;
|
||||
|
||||
mutex_lock(&dev->inflight_lock);
|
||||
list_add_tail(&inf->list, &dev->inflight);
|
||||
@@ -789,6 +805,22 @@ void daedalus_complete_resp_frame(u32 cookie,
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
* Phase 8.14: if the src_buf was bound to a media_request
|
||||
* (libva-driven decode path), complete the per-request
|
||||
* control state BEFORE buf_done_and_job_finish. vb2-core's
|
||||
* buf_done unbinds the buffer's req_obj on its own, but the
|
||||
* control object stays bound until v4l2_ctrl_request_complete
|
||||
* runs — only after BOTH objects unbind does the request
|
||||
* transition to MEDIA_REQUEST_STATE_COMPLETE and wake any
|
||||
* userspace poll on the request fd.
|
||||
*
|
||||
* For non-request flows (test_m2m_stream direct QBUF) inf->req
|
||||
* is NULL and v4l2_ctrl_request_complete just no-ops.
|
||||
*/
|
||||
if (inf->req)
|
||||
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
|
||||
|
||||
/*
|
||||
* Use the buf_done_and_job_finish helper rather than plain
|
||||
* buf_done + job_finish: the helper pops the buffers off
|
||||
@@ -968,6 +1000,15 @@ static int daedalus_open(struct file *file)
|
||||
|
||||
v4l2_ctrl_handler_init(&ctx->hdl, ARRAY_SIZE(daedalus_stateless_ctrls));
|
||||
daedalus_register_stateless_ctrls(&ctx->hdl);
|
||||
/*
|
||||
* v4l2_ctrl_handler_setup runs s_ctrl for every registered
|
||||
* control with its default value — required to bring each
|
||||
* control out of "uninitialised" state. Without this the
|
||||
* per-request handler clone path returns EINVAL on
|
||||
* VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL). rkvdec/cedrus/
|
||||
* hantro all call this after registration.
|
||||
*/
|
||||
v4l2_ctrl_handler_setup(&ctx->hdl);
|
||||
ctx->fh.ctrl_handler = &ctx->hdl;
|
||||
|
||||
daedalus_fill_output_fmt(&ctx->src_fmt,
|
||||
|
||||
Reference in New Issue
Block a user