Phase 8.13: byte-exact end-to-end via libva (consumer target hit)

The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with -hwaccel vaapi) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.

  ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
         -hwaccel_output_format nv12 -i vp9_small.ivf \
         -f rawvideo -y /tmp/vp9_via_libva.nv12
  cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12  → match

18432-byte NV12 byte-for-byte identical to plain ffmpeg
-pix_fmt nv12 software decode. The project_consumer_target
memory's deliverable shape — "V4L2 stateless node consumed by
a real VAAPI client" — is achieved.

Two related kernel changes:

1. v4l2_ctrl_handler_setup(&ctx->hdl) after registration —
   matches rkvdec/cedrus/hantro. Brings each registered
   compound control out of "uninitialised" state via
   std_init_compound defaults.

2. Per-request control completion in the decode path —
   the real fix for "Timeout when waiting for media request".
   vb2-core's vb2_buffer_done unbinds the BUFFER's req_obj
   on normal decode completion, but the per-request CONTROL
   object stays bound. buf_request_complete fires only from
   queue-cancel paths (vb2-core line 2284), NOT from normal
   buf_done. The driver must call
   v4l2_ctrl_request_complete(req, hdl) explicitly from the
   completion path.

   struct daedalus_inflight gained a `struct media_request
   *req` field, captured from src_buf->vb2_buf.req_obj.req
   in device_run. daedalus_complete_resp_frame then calls
   v4l2_ctrl_request_complete before
   v4l2_m2m_buf_done_and_job_finish — triggers
   MEDIA_REQUEST_STATE_COMPLETE and wakes the request fd
   poll.

   For non-request flows (test_m2m_stream direct QBUF)
   inf->req is NULL; the conditional skips the call.
   Both consumer styles work concurrently.

Diagnostic clarification (was Phase 8.13a):

strace traced three S_EXT_CTRLS calls per frame:
  1. H264_PROFILE + H264_LEVEL → EINVAL  (we don't register)
  2. HEVC_PROFILE + HEVC_LEVEL → EINVAL  (we don't register)
  3. VP9_FRAME + VP9_COMPRESSED_HDR → SUCCESS

The first two are harmless: libva probes whether we support
H264/HEVC integer profile/level controls during config
negotiation; we don't (we expose them as stateless), so EINVAL
just falls through. The actual VP9 stateless controls (#3)
succeeded all along — the libva-side "Unable to set control(s)"
log was misleading us into thinking the control path was the
bug.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  daemon log:
    REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
    decoder: opened vp9 context
    decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe ...

  ffmpeg side:
    no Timeout, no Decoding error
    /tmp/vp9_via_libva.nv12: 18432 bytes

  cmp vs reference: byte-for-byte identical.

Roadmap update:
- 8.10/8.11, 8.12, 8.13 marked closed with closure docs.
- 8.14 = multi-frame VP9 via libva, AV1 + H.264, mpv/Firefox
  higher-level consumers.

Per correctness-before-speed:
- strace + kernel-source-reading found the actual root cause
  rather than guessing.
- Conditional v4l2_ctrl_request_complete preserves the existing
  test_m2m_stream non-request path — both consumer styles work
  concurrently without per-flow branching elsewhere.
- Byte-exact pixel comparison, not "frame size matches."

Phase 8.14 next: multi-frame stream + multi-codec via libva +
mpv/Firefox.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 18:14:34 +00:00
parent a7d585eee8
commit f04d7000f8
3 changed files with 297 additions and 10 deletions
+200
View File
@@ -0,0 +1,200 @@
# Phase 8.13 closure — byte-exact end-to-end via libva
**Status:** closed 2026-05-18.
The project's consumer-side goal landed: a real VAAPI consumer
(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2
driver → daemon → byte-exact NV12 output back to ffmpeg.
```
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match
```
18432 bytes, byte-for-byte identical to plain
`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same
VP9 keyframe. The `project_consumer_target` memory's deliverable
shape — "V4L2 stateless node consumed by a real VAAPI client" —
is achieved.
## What lands
Two related kernel changes that unstick the libva request
completion handshake:
### 1. Stateless control handler initialisation
`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration —
matches rkvdec/cedrus/hantro. Brings each registered compound
control out of "uninitialised" state via the std_init_compound
defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`).
### 2. Per-request control completion in the decode path
The actual root cause of "Timeout when waiting for media
request":
- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on
normal decode completion.
- But the per-request CONTROL object stays bound until
`v4l2_ctrl_request_complete` runs.
- The vb2 `buf_request_complete` op fires only from queue-cancel
paths (vb2-core line 2284), NOT from normal buf_done.
- The driver must call `v4l2_ctrl_request_complete(req, hdl)`
explicitly from its decode-completion path.
Fix (in `kernel/daedalus_v4l2_main.c`):
```c
struct daedalus_inflight {
...
struct media_request *req; /* captured from src_buf */
};
static void daedalus_device_run(void *priv) {
...
inf->req = src_buf->vb2_buf.req_obj.req;
...
}
void daedalus_complete_resp_frame(...) {
...
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
v4l2_m2m_buf_done_and_job_finish(...);
}
```
For non-request flows (test_m2m_stream's direct QBUF)
`inf->req` is NULL; the conditional skips the
`v4l2_ctrl_request_complete` call. Both consumer styles
work concurrently.
### 3. Diagnostic improvements
- libva-v4l2-request-fourier `src/v4l2.c`: better error
logging in `v4l2_set_controls` (logs `error_idx`, failing
control id, size). Made the diagnosis above tractable.
## Verification
### End-to-end via ffmpeg + libva, byte-exact
```
$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2
$ sudo insmod kernel/daedalus_v4l2.ko
$ daedalus_v4l2_daemon -v daemon &
$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \
LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \
-f rawvideo -y /tmp/vp9_via_libva.nv12
v4l2-request: cap_pool_init: 24 slots ready
v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless)
v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless)
(no timeout, no decode error)
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
$ ls -la /tmp/vp9_via_libva.nv12
-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12
$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \
-y /tmp/vp9_ref_for_libva.nv12
$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12
$ echo $?
0
```
Byte-for-byte match. The two `Unable to set control(s):
EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL
integer controls during config negotiation — we don't
register those (since we expose VP9/AV1/H264 stateless), libva
gets EINVAL, logs it, and moves on. Functional flow is
unaffected.
## Design analysis
### Why was Phase 8.12 close but not complete
8.12 wired all the request API hooks (supports_requests,
buf_out_validate, buf_request_complete) and the daemon-side
decode worked byte-exact — but the per-request control object
stayed bound forever because `buf_request_complete` only fires
from queue-cancel paths in vb2-core, not from normal buf_done.
Result: request never transitioned to COMPLETE, libva poll
timed out.
Phase 8.13 closes that loop by capturing the media_request from
the OUTPUT vb2_buffer's req_obj at device_run time and calling
`v4l2_ctrl_request_complete` explicitly when the decode
finishes (chardev RESP_FRAME path). Mirrors what rkvdec does
from its IRQ handler and cedrus from its device_run completion.
### Why the EINVAL noise was misleading
Earlier phases (8.108.12) kept seeing "Unable to set
control(s): Invalid argument" and assumed it pointed at our
stateless control registration. strace revealed three
separate S_EXT_CTRLS calls per frame:
| # | controls | result | meaning |
|---|----------|--------|---------|
| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it |
| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it |
| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls |
Calls 1 and 2 are harmless: libva detects we don't support
H264/HEVC integer probes and falls back to the stateless
controls it does have. Call 3 (the actual VP9 stateless
controls) succeeded all along. Only the completion handshake
was broken.
Phase 8.13's added `error_idx` logging in v4l2.c
(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the
distinction visible.
### Why one fix unblocked both 8.13 and 8.14
The original plan split Phase 8.13 ("trace the EINVAL") from
Phase 8.14 ("call request_complete from the right place").
Once strace clarified that the EINVAL was probe noise, the
real fix was just the request_complete call from the decode
path — a 10-line change. Doing both in one shot avoided a
phase boundary that wouldn't have shipped anything additional.
## What's NOT here (Phase 8.14+ scope)
- **Multi-frame stream via libva.** Verified single keyframe;
P-frame reference handling across requests untested. Likely
works (the daemon's AVCodecContext is persistent across
REQ_DECODE calls — already proven via test_m2m_stream).
- **AV1 + H.264 via libva.** Different stateless control
sets; needs the same control-payload validation path. May
need similar `v4l2_ctrl_request_complete` adjustments per
codec.
- **mpv + Firefox end-to-end.** The lower-level harness
(ffmpeg vaapi) works; higher-level consumers should follow
but each has its own VAAPI quirks.
- **The two harmless EINVALs from H264/HEVC profile probes.**
Could be suppressed by registering those integer controls
too (always rejecting writes) but that's a polish item.
## Phase 8.14 plan
1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from
Phase 8.9 stress test).
2. AV1 + H.264 single-frame via libva (likely needs codec-
specific tweaks).
3. Document any remaining libva-side quirks for higher-level
consumers (mpv, Firefox).
+56 -10
View File
@@ -134,18 +134,64 @@ See `docs/phase_8_8_closure.md`.
See `docs/phase_8_9_closure.md`. See `docs/phase_8_9_closure.md`.
### Phase 8.10 — libva-v4l2-request VP9/AV1 patch + end-to-end consumer ### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18)
1. Build libva-v4l2-request from source on hertz. - Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2-
2. Add VP9_FRAME + AV1_FRAME profile mappings; add request-fourier (gitea); discovered the existing fourier
V4L2_PIX_FMT_NV12 (single-plane) to our CAPTURE so fork already had VP9/AV1/HEVC support on Rockchip.
the library's video.c picks us. - Added daedalus_v4l2 to known_decoder_drivers + meson
3. End-to-end: `mpv --hwdec=vaapi` against test files; build option.
then Firefox. - Added V4L2_PIX_FMT_NV12 single-plane + request API
4. (Stretch) Upstream the patches to bootlin. media ops + stateless control registration to our
kernel.
- vainfo enumerates 7 VAProfile entries via our driver.
After 8.10 the project's user-facing loop is closed. See `docs/phase_8_10_11_closure.md`.
Optimisation phases (QPU dispatch, 4K) ship when motivated.
### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18)
- vb2_queue supports_requests; vb2_ops buf_out_validate +
buf_request_complete; v4l2_ctrl_new_custom for stateless
ctrl registration.
- Daemon decoded byte-exact VP9 keyframe via the full
libva path (FNV-1a 0x1eb34bfe matches standalone).
- ffmpeg still timed out waiting for media_request
completion (request bind state).
See `docs/phase_8_12_closure.md`.
### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18)
- Traced the misleading "Unable to set control(s):
Invalid argument" — actually libva probing H264/HEVC
PROFILE/LEVEL we don't expose (harmless); the real VP9
stateless control SET succeeds.
- Diagnosed the "Timeout when waiting for media request"
root cause: per-request control object stays bound
because vb2's normal buf_done path doesn't fire
buf_request_complete (only queue-cancel does).
- Fix: capture media_request from src_buf in inflight
entry, call v4l2_ctrl_request_complete from the
completion path before buf_done_and_job_finish.
- **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi →
libva-v4l2-request-fourier → /dev/video0 →
daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte
identical to ffmpeg software reference.
- **Project consumer target hit**: V4L2 stateless node
consumed by a real VAAPI client.
See `docs/phase_8_13_closure.md`.
### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers
1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9
stress test) — P-frame references across requests.
2. AV1 + H.264 single-frame via libva.
3. mpv --hwdec=vaapi end-to-end.
4. Firefox / WebRTC if motivated.
Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships
when there's a concrete user-facing need.
## Effort estimate ## Effort estimate
+41
View File
@@ -520,6 +520,14 @@ struct daedalus_inflight {
struct daedalus_ctx *ctx; struct daedalus_ctx *ctx;
struct vb2_v4l2_buffer *src_buf; struct vb2_v4l2_buffer *src_buf;
struct vb2_v4l2_buffer *dst_buf; struct vb2_v4l2_buffer *dst_buf;
/*
* Captured media_request the src_buf was bound to (if any).
* Set by device_run from src_buf->vb2_buf.req_obj.req;
* consumed by the completion path to call
* v4l2_ctrl_request_complete + signal request fd. NULL for
* non-request flows (e.g. test_m2m_stream direct QBUF).
*/
struct media_request *req;
}; };
static struct daedalus_inflight * static struct daedalus_inflight *
@@ -666,6 +674,14 @@ static void daedalus_device_run(void *priv)
inf->ctx = ctx; inf->ctx = ctx;
inf->src_buf = src_buf; inf->src_buf = src_buf;
inf->dst_buf = dst_buf; inf->dst_buf = dst_buf;
/*
* Capture the bound media_request (if any) so the
* completion path can call v4l2_ctrl_request_complete +
* trigger MEDIA_REQUEST_STATE_COMPLETE. vb2-core's normal
* buf_done path unbinds the buffer's req_obj but leaves the
* control object bound — the driver has to complete it.
*/
inf->req = src_buf->vb2_buf.req_obj.req;
mutex_lock(&dev->inflight_lock); mutex_lock(&dev->inflight_lock);
list_add_tail(&inf->list, &dev->inflight); list_add_tail(&inf->list, &dev->inflight);
@@ -789,6 +805,22 @@ void daedalus_complete_resp_frame(u32 cookie,
} }
} }
/*
* Phase 8.14: if the src_buf was bound to a media_request
* (libva-driven decode path), complete the per-request
* control state BEFORE buf_done_and_job_finish. vb2-core's
* buf_done unbinds the buffer's req_obj on its own, but the
* control object stays bound until v4l2_ctrl_request_complete
* runs — only after BOTH objects unbind does the request
* transition to MEDIA_REQUEST_STATE_COMPLETE and wake any
* userspace poll on the request fd.
*
* For non-request flows (test_m2m_stream direct QBUF) inf->req
* is NULL and v4l2_ctrl_request_complete just no-ops.
*/
if (inf->req)
v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl);
/* /*
* Use the buf_done_and_job_finish helper rather than plain * Use the buf_done_and_job_finish helper rather than plain
* buf_done + job_finish: the helper pops the buffers off * buf_done + job_finish: the helper pops the buffers off
@@ -968,6 +1000,15 @@ static int daedalus_open(struct file *file)
v4l2_ctrl_handler_init(&ctx->hdl, ARRAY_SIZE(daedalus_stateless_ctrls)); v4l2_ctrl_handler_init(&ctx->hdl, ARRAY_SIZE(daedalus_stateless_ctrls));
daedalus_register_stateless_ctrls(&ctx->hdl); daedalus_register_stateless_ctrls(&ctx->hdl);
/*
* v4l2_ctrl_handler_setup runs s_ctrl for every registered
* control with its default value — required to bring each
* control out of "uninitialised" state. Without this the
* per-request handler clone path returns EINVAL on
* VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL). rkvdec/cedrus/
* hantro all call this after registration.
*/
v4l2_ctrl_handler_setup(&ctx->hdl);
ctx->fh.ctrl_handler = &ctx->hdl; ctx->fh.ctrl_handler = &ctx->hdl;
daedalus_fill_output_fmt(&ctx->src_fmt, daedalus_fill_output_fmt(&ctx->src_fmt,