Phase 8.12: first VP9 frame decoded via libva

ffmpeg -hwaccel vaapi → libva-v4l2-request-fourier →
/dev/video0 → daedalus_v4l2 kernel → REQ_DECODE on the
chardev → daemon FFmpeg decode → byte-exact NV12 (FNV-1a
0x1eb34bfe, same hash the standalone test_m2m_stream
produces for the same 128x96 VP9 keyframe).

The pixel-correct decode through the libva path is the
milestone. What's NOT yet working: libva times out on the
media_request fd because buf_request_complete never fires
(vb->req_obj.req is NULL when buf_done runs — the
S_EXT_CTRLS EINVAL leaves the buffer un-bound to the
request even though the buffer queues anyway). Phase 8.13
fixes the EINVAL so the request bind takes and the
completion signal propagates.

Kernel V4L2 request API integration:
- media_device_ops.req_validate / req_queue = vb2_request_
  validate / v4l2_m2m_request_queue (Phase 8.11) —
  MEDIA_IOC_REQUEST_ALLOC succeeds.
- vb2_queue.supports_requests = true on OUTPUT queue —
  without this v4l2-core rejects S_EXT_CTRLS(REQUEST_VAL).
- vb2_ops.buf_request_complete = daedalus_buf_request_complete
  → v4l2_ctrl_request_complete(req, &ctx->hdl). Without
  this v4l2-core WARNs at videobuf2-v4l2.c:440.
- vb2_ops.buf_out_validate: sets field=V4L2_FIELD_NONE on
  OUTPUT buf. Required for the same WARN check.
- requires_requests intentionally NOT set: lets the
  existing test_m2m_stream (direct QBUF, no request) keep
  working alongside the libva path.

Stateless control re-registration:
- Switched from v4l2_ctrl_new_std_compound(NULL p_def) to
  v4l2_ctrl_new_custom(&cfg, NULL) — pattern rkvdec /
  cedrus / hantro use. v4l2-core auto-fills elem_size +
  type from std table (verified: VP9_FRAME elem_size=168,
  matches sizeof(struct v4l2_ctrl_vp9_frame)).
- No-op s_ctrl callback so SET requests don't crash —
  daemon ignores values, FFmpeg re-parses the bitstream.

Verification on hertz (Pi 5, 6.12.75+rpt-rpi-2712):

  ffmpeg -hwaccel vaapi -i vp9_small.ivf …
  daemon: REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
  daemon: decoder: opened vp9 context
  daemon: decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe …

Same FNV-1a hash as the standalone test_m2m_stream produces
for the same VP9 keyframe. End-to-end through libva.

Remaining (Phase 8.13):
- S_EXT_CTRLS EINVAL on V4L2_CID_STATELESS_VP9_FRAME despite
  matching elem_size — needs deeper validate-path debugging.
- Once the request bind takes, buf_request_complete fires
  on buf_done, request fd signals completion, libva DQBUFs
  the decoded NV12.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-18 18:01:26 +00:00
parent 0de0288dce
commit a7d585eee8
2 changed files with 219 additions and 8 deletions
+161
View File
@@ -0,0 +1,161 @@
# Phase 8.12 closure — first VP9 frame decoded via libva
**Status:** closed 2026-05-18.
The libva path now drives our daemon end-to-end: a 1566-byte
VP9 keyframe submitted via `ffmpeg -hwaccel vaapi`
`libva-v4l2-request-fourier``/dev/video0` → daedalus_v4l2
kernel → REQ_DECODE on the chardev → daemon FFmpeg decode →
**byte-exact NV12 output** (FNV-1a `0x1eb34bfe` — same hash
as the standalone `test_m2m_stream` produces for the same
input).
The pixel-correct decode is the milestone. What's *not* yet
working is the libva-side request-completion signal: `ffmpeg`
times out waiting for the media_request to complete and reports
"Decoding error" even though the kernel-side decode succeeded.
That's Phase 8.13 scope.
## What lands
### Kernel V4L2 request API integration (`kernel/daedalus_v4l2_main.c`)
- **`media_device_ops.req_validate / req_queue`** wired in
Phase 8.11 — exposed `MEDIA_IOC_REQUEST_ALLOC` to userspace.
- **`vb2_queue.supports_requests = true`** on the OUTPUT
queue — without this v4l2-core rejects
`VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL)` before reaching any
driver code.
- **`vb2_ops.buf_request_complete = daedalus_buf_request_complete`**
— calls `v4l2_ctrl_request_complete(req, &ctx->hdl)` so the
per-request control state gets unbound when the request
completes. Without this v4l2-core WARNs at qbuf:
`videobuf2-v4l2.c:440: WARN_ON(!q->ops->buf_request_complete)`.
- **`vb2_ops.buf_out_validate`** — sets the OUTPUT buf's
`field = V4L2_FIELD_NONE`. Required on request-supporting
OUTPUT queues for the same WARN check (one above the
buf_request_complete one).
### Stateless control re-registration
Switched from `v4l2_ctrl_new_std_compound(NULL p_def)` to
`v4l2_ctrl_new_custom(&cfg, NULL)` with a no-op `s_ctrl`
callback — the pattern rkvdec / cedrus / hantro use for known
`V4L2_CID_STATELESS_*` ids. v4l2-core auto-fills `elem_size`
and `type` from its internal std-control table (verified via
debug prints: VP9_FRAME registers with `elem_size=168`,
matching `sizeof(struct v4l2_ctrl_vp9_frame)`).
## Verification
### End-to-end decode through libva
```
$ LIBVA_DRIVERS_PATH=… LIBVA_DRIVER_NAME=v4l2_request \
LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \
LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \
ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \
-hwaccel_output_format nv12 \
-i /tmp/vp9_small.ivf -f rawvideo -y /tmp/out.nv12
daemon log:
REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes
decoder: opened vp9 context
decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144
```
Daemon decoded byte-exact, same hash as the standalone test
path produces for the same VP9 keyframe.
### libva side hangs on the request-completion poll
```
v4l2-request: Unable to set control(s): Invalid argument
v4l2-request: Timeout when waiting for media request
v4l2-request: Unable to reinit media request: Device or resource busy
[vp9] Failed to end picture decode issue: 1 (operation failed)
[vp9] Decoding error: Input/output error
```
ffmpeg sees nothing because libva times out before any DQBUF.
`/tmp/out.nv12` ends up zero bytes.
The S_EXT_CTRLS still returns EINVAL on `V4L2_CID_STATELESS_
VP9_FRAME` (size=168, struct shape matches the kernel's
registered control), but the buffer apparently gets queued
ANYWAY (without proper request binding) — that's why the
decode runs. Then `buf_request_complete` never fires
because `vb->req_obj.req` is NULL when `buf_done` runs, and
libva's wait on the media_request fd times out.
## Design decisions
### Why ship even though libva still times out
The actual pixel-decode pipeline through the libva path
works. That's a major milestone — Phase 8.10/8.11 stopped
at "all setup ioctls succeed but no decode runs"; Phase 8.12
makes the decode run with byte-exact output.
What remains is plumbing on the V4L2 framework side: making
the request fd see the buffer's completion. That's a deeper
investigation (probably reachable by fixing the S_EXT_CTRLS
EINVAL so the request actually binds the buffer, then
buf_request_complete fires naturally on buf_done) — but it's
a distinct workstream from "is the decode itself working."
### Why the no-op s_ctrl is right for our daemon
FFmpeg in the daemon re-parses the bitstream itself; the
per-frame stateless controls from libva are advisory. Our
`s_ctrl` returns 0 unconditionally because we don't act on
the values — we just need v4l2-core to accept the writes so
the request validation passes.
If we ever wanted to skip FFmpeg's parse step and trust
libva's parsed values, `s_ctrl` would need to forward them
to the daemon over the chardev — that's a different
architecture (Phase 8.14+ if motivated).
### Why `supports_requests` without `requires_requests`
`requires_requests=true` would reject `VIDIOC_QBUF` without
a bound `media_request_fd`. That'd break our existing
test_m2m_stream which queues buffers directly (no request).
Keeping `requires_requests=false` lets both consumers
coexist: the libva path uses requests, the test client uses
direct QBUF. The cost is that libva queueing failures might
fall back to direct QBUF (which is what we see — the buffer
runs anyway after S_EXT_CTRLS fails), masking the actual
issue.
## What's NOT here (Phase 8.13 scope)
- **Fix the S_EXT_CTRLS EINVAL** so request bind actually
takes. Likely sub-issue: the request's per-handler
control state isn't initialised correctly when our
controls were registered via `v4l2_ctrl_new_custom`
may need explicit `v4l2_ctrl_request_setup` or different
registration approach.
- **Verify byte-exact CAPTURE buffer returned to ffmpeg**
after request fixes.
- **Multi-frame stream via libva** (P-frame reference handling
across requests).
- **AV1 + H.264 via libva** (different control sets, may
need adjustments).
## Phase 8.13 plan
1. Deep-dive S_EXT_CTRLS EINVAL: trace what v4l2-core
validates beyond elem_size. Possible suspects:
- `validate_new_compound` may require `ctrl->p_def` non-NULL
for std compound controls.
- The control's "request setup" callback may need
explicit registration.
2. Either resolve the EINVAL or document what cedrus/rkvdec
do differently (their controls are also `v4l2_ctrl_new_custom`
with bare `.id` — so why do theirs work?).
3. Once S_EXT_CTRLS succeeds, buf_request_complete should
fire naturally and the request poll should return.
4. End-to-end byte-exact verification.
+58 -8
View File
@@ -224,12 +224,15 @@ static int daedalus_register_stateless_ctrls(struct v4l2_ctrl_handler *hdl)
.ops = &daedalus_ctrl_ops, .ops = &daedalus_ctrl_ops,
.id = daedalus_stateless_ctrls[i], .id = daedalus_stateless_ctrls[i],
}; };
v4l2_ctrl_new_custom(hdl, &cfg, NULL); struct v4l2_ctrl *ctrl;
ctrl = v4l2_ctrl_new_custom(hdl, &cfg, NULL);
if (hdl->error) { if (hdl->error) {
pr_debug("daedalus_v4l2: skipping unsupported CID 0x%x (err=%d)\n", pr_debug("daedalus_v4l2: skipping unsupported CID 0x%x (err=%d)\n",
daedalus_stateless_ctrls[i], hdl->error); daedalus_stateless_ctrls[i], hdl->error);
hdl->error = 0; hdl->error = 0;
} }
(void) ctrl;
} }
return 0; return 0;
} }
@@ -397,14 +400,43 @@ static void daedalus_stop_streaming(struct vb2_queue *vq)
v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR); v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR);
} }
/*
* Phase 8.12: request API vb2 hooks.
*
* buf_out_validate: called on QBUF of an OUTPUT buf with a bound
* request fd. We only accept progressive (FIELD_NONE) frames, so
* normalise + accept. Without this op v4l2-core WARNs at
* vb2_queue_or_prepare_buf and rejects with EINVAL.
*
* buf_request_complete: called when a request completes or is
* cancelled; v4l2_ctrl_request_complete is the canonical helper
* (releases the per-request control state cloned off ctx->hdl).
*/
static int daedalus_buf_out_validate(struct vb2_buffer *vb)
{
struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb);
vbuf->field = V4L2_FIELD_NONE;
return 0;
}
static void daedalus_buf_request_complete(struct vb2_buffer *vb)
{
struct daedalus_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue);
v4l2_ctrl_request_complete(vb->req_obj.req, &ctx->hdl);
}
static const struct vb2_ops daedalus_qops = { static const struct vb2_ops daedalus_qops = {
.queue_setup = daedalus_queue_setup, .queue_setup = daedalus_queue_setup,
.buf_prepare = daedalus_buf_prepare, .buf_prepare = daedalus_buf_prepare,
.buf_queue = daedalus_buf_queue, .buf_queue = daedalus_buf_queue,
.start_streaming = daedalus_start_streaming, .start_streaming = daedalus_start_streaming,
.stop_streaming = daedalus_stop_streaming, .stop_streaming = daedalus_stop_streaming,
.wait_prepare = vb2_ops_wait_prepare, .wait_prepare = vb2_ops_wait_prepare,
.wait_finish = vb2_ops_wait_finish, .wait_finish = vb2_ops_wait_finish,
.buf_out_validate = daedalus_buf_out_validate,
.buf_request_complete = daedalus_buf_request_complete,
}; };
/* -- m2m queue init -------------------------------------------------- */ /* -- m2m queue init -------------------------------------------------- */
@@ -418,6 +450,24 @@ static int daedalus_queue_init(void *priv, struct vb2_queue *src_vq,
src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE; src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE;
src_vq->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF; src_vq->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF;
src_vq->drv_priv = ctx; src_vq->drv_priv = ctx;
/*
* Phase 8.12: Request API support on OUTPUT queue. vb2
* binds the per-request control state (the v4l2_ctrl_handler
* clone used by VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL)) to
* the OUTPUT queue. Without this flag v4l2-core rejects
* REQUEST_VAL writes before our s_ctrl is ever reached.
*/
src_vq->supports_requests = true;
/*
* requires_requests would reject any QBUF without a bound
* media_request — useful when the daemon truly needs the
* per-frame stateless controls to decode. Our daemon
* re-parses the bitstream so it doesn't actually need the
* controls; leaving requires_requests off lets non-request
* clients (test_m2m_stream etc.) keep working AND lets the
* libva path proceed even if its S_EXT_CTRLS bind didn't
* fully take.
*/
src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer); src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer);
src_vq->ops = &daedalus_qops; src_vq->ops = &daedalus_qops;
/* /*