diff --git a/docs/phase_8_12_closure.md b/docs/phase_8_12_closure.md new file mode 100644 index 0000000..e9ff0a7 --- /dev/null +++ b/docs/phase_8_12_closure.md @@ -0,0 +1,161 @@ +# Phase 8.12 closure — first VP9 frame decoded via libva + +**Status:** closed 2026-05-18. + +The libva path now drives our daemon end-to-end: a 1566-byte +VP9 keyframe submitted via `ffmpeg -hwaccel vaapi` → +`libva-v4l2-request-fourier` → `/dev/video0` → daedalus_v4l2 +kernel → REQ_DECODE on the chardev → daemon FFmpeg decode → +**byte-exact NV12 output** (FNV-1a `0x1eb34bfe` — same hash +as the standalone `test_m2m_stream` produces for the same +input). + +The pixel-correct decode is the milestone. What's *not* yet +working is the libva-side request-completion signal: `ffmpeg` +times out waiting for the media_request to complete and reports +"Decoding error" even though the kernel-side decode succeeded. +That's Phase 8.13 scope. + +## What lands + +### Kernel V4L2 request API integration (`kernel/daedalus_v4l2_main.c`) + +- **`media_device_ops.req_validate / req_queue`** wired in + Phase 8.11 — exposed `MEDIA_IOC_REQUEST_ALLOC` to userspace. +- **`vb2_queue.supports_requests = true`** on the OUTPUT + queue — without this v4l2-core rejects + `VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL)` before reaching any + driver code. +- **`vb2_ops.buf_request_complete = daedalus_buf_request_complete`** + — calls `v4l2_ctrl_request_complete(req, &ctx->hdl)` so the + per-request control state gets unbound when the request + completes. Without this v4l2-core WARNs at qbuf: + `videobuf2-v4l2.c:440: WARN_ON(!q->ops->buf_request_complete)`. +- **`vb2_ops.buf_out_validate`** — sets the OUTPUT buf's + `field = V4L2_FIELD_NONE`. Required on request-supporting + OUTPUT queues for the same WARN check (one above the + buf_request_complete one). + +### Stateless control re-registration + +Switched from `v4l2_ctrl_new_std_compound(NULL p_def)` to +`v4l2_ctrl_new_custom(&cfg, NULL)` with a no-op `s_ctrl` +callback — the pattern rkvdec / cedrus / hantro use for known +`V4L2_CID_STATELESS_*` ids. v4l2-core auto-fills `elem_size` +and `type` from its internal std-control table (verified via +debug prints: VP9_FRAME registers with `elem_size=168`, +matching `sizeof(struct v4l2_ctrl_vp9_frame)`). + +## Verification + +### End-to-end decode through libva + +``` +$ LIBVA_DRIVERS_PATH=… LIBVA_DRIVER_NAME=v4l2_request \ + LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \ + LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \ + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ + -hwaccel_output_format nv12 \ + -i /tmp/vp9_small.ivf -f rawvideo -y /tmp/out.nv12 + +daemon log: + REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes + decoder: opened vp9 context + decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144 +``` + +Daemon decoded byte-exact, same hash as the standalone test +path produces for the same VP9 keyframe. + +### libva side hangs on the request-completion poll + +``` +v4l2-request: Unable to set control(s): Invalid argument +v4l2-request: Timeout when waiting for media request +v4l2-request: Unable to reinit media request: Device or resource busy +[vp9] Failed to end picture decode issue: 1 (operation failed) +[vp9] Decoding error: Input/output error +``` + +ffmpeg sees nothing because libva times out before any DQBUF. +`/tmp/out.nv12` ends up zero bytes. + +The S_EXT_CTRLS still returns EINVAL on `V4L2_CID_STATELESS_ +VP9_FRAME` (size=168, struct shape matches the kernel's +registered control), but the buffer apparently gets queued +ANYWAY (without proper request binding) — that's why the +decode runs. Then `buf_request_complete` never fires +because `vb->req_obj.req` is NULL when `buf_done` runs, and +libva's wait on the media_request fd times out. + +## Design decisions + +### Why ship even though libva still times out + +The actual pixel-decode pipeline through the libva path +works. That's a major milestone — Phase 8.10/8.11 stopped +at "all setup ioctls succeed but no decode runs"; Phase 8.12 +makes the decode run with byte-exact output. + +What remains is plumbing on the V4L2 framework side: making +the request fd see the buffer's completion. That's a deeper +investigation (probably reachable by fixing the S_EXT_CTRLS +EINVAL so the request actually binds the buffer, then +buf_request_complete fires naturally on buf_done) — but it's +a distinct workstream from "is the decode itself working." + +### Why the no-op s_ctrl is right for our daemon + +FFmpeg in the daemon re-parses the bitstream itself; the +per-frame stateless controls from libva are advisory. Our +`s_ctrl` returns 0 unconditionally because we don't act on +the values — we just need v4l2-core to accept the writes so +the request validation passes. + +If we ever wanted to skip FFmpeg's parse step and trust +libva's parsed values, `s_ctrl` would need to forward them +to the daemon over the chardev — that's a different +architecture (Phase 8.14+ if motivated). + +### Why `supports_requests` without `requires_requests` + +`requires_requests=true` would reject `VIDIOC_QBUF` without +a bound `media_request_fd`. That'd break our existing +test_m2m_stream which queues buffers directly (no request). + +Keeping `requires_requests=false` lets both consumers +coexist: the libva path uses requests, the test client uses +direct QBUF. The cost is that libva queueing failures might +fall back to direct QBUF (which is what we see — the buffer +runs anyway after S_EXT_CTRLS fails), masking the actual +issue. + +## What's NOT here (Phase 8.13 scope) + +- **Fix the S_EXT_CTRLS EINVAL** so request bind actually + takes. Likely sub-issue: the request's per-handler + control state isn't initialised correctly when our + controls were registered via `v4l2_ctrl_new_custom` — + may need explicit `v4l2_ctrl_request_setup` or different + registration approach. +- **Verify byte-exact CAPTURE buffer returned to ffmpeg** + after request fixes. +- **Multi-frame stream via libva** (P-frame reference handling + across requests). +- **AV1 + H.264 via libva** (different control sets, may + need adjustments). + +## Phase 8.13 plan + +1. Deep-dive S_EXT_CTRLS EINVAL: trace what v4l2-core + validates beyond elem_size. Possible suspects: + - `validate_new_compound` may require `ctrl->p_def` non-NULL + for std compound controls. + - The control's "request setup" callback may need + explicit registration. +2. Either resolve the EINVAL or document what cedrus/rkvdec + do differently (their controls are also `v4l2_ctrl_new_custom` + with bare `.id` — so why do theirs work?). +3. Once S_EXT_CTRLS succeeds, buf_request_complete should + fire naturally and the request poll should return. +4. End-to-end byte-exact verification. diff --git a/kernel/daedalus_v4l2_main.c b/kernel/daedalus_v4l2_main.c index 6680f0a..8a95ee9 100644 --- a/kernel/daedalus_v4l2_main.c +++ b/kernel/daedalus_v4l2_main.c @@ -224,12 +224,15 @@ static int daedalus_register_stateless_ctrls(struct v4l2_ctrl_handler *hdl) .ops = &daedalus_ctrl_ops, .id = daedalus_stateless_ctrls[i], }; - v4l2_ctrl_new_custom(hdl, &cfg, NULL); + struct v4l2_ctrl *ctrl; + + ctrl = v4l2_ctrl_new_custom(hdl, &cfg, NULL); if (hdl->error) { pr_debug("daedalus_v4l2: skipping unsupported CID 0x%x (err=%d)\n", daedalus_stateless_ctrls[i], hdl->error); hdl->error = 0; } + (void) ctrl; } return 0; } @@ -397,14 +400,43 @@ static void daedalus_stop_streaming(struct vb2_queue *vq) v4l2_m2m_buf_done(vbuf, VB2_BUF_STATE_ERROR); } +/* + * Phase 8.12: request API vb2 hooks. + * + * buf_out_validate: called on QBUF of an OUTPUT buf with a bound + * request fd. We only accept progressive (FIELD_NONE) frames, so + * normalise + accept. Without this op v4l2-core WARNs at + * vb2_queue_or_prepare_buf and rejects with EINVAL. + * + * buf_request_complete: called when a request completes or is + * cancelled; v4l2_ctrl_request_complete is the canonical helper + * (releases the per-request control state cloned off ctx->hdl). + */ +static int daedalus_buf_out_validate(struct vb2_buffer *vb) +{ + struct vb2_v4l2_buffer *vbuf = to_vb2_v4l2_buffer(vb); + + vbuf->field = V4L2_FIELD_NONE; + return 0; +} + +static void daedalus_buf_request_complete(struct vb2_buffer *vb) +{ + struct daedalus_ctx *ctx = vb2_get_drv_priv(vb->vb2_queue); + + v4l2_ctrl_request_complete(vb->req_obj.req, &ctx->hdl); +} + static const struct vb2_ops daedalus_qops = { - .queue_setup = daedalus_queue_setup, - .buf_prepare = daedalus_buf_prepare, - .buf_queue = daedalus_buf_queue, - .start_streaming = daedalus_start_streaming, - .stop_streaming = daedalus_stop_streaming, - .wait_prepare = vb2_ops_wait_prepare, - .wait_finish = vb2_ops_wait_finish, + .queue_setup = daedalus_queue_setup, + .buf_prepare = daedalus_buf_prepare, + .buf_queue = daedalus_buf_queue, + .start_streaming = daedalus_start_streaming, + .stop_streaming = daedalus_stop_streaming, + .wait_prepare = vb2_ops_wait_prepare, + .wait_finish = vb2_ops_wait_finish, + .buf_out_validate = daedalus_buf_out_validate, + .buf_request_complete = daedalus_buf_request_complete, }; /* -- m2m queue init -------------------------------------------------- */ @@ -418,6 +450,24 @@ static int daedalus_queue_init(void *priv, struct vb2_queue *src_vq, src_vq->type = V4L2_BUF_TYPE_VIDEO_OUTPUT_MPLANE; src_vq->io_modes = VB2_MMAP | VB2_USERPTR | VB2_DMABUF; src_vq->drv_priv = ctx; + /* + * Phase 8.12: Request API support on OUTPUT queue. vb2 + * binds the per-request control state (the v4l2_ctrl_handler + * clone used by VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL)) to + * the OUTPUT queue. Without this flag v4l2-core rejects + * REQUEST_VAL writes before our s_ctrl is ever reached. + */ + src_vq->supports_requests = true; + /* + * requires_requests would reject any QBUF without a bound + * media_request — useful when the daemon truly needs the + * per-frame stateless controls to decode. Our daemon + * re-parses the bitstream so it doesn't actually need the + * controls; leaving requires_requests off lets non-request + * clients (test_m2m_stream etc.) keep working AND lets the + * libva path proceed even if its S_EXT_CTRLS bind didn't + * fully take. + */ src_vq->buf_struct_size = sizeof(struct v4l2_m2m_buffer); src_vq->ops = &daedalus_qops; /*