diff --git a/docs/phase_8_13_closure.md b/docs/phase_8_13_closure.md new file mode 100644 index 0000000..f8b234e --- /dev/null +++ b/docs/phase_8_13_closure.md @@ -0,0 +1,200 @@ +# Phase 8.13 closure — byte-exact end-to-end via libva + +**Status:** closed 2026-05-18. + +The project's consumer-side goal landed: a real VAAPI consumer +(ffmpeg with `-hwaccel vaapi`) drives our libva backend → V4L2 +driver → daemon → byte-exact NV12 output back to ffmpeg. + +``` +ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ + -hwaccel_output_format nv12 -i vp9_small.ivf \ + -f rawvideo -y /tmp/vp9_via_libva.nv12 + +cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 # ← match +``` + +18432 bytes, byte-for-byte identical to plain +`ffmpeg -pix_fmt nv12 -f rawvideo` software decode of the same +VP9 keyframe. The `project_consumer_target` memory's deliverable +shape — "V4L2 stateless node consumed by a real VAAPI client" — +is achieved. + +## What lands + +Two related kernel changes that unstick the libva request +completion handshake: + +### 1. Stateless control handler initialisation + +`v4l2_ctrl_handler_setup(&ctx->hdl)` after registration — +matches rkvdec/cedrus/hantro. Brings each registered compound +control out of "uninitialised" state via the std_init_compound +defaults (e.g. VP9_FRAME gets `profile=0, bit_depth=8`). + +### 2. Per-request control completion in the decode path + +The actual root cause of "Timeout when waiting for media +request": + +- vb2-core's `vb2_buffer_done` unbinds the BUFFER's req_obj on + normal decode completion. +- But the per-request CONTROL object stays bound until + `v4l2_ctrl_request_complete` runs. +- The vb2 `buf_request_complete` op fires only from queue-cancel + paths (vb2-core line 2284), NOT from normal buf_done. +- The driver must call `v4l2_ctrl_request_complete(req, hdl)` + explicitly from its decode-completion path. + +Fix (in `kernel/daedalus_v4l2_main.c`): + +```c +struct daedalus_inflight { + ... + struct media_request *req; /* captured from src_buf */ +}; + +static void daedalus_device_run(void *priv) { + ... + inf->req = src_buf->vb2_buf.req_obj.req; + ... +} + +void daedalus_complete_resp_frame(...) { + ... + if (inf->req) + v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl); + v4l2_m2m_buf_done_and_job_finish(...); +} +``` + +For non-request flows (test_m2m_stream's direct QBUF) +`inf->req` is NULL; the conditional skips the +`v4l2_ctrl_request_complete` call. Both consumer styles +work concurrently. + +### 3. Diagnostic improvements + +- libva-v4l2-request-fourier `src/v4l2.c`: better error + logging in `v4l2_set_controls` (logs `error_idx`, failing + control id, size). Made the diagnosis above tractable. + +## Verification + +### End-to-end via ffmpeg + libva, byte-exact + +``` +$ pkill -f daedalus_v4l2_daemon; sudo rmmod daedalus_v4l2 +$ sudo insmod kernel/daedalus_v4l2.ko +$ daedalus_v4l2_daemon -v daemon & + +$ LIBVA_DRIVERS_PATH=/home/mfritsche/src/libva-v4l2-request-fourier/build/src \ + LIBVA_DRIVER_NAME=v4l2_request \ + LIBVA_V4L2_REQUEST_VIDEO_PATH=/dev/video0 \ + LIBVA_V4L2_REQUEST_MEDIA_PATH=/dev/media3 \ + ffmpeg -hwaccel vaapi -hwaccel_device /dev/dri/renderD128 \ + -hwaccel_output_format nv12 -i /tmp/vp9_small.ivf \ + -f rawvideo -y /tmp/vp9_via_libva.nv12 + + v4l2-request: cap_pool_init: 24 slots ready + v4l2-request: Unable to set control(s): EINVAL (H264 probe — harmless) + v4l2-request: Unable to set control(s): EINVAL (HEVC probe — harmless) + (no timeout, no decode error) + +daemon log: + REQ_DECODE cookie=1 codec=1 bitstream=1566 bytes capture=128x96 1 planes + decoder: opened vp9 context + decoder: OK 128x96 fmt=0 (yuv420p) fnv1a=0x1eb34bfe luma=12288 chroma=6144 + +$ ls -la /tmp/vp9_via_libva.nv12 +-rw-r--r-- 1 root root 18432 May 18 20:13 /tmp/vp9_via_libva.nv12 + +$ ffmpeg -i /tmp/vp9_small.ivf -pix_fmt nv12 -f rawvideo \ + -y /tmp/vp9_ref_for_libva.nv12 +$ cmp /tmp/vp9_via_libva.nv12 /tmp/vp9_ref_for_libva.nv12 +$ echo $? +0 +``` + +Byte-for-byte match. The two `Unable to set control(s): +EINVAL` messages are libva probing H264 + HEVC PROFILE/LEVEL +integer controls during config negotiation — we don't +register those (since we expose VP9/AV1/H264 stateless), libva +gets EINVAL, logs it, and moves on. Functional flow is +unaffected. + +## Design analysis + +### Why was Phase 8.12 close but not complete + +8.12 wired all the request API hooks (supports_requests, +buf_out_validate, buf_request_complete) and the daemon-side +decode worked byte-exact — but the per-request control object +stayed bound forever because `buf_request_complete` only fires +from queue-cancel paths in vb2-core, not from normal buf_done. +Result: request never transitioned to COMPLETE, libva poll +timed out. + +Phase 8.13 closes that loop by capturing the media_request from +the OUTPUT vb2_buffer's req_obj at device_run time and calling +`v4l2_ctrl_request_complete` explicitly when the decode +finishes (chardev RESP_FRAME path). Mirrors what rkvdec does +from its IRQ handler and cedrus from its device_run completion. + +### Why the EINVAL noise was misleading + +Earlier phases (8.10–8.12) kept seeing "Unable to set +control(s): Invalid argument" and assumed it pointed at our +stateless control registration. strace revealed three +separate S_EXT_CTRLS calls per frame: + +| # | controls | result | meaning | +|---|----------|--------|---------| +| 1 | H264_PROFILE + H264_LEVEL | EINVAL | libva probes H264; we don't register it | +| 2 | HEVC_PROFILE + HEVC_LEVEL | EINVAL | libva probes HEVC; we don't register it | +| 3 | VP9_FRAME + VP9_COMPRESSED_HDR | OK | actual decode controls | + +Calls 1 and 2 are harmless: libva detects we don't support +H264/HEVC integer probes and falls back to the stateless +controls it does have. Call 3 (the actual VP9 stateless +controls) succeeded all along. Only the completion handshake +was broken. + +Phase 8.13's added `error_idx` logging in v4l2.c +(`failing_ctrl_id=0xa40900 size=0` etc.) is what made the +distinction visible. + +### Why one fix unblocked both 8.13 and 8.14 + +The original plan split Phase 8.13 ("trace the EINVAL") from +Phase 8.14 ("call request_complete from the right place"). +Once strace clarified that the EINVAL was probe noise, the +real fix was just the request_complete call from the decode +path — a 10-line change. Doing both in one shot avoided a +phase boundary that wouldn't have shipped anything additional. + +## What's NOT here (Phase 8.14+ scope) + +- **Multi-frame stream via libva.** Verified single keyframe; + P-frame reference handling across requests untested. Likely + works (the daemon's AVCodecContext is persistent across + REQ_DECODE calls — already proven via test_m2m_stream). +- **AV1 + H.264 via libva.** Different stateless control + sets; needs the same control-payload validation path. May + need similar `v4l2_ctrl_request_complete` adjustments per + codec. +- **mpv + Firefox end-to-end.** The lower-level harness + (ffmpeg vaapi) works; higher-level consumers should follow + but each has its own VAAPI quirks. +- **The two harmless EINVALs from H264/HEVC profile probes.** + Could be suppressed by registering those integer controls + too (always rejecting writes) but that's a polish item. + +## Phase 8.14 plan + +1. Multi-frame VP9 stream via libva (re-use vp9_60s.ivf from + Phase 8.9 stress test). +2. AV1 + H.264 single-frame via libva (likely needs codec- + specific tweaks). +3. Document any remaining libva-side quirks for higher-level + consumers (mpv, Firefox). diff --git a/docs/roadmap.md b/docs/roadmap.md index c22d1ec..20a8262 100644 --- a/docs/roadmap.md +++ b/docs/roadmap.md @@ -134,18 +134,64 @@ See `docs/phase_8_8_closure.md`. See `docs/phase_8_9_closure.md`. -### Phase 8.10 — libva-v4l2-request VP9/AV1 patch + end-to-end consumer +### Phase 8.10 + 8.11 — libva consumer integration scaffold (closed 2026-05-18) -1. Build libva-v4l2-request from source on hertz. -2. Add VP9_FRAME + AV1_FRAME profile mappings; add - V4L2_PIX_FMT_NV12 (single-plane) to our CAPTURE so - the library's video.c picks us. -3. End-to-end: `mpv --hwdec=vaapi` against test files; - then Firefox. -4. (Stretch) Upstream the patches to bootlin. +- Forked bootlin/libva-v4l2-request to marfrit/libva-v4l2- + request-fourier (gitea); discovered the existing fourier + fork already had VP9/AV1/HEVC support on Rockchip. +- Added daedalus_v4l2 to known_decoder_drivers + meson + build option. +- Added V4L2_PIX_FMT_NV12 single-plane + request API + media ops + stateless control registration to our + kernel. +- vainfo enumerates 7 VAProfile entries via our driver. -After 8.10 the project's user-facing loop is closed. -Optimisation phases (QPU dispatch, 4K) ship when motivated. +See `docs/phase_8_10_11_closure.md`. + +### Phase 8.12 — first VP9 frame decoded via libva (closed 2026-05-18) + +- vb2_queue supports_requests; vb2_ops buf_out_validate + + buf_request_complete; v4l2_ctrl_new_custom for stateless + ctrl registration. +- Daemon decoded byte-exact VP9 keyframe via the full + libva path (FNV-1a 0x1eb34bfe matches standalone). +- ffmpeg still timed out waiting for media_request + completion (request bind state). + +See `docs/phase_8_12_closure.md`. + +### Phase 8.13 — byte-exact end-to-end via libva (closed 2026-05-18) + +- Traced the misleading "Unable to set control(s): + Invalid argument" — actually libva probing H264/HEVC + PROFILE/LEVEL we don't expose (harmless); the real VP9 + stateless control SET succeeds. +- Diagnosed the "Timeout when waiting for media request" + root cause: per-request control object stays bound + because vb2's normal buf_done path doesn't fire + buf_request_complete (only queue-cancel does). +- Fix: capture media_request from src_buf in inflight + entry, call v4l2_ctrl_request_complete from the + completion path before buf_done_and_job_finish. +- **Byte-exact end-to-end**: ffmpeg -hwaccel vaapi → + libva-v4l2-request-fourier → /dev/video0 → + daedalus_v4l2 → daemon → 18432-byte NV12 byte-for-byte + identical to ffmpeg software reference. +- **Project consumer target hit**: V4L2 stateless node + consumed by a real VAAPI client. + +See `docs/phase_8_13_closure.md`. + +### Phase 8.14 — multi-frame + AV1/H.264 + higher-level consumers + +1. Multi-frame VP9 stream via libva (vp9_60s.ivf from 8.9 + stress test) — P-frame references across requests. +2. AV1 + H.264 single-frame via libva. +3. mpv --hwdec=vaapi end-to-end. +4. Firefox / WebRTC if motivated. + +Optimisation work (QPU dispatch, 4K, HDR-in-libva) ships +when there's a concrete user-facing need. ## Effort estimate diff --git a/kernel/daedalus_v4l2_main.c b/kernel/daedalus_v4l2_main.c index 8a95ee9..6dfd6cf 100644 --- a/kernel/daedalus_v4l2_main.c +++ b/kernel/daedalus_v4l2_main.c @@ -520,6 +520,14 @@ struct daedalus_inflight { struct daedalus_ctx *ctx; struct vb2_v4l2_buffer *src_buf; struct vb2_v4l2_buffer *dst_buf; + /* + * Captured media_request the src_buf was bound to (if any). + * Set by device_run from src_buf->vb2_buf.req_obj.req; + * consumed by the completion path to call + * v4l2_ctrl_request_complete + signal request fd. NULL for + * non-request flows (e.g. test_m2m_stream direct QBUF). + */ + struct media_request *req; }; static struct daedalus_inflight * @@ -666,6 +674,14 @@ static void daedalus_device_run(void *priv) inf->ctx = ctx; inf->src_buf = src_buf; inf->dst_buf = dst_buf; + /* + * Capture the bound media_request (if any) so the + * completion path can call v4l2_ctrl_request_complete + + * trigger MEDIA_REQUEST_STATE_COMPLETE. vb2-core's normal + * buf_done path unbinds the buffer's req_obj but leaves the + * control object bound — the driver has to complete it. + */ + inf->req = src_buf->vb2_buf.req_obj.req; mutex_lock(&dev->inflight_lock); list_add_tail(&inf->list, &dev->inflight); @@ -789,6 +805,22 @@ void daedalus_complete_resp_frame(u32 cookie, } } + /* + * Phase 8.14: if the src_buf was bound to a media_request + * (libva-driven decode path), complete the per-request + * control state BEFORE buf_done_and_job_finish. vb2-core's + * buf_done unbinds the buffer's req_obj on its own, but the + * control object stays bound until v4l2_ctrl_request_complete + * runs — only after BOTH objects unbind does the request + * transition to MEDIA_REQUEST_STATE_COMPLETE and wake any + * userspace poll on the request fd. + * + * For non-request flows (test_m2m_stream direct QBUF) inf->req + * is NULL and v4l2_ctrl_request_complete just no-ops. + */ + if (inf->req) + v4l2_ctrl_request_complete(inf->req, &inf->ctx->hdl); + /* * Use the buf_done_and_job_finish helper rather than plain * buf_done + job_finish: the helper pops the buffers off @@ -968,6 +1000,15 @@ static int daedalus_open(struct file *file) v4l2_ctrl_handler_init(&ctx->hdl, ARRAY_SIZE(daedalus_stateless_ctrls)); daedalus_register_stateless_ctrls(&ctx->hdl); + /* + * v4l2_ctrl_handler_setup runs s_ctrl for every registered + * control with its default value — required to bring each + * control out of "uninitialised" state. Without this the + * per-request handler clone path returns EINVAL on + * VIDIOC_S_EXT_CTRLS(which=REQUEST_VAL). rkvdec/cedrus/ + * hantro all call this after registration. + */ + v4l2_ctrl_handler_setup(&ctx->hdl); ctx->fh.ctrl_handler = &ctx->hdl; daedalus_fill_output_fmt(&ctx->src_fmt,