## Iteration 21 — Phase 8 (close) Closes 2026-05-14. iter21 = kernel printk at top of `v4l2_ctrl_request_setup` + per-ref dump. FULL close. **Smoking-gun finding: libva's request-clone-handler is missing 6 HEVC stateless controls registered in main_hdl.** ### Method `linux-fresnel-fourier 7.0-5` (pkgrel 4→5) adds two `pr_info` to `v4l2_ctrl_request_setup` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`: ```c obj = media_request_object_find(req, &req_ops, main_hdl); pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n", req, main_hdl, obj); ... list_for_each_entry(ref, &hdl->ctrl_refs, node) { ... pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n", ctrl->id, ref->p_req_valid, have_new_data); ... } ``` Built ~1 min via ccache reuse. Deployed via scp + `pacman -U` + reboot. ### α-24 result (predicate: kernel-only path required) α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) implemented as 1547a5d → amended a9c897f → reverted e109306. Kernel returned **EACCES** for all 13 libva HEVC frames: this V4L2 build disallows userspace probing of `req->p_new` for an uncompleted request. The probe path must run inside the kernel. ### Result — definitive (libva vs kdirect) **libva HEVC frame 1 setup** (clone-hdl ctrl_refs in ID order, 14 entries): ``` 0x990a67 p_req_valid=0 0x990a6b p_req_valid=0 0x990b00 p_req_valid=0 0x990b67 p_req_valid=0 0x990b68 p_req_valid=0 (5 codec-class menu controls) 0xa40900 p_req_valid=0 H264_DECODE_MODE 0xa40901 p_req_valid=0 H264_START_CODE 0xa40902 p_req_valid=0 H264_SPS 0xa40903 p_req_valid=0 H264_PPS 0xa40904 p_req_valid=0 H264_SCALING_MATRIX 0xa40907 p_req_valid=0 H264_DECODE_PARAMS 0xa40a2c p_req_valid=0 (misc stateless) 0xa40a2d p_req_valid=0 (misc stateless) 0xa40a90 p_req_valid=1 have_new=1 HEVC_SPS — CLONE STOPS HERE ``` **Missing from libva clone (vs kdirect):** - 0xa40905 H264_PRED_WEIGHTS (compound) - 0xa40906 H264_SLICE_PARAMS (compound, dyn_array) - 0xa40a91 HEVC_PPS (compound) - 0xa40a92 HEVC_SLICE_PARAMS (compound, dyn_array) - 0xa40a93 HEVC_SCALING_MATRIX (compound) - 0xa40a94 HEVC_DECODE_PARAMS (compound) - 0xa40a95 HEVC_DECODE_MODE (menu) - 0xa40a96 HEVC_START_CODE (menu) **kdirect HEVC frame 1 setup** (same hdl, 21 entries — all of above PLUS the 8 missing): ``` ... 14 entries as above ... 0xa40a91 p_req_valid=1 have_new=1 HEVC_PPS 0xa40a92 p_req_valid=1 have_new=1 HEVC_SLICE_PARAMS 0xa40a93 p_req_valid=1 have_new=1 HEVC_SCALING_MATRIX 0xa40a94 p_req_valid=1 have_new=1 HEVC_DECODE_PARAMS 0xa40a95 p_req_valid=0 HEVC_DECODE_MODE (device-init only) 0xa40a96 p_req_valid=0 HEVC_START_CODE (device-init only) ``` ### What this means `v4l2_ctrl_request_setup(req, main_hdl)`: - finds `obj` for both libva and kdirect (non-NULL) — request properly bound. - iterates `hdl->ctrl_refs` — but **libva's hdl is the request-clone-hdl, and it contains 14 of the 21 source controls**. - libva's HEVC_SPS has `p_req_valid=1` — staging worked for that one control. - The other 6 HEVC controls (PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS, DECODE_MODE, START_CODE) **don't exist in the clone-hdl at all** — they cannot be staged. When libva submits its 5-control S_EXT_CTRLS batch (SPS, PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS), only SPS is registered in the clone-hdl. PPS, SLICE_PARAMS, SCALING_MATRIX, DECODE_PARAMS find no ref → `prepare_ext_ctrls` returns `-EINVAL`. (This contradicts iter18 α-22's rc=0 — needs re-investigation of error_idx semantics for the request path; the userspace observation of rc=0 may not reflect the actual kernel error for compound-control lookups in request clones.) ### iter20's "zero SPS bytes" explained iter20 showed `rkvdec sees sps[0..16]=00..00` for libva. That's because: - HEVC_SPS *is* in the clone-hdl with `p_req_valid=1` — so it got STAGED. - But the **content** in `req->p_new[SPS]` is all-zero. Two possible reasons for zero content despite p_req_valid=1: 1. `user_to_new` ran on a zero-payload from libva. iter15 strace ruled this out — libva's SPS payload is non-zero at ioctl entry. 2. `new_to_req` ran, but the data flow is somehow corrupted. Possible if the master/cluster lookup is wrong on the clone-hdl. iter22 candidate: add a printk in `new_to_req` and `req_to_new` to log the copy: source pointer, dest pointer, first 4 bytes, payload size. ### Mechanism status (post-iter21) | # | Mechanism | Status | |---|---|---| | 1 | request_fd mismatch | DISPROVED iter17/18 | | 2 | REINIT clears | DISPROVED iter19 | | 3 | Stack-locals stale | DISPROVED iter18 | | 4 | ctrl_hdl mismatch | REFRAMED iter20 | | 5 | error_idx silent failure | DISPROVED iter18 (but warrants re-check given iter21 finding) | | 6 | req->p_new staging incomplete | **CONFIRMED iter21**: clone-hdl missing controls = staging cannot occur for 6 of 7 HEVC controls | | 7 | **NEW iter21**: clone-hdl is missing controls that main_hdl has registered | **Root question for iter22** | ### Why is the clone incomplete? `v4l2_ctrl_request_clone(new_hdl, from=main_hdl)` iterates `main_hdl->ctrl_refs` in ID-sorted order. After cloning HEVC_SPS (0xa40a90), the loop **stops** before HEVC_PPS (0xa40a91). Equivalent stops happen at H264_PRED_WEIGHTS (0xa40905) — both are first compound controls of their codec block. Hypothesis: `handler_new_ref` returns non-zero error at the first compound control AFTER an SPS-like single-struct compound, but **only when called from the request-clone path**. Or: `kzalloc(sizeof(*new_ref) + size_extra_req)` fails for ones with larger `elem_size` (HEVC_PPS = 64 bytes, H264_PRED_WEIGHTS = 32 bytes — small, unlikely to OOM but worth verifying). Alt hypothesis: `handler_new_ref`'s auto-class-control insertion (`v4l2_ctrl_new_std`) fails for non-compound HEVC menu controls in request-clone path, which propagates `hdl->error` and breaks subsequent iterations. Same kernel succeeds for kdirect on the same `from` hdl, so something is **per-request-bind specific** — maybe related to request lifecycle timing in libva (iter6 permanent request_fd at CreateContext) vs kdirect (per-frame request_fd). ### Substrate state at iter21 close - Backend SHA on fresnel: `c1d4bb53…` (iter15 stable, unchanged). - Fork tip `e109306` (α-24 reverted). - Kernel `linux-fresnel-fourier 7.0-5` with iter17 + iter20 + iter21 printks. NOT a shipping kernel. - 5-codec anchors: unchanged. Zero regression. ### iter22 candidate Add printks to `v4l2_ctrl_request_clone` and `handler_new_ref`: ```c // in v4l2_ctrl_request_clone pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from); // per iteration err = handler_new_ref(hdl, ctrl, &new_ref, false, true); pr_info("iter22_clone_step: id=0x%x err=%d from_other=%d\n", ctrl->id, err, ref->from_other_dev); if (err) { pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n", ctrl->id, err, hdl->error); break; } ``` After 7.0-6 deploys, libva HEVC run will show exactly which ctrl_id breaks the loop and the error code. Then we can localize either to `kzalloc` failure, `v4l2_ctrl_new_std` failure (auto-class), or some other condition. ### Lesson iter21 overturns the iter11–iter18 hypothesis space entirely. The S_EXT_CTRLS ioctl wire-byte payload analysis was correct — libva's bytes match kdirect's. But **at the v4l2_ctrl framework level, libva's request-clone is missing the registered controls libva tries to stage**. The bug is in how the V4L2 control framework handles libva's specific request-binding pattern, NOT in libva's ioctl content. This is the strongest narrowing since iter17. We've gone from "anywhere in kernel" → "kernel control framework" → "request-clone path specifically" → "iteration breaks at first compound HEVC control".