iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 10:10:56 +00:00
parent a443ad73d3
commit bf67900cd8
9 changed files with 911 additions and 0 deletions
+60
View File
@@ -0,0 +1,60 @@
## Iteration 21 — Phase 4 (plan)
Opens 2026-05-14. Continues iter20's localization: rkvdec sees all-zero ctx->ctrl_hdl SPS for libva, real bytes for kdirect. The break is between userspace S_EXT_CTRLS and rkvdec's read of `ctx->ctrl_hdl[SPS].p_cur.p`.
### Locked research question (iter21)
> *"At the v4l2_ctrl_request_setup() entry for libva's per-frame request_fd, is the V4L2 control-handler object (`obj`) found? For each control_ref in the request's hdl->ctrl_refs, is `p_req_valid == true`?"*
### What this narrows
`v4l2_ctrl_request_setup(req, main_hdl)` at IOC_QUEUE time iterates `hdl->ctrl_refs` and only applies controls where `ref->p_req_valid == true`. The bit gets set by the staging path in `try_set_ext_ctrls_common` (called from `try_set_ext_ctrls_request`) when which=V4L2_CTRL_WHICH_REQUEST_VAL.
If libva's S_EXT_CTRLS staged correctly, p_req_valid is true for each ctrl libva submitted. If staging failed silently, p_req_valid is false and `v4l2_ctrl_request_setup` skips them — ctx->ctrl_hdl stays at zero (matches iter20 evidence).
### Approach (α-24 inert → kernel path)
α-24 (libva G_EXT_CTRLS readback after S_EXT_CTRLS) was implemented in 1547a5d+a9c897f, returned `EACCES` for all 13 libva HEVC frames. Reverted in e109306. The kernel disallows G_EXT_CTRLS against a not-yet-completed request — userspace can't probe `req->p_new`. Mechanism distinguishing requires kernel printk.
iter21 patches `v4l2_ctrl_request_setup` with two printk lines:
```c
pr_info("iter21_setup: req=%p main_hdl=%p obj=%p\n",
req, main_hdl, obj);
pr_info("iter21_setup_ref: ctrl_id=0x%x p_req_valid=%d have_new=%d\n",
ctrl->id, ref->p_req_valid, have_new_data);
```
Build `linux-fresnel-fourier 7.0-5` (pkgrel 4→5), deploy, reboot, run libva-HEVC + kdirect-HEVC, capture dmesg.
### Outcome interpretation
| obj at setup entry | p_req_valid (libva run) | Diagnosis |
|---|---|---|
| NULL | n/a | req has no v4l2_ctrl_handler bound at queue time. libva's S_EXT_CTRLS never staged. Bug in libva's request lifecycle. |
| non-NULL | all false | obj found, but staging path never set `p_req_valid`. Bug in `try_set_ext_ctrls_common` for libva's invocation. |
| non-NULL | true for SPS | staging worked but `req_to_new` / `try_or_set_cluster` failed silently. Bug in apply path. Needs another printk after `req_to_new`. |
iter21 finishes when one of these is confirmed for libva. Compare to kdirect baseline (should always show p_req_valid=true for SPS).
### Substrate state at iter21 open
- Kernel `linux-fresnel-fourier 7.0-5` building on boltzmann (PID 1584834, log /tmp/iter21-kbuild.log).
- Backend SHA `c1d4bb53…` (iter15 stable) — backend unchanged from iter15.
- Fork tip `e109306` (α-24 reverted).
- 5-codec anchors: unchanged. Zero regression.
### Phase 5 review note
Diagnostic kernel patch (2 pr_info calls in well-known V4L2 framework function, no behavior change). Phase 5 review skipped per iter17 precedent for diagnostic-only kernel work.
### Phase 7 plan
After 7.0-5 deploys:
1. Reboot fresnel; sddm autologin reseats mfritsche.
2. `sudo dmesg -C`.
3. Run libva HEVC; capture rkvdec_iter20 + iter21_setup lines.
4. `sudo dmesg -C`.
5. Run kdirect HEVC; capture same.
6. Diff; localize bug to one of the three table-row diagnoses.