iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 10:10:56 +00:00
parent a443ad73d3
commit bf67900cd8
9 changed files with 911 additions and 0 deletions
+62
View File
@@ -0,0 +1,62 @@
## Iteration 22 — Phase 4 (plan)
Opens 2026-05-14 following iter21's smoking-gun finding: libva's request-clone-handler is missing 6 of 7 HEVC stateless controls registered in main_hdl.
### Locked research question (iter22)
> *"At which control_id does `v4l2_ctrl_request_clone`'s iteration break for libva, and what error code does `handler_new_ref` return?"*
### Approach
Add three printks to `v4l2_ctrl_request_clone` in `drivers/media/v4l2-core/v4l2-ctrls-request.c`:
```c
pr_info("iter22_clone_start: new_hdl=%p from=%p\n", hdl, from);
// per iteration:
pr_info("iter22_clone_step: id=0x%x err=%d hdl_error=%d new_ref=%p\n",
ctrl->id, err, hdl->error, new_ref);
// on break:
pr_info("iter22_clone_break: at id=0x%x err=%d hdl_error=%d\n",
ctrl->id, err, hdl->error);
// on end:
pr_info("iter22_clone_end: hdl=%p err=%d\n", hdl, err);
```
Built as `linux-fresnel-fourier 7.0-6` (pkgrel 5→6). Deploy, reboot, run libva HEVC + kdirect HEVC. Diff.
### Outcome interpretation
| handler_new_ref return | hdl->error | Diagnosis |
|---|---|---|
| 0, new_ref=valid | 0 | Loop step succeeded — clone wouldn't break here. Look further. |
| 0, new_ref=NULL | 0 | Duplicate (skip silently). Means main_hdl has duplicate ctrl_refs — unlikely. |
| -ENOMEM | -ENOMEM | kzalloc failed. Memory pressure analysis needed. |
| 0, hdl->error=X | non-zero | Earlier auto-class-control insertion failed; subsequent handler_new_ref short-circuits. |
| -EINVAL | varies | Validation failed (e.g., overlapping ID range). |
### Coordinate with iter21 finding
If iter22 shows the loop breaks at 0xa40905 (H264_PRED_WEIGHTS) and again at 0xa40a91 (HEVC_PPS), the break must be UNREACHED by libva's iteration → means the **source main_hdl itself** doesn't have these controls.
If iter22 shows the loop reaches 0xa40a91 with err=0 (i.e., NOT a break), then libva's clone-hdl actually DOES contain HEVC_PPS, and our iter21 printk was missing it (e.g., a list-ordering bug in the iteration). Unlikely but worth checking.
### Substrate state at iter22 open
- Kernel `linux-fresnel-fourier 7.0-6` building on boltzmann (PID 1613982, log /tmp/iter22-kbuild.log).
- Backend SHA `c1d4bb53…` — unchanged from iter15.
- Fork tip `e109306` — unchanged.
- 5-codec anchors: unchanged.
### Phase 5 review
Diagnostic-only kernel patch (printk-only, no behavior change). Skipped per iter17 precedent.
### Phase 7 plan
After 7.0-6 deploys:
1. Reboot fresnel; sddm autologin reseats mfritsche.
2. `sudo dmesg -C`.
3. Run libva HEVC; capture iter22_clone_* lines.
4. `sudo dmesg -C`.
5. Run kdirect HEVC; capture same.
6. Diff. Localize the break or absence-from-source.