iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 10:10:56 +00:00
parent a443ad73d3
commit bf67900cd8
9 changed files with 911 additions and 0 deletions
+94
View File
@@ -0,0 +1,94 @@
## Iteration 25 — Phase 8 (close)
Closes 2026-05-14. iter25 = α-25 synthetic-SPS injection before cap_pool_init. **MAJOR WIN.** PARTIAL close — frame 1 byte-identical to kdirect for HEVC libva; frames 2+ have separate wire-byte issue (decode_params).
### α-25 implementation
`src/context.c::RequestCreateContext` — after S_FMT(OUTPUT) + S_FMT(CAPTURE) + G_FMT(CAPTURE) sanity, BEFORE `cap_pool_init`:
```c
switch (config_object->profile) {
case VAProfileHEVCMain: {
struct v4l2_ctrl_hevc_sps dummy_sps;
memset(&dummy_sps, 0, sizeof(dummy_sps));
dummy_sps.chroma_format_idc = 1; /* 4:2:0 */
dummy_sps.bit_depth_luma_minus8 = 0; /* 8-bit */
dummy_sps.bit_depth_chroma_minus8 = 0;
dummy_sps.pic_width_in_luma_samples = picture_width;
dummy_sps.pic_height_in_luma_samples = picture_height;
/* ... v4l2_set_controls(video_fd, request_fd=-1, &SPS, 1) ... */
}
case VAProfileH264*: similar with V4L2_CID_STATELESS_H264_SPS
default: skip
}
```
Forks `db0b7f9` — single commit.
### Result — definitive
**Frame 1**: libva CAPTURE bytes = kdirect CAPTURE bytes (cmp identical for first 1382400 bytes, the entire frame 1 NV12 payload of 1280×720).
**Frame 2+**: diverge starting at byte 1382401.
### Kernel printk evidence (post-α-25)
```
iter24_req_to_new: id=0xa40a90 ret=0 p_req_valid=1 p_req_elems=1
iter24_try_or_set: master_id=0xa40a90 ret=0 ← was -16 (EBUSY) before
iter24_req_to_new: id=0xa40a91 ret=0
iter24_try_or_set: master_id=0xa40a91 ret=0
iter24_req_to_new: id=0xa40a92 ret=0
iter24_try_or_set: master_id=0xa40a92 ret=0
iter24_req_to_new: id=0xa40a93 ret=0
iter24_try_or_set: master_id=0xa40a93 ret=0
iter24_req_to_new: id=0xa40a94 ret=0
iter24_try_or_set: master_id=0xa40a94 ret=0
rkvdec_iter20: sps[0..16]=00 00 00 05 d0 02 00 00 04 04 04 00 01 01 00 03
← non-zero, w=1280, h=720
rkvdec_hevc_run: w=1280 h=720 chroma=1 nal_unit_type=20 slice_type=2 decode_flags=0x3
← rkvdec sees CORRECT SPS for the first time
```
`iter24_loop_break-count = 0` — the setup loop NEVER breaks. All 5 staged HEVC controls commit to ctx->ctrl_hdl successfully.
### Bug 5 root cause: FIXED
The -EBUSY block from rkvdec_s_ctrl's vb2_is_busy check is gone. ctx->image_fmt is pre-seeded to RKVDEC_IMG_FMT_420_8BIT by the synthetic SPS injection before any CAPTURE buffer is allocated. Per-frame SPS submissions find image_fmt_changed=false → skip reset → commit succeeds.
### Frame 2+ divergence (separate Bug)
`decode_params.short_term_ref_pic_set_size`:
- libva frame 2: bytes 4-5 = `00 00` → 0
- kdirect frame 2: bytes 4-5 = `0a 00` → 10
libva's `h265_fill_decode_params` doesn't populate short_term_ref_pic_set_size (VAAPI doesn't expose it). kdirect parses it from the HEVC NAL directly. This affects DPB reference resolution for P/B frames. iter26 candidate.
### Mechanism status
| # | Mechanism | Status |
|---|---|---|
| 9 | rkvdec_s_ctrl -EBUSY on first SPS | **FIXED iter25 α-25** |
| 10 | decode_params.short_term_ref_pic_set_size = 0 | **NEW iter26 candidate** |
### Substrate state at iter25 close
- Backend SHA on fresnel: post-α-25 build (commit `db0b7f9`).
- Fork tip `db0b7f9` (α-25).
- Kernel `linux-fresnel-fourier 7.0-8` (diagnostic printks; should eventually revert to clean 7.0-1 + RFC v2 + iter12 baseline).
- HEVC libva frame 1 = kdirect frame 1 byte-identical. ✓✓✓
- HEVC libva frame 2+: differs.
### Anchors check pending
Need to re-run 5-codec anchors to verify α-25 didn't regress VP9/MPEG-2/VP8 (it shouldn't — guard is `case VAProfileHEVCMain` / `case VAProfileH264*` only).
### Lesson
After 15 iterations chasing wire-byte hypotheses (iter11-iter18), 5 iterations of kernel printk (iter17-iter24), the actual bug was an interaction between libva's CAPTURE-pre-allocate design and rkvdec's lazy image_fmt determination. The fix is 90 LOC in libva. The kernel was correct all along — it just needed a way to commit the image_fmt before buffers were locked in.
This validates [[feedback-libva-byte-correct-kernel-bug]] only partially: libva WAS byte-correct in its ioctl content, but it had a CAPTURE-pool-allocation TIMING bug that interacted with kernel state. The bug is in libva, not the kernel, but the symptom only manifested because of kernel-side -EBUSY semantics that aren't well documented.
### iter26 candidate
Fix `h265_fill_decode_params` to populate `short_term_ref_pic_set_size`. VAAPI doesn't expose this directly, but it can be derived from `surface_object->params.h265.slices[0].short_term_ref_pic_set_size` (if VAAPI provides it) or parsed from the slice header.