iter20-26: kernel-side root-cause localization, α-25/α-26 fix Bug 4, partial Bug 5

iter20-23: kernel printk in rkvdec_hevc_run + v4l2_ctrl_request_setup
iter24:    pinpointed rkvdec_s_ctrl returning -EBUSY for HEVC_SPS due
           to vb2_is_busy(CAPTURE) — libva pre-allocates 24 CAPTURE bufs
           before first per-frame S_EXT_CTRLS, blocking image_fmt reset
iter25 α-25: synthetic SPS injection before cap_pool_init seeds
           ctx->image_fmt to RKVDEC_IMG_FMT_420_8BIT while CAPTURE is
           still empty. H264 Bug 4 fully fixed (byte-equal kdirect).
           HEVC Bug 5 frame 1 fixed (byte-equal kdirect).
iter26 α-26: populate decode_params.short_term_ref_pic_set_size from
           picture->st_rps_bits (VAAPI does expose it). Bytes 4-5 of
           dp now match kdirect. HEVC frame 2+ still diverges
           (separate bug, likely DPB entry mapping).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-14 10:10:56 +00:00
parent a443ad73d3
commit bf67900cd8
9 changed files with 911 additions and 0 deletions
+76
View File
@@ -0,0 +1,76 @@
## Iteration 26 — Phase 8 (close)
Closes 2026-05-14. iter26 = α-26 `decode_params.short_term_ref_pic_set_size` from `VAPictureParameterBufferHEVC.st_rps_bits`. PARTIAL close.
### α-26 fix
`src/h265.c::h265_fill_decode_params` — replaced the comment "VAAPI doesn't expose" with the actual assignment:
```c
decode_params->short_term_ref_pic_set_size = picture->st_rps_bits;
```
VAAPI's `VAPictureParameterBufferHEVC` exposes `st_rps_bits` (u32) as the bit-count of the inline `short_term_ref_pic_set` syntax element in the slice header. The previous comment in libva was wrong — the field IS exposed.
Fork `66ef848`.
### Empirical result
- **HEVC frame 1** (IDR): libva CAPTURE = kdirect CAPTURE byte-identical. ✓
- **HEVC frames 210**: still diverge. Hash unchanged from iter25 (`700aa52d…`).
- **decode_params bytes** (per iter20 kernel printk) NOW match kdirect for frames 1-3:
- libva frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00`
- kdirect frame 2: `dp[0..16] = 04 00 00 00 0a 00 00 00 01 01 00 00 00 00 00 00`
α-26 fixed the first 16 bytes of decode_params for libva. But the output is identical to iter25 — so the divergence-causing bytes are NOT in `decode_params[0..16]`.
### What still differs
12,234,632 of 13,824,000 bytes diverge (frames 2-10 nearly all bytes off). Frame 1 is 1,382,400 bytes — byte-identical.
`rkvdec_hevc_run` printk for libva still shows `reorder=4` (libva's incorrect `sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1`) vs kdirect's `reorder=2`. But kernel source search shows `sps_max_num_reorder_pics` is referenced ONLY in our diagnostic printk — rkvdec_hevc_run hardware setup doesn't use it. So that's not the cause.
The likely candidates for frame 2+ divergence:
1. **DPB entry mapping**: dpb[i].timestamp must match the CAPTURE buffer's timestamp. iter26 didn't probe this. Need to dump `dp[64..96]` (dpb[0..1] entries) and compare libva vs kdirect.
2. **CAPTURE buffer reuse pattern**: libva's iter5b-β has 24-slot LRU, kdirect has different pool size. Maybe the kernel's reference buffer association differs.
3. **slice_params bytes**: not yet inspected by printk.
### Regression check (non-HEVC anchors)
Run with αpha-25 + α-26 changes:
| Codec | Result | Notes |
|---|---|---|
| H.264 (bbb_1080p30_h264, 3 frames) | **HW = kdirect byte-equal** | Bug 4 FIXED |
| HEVC (bbb_720p10s_hevc, 1 frame) | HW = kdirect byte-equal | Frame 1 FIXED |
| HEVC (bbb_720p10s_hevc, 10 frames) | HW ≠ kdirect | Frame 2+ separate bug |
| VP9 (bbb_720p10s_vp9) | HW = SW byte-equal | Unchanged |
| MPEG-2 (bbb_720p10s_mpeg2) | **Not testable this boot** | Pre-existing: vainfo only advertises rkvdec profiles, hantro paths not multi-probed |
| VP8 (bbb_720p10s_vp8) | **Not testable this boot** | Same pre-existing issue |
The MPEG-2 / VP8 "not testable" state is due to libva backend's single-device auto-select (chose rkvdec at /dev/video1+/dev/media0 this boot). rkvdec doesn't advertise MPEG-2 / VP8 profiles. To test these, would need libva-side multi-device profile-probe or explicit env override.
### Major campaign milestone
**Bug 4 (H.264 keyframe-partial → all-correct)** and **Bug 5 (HEVC libva all-zero → frame 1 correct)** root causes identified and **PARTIALLY FIXED** in two iterations after the 6 kernel-printk diagnostic iterations narrowed the failure to `rkvdec_s_ctrl -EBUSY` on first SPS.
H.264 is now fully byte-equivalent to kdirect. HEVC has one remaining bug (frame 2+).
### Substrate state at iter26 close
- Backend fork tip: `66ef848` (α-25 + H264-flag-fix + α-26).
- Kernel `7.0-8` with diagnostic printks (will eventually revert to clean baseline).
- 5-codec status: H264 ✅, HEVC ⚠️ (frame 1 ✅), VP9 ✅, MPEG-2/VP8 untestable this boot.
### iter27 candidate
Add iter20-style kernel printk extension to dump `dp[64..96]` covering `dpb[0..1]` entries. Compare libva vs kdirect DPB entries for HEVC frame 2 to identify if it's:
- (a) timestamp mismatch → libva references a non-existent CAPTURE buffer.
- (b) pic_order_cnt_val mismatch.
- (c) DPB entry flags mismatch.
OR alternatively, just inspect slice_params bytes for frame 2 (rkvdec's `run.slices_params[0]` printk extension).
### Lesson
Two iterations of libva-side patches (α-25 = synthetic SPS, α-26 = st_rps_bits) after the 24-iteration kernel-printk localization fixed Bug 4 fully and Bug 5 partially. The campaign's wire-byte hypothesis arc (iter11-iter18) was overturned by kernel printk, but THEN the actual fix was almost entirely on the libva side. The kernel was correct.