Update campaign session doc: full-day arc closes at 3/3 PASS

This commit is contained in:
2026-05-14 15:34:22 +00:00
parent fde8a25779
commit 85cc1781e1
+57 -18
View File
@@ -1,9 +1,9 @@
## Campaign Session 2026-05-14 — Final Summary
## Campaign Session 2026-05-14 — Final Summary (post iter31)
### Starting state
- Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
- iter11iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
- iter17 (this session) added kernel printk in `rkvdec_hevc_run` and found rkvdec sees all-zero SPS contents for libva.
- iter17 (start of this campaign day) added kernel printk in `rkvdec_hevc_run` and found rkvdec sees all-zero SPS contents for libva.
### Iterations completed
| Iter | Type | Output |
@@ -18,21 +18,29 @@
| 24 | kernel printk for req_to_new + try_or_set_cluster returns | `try_or_set_cluster ret=-16` (-EBUSY) for HEVC_SPS |
| | | **ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue** |
| 25 | **α-25 synthetic SPS injection in libva** | **H.264 fully fixed (10F byte-equal to SW)**; **HEVC frame 1 fixed (byte-equal to SW)** |
| 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct; rkvdec doesn't use field |
| 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct field name match — but rkvdec ignores this field (mis-route) |
| 27 | α-27 num_entry_point_offsets from VAAPI | No-op (VAAPI returns 0; rkvdec doesn't use) |
| 28 | α-28 bit_size = (slice_data_size - data_byte_offset) * 8 | No-op (rkvdec doesn't use bit_size) |
| 28b | env-gated trim=40 (DIAG) | Refuted "universal 40-byte trim" theory; reverted |
| 29 | env-gated dump of HEVC slice_data trailing 80 bytes | Refuted "40-byte inflation" theory — trailing bytes are real entropy |
| 30 | env-gated `LIBVA_TS_SCALE` timestamp multiplier | Refuted "timestamp magnitude" theory; same wrong output for scales 1/1k/1M |
| 31 | extend kernel printk to dpb[2..3] + sl[32..64]; trace rkvdec assemble_sw_rps | Located `sl_params->short_term_ref_pic_set_size` consumer in `rkvdec_hevc.c:386-389` |
| | | **α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits** |
| | | **Bug 5 FULLY FIXED — HEVC 10F byte-equal to SW** |
### Final 5-codec state
| Codec | Status | Notes |
|---|---|---|
| H.264 | **PASS** (byte-equal SW, 10 frames) | Bug 4 fixed |
| HEVC | **frame 1 PASS** (byte-equal SW); frames 2+ DIVERGE | Bug 5 partial; frame 2+ rooted in ffmpeg-vaapi slice_data buffer 40-byte inflation vs ffmpeg-v4l2request — deferred |
| VP9 | **PASS** (byte-equal SW) | Unchanged |
| H.264 | **PASS** (10F byte-equal SW) | Bug 4 fixed iter25 α-25 |
| HEVC | **PASS** (10F byte-equal SW) | Bug 5 frame 1 fixed iter25 α-25; frames 2+ fixed iter31 α-29 |
| VP9 | **PASS** (10F byte-equal SW) | Unchanged through both fixes (no regression) |
| MPEG-2 | untestable on this kernel boot | Pre-existing libva single-device profile-probe limitation |
| VP8 | untestable on this kernel boot | Same |
### Root cause discovered
### Root causes discovered
**Bug 4 + Bug 5 frame 1** (`rkvdec_s_ctrl` -EBUSY):
`rkvdec_s_ctrl` on first HEVC_SPS / H264_SPS resolves `image_fmt` via `get_image_fmt()` and, if it differs from cached `ctx->image_fmt` (default `RKVDEC_IMG_FMT_ANY`), tries to reset the CAPTURE format. Reset blocked by `vb2_is_busy` (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:
- First per-frame SPS staged via `try_or_set_cluster``s_ctrl``rkvdec_s_ctrl` returns -EBUSY.
@@ -40,27 +48,58 @@
- `rkvdec_hevc_run_preamble` reads `ctx->ctrl_hdl[SPS]->p_cur` which is zero.
- Hardware sees w=0 h=0 → all-zero CAPTURE.
### Fix delivered
**Bug 5 frame 2+** (`sl_params->short_term_ref_pic_set_size` zero):
`src/context.c::RequestCreateContext` (α-25 commit `db0b7f9`, fixed `d062fec`):
`rkvdec_hevc.c::assemble_sw_rps` lines 386-389:
```c
if (!(decode_params->flags & V4L2_HEVC_DECODE_PARAM_FLAG_IDR_PIC)) {
if (sl_params->short_term_ref_pic_set_size)
st_bit_offset = sl_params->short_term_ref_pic_set_size;
else if (sps->num_short_term_ref_pic_sets > 1)
st_bit_offset = fls(sps->num_short_term_ref_pic_sets - 1);
}
```
libva set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). For BBB's `num_short_term_ref_pic_sets == 1`, fallback gives `fls(0)=0`. HW reads slice-header bits starting at offset 0 → consumes st_ref_pic_set() bytes as long-term-RPS / slice-header continuation → entropy decoder spins onto garbage state for every non-IDR slice. IDR is gated by `!IDR_PIC` check → frame 1 unaffected → consistent with iter25 "frame 1 PASS, frame 2+ FAIL" observation.
The mis-direction: α-26 routed `picture->st_rps_bits` into `decode_params->short_term_ref_pic_set_size` based on field name. But V4L2 has the field in BOTH `decode_params` and `slice_params` with the same name but DIFFERENT semantics:
- `decode_params` version: bit count of SPS-side st_ref_pic_set syntax (rkvdec doesn't read this).
- `slice_params` version: bit count of slice-header-side st_ref_pic_set syntax (rkvdec reads this).
VAAPI's `picture->st_rps_bits` per `/usr/include/va/va_dec_hevc.h:177-185` is documented as the slice-header bit count → belongs in slice_params.
### Fixes delivered
**α-25** (`src/context.c::RequestCreateContext`, commits `db0b7f9` + `d062fec`):
inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE `cap_pool_init`. CAPTURE queue is empty at this point → `vb2_is_busy=false``rkvdec_s_ctrl` resets and updates `ctx->image_fmt` → from then on per-frame SPS submissions see `image_fmt_changed=false` → no reset → no -EBUSY → SPS commits correctly.
**α-29** (`src/h265.c::h265_fill_slice_params`, commit `23eb1bd`):
```c
- slice_params->short_term_ref_pic_set_size = 0; /* VAAPI doesn't expose */
+ slice_params->short_term_ref_pic_set_size = picture->st_rps_bits;
```
### Remaining work
1. **HEVC frame 2+ divergence**: 40-byte slice_data buffer inflation between ffmpeg-vaapi vs ffmpeg-v4l2request. Need either (a) ffmpeg-vaapi-side investigation/patch to ensure consistent `size` parameter, (b) libva-backend bitstream parser to find HEVC rbsp_trailing_bits and trim. Deferred.
2. **MPEG-2 / VP8 multi-device probe**: libva backend's `find_codec_device` picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Deferred.
1. **Kernel substrate cleanup** (item underway as iter32): 7.0-10 has diagnostic printks. Build clean 7.0-11.
2. **MPEG-2 / VP8 multi-device probe**: libva backend's `find_codec_device` picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Orthogonal to Bug 4/5; design decision required from user.
3. **Backend env-gated diagnostics**: iter29 `LIBVA_HEVC_DUMP_SLICE_TAIL` and iter30 `LIBVA_TS_SCALE` are env-gated (no behavior change without env). Leave for future regression debugging or clean up at user's discretion. Low priority.
4. **α-26 dead-code**: cosmetic revert of `decode_params->short_term_ref_pic_set_size = picture->st_rps_bits` (mis-route; rkvdec ignores). Low priority.
### Commits
Campaign repo: `bf67900`, `02c4192`.
Campaign repo: `bf67900`, `02c4192`, `8b17bf7`, `c15fc6c`, `422ecaf`, `c1f9738`, `fde8a25`.
Backend fork: `db0b7f9`, `d062fec`, `66ef848`, `719d813`, `c9bfa21`, `754be1d`, `cd286d9` (final tip).
Backend fork: `db0b7f9`, `d062fec`, `66ef848`, `719d813`, `c9bfa21`, `754be1d`, `cd286d9`, `c555788` (reverted), `6646b16` (revert), `0eca3ff`, `68dbbdd`, `23eb1bd` (final tip).
Kernel substrate: `linux-fresnel-fourier 7.0-1` (clean baseline) was used; 7.0-2..7.0-9 added incremental diagnostic printks for iter12 RFC v2 + iter17iter27 root-cause investigation. The diagnostic kernels are NOT shipping; should revert to clean 7.0-X for production once campaign exits diagnostic mode.
Kernel substrate: `linux-fresnel-fourier 7.0-1` was the clean baseline. 7.0-2..7.0-10 added incremental diagnostic printks for iter12 RFC v2 + iter17iter31 root-cause investigation. Building clean 7.0-11 now (iter32). NOT shipping the diagnostic builds.
### Lesson
### Lessons
The wire-byte hypothesis arc (iter11iter18) chased an empirical illusion: libva's ioctl payloads WERE byte-correct but the BUG was in the interaction between libva's CAPTURE-pool TIMING and rkvdec's lazy `image_fmt` determination. 6 kernel-printk iterations narrowed the failure to one function returning one error code. The fix is 90 LOC in libva. The kernel was correct all along.
1. **Wire-byte equivalence ≠ behavior equivalence.** Iter11iter18's wire-byte hypotheses were chasing payload-content correctness, but the real bug was payload-RECEPTION at the kernel (image_fmt -EBUSY for Bug 4, st_rps_bits=0 for Bug 5 frame 2+). Both bugs are about field semantics that rkvdec READS, not bytes that libva WRITES.
The `[[feedback-libva-byte-correct-kernel-bug]]` memory entry was partially overturned: kernel-side -EBUSY semantics interact with libva-side allocation TIMING. Memory entry updated.
2. **Field name match is not field semantic match.** α-26 routed the right value (`picture->st_rps_bits`) into the wrong V4L2 field (`decode_params` vs `slice_params`). Both have the same field name. Different semantics. Reading the V4L2 spec docs + reading the kernel consumer's code (rkvdec's `assemble_sw_rps`) was what surfaced the mis-route.
3. **Theory iteration ROI**: in this campaign, iter27/28/29/30 spent significant effort on hypotheses (40-byte inflation, timestamp magnitude) that turned out wrong. iter31 reverse-traced from the kernel-side consumer (`assemble_sw_rps`) backward to the libva-side producer — finding the bug in one iteration. **The forward-from-libva theory loop was less efficient than the backward-from-kernel-consumer approach.** [[trace-fix-mechanism-to-consumer]] is a memory entry that captures this.
4. **The memory entry `[[libva-byte-correct-kernel-bug]]` was fully overturned**: both Bug 4 and Bug 5 are libva-side fixes, despite the empirical bytewise correctness of libva's OUTPUT buffers. The kernel was correct all along given the inputs it actually received. The libva inputs differed from kdirect's in subtle ways (image_fmt reset timing, slice_params field that rkvdec reads but libva zeroed).