Files
fresnel-fourier/CAMPAIGN_SESSION_2026_05_14.md

106 lines
8.8 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Campaign Session 2026-05-14 — Final Summary (post iter31)
### Starting state
- Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
- iter11iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
- iter17 (start of this campaign day) added kernel printk in `rkvdec_hevc_run` and found rkvdec sees all-zero SPS contents for libva.
### Iterations completed
| Iter | Type | Output |
|---|---|---|
| 17 | kernel printk in rkvdec_hevc_run | rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect |
| 18 | α-21/22 mechanism eliminations | Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED |
| 19 | α-23 REINIT test | Mechanism 2 (REINIT clears) DISPROVED |
| 20 | kernel printk for ctrl_hdl pointer + ctrl bytes | ctrl_hdl pointers stable; SPS bytes all-zero for libva |
| 21 | kernel printk in v4l2_ctrl_request_setup loop | HEVC_SPS has p_req_valid=1; loop exits after SPS |
| 22 | kernel printk in v4l2_ctrl_request_clone | Clone IS complete (22 controls cloned with err=0) |
| 23 | kernel printk for skip-reason | Loop EXITS at HEVC_SPS, doesn't skip |
| 24 | kernel printk for req_to_new + try_or_set_cluster returns | `try_or_set_cluster ret=-16` (-EBUSY) for HEVC_SPS |
| | | **ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue** |
| 25 | **α-25 synthetic SPS injection in libva** | **H.264 fully fixed (10F byte-equal to SW)**; **HEVC frame 1 fixed (byte-equal to SW)** |
| 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct field name match — but rkvdec ignores this field (mis-route) |
| 27 | α-27 num_entry_point_offsets from VAAPI | No-op (VAAPI returns 0; rkvdec doesn't use) |
| 28 | α-28 bit_size = (slice_data_size - data_byte_offset) * 8 | No-op (rkvdec doesn't use bit_size) |
| 28b | env-gated trim=40 (DIAG) | Refuted "universal 40-byte trim" theory; reverted |
| 29 | env-gated dump of HEVC slice_data trailing 80 bytes | Refuted "40-byte inflation" theory — trailing bytes are real entropy |
| 30 | env-gated `LIBVA_TS_SCALE` timestamp multiplier | Refuted "timestamp magnitude" theory; same wrong output for scales 1/1k/1M |
| 31 | extend kernel printk to dpb[2..3] + sl[32..64]; trace rkvdec assemble_sw_rps | Located `sl_params->short_term_ref_pic_set_size` consumer in `rkvdec_hevc.c:386-389` |
| | | **α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits** |
| | | **Bug 5 FULLY FIXED — HEVC 10F byte-equal to SW** |
### Final 5-codec state
| Codec | Status | Notes |
|---|---|---|
| H.264 | **PASS** (10F byte-equal SW) | Bug 4 fixed iter25 α-25 |
| HEVC | **PASS** (10F byte-equal SW) | Bug 5 frame 1 fixed iter25 α-25; frames 2+ fixed iter31 α-29 |
| VP9 | **PASS** (10F byte-equal SW) | Unchanged through both fixes (no regression) |
| MPEG-2 | untestable on this kernel boot | Pre-existing libva single-device profile-probe limitation |
| VP8 | untestable on this kernel boot | Same |
### Root causes discovered
**Bug 4 + Bug 5 frame 1** (`rkvdec_s_ctrl` -EBUSY):
`rkvdec_s_ctrl` on first HEVC_SPS / H264_SPS resolves `image_fmt` via `get_image_fmt()` and, if it differs from cached `ctx->image_fmt` (default `RKVDEC_IMG_FMT_ANY`), tries to reset the CAPTURE format. Reset blocked by `vb2_is_busy` (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:
- First per-frame SPS staged via `try_or_set_cluster``s_ctrl``rkvdec_s_ctrl` returns -EBUSY.
- `v4l2_ctrl_request_setup` outer loop breaks → SPS never committed to `ctx->ctrl_hdl`.
- `rkvdec_hevc_run_preamble` reads `ctx->ctrl_hdl[SPS]->p_cur` which is zero.
- Hardware sees w=0 h=0 → all-zero CAPTURE.
**Bug 5 frame 2+** (`sl_params->short_term_ref_pic_set_size` zero):
`rkvdec_hevc.c::assemble_sw_rps` lines 386-389:
```c
if (!(decode_params->flags & V4L2_HEVC_DECODE_PARAM_FLAG_IDR_PIC)) {
if (sl_params->short_term_ref_pic_set_size)
st_bit_offset = sl_params->short_term_ref_pic_set_size;
else if (sps->num_short_term_ref_pic_sets > 1)
st_bit_offset = fls(sps->num_short_term_ref_pic_sets - 1);
}
```
libva set `slice_params->short_term_ref_pic_set_size = 0` (stale "VAAPI doesn't expose" comment). For BBB's `num_short_term_ref_pic_sets == 1`, fallback gives `fls(0)=0`. HW reads slice-header bits starting at offset 0 → consumes st_ref_pic_set() bytes as long-term-RPS / slice-header continuation → entropy decoder spins onto garbage state for every non-IDR slice. IDR is gated by `!IDR_PIC` check → frame 1 unaffected → consistent with iter25 "frame 1 PASS, frame 2+ FAIL" observation.
The mis-direction: α-26 routed `picture->st_rps_bits` into `decode_params->short_term_ref_pic_set_size` based on field name. But V4L2 has the field in BOTH `decode_params` and `slice_params` with the same name but DIFFERENT semantics:
- `decode_params` version: bit count of SPS-side st_ref_pic_set syntax (rkvdec doesn't read this).
- `slice_params` version: bit count of slice-header-side st_ref_pic_set syntax (rkvdec reads this).
VAAPI's `picture->st_rps_bits` per `/usr/include/va/va_dec_hevc.h:177-185` is documented as the slice-header bit count → belongs in slice_params.
### Fixes delivered
**α-25** (`src/context.c::RequestCreateContext`, commits `db0b7f9` + `d062fec`):
inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE `cap_pool_init`. CAPTURE queue is empty at this point → `vb2_is_busy=false``rkvdec_s_ctrl` resets and updates `ctx->image_fmt` → from then on per-frame SPS submissions see `image_fmt_changed=false` → no reset → no -EBUSY → SPS commits correctly.
**α-29** (`src/h265.c::h265_fill_slice_params`, commit `23eb1bd`):
```c
- slice_params->short_term_ref_pic_set_size = 0; /* VAAPI doesn't expose */
+ slice_params->short_term_ref_pic_set_size = picture->st_rps_bits;
```
### Remaining work
1. **Kernel substrate cleanup** (item underway as iter32): 7.0-10 has diagnostic printks. Build clean 7.0-11.
2. **MPEG-2 / VP8 multi-device probe**: libva backend's `find_codec_device` picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Orthogonal to Bug 4/5; design decision required from user.
3. **Backend env-gated diagnostics**: iter29 `LIBVA_HEVC_DUMP_SLICE_TAIL` and iter30 `LIBVA_TS_SCALE` are env-gated (no behavior change without env). Leave for future regression debugging or clean up at user's discretion. Low priority.
4. **α-26 dead-code**: cosmetic revert of `decode_params->short_term_ref_pic_set_size = picture->st_rps_bits` (mis-route; rkvdec ignores). Low priority.
### Commits
Campaign repo: `bf67900`, `02c4192`, `8b17bf7`, `c15fc6c`, `422ecaf`, `c1f9738`, `fde8a25`.
Backend fork: `db0b7f9`, `d062fec`, `66ef848`, `719d813`, `c9bfa21`, `754be1d`, `cd286d9`, `c555788` (reverted), `6646b16` (revert), `0eca3ff`, `68dbbdd`, `23eb1bd` (final tip).
Kernel substrate: `linux-fresnel-fourier 7.0-1` was the clean baseline. 7.0-2..7.0-10 added incremental diagnostic printks for iter12 RFC v2 + iter17iter31 root-cause investigation. Building clean 7.0-11 now (iter32). NOT shipping the diagnostic builds.
### Lessons
1. **Wire-byte equivalence ≠ behavior equivalence.** Iter11iter18's wire-byte hypotheses were chasing payload-content correctness, but the real bug was payload-RECEPTION at the kernel (image_fmt -EBUSY for Bug 4, st_rps_bits=0 for Bug 5 frame 2+). Both bugs are about field semantics that rkvdec READS, not bytes that libva WRITES.
2. **Field name match is not field semantic match.** α-26 routed the right value (`picture->st_rps_bits`) into the wrong V4L2 field (`decode_params` vs `slice_params`). Both have the same field name. Different semantics. Reading the V4L2 spec docs + reading the kernel consumer's code (rkvdec's `assemble_sw_rps`) was what surfaced the mis-route.
3. **Theory iteration ROI**: in this campaign, iter27/28/29/30 spent significant effort on hypotheses (40-byte inflation, timestamp magnitude) that turned out wrong. iter31 reverse-traced from the kernel-side consumer (`assemble_sw_rps`) backward to the libva-side producer — finding the bug in one iteration. **The forward-from-libva theory loop was less efficient than the backward-from-kernel-consumer approach.** [[trace-fix-mechanism-to-consumer]] is a memory entry that captures this.
4. **The memory entry `[[libva-byte-correct-kernel-bug]]` was fully overturned**: both Bug 4 and Bug 5 are libva-side fixes, despite the empirical bytewise correctness of libva's OUTPUT buffers. The kernel was correct all along given the inputs it actually received. The libva inputs differed from kdirect's in subtle ways (image_fmt reset timing, slice_params field that rkvdec reads but libva zeroed).