## Campaign Session 2026-05-14 — Final Summary ### Starting state - Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved. - iter11–iter18 had eliminated 8 wire-byte hypotheses without finding the root cause. - iter17 (this session) added kernel printk in `rkvdec_hevc_run` and found rkvdec sees all-zero SPS contents for libva. ### Iterations completed | Iter | Type | Output | |---|---|---| | 17 | kernel printk in rkvdec_hevc_run | rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect | | 18 | α-21/22 mechanism eliminations | Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED | | 19 | α-23 REINIT test | Mechanism 2 (REINIT clears) DISPROVED | | 20 | kernel printk for ctrl_hdl pointer + ctrl bytes | ctrl_hdl pointers stable; SPS bytes all-zero for libva | | 21 | kernel printk in v4l2_ctrl_request_setup loop | HEVC_SPS has p_req_valid=1; loop exits after SPS | | 22 | kernel printk in v4l2_ctrl_request_clone | Clone IS complete (22 controls cloned with err=0) | | 23 | kernel printk for skip-reason | Loop EXITS at HEVC_SPS, doesn't skip | | 24 | kernel printk for req_to_new + try_or_set_cluster returns | `try_or_set_cluster ret=-16` (-EBUSY) for HEVC_SPS | | | | **ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue** | | 25 | **α-25 synthetic SPS injection in libva** | **H.264 fully fixed (10F byte-equal to SW)**; **HEVC frame 1 fixed (byte-equal to SW)** | | 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct; rkvdec doesn't use field | | 27 | α-27 num_entry_point_offsets from VAAPI | No-op (VAAPI returns 0; rkvdec doesn't use) | | 28 | α-28 bit_size = (slice_data_size - data_byte_offset) * 8 | No-op (rkvdec doesn't use bit_size) | ### Final 5-codec state | Codec | Status | Notes | |---|---|---| | H.264 | **PASS** (byte-equal SW, 10 frames) | Bug 4 fixed | | HEVC | **frame 1 PASS** (byte-equal SW); frames 2+ DIVERGE | Bug 5 partial; frame 2+ rooted in ffmpeg-vaapi slice_data buffer 40-byte inflation vs ffmpeg-v4l2request — deferred | | VP9 | **PASS** (byte-equal SW) | Unchanged | | MPEG-2 | untestable on this kernel boot | Pre-existing libva single-device profile-probe limitation | | VP8 | untestable on this kernel boot | Same | ### Root cause discovered `rkvdec_s_ctrl` on first HEVC_SPS / H264_SPS resolves `image_fmt` via `get_image_fmt()` and, if it differs from cached `ctx->image_fmt` (default `RKVDEC_IMG_FMT_ANY`), tries to reset the CAPTURE format. Reset blocked by `vb2_is_busy` (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so: - First per-frame SPS staged via `try_or_set_cluster` → `s_ctrl` → `rkvdec_s_ctrl` returns -EBUSY. - `v4l2_ctrl_request_setup` outer loop breaks → SPS never committed to `ctx->ctrl_hdl`. - `rkvdec_hevc_run_preamble` reads `ctx->ctrl_hdl[SPS]->p_cur` which is zero. - Hardware sees w=0 h=0 → all-zero CAPTURE. ### Fix delivered `src/context.c::RequestCreateContext` (α-25 commit `db0b7f9`, fixed `d062fec`): inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE `cap_pool_init`. CAPTURE queue is empty at this point → `vb2_is_busy=false` → `rkvdec_s_ctrl` resets and updates `ctx->image_fmt` → from then on per-frame SPS submissions see `image_fmt_changed=false` → no reset → no -EBUSY → SPS commits correctly. ### Remaining work 1. **HEVC frame 2+ divergence**: 40-byte slice_data buffer inflation between ffmpeg-vaapi vs ffmpeg-v4l2request. Need either (a) ffmpeg-vaapi-side investigation/patch to ensure consistent `size` parameter, (b) libva-backend bitstream parser to find HEVC rbsp_trailing_bits and trim. Deferred. 2. **MPEG-2 / VP8 multi-device probe**: libva backend's `find_codec_device` picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Deferred. ### Commits Campaign repo: `bf67900`, `02c4192`. Backend fork: `db0b7f9`, `d062fec`, `66ef848`, `719d813`, `c9bfa21`, `754be1d`, `cd286d9` (final tip). Kernel substrate: `linux-fresnel-fourier 7.0-1` (clean baseline) was used; 7.0-2..7.0-9 added incremental diagnostic printks for iter12 RFC v2 + iter17–iter27 root-cause investigation. The diagnostic kernels are NOT shipping; should revert to clean 7.0-X for production once campaign exits diagnostic mode. ### Lesson The wire-byte hypothesis arc (iter11–iter18) chased an empirical illusion: libva's ioctl payloads WERE byte-correct but the BUG was in the interaction between libva's CAPTURE-pool TIMING and rkvdec's lazy `image_fmt` determination. 6 kernel-printk iterations narrowed the failure to one function returning one error code. The fix is 90 LOC in libva. The kernel was correct all along. The `[[feedback-libva-byte-correct-kernel-bug]]` memory entry was partially overturned: kernel-side -EBUSY semantics interact with libva-side allocation TIMING. Memory entry updated.