Files
fresnel-fourier/CAMPAIGN_SESSION_2026_05_14.md
T

8.8 KiB
Raw Blame History

Campaign Session 2026-05-14 — Final Summary (post iter31)

Starting state

  • Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
  • iter11iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
  • iter17 (start of this campaign day) added kernel printk in rkvdec_hevc_run and found rkvdec sees all-zero SPS contents for libva.

Iterations completed

Iter Type Output
17 kernel printk in rkvdec_hevc_run rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect
18 α-21/22 mechanism eliminations Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED
19 α-23 REINIT test Mechanism 2 (REINIT clears) DISPROVED
20 kernel printk for ctrl_hdl pointer + ctrl bytes ctrl_hdl pointers stable; SPS bytes all-zero for libva
21 kernel printk in v4l2_ctrl_request_setup loop HEVC_SPS has p_req_valid=1; loop exits after SPS
22 kernel printk in v4l2_ctrl_request_clone Clone IS complete (22 controls cloned with err=0)
23 kernel printk for skip-reason Loop EXITS at HEVC_SPS, doesn't skip
24 kernel printk for req_to_new + try_or_set_cluster returns try_or_set_cluster ret=-16 (-EBUSY) for HEVC_SPS
ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue
25 α-25 synthetic SPS injection in libva H.264 fully fixed (10F byte-equal to SW); HEVC frame 1 fixed (byte-equal to SW)
26 α-26 decode_params.short_term_ref_pic_set_size from VAAPI Wire-correct field name match — but rkvdec ignores this field (mis-route)
27 α-27 num_entry_point_offsets from VAAPI No-op (VAAPI returns 0; rkvdec doesn't use)
28 α-28 bit_size = (slice_data_size - data_byte_offset) * 8 No-op (rkvdec doesn't use bit_size)
28b env-gated trim=40 (DIAG) Refuted "universal 40-byte trim" theory; reverted
29 env-gated dump of HEVC slice_data trailing 80 bytes Refuted "40-byte inflation" theory — trailing bytes are real entropy
30 env-gated LIBVA_TS_SCALE timestamp multiplier Refuted "timestamp magnitude" theory; same wrong output for scales 1/1k/1M
31 extend kernel printk to dpb[2..3] + sl[32..64]; trace rkvdec assemble_sw_rps Located sl_params->short_term_ref_pic_set_size consumer in rkvdec_hevc.c:386-389
α-29: slice_params.short_term_ref_pic_set_size = picture->st_rps_bits
Bug 5 FULLY FIXED — HEVC 10F byte-equal to SW

Final 5-codec state

Codec Status Notes
H.264 PASS (10F byte-equal SW) Bug 4 fixed iter25 α-25
HEVC PASS (10F byte-equal SW) Bug 5 frame 1 fixed iter25 α-25; frames 2+ fixed iter31 α-29
VP9 PASS (10F byte-equal SW) Unchanged through both fixes (no regression)
MPEG-2 untestable on this kernel boot Pre-existing libva single-device profile-probe limitation
VP8 untestable on this kernel boot Same

Root causes discovered

Bug 4 + Bug 5 frame 1 (rkvdec_s_ctrl -EBUSY):

rkvdec_s_ctrl on first HEVC_SPS / H264_SPS resolves image_fmt via get_image_fmt() and, if it differs from cached ctx->image_fmt (default RKVDEC_IMG_FMT_ANY), tries to reset the CAPTURE format. Reset blocked by vb2_is_busy (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:

  • First per-frame SPS staged via try_or_set_clusters_ctrlrkvdec_s_ctrl returns -EBUSY.
  • v4l2_ctrl_request_setup outer loop breaks → SPS never committed to ctx->ctrl_hdl.
  • rkvdec_hevc_run_preamble reads ctx->ctrl_hdl[SPS]->p_cur which is zero.
  • Hardware sees w=0 h=0 → all-zero CAPTURE.

Bug 5 frame 2+ (sl_params->short_term_ref_pic_set_size zero):

rkvdec_hevc.c::assemble_sw_rps lines 386-389:

if (!(decode_params->flags & V4L2_HEVC_DECODE_PARAM_FLAG_IDR_PIC)) {
    if (sl_params->short_term_ref_pic_set_size)
        st_bit_offset = sl_params->short_term_ref_pic_set_size;
    else if (sps->num_short_term_ref_pic_sets > 1)
        st_bit_offset = fls(sps->num_short_term_ref_pic_sets - 1);
}

libva set slice_params->short_term_ref_pic_set_size = 0 (stale "VAAPI doesn't expose" comment). For BBB's num_short_term_ref_pic_sets == 1, fallback gives fls(0)=0. HW reads slice-header bits starting at offset 0 → consumes st_ref_pic_set() bytes as long-term-RPS / slice-header continuation → entropy decoder spins onto garbage state for every non-IDR slice. IDR is gated by !IDR_PIC check → frame 1 unaffected → consistent with iter25 "frame 1 PASS, frame 2+ FAIL" observation.

The mis-direction: α-26 routed picture->st_rps_bits into decode_params->short_term_ref_pic_set_size based on field name. But V4L2 has the field in BOTH decode_params and slice_params with the same name but DIFFERENT semantics:

  • decode_params version: bit count of SPS-side st_ref_pic_set syntax (rkvdec doesn't read this).
  • slice_params version: bit count of slice-header-side st_ref_pic_set syntax (rkvdec reads this).

VAAPI's picture->st_rps_bits per /usr/include/va/va_dec_hevc.h:177-185 is documented as the slice-header bit count → belongs in slice_params.

Fixes delivered

α-25 (src/context.c::RequestCreateContext, commits db0b7f9 + d062fec): inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE cap_pool_init. CAPTURE queue is empty at this point → vb2_is_busy=falserkvdec_s_ctrl resets and updates ctx->image_fmt → from then on per-frame SPS submissions see image_fmt_changed=false → no reset → no -EBUSY → SPS commits correctly.

α-29 (src/h265.c::h265_fill_slice_params, commit 23eb1bd):

- slice_params->short_term_ref_pic_set_size = 0;  /* VAAPI doesn't expose */
+ slice_params->short_term_ref_pic_set_size = picture->st_rps_bits;

Remaining work

  1. Kernel substrate cleanup (item underway as iter32): 7.0-10 has diagnostic printks. Build clean 7.0-11.
  2. MPEG-2 / VP8 multi-device probe: libva backend's find_codec_device picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Orthogonal to Bug 4/5; design decision required from user.
  3. Backend env-gated diagnostics: iter29 LIBVA_HEVC_DUMP_SLICE_TAIL and iter30 LIBVA_TS_SCALE are env-gated (no behavior change without env). Leave for future regression debugging or clean up at user's discretion. Low priority.
  4. α-26 dead-code: cosmetic revert of decode_params->short_term_ref_pic_set_size = picture->st_rps_bits (mis-route; rkvdec ignores). Low priority.

Commits

Campaign repo: bf67900, 02c4192, 8b17bf7, c15fc6c, 422ecaf, c1f9738, fde8a25.

Backend fork: db0b7f9, d062fec, 66ef848, 719d813, c9bfa21, 754be1d, cd286d9, c555788 (reverted), 6646b16 (revert), 0eca3ff, 68dbbdd, 23eb1bd (final tip).

Kernel substrate: linux-fresnel-fourier 7.0-1 was the clean baseline. 7.0-2..7.0-10 added incremental diagnostic printks for iter12 RFC v2 + iter17iter31 root-cause investigation. Building clean 7.0-11 now (iter32). NOT shipping the diagnostic builds.

Lessons

  1. Wire-byte equivalence ≠ behavior equivalence. Iter11iter18's wire-byte hypotheses were chasing payload-content correctness, but the real bug was payload-RECEPTION at the kernel (image_fmt -EBUSY for Bug 4, st_rps_bits=0 for Bug 5 frame 2+). Both bugs are about field semantics that rkvdec READS, not bytes that libva WRITES.

  2. Field name match is not field semantic match. α-26 routed the right value (picture->st_rps_bits) into the wrong V4L2 field (decode_params vs slice_params). Both have the same field name. Different semantics. Reading the V4L2 spec docs + reading the kernel consumer's code (rkvdec's assemble_sw_rps) was what surfaced the mis-route.

  3. Theory iteration ROI: in this campaign, iter27/28/29/30 spent significant effort on hypotheses (40-byte inflation, timestamp magnitude) that turned out wrong. iter31 reverse-traced from the kernel-side consumer (assemble_sw_rps) backward to the libva-side producer — finding the bug in one iteration. The forward-from-libva theory loop was less efficient than the backward-from-kernel-consumer approach. trace-fix-mechanism-to-consumer is a memory entry that captures this.

  4. The memory entry [[libva-byte-correct-kernel-bug]] was fully overturned: both Bug 4 and Bug 5 are libva-side fixes, despite the empirical bytewise correctness of libva's OUTPUT buffers. The kernel was correct all along given the inputs it actually received. The libva inputs differed from kdirect's in subtle ways (image_fmt reset timing, slice_params field that rkvdec reads but libva zeroed).