Files
fresnel-fourier/CAMPAIGN_SESSION_2026_05_14.md
T
marfrit 8b17bf797a Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
Bug 4 (H264 keyframe-partial): FIXED.
Bug 5 (HEVC libva all-zero): partial fix, frame 1 byte-equal.
Root cause: rkvdec_s_ctrl -EBUSY when first SPS triggers image_fmt
reset on busy CAPTURE queue (libva pre-allocates buffers at
CreateContext, kernel blocks the reset).
Fix: 90-LOC synthetic SPS injection in libva CreateContext before
cap_pool_init pre-seeds ctx->image_fmt.

Remaining: HEVC frame 2+ (ffmpeg-vaapi slice_data 40-byte inflation),
MPEG-2/VP8 (libva multi-device probe). Both deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 12:10:08 +00:00

5.0 KiB
Raw Blame History

Campaign Session 2026-05-14 — Final Summary

Starting state

  • Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
  • iter11iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
  • iter17 (this session) added kernel printk in rkvdec_hevc_run and found rkvdec sees all-zero SPS contents for libva.

Iterations completed

Iter Type Output
17 kernel printk in rkvdec_hevc_run rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect
18 α-21/22 mechanism eliminations Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED
19 α-23 REINIT test Mechanism 2 (REINIT clears) DISPROVED
20 kernel printk for ctrl_hdl pointer + ctrl bytes ctrl_hdl pointers stable; SPS bytes all-zero for libva
21 kernel printk in v4l2_ctrl_request_setup loop HEVC_SPS has p_req_valid=1; loop exits after SPS
22 kernel printk in v4l2_ctrl_request_clone Clone IS complete (22 controls cloned with err=0)
23 kernel printk for skip-reason Loop EXITS at HEVC_SPS, doesn't skip
24 kernel printk for req_to_new + try_or_set_cluster returns try_or_set_cluster ret=-16 (-EBUSY) for HEVC_SPS
ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue
25 α-25 synthetic SPS injection in libva H.264 fully fixed (10F byte-equal to SW); HEVC frame 1 fixed (byte-equal to SW)
26 α-26 decode_params.short_term_ref_pic_set_size from VAAPI Wire-correct; rkvdec doesn't use field
27 α-27 num_entry_point_offsets from VAAPI No-op (VAAPI returns 0; rkvdec doesn't use)
28 α-28 bit_size = (slice_data_size - data_byte_offset) * 8 No-op (rkvdec doesn't use bit_size)

Final 5-codec state

Codec Status Notes
H.264 PASS (byte-equal SW, 10 frames) Bug 4 fixed
HEVC frame 1 PASS (byte-equal SW); frames 2+ DIVERGE Bug 5 partial; frame 2+ rooted in ffmpeg-vaapi slice_data buffer 40-byte inflation vs ffmpeg-v4l2request — deferred
VP9 PASS (byte-equal SW) Unchanged
MPEG-2 untestable on this kernel boot Pre-existing libva single-device profile-probe limitation
VP8 untestable on this kernel boot Same

Root cause discovered

rkvdec_s_ctrl on first HEVC_SPS / H264_SPS resolves image_fmt via get_image_fmt() and, if it differs from cached ctx->image_fmt (default RKVDEC_IMG_FMT_ANY), tries to reset the CAPTURE format. Reset blocked by vb2_is_busy (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:

  • First per-frame SPS staged via try_or_set_clusters_ctrlrkvdec_s_ctrl returns -EBUSY.
  • v4l2_ctrl_request_setup outer loop breaks → SPS never committed to ctx->ctrl_hdl.
  • rkvdec_hevc_run_preamble reads ctx->ctrl_hdl[SPS]->p_cur which is zero.
  • Hardware sees w=0 h=0 → all-zero CAPTURE.

Fix delivered

src/context.c::RequestCreateContext (α-25 commit db0b7f9, fixed d062fec): inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE cap_pool_init. CAPTURE queue is empty at this point → vb2_is_busy=falserkvdec_s_ctrl resets and updates ctx->image_fmt → from then on per-frame SPS submissions see image_fmt_changed=false → no reset → no -EBUSY → SPS commits correctly.

Remaining work

  1. HEVC frame 2+ divergence: 40-byte slice_data buffer inflation between ffmpeg-vaapi vs ffmpeg-v4l2request. Need either (a) ffmpeg-vaapi-side investigation/patch to ensure consistent size parameter, (b) libva-backend bitstream parser to find HEVC rbsp_trailing_bits and trim. Deferred.

  2. MPEG-2 / VP8 multi-device probe: libva backend's find_codec_device picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Deferred.

Commits

Campaign repo: bf67900, 02c4192.

Backend fork: db0b7f9, d062fec, 66ef848, 719d813, c9bfa21, 754be1d, cd286d9 (final tip).

Kernel substrate: linux-fresnel-fourier 7.0-1 (clean baseline) was used; 7.0-2..7.0-9 added incremental diagnostic printks for iter12 RFC v2 + iter17iter27 root-cause investigation. The diagnostic kernels are NOT shipping; should revert to clean 7.0-X for production once campaign exits diagnostic mode.

Lesson

The wire-byte hypothesis arc (iter11iter18) chased an empirical illusion: libva's ioctl payloads WERE byte-correct but the BUG was in the interaction between libva's CAPTURE-pool TIMING and rkvdec's lazy image_fmt determination. 6 kernel-printk iterations narrowed the failure to one function returning one error code. The fix is 90 LOC in libva. The kernel was correct all along.

The [[feedback-libva-byte-correct-kernel-bug]] memory entry was partially overturned: kernel-side -EBUSY semantics interact with libva-side allocation TIMING. Memory entry updated.