Bug 4 (H264 keyframe-partial): FIXED. Bug 5 (HEVC libva all-zero): partial fix, frame 1 byte-equal. Root cause: rkvdec_s_ctrl -EBUSY when first SPS triggers image_fmt reset on busy CAPTURE queue (libva pre-allocates buffers at CreateContext, kernel blocks the reset). Fix: 90-LOC synthetic SPS injection in libva CreateContext before cap_pool_init pre-seeds ctx->image_fmt. Remaining: HEVC frame 2+ (ffmpeg-vaapi slice_data 40-byte inflation), MPEG-2/VP8 (libva multi-device probe). Both deferred. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.0 KiB
Campaign Session 2026-05-14 — Final Summary
Starting state
- Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
- iter11–iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
- iter17 (this session) added kernel printk in
rkvdec_hevc_runand found rkvdec sees all-zero SPS contents for libva.
Iterations completed
| Iter | Type | Output |
|---|---|---|
| 17 | kernel printk in rkvdec_hevc_run | rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect |
| 18 | α-21/22 mechanism eliminations | Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED |
| 19 | α-23 REINIT test | Mechanism 2 (REINIT clears) DISPROVED |
| 20 | kernel printk for ctrl_hdl pointer + ctrl bytes | ctrl_hdl pointers stable; SPS bytes all-zero for libva |
| 21 | kernel printk in v4l2_ctrl_request_setup loop | HEVC_SPS has p_req_valid=1; loop exits after SPS |
| 22 | kernel printk in v4l2_ctrl_request_clone | Clone IS complete (22 controls cloned with err=0) |
| 23 | kernel printk for skip-reason | Loop EXITS at HEVC_SPS, doesn't skip |
| 24 | kernel printk for req_to_new + try_or_set_cluster returns | try_or_set_cluster ret=-16 (-EBUSY) for HEVC_SPS |
| ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue | ||
| 25 | α-25 synthetic SPS injection in libva | H.264 fully fixed (10F byte-equal to SW); HEVC frame 1 fixed (byte-equal to SW) |
| 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct; rkvdec doesn't use field |
| 27 | α-27 num_entry_point_offsets from VAAPI | No-op (VAAPI returns 0; rkvdec doesn't use) |
| 28 | α-28 bit_size = (slice_data_size - data_byte_offset) * 8 | No-op (rkvdec doesn't use bit_size) |
Final 5-codec state
| Codec | Status | Notes |
|---|---|---|
| H.264 | PASS (byte-equal SW, 10 frames) | Bug 4 fixed |
| HEVC | frame 1 PASS (byte-equal SW); frames 2+ DIVERGE | Bug 5 partial; frame 2+ rooted in ffmpeg-vaapi slice_data buffer 40-byte inflation vs ffmpeg-v4l2request — deferred |
| VP9 | PASS (byte-equal SW) | Unchanged |
| MPEG-2 | untestable on this kernel boot | Pre-existing libva single-device profile-probe limitation |
| VP8 | untestable on this kernel boot | Same |
Root cause discovered
rkvdec_s_ctrl on first HEVC_SPS / H264_SPS resolves image_fmt via get_image_fmt() and, if it differs from cached ctx->image_fmt (default RKVDEC_IMG_FMT_ANY), tries to reset the CAPTURE format. Reset blocked by vb2_is_busy (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:
- First per-frame SPS staged via
try_or_set_cluster→s_ctrl→rkvdec_s_ctrlreturns -EBUSY. v4l2_ctrl_request_setupouter loop breaks → SPS never committed toctx->ctrl_hdl.rkvdec_hevc_run_preamblereadsctx->ctrl_hdl[SPS]->p_curwhich is zero.- Hardware sees w=0 h=0 → all-zero CAPTURE.
Fix delivered
src/context.c::RequestCreateContext (α-25 commit db0b7f9, fixed d062fec):
inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE cap_pool_init. CAPTURE queue is empty at this point → vb2_is_busy=false → rkvdec_s_ctrl resets and updates ctx->image_fmt → from then on per-frame SPS submissions see image_fmt_changed=false → no reset → no -EBUSY → SPS commits correctly.
Remaining work
-
HEVC frame 2+ divergence: 40-byte slice_data buffer inflation between ffmpeg-vaapi vs ffmpeg-v4l2request. Need either (a) ffmpeg-vaapi-side investigation/patch to ensure consistent
sizeparameter, (b) libva-backend bitstream parser to find HEVC rbsp_trailing_bits and trim. Deferred. -
MPEG-2 / VP8 multi-device probe: libva backend's
find_codec_devicepicks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Deferred.
Commits
Campaign repo: bf67900, 02c4192.
Backend fork: db0b7f9, d062fec, 66ef848, 719d813, c9bfa21, 754be1d, cd286d9 (final tip).
Kernel substrate: linux-fresnel-fourier 7.0-1 (clean baseline) was used; 7.0-2..7.0-9 added incremental diagnostic printks for iter12 RFC v2 + iter17–iter27 root-cause investigation. The diagnostic kernels are NOT shipping; should revert to clean 7.0-X for production once campaign exits diagnostic mode.
Lesson
The wire-byte hypothesis arc (iter11–iter18) chased an empirical illusion: libva's ioctl payloads WERE byte-correct but the BUG was in the interaction between libva's CAPTURE-pool TIMING and rkvdec's lazy image_fmt determination. 6 kernel-printk iterations narrowed the failure to one function returning one error code. The fix is 90 LOC in libva. The kernel was correct all along.
The [[feedback-libva-byte-correct-kernel-bug]] memory entry was partially overturned: kernel-side -EBUSY semantics interact with libva-side allocation TIMING. Memory entry updated.