Files
fresnel-fourier/CAMPAIGN_SESSION_2026_05_14.md
T
marfrit 8b17bf797a Final session summary: H264 + VP9 + HEVC frame 1 byte-equal to SW
Bug 4 (H264 keyframe-partial): FIXED.
Bug 5 (HEVC libva all-zero): partial fix, frame 1 byte-equal.
Root cause: rkvdec_s_ctrl -EBUSY when first SPS triggers image_fmt
reset on busy CAPTURE queue (libva pre-allocates buffers at
CreateContext, kernel blocks the reset).
Fix: 90-LOC synthetic SPS injection in libva CreateContext before
cap_pool_init pre-seeds ctx->image_fmt.

Remaining: HEVC frame 2+ (ffmpeg-vaapi slice_data 40-byte inflation),
MPEG-2/VP8 (libva multi-device probe). Both deferred.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 12:10:08 +00:00

67 lines
5.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Campaign Session 2026-05-14 — Final Summary
### Starting state
- Bug 4 (H.264 keyframe-partial) and Bug 5 (HEVC libva all-zero CAPTURE) unresolved.
- iter11iter18 had eliminated 8 wire-byte hypotheses without finding the root cause.
- iter17 (this session) added kernel printk in `rkvdec_hevc_run` and found rkvdec sees all-zero SPS contents for libva.
### Iterations completed
| Iter | Type | Output |
|---|---|---|
| 17 | kernel printk in rkvdec_hevc_run | rkvdec sees w=0 h=0 for libva, w=1280 h=720 for kdirect |
| 18 | α-21/22 mechanism eliminations | Mechanisms 3 (stale stack), 5 (error_idx) DISPROVED |
| 19 | α-23 REINIT test | Mechanism 2 (REINIT clears) DISPROVED |
| 20 | kernel printk for ctrl_hdl pointer + ctrl bytes | ctrl_hdl pointers stable; SPS bytes all-zero for libva |
| 21 | kernel printk in v4l2_ctrl_request_setup loop | HEVC_SPS has p_req_valid=1; loop exits after SPS |
| 22 | kernel printk in v4l2_ctrl_request_clone | Clone IS complete (22 controls cloned with err=0) |
| 23 | kernel printk for skip-reason | Loop EXITS at HEVC_SPS, doesn't skip |
| 24 | kernel printk for req_to_new + try_or_set_cluster returns | `try_or_set_cluster ret=-16` (-EBUSY) for HEVC_SPS |
| | | **ROOT CAUSE: rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on busy CAPTURE queue** |
| 25 | **α-25 synthetic SPS injection in libva** | **H.264 fully fixed (10F byte-equal to SW)**; **HEVC frame 1 fixed (byte-equal to SW)** |
| 26 | α-26 decode_params.short_term_ref_pic_set_size from VAAPI | Wire-correct; rkvdec doesn't use field |
| 27 | α-27 num_entry_point_offsets from VAAPI | No-op (VAAPI returns 0; rkvdec doesn't use) |
| 28 | α-28 bit_size = (slice_data_size - data_byte_offset) * 8 | No-op (rkvdec doesn't use bit_size) |
### Final 5-codec state
| Codec | Status | Notes |
|---|---|---|
| H.264 | **PASS** (byte-equal SW, 10 frames) | Bug 4 fixed |
| HEVC | **frame 1 PASS** (byte-equal SW); frames 2+ DIVERGE | Bug 5 partial; frame 2+ rooted in ffmpeg-vaapi slice_data buffer 40-byte inflation vs ffmpeg-v4l2request — deferred |
| VP9 | **PASS** (byte-equal SW) | Unchanged |
| MPEG-2 | untestable on this kernel boot | Pre-existing libva single-device profile-probe limitation |
| VP8 | untestable on this kernel boot | Same |
### Root cause discovered
`rkvdec_s_ctrl` on first HEVC_SPS / H264_SPS resolves `image_fmt` via `get_image_fmt()` and, if it differs from cached `ctx->image_fmt` (default `RKVDEC_IMG_FMT_ANY`), tries to reset the CAPTURE format. Reset blocked by `vb2_is_busy` (any CAPTURE buffer allocated → returns true). libva pre-allocates 24 CAPTURE buffers at CreateContext (iter5b-β design) BEFORE the first per-frame S_EXT_CTRLS, so:
- First per-frame SPS staged via `try_or_set_cluster``s_ctrl``rkvdec_s_ctrl` returns -EBUSY.
- `v4l2_ctrl_request_setup` outer loop breaks → SPS never committed to `ctx->ctrl_hdl`.
- `rkvdec_hevc_run_preamble` reads `ctx->ctrl_hdl[SPS]->p_cur` which is zero.
- Hardware sees w=0 h=0 → all-zero CAPTURE.
### Fix delivered
`src/context.c::RequestCreateContext` (α-25 commit `db0b7f9`, fixed `d062fec`):
inject one S_EXT_CTRLS with a synthetic minimal HEVC_SPS or H264_SPS (chroma + bit_depth from profile) at CreateContext, BEFORE `cap_pool_init`. CAPTURE queue is empty at this point → `vb2_is_busy=false``rkvdec_s_ctrl` resets and updates `ctx->image_fmt` → from then on per-frame SPS submissions see `image_fmt_changed=false` → no reset → no -EBUSY → SPS commits correctly.
### Remaining work
1. **HEVC frame 2+ divergence**: 40-byte slice_data buffer inflation between ffmpeg-vaapi vs ffmpeg-v4l2request. Need either (a) ffmpeg-vaapi-side investigation/patch to ensure consistent `size` parameter, (b) libva-backend bitstream parser to find HEVC rbsp_trailing_bits and trim. Deferred.
2. **MPEG-2 / VP8 multi-device probe**: libva backend's `find_codec_device` picks ONE device for the entire session. For RK3399 with both rkvdec (H.264/HEVC/VP9) and hantro (MPEG-2/VP8), the backend should multi-probe and aggregate profiles. Deferred.
### Commits
Campaign repo: `bf67900`, `02c4192`.
Backend fork: `db0b7f9`, `d062fec`, `66ef848`, `719d813`, `c9bfa21`, `754be1d`, `cd286d9` (final tip).
Kernel substrate: `linux-fresnel-fourier 7.0-1` (clean baseline) was used; 7.0-2..7.0-9 added incremental diagnostic printks for iter12 RFC v2 + iter17iter27 root-cause investigation. The diagnostic kernels are NOT shipping; should revert to clean 7.0-X for production once campaign exits diagnostic mode.
### Lesson
The wire-byte hypothesis arc (iter11iter18) chased an empirical illusion: libva's ioctl payloads WERE byte-correct but the BUG was in the interaction between libva's CAPTURE-pool TIMING and rkvdec's lazy `image_fmt` determination. 6 kernel-printk iterations narrowed the failure to one function returning one error code. The fix is 90 LOC in libva. The kernel was correct all along.
The `[[feedback-libva-byte-correct-kernel-bug]]` memory entry was partially overturned: kernel-side -EBUSY semantics interact with libva-side allocation TIMING. Memory entry updated.