Files
fresnel-fourier/phase8_iteration27_close.md
marfrit c15fc6c0f6 iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
Confirmed the 40-byte inflation is non-uniform — IDR slice has correct
size from VAAPI; only P/B slices are inflated. Real fix requires dynamic
rbsp_stop_bit detection or per-slice-type logic.
2026-05-14 14:45:35 +00:00

5.4 KiB
Raw Permalink Blame History

Iteration 27/28 — Phase 8 (close)

Closes 2026-05-14. iter27 = extend kernel printk to dpb/slice bytes; iter28 = α-27/α-28 attempts to fix HEVC frame 2+. PARTIAL close, frame 2+ NOT fixed via libva-backend changes.

α-27 (no-op): num_entry_point_offsets from VAAPI

VAAPI's slice->num_entry_point_offsets is 0 for all slices (ffmpeg-vaapi front-end does NOT parse this field). Even though kdirect (ffmpeg-v4l2request) writes 22 (=BBB's actual WPP entry-point count), rkvdec's source has NO reference to num_entry_point_offsets at all — the field is unused by the kernel driver.

Cannot fix from libva-side (need ffmpeg-vaapi parser upstream patch), and not needed for decode correctness on rkvdec.

α-28 (no-op output): bit_size formula

Changed from slice_data_size * 8 to (slice_data_size - slice_data_byte_offset) * 8. Kernel printk verifies bit_size now matches kdirect's 44096 for BBB frame 2. But output hash unchanged (700aa52d…). rkvdec's source has NO reference to bit_size either.

bit_size in the V4L2 stateless HEVC API spec is for HW consumers that use it; rkvdec doesn't.

Remaining HEVC frame 2+ divergence

Per kernel iter20 + iter27 printks, with α-25/26/27/28 all in place:

Field libva frame 2 kdirect frame 2 rkvdec uses?
bit_size 44096 (α-28) 44096 No
data_byte_offset 40 40 Yes (likely)
num_entry_point_offsets 0 22 No
decode_params dp[0..16] same same Yes
sps[0..16] same same Yes
nal_unit_type, slice_type same same Yes
slice_pic_order_cnt, qp_delta same same Yes

All inspected fields match. Yet libva output diverges at byte 1382401 (frame 2 boundary), 12.2M bytes total differ across 10 frames.

iter28b DIAG: universal 40-byte trim breaks IDR (tested, reverted)

LIBVA_HEVC_TRIM_TRAILING=40 was tested as a quick refute/confirm of the 40-byte-inflation hypothesis. Result:

  • libva HEVC 10-frame hash diverged at byte 899745 (INSIDE frame 1, not at frame-2 boundary).
  • Trimming 40 bytes off IDR slice (96890→96850) corrupted frame 1.

Conclusion: the 40-byte inflation is NOT uniform per slice. The IDR (frame 1) slice has correct size from VAAPI. Only P/B slices have the inflation. A real fix requires dynamic detection — scan for rbsp_stop_one_bit, or per-slice-type logic.

Iter28b code reverted: 6646b16.

Likely root cause for frame 2+ (deferred)

The libva OUTPUT buffer for frame 2 = 5552 bytes (= 3 Annex-B start + 5549 from slice->slice_data_size). ffmpeg-v4l2request's OUTPUT for the SAME frame appears to be ~5512 bytes (= 3 + 5509, based on its bit_size = (size+extra_size)*8 formula).

Difference: libva's slice_data_buffer from VAAPI is 40 bytes larger than what ffmpeg-v4l2request's libavcodec dispatch gives. The trailing 40 bytes get appended to libva's OUTPUT buffer per slice.

Hypothesis: ffmpeg-vaapi's slice_data buffer concatenates trailing bytes (RBSP trailing alignment / between-NAL zeros) that ffmpeg-v4l2request strips. rkvdec reads past the actual slice payload into these trailing bytes → entropy decoder corrupts state → frame 2+ decoded with wrong reference content.

Both libavcodec dispatches share FF_HW_CALL(s->avctx, decode_slice, nal->raw_data, nal->raw_size) at hevcdec.c:2989, so same size parameter should reach both. The 40-byte inflation happens INSIDE ffmpeg-vaapi or libva (between hwaccel-init and slice-buffer-make).

Mechanism status (post-iter27/28)

# Mechanism Status
iter24 #9 rkvdec_s_ctrl -EBUSY FIXED iter25 α-25 (Bug 4 fully; Bug 5 frame 1 ok)
iter26 #10 decode_params.short_term_ref_pic_set_size FIXED iter26 α-26 (cosmetic; rkvdec doesn't use this field either)
iter27 #11 num_entry_point_offsets NO-OP (rkvdec doesn't use)
iter28 #12 bit_size formula NO-OP (rkvdec doesn't use)
iter28 #13 slice_data buffer 40-byte inflation in VAAPI vs libavcodec LEADING — fix in ffmpeg-vaapi or libva-internal slice buffer; outside campaign scope this iter

Substrate state at iter27/28 close

  • Backend fork tip cd286d9 (α-25 + H264-flag-fix + α-26 + α-27 + α-28).
  • Kernel 7.0-9 with iter17 + iter20 + iter21 + iter22 + iter23 + iter27 printks.
  • 5-codec status:
    • H.264: byte-equal to kdirect on 3 frames (Bug 4 FIXED).
    • HEVC: frame 1 byte-equal to kdirect (Bug 5 frame 1 FIXED). Frames 2+ diverge (separate root cause).
    • VP9: unchanged (HW=SW byte-equal).
    • MPEG-2 / VP8: untestable on this kernel boot — libva backend's single-device auto-probe lands on rkvdec which doesn't expose these profiles. Pre-existing libva backend limitation.

Campaign-wide milestone

In one day:

  • 8 wire-byte hypotheses eliminated iter11-iter18.
  • 5 kernel-printk iterations (iter17, iter20-iter23) walking down the kernel-side localization.
  • 1 root cause identification (iter24): rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on a busy CAPTURE queue.
  • 1 fix iteration (iter25): synthetic SPS injection pre-allocates ctx->image_fmt → Bug 4 fully fixed, Bug 5 frame 1 fixed.
  • 3 followup iterations (iter26-28): VAAPI field propagation discrepancies between vaapi front-end and v4l2request front-end identified but partially un-fixable from libva-side alone.

The campaign has gone from 0/5 PASS (initial) → 4/5 PARTIAL/PASS (Bug 4+5 root-caused, H264 fully fixed, HEVC frame 1 fixed, VP9 unchanged), with the remaining frame 2+ issue localized to an ffmpeg-vaapi serialization quirk.