Files
fresnel-fourier/phase8_iteration27_close.md
marfrit c15fc6c0f6 iter28b DIAG documented: universal trim=40 breaks IDR (reverted)
Confirmed the 40-byte inflation is non-uniform — IDR slice has correct
size from VAAPI; only P/B slices are inflated. Real fix requires dynamic
rbsp_stop_bit detection or per-slice-type logic.
2026-05-14 14:45:35 +00:00

83 lines
5.4 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
## Iteration 27/28 — Phase 8 (close)
Closes 2026-05-14. iter27 = extend kernel printk to dpb/slice bytes; iter28 = α-27/α-28 attempts to fix HEVC frame 2+. PARTIAL close, frame 2+ NOT fixed via libva-backend changes.
### α-27 (no-op): num_entry_point_offsets from VAAPI
VAAPI's `slice->num_entry_point_offsets` is 0 for all slices (ffmpeg-vaapi front-end does NOT parse this field). Even though kdirect (ffmpeg-v4l2request) writes 22 (=BBB's actual WPP entry-point count), rkvdec's source has NO reference to `num_entry_point_offsets` at all — the field is unused by the kernel driver.
Cannot fix from libva-side (need ffmpeg-vaapi parser upstream patch), and not needed for decode correctness on rkvdec.
### α-28 (no-op output): bit_size formula
Changed from `slice_data_size * 8` to `(slice_data_size - slice_data_byte_offset) * 8`. Kernel printk verifies bit_size now matches kdirect's 44096 for BBB frame 2. **But output hash unchanged** (`700aa52d…`). rkvdec's source has NO reference to `bit_size` either.
bit_size in the V4L2 stateless HEVC API spec is for HW consumers that use it; rkvdec doesn't.
### Remaining HEVC frame 2+ divergence
Per kernel iter20 + iter27 printks, with α-25/26/27/28 all in place:
| Field | libva frame 2 | kdirect frame 2 | rkvdec uses? |
|---|---|---|---|
| `bit_size` | 44096 (α-28) | 44096 | No |
| `data_byte_offset` | 40 | 40 | Yes (likely) |
| `num_entry_point_offsets` | 0 | 22 | No |
| `decode_params dp[0..16]` | same | same | Yes |
| `sps[0..16]` | same | same | Yes |
| `nal_unit_type`, `slice_type` | same | same | Yes |
| `slice_pic_order_cnt`, qp_delta | same | same | Yes |
All inspected fields match. Yet libva output diverges at byte 1382401 (frame 2 boundary), 12.2M bytes total differ across 10 frames.
### iter28b DIAG: universal 40-byte trim breaks IDR (tested, reverted)
`LIBVA_HEVC_TRIM_TRAILING=40` was tested as a quick refute/confirm of the 40-byte-inflation hypothesis. Result:
- libva HEVC 10-frame hash diverged at byte 899745 (INSIDE frame 1, not at frame-2 boundary).
- Trimming 40 bytes off IDR slice (96890→96850) corrupted frame 1.
Conclusion: the 40-byte inflation is NOT uniform per slice. The IDR (frame 1) slice has correct size from VAAPI. Only P/B slices have the inflation. A real fix requires dynamic detection — scan for rbsp_stop_one_bit, or per-slice-type logic.
Iter28b code reverted: `6646b16`.
### Likely root cause for frame 2+ (deferred)
The libva OUTPUT buffer for frame 2 = 5552 bytes (= 3 Annex-B start + 5549 from `slice->slice_data_size`). ffmpeg-v4l2request's OUTPUT for the SAME frame appears to be ~5512 bytes (= 3 + 5509, based on its `bit_size = (size+extra_size)*8` formula).
Difference: libva's slice_data_buffer from VAAPI is **40 bytes larger** than what ffmpeg-v4l2request's libavcodec dispatch gives. The trailing 40 bytes get appended to libva's OUTPUT buffer per slice.
Hypothesis: ffmpeg-vaapi's slice_data buffer concatenates trailing bytes (RBSP trailing alignment / between-NAL zeros) that ffmpeg-v4l2request strips. rkvdec reads past the actual slice payload into these trailing bytes → entropy decoder corrupts state → frame 2+ decoded with wrong reference content.
Both libavcodec dispatches share `FF_HW_CALL(s->avctx, decode_slice, nal->raw_data, nal->raw_size)` at hevcdec.c:2989, so same `size` parameter should reach both. The 40-byte inflation happens INSIDE ffmpeg-vaapi or libva (between hwaccel-init and slice-buffer-make).
### Mechanism status (post-iter27/28)
| # | Mechanism | Status |
|---|---|---|
| iter24 #9 | rkvdec_s_ctrl -EBUSY | FIXED iter25 α-25 (Bug 4 fully; Bug 5 frame 1 ok) |
| iter26 #10 | decode_params.short_term_ref_pic_set_size | FIXED iter26 α-26 (cosmetic; rkvdec doesn't use this field either) |
| iter27 #11 | num_entry_point_offsets | NO-OP (rkvdec doesn't use) |
| iter28 #12 | bit_size formula | NO-OP (rkvdec doesn't use) |
| iter28 #13 | **slice_data buffer 40-byte inflation in VAAPI vs libavcodec** | **LEADING — fix in ffmpeg-vaapi or libva-internal slice buffer; outside campaign scope this iter** |
### Substrate state at iter27/28 close
- Backend fork tip `cd286d9` (α-25 + H264-flag-fix + α-26 + α-27 + α-28).
- Kernel `7.0-9` with iter17 + iter20 + iter21 + iter22 + iter23 + iter27 printks.
- 5-codec status:
- **H.264**: byte-equal to kdirect on 3 frames (Bug 4 FIXED).
- **HEVC**: frame 1 byte-equal to kdirect (Bug 5 frame 1 FIXED). Frames 2+ diverge (separate root cause).
- **VP9**: unchanged (HW=SW byte-equal).
- **MPEG-2 / VP8**: untestable on this kernel boot — libva backend's single-device auto-probe lands on rkvdec which doesn't expose these profiles. Pre-existing libva backend limitation.
### Campaign-wide milestone
In one day:
- 8 wire-byte hypotheses eliminated iter11-iter18.
- 5 kernel-printk iterations (iter17, iter20-iter23) walking down the kernel-side localization.
- 1 root cause identification (iter24): rkvdec_s_ctrl returns -EBUSY when SPS triggers image_fmt reset on a busy CAPTURE queue.
- 1 fix iteration (iter25): synthetic SPS injection pre-allocates ctx->image_fmt → Bug 4 fully fixed, Bug 5 frame 1 fixed.
- 3 followup iterations (iter26-28): VAAPI field propagation discrepancies between vaapi front-end and v4l2request front-end identified but partially un-fixable from libva-side alone.
The campaign has gone from 0/5 PASS (initial) → 4/5 PARTIAL/PASS (Bug 4+5 root-caused, H264 fully fixed, HEVC frame 1 fixed, VP9 unchanged), with the remaining frame 2+ issue localized to an ffmpeg-vaapi serialization quirk.