Files
fresnel-fourier/phase7_iter8_verification.md
marfrit 84c939692f iter8 Phase 7 (γ + IMP-1): root cause confirmed kernel-side
γ dump confirms libva reads buffer correctly; the 16x32 patch and
stride-4 UV markers appear at YUV output exactly as in the dump.

IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output
(identical hash). The 512 bytes ARE deterministic kernel writes, not
stale residue.

Bug root cause: rkvdec accepts libva's H.264 decode request without
error flags but writes only 16x32 of luma-neutral data + stride-4 UV
scratch. Kernel decoded a tiny bit then stopped.

Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's
0x02 — likely the kernel hint that triggers rkvdec's full decode path
for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per
VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:23:55 +00:00

6.7 KiB
Raw Permalink Blame History

Iteration 8 — Phase 7 (γ + IMP-1 verification)

Captured 2026-05-13 on fresnel via fresnel.vpn. Backend SHA e4649a48… (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: 7eae6ea (γ dump), 66ecbef (memset gate), 6f4e583 (stdlib.h fix-fwd).

What γ revealed

Running libva H.264 sweep with LIBVA_V4L2_DUMP_CAPTURE=1:

  • Slot v4l2_index=0 (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), non_zero=256. Head bytes: 81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00.
  • Slots v4l2_index=1..9 (subsequent decodes): plane[0] non_zero=0. Head bytes: all 0.
  • Slot v4l2_index=10 (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head: 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00....
  • All slots plane[1] (UV): non_zero in 16-32 byte range, stride-4 pattern 01 00 00 00 00 00 00 00 01 00 00 00 ... — looks like kernel scratch markers (frame indices, ref counts, or similar).
  • Plane[0] tail bytes: all zero across all slots.

Key conclusion: what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output. The hash of the dump-on YUV (71ac099b…) is identical to baseline. Libva is reading the buffer correctly.

What IMP-1 (memset-before-QBUF) revealed

Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with LIBVA_V4L2_ZERO_CAPTURE=1:

Run Hash Frame 1 non-zero Frame 2 non-zero Frame 3 non-zero
baseline (no pre-zero) 71ac099b… 512 32 16
pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) 71ac099b… (identical) 512 32 16

The 16×32 patch is NOT stale residue. The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3.

Definitive root cause classification

The bug is confirmed kernel-side. rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.

This rules out from Phase 4's outcome table:

  • Plane[0] fully populated → libva mis-reads — eliminated (γ shows only 256 non-zero in scan).
  • Plane[0] populated with other content → slot binding wrong — eliminated (γ shows the buffer holds exactly what YUV output displays).

Confirmed:

  • Plane[0] only 16×32 populated → kernel didn't decode properly — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state.

The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR.

Comparison with kdirect

kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3):

libva   SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...]
kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...]
                       ^^
                       constraint_set_flags differs

Both backends:

  • profile_idc = 0x4D = 77 (Main).
  • level_idc = 0x29 = 41 (Level 4.1).
  • seq_parameter_set_id = 0.
  • chroma_format_idc = 1 (4:2:0).
  • bit_depth_*_minus8 = 0.
  • log2_max_frame_num_minus4 = 1.
  • pic_order_cnt_type = 0.
  • log2_max_pic_order_cnt_lsb_minus4 = 3.
  • max_num_ref_frames = 2.

Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE).

The only meaningful SPS diff is constraint_set_flags: 0x00 (libva) vs 0x02 (kdirect).

constraint_set1_flag (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser.

Phase 4b plan: α-1 SPS constraint_set_flags fix

Mechanism

In h264.c::h264_set_controls, after setting sps.profile_idc, derive constraint_set_flags based on the VAAPI profile:

sps.profile_idc = h264_profile_to_idc(profile);
sps.constraint_set_flags = h264_constraint_set_flags(profile);

Mapping (per H.264 spec for typical contents):

  • Main (77): 0x02 (constraint_set1_flag — Main subset of Baseline conformance hint).
  • Baseline / ConstrainedBaseline (66): 0x42 (constraint_set1_flag + constraint_set6_flag for constrained baseline) or 0x40.
  • High (100): 0x00 (no constraints — high profile fields are unconstrained).
  • MultiviewHigh (118), StereoHigh (128): 0x00.

Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams:

  • Main → 0x02
  • ConstrainedBaseline → 0x42
  • Baseline → 0x40
  • High / MultiviewHigh / StereoHigh → 0x00

For BBB H.264 Main, this is 0x02 — matching what kdirect sends.

LOC

~10 LOC: a new h264_constraint_set_flags(profile) helper in h264.c similar to existing h264_profile_to_idc, plus 1 line in h264_set_controls.

Expected outcome

Three possibilities:

  1. Best case: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1).
  2. Middle case: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields).
  3. Worst case: no change. Same hash. Continue Phase 4b with next α candidate.

Phase 5b review requirement

Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific).

Phase 7 outcome summary for Phase 8 close

Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to "kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing".

If α-1 (Phase 4b/6b) closes the bug:

  • iter8 = clean PASS.
  • Bug 4 closed.
  • Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264.

If α-1 doesn't close:

  • iter8 = PARTIAL — Bug 4 narrowed but not fixed.
  • Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding).

Tracking

Artifact Hash
h264_baseline.yuv 71ac099b8d007836…
h264_zeroed.yuv 71ac099b8d007836…
h264_gamma.yuv 71ac099b8d007836…
kdirect_h264.yuv 1e7a0bc98d5bd83f… (target, correct)

Backend SHA e4649a48… carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.