γ dump confirms libva reads buffer correctly; the 16x32 patch and stride-4 UV markers appear at YUV output exactly as in the dump. IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output (identical hash). The 512 bytes ARE deterministic kernel writes, not stale residue. Bug root cause: rkvdec accepts libva's H.264 decode request without error flags but writes only 16x32 of luma-neutral data + stride-4 UV scratch. Kernel decoded a tiny bit then stopped. Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's 0x02 — likely the kernel hint that triggers rkvdec's full decode path for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6.7 KiB
Iteration 8 — Phase 7 (γ + IMP-1 verification)
Captured 2026-05-13 on fresnel via fresnel.vpn. Backend SHA e4649a48… (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: 7eae6ea (γ dump), 66ecbef (memset gate), 6f4e583 (stdlib.h fix-fwd).
What γ revealed
Running libva H.264 sweep with LIBVA_V4L2_DUMP_CAPTURE=1:
- Slot v4l2_index=0 (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), non_zero=256. Head bytes:
81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00. - Slots v4l2_index=1..9 (subsequent decodes): plane[0] non_zero=0. Head bytes: all 0.
- Slot v4l2_index=10 (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head:
80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00.... - All slots plane[1] (UV): non_zero in 16-32 byte range, stride-4 pattern
01 00 00 00 00 00 00 00 01 00 00 00 ...— looks like kernel scratch markers (frame indices, ref counts, or similar). - Plane[0] tail bytes: all zero across all slots.
Key conclusion: what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output. The hash of the dump-on YUV (71ac099b…) is identical to baseline. Libva is reading the buffer correctly.
What IMP-1 (memset-before-QBUF) revealed
Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with LIBVA_V4L2_ZERO_CAPTURE=1:
| Run | Hash | Frame 1 non-zero | Frame 2 non-zero | Frame 3 non-zero |
|---|---|---|---|---|
| baseline (no pre-zero) | 71ac099b… |
512 | 32 | 16 |
| pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) | 71ac099b… (identical) |
512 | 32 | 16 |
The 16×32 patch is NOT stale residue. The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3.
Definitive root cause classification
The bug is confirmed kernel-side. rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.
This rules out from Phase 4's outcome table:
Plane[0] fully populated → libva mis-reads— eliminated (γ shows only 256 non-zero in scan).Plane[0] populated with other content → slot binding wrong— eliminated (γ shows the buffer holds exactly what YUV output displays).
Confirmed:
- Plane[0] only 16×32 populated → kernel didn't decode properly — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state.
The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR.
Comparison with kdirect
kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3):
libva SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...]
kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...]
^^
constraint_set_flags differs
Both backends:
- profile_idc = 0x4D = 77 (Main).
- level_idc = 0x29 = 41 (Level 4.1).
- seq_parameter_set_id = 0.
- chroma_format_idc = 1 (4:2:0).
- bit_depth_*_minus8 = 0.
- log2_max_frame_num_minus4 = 1.
- pic_order_cnt_type = 0.
- log2_max_pic_order_cnt_lsb_minus4 = 3.
- max_num_ref_frames = 2.
Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE).
The only meaningful SPS diff is constraint_set_flags: 0x00 (libva) vs 0x02 (kdirect).
constraint_set1_flag (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser.
Phase 4b plan: α-1 SPS constraint_set_flags fix
Mechanism
In h264.c::h264_set_controls, after setting sps.profile_idc, derive constraint_set_flags based on the VAAPI profile:
sps.profile_idc = h264_profile_to_idc(profile);
sps.constraint_set_flags = h264_constraint_set_flags(profile);
Mapping (per H.264 spec for typical contents):
- Main (77):
0x02(constraint_set1_flag — Main subset of Baseline conformance hint). - Baseline / ConstrainedBaseline (66):
0x42(constraint_set1_flag + constraint_set6_flag for constrained baseline) or0x40. - High (100):
0x00(no constraints — high profile fields are unconstrained). - MultiviewHigh (118), StereoHigh (128):
0x00.
Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams:
- Main → 0x02
- ConstrainedBaseline → 0x42
- Baseline → 0x40
- High / MultiviewHigh / StereoHigh → 0x00
For BBB H.264 Main, this is 0x02 — matching what kdirect sends.
LOC
~10 LOC: a new h264_constraint_set_flags(profile) helper in h264.c similar to existing h264_profile_to_idc, plus 1 line in h264_set_controls.
Expected outcome
Three possibilities:
- Best case: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1).
- Middle case: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields).
- Worst case: no change. Same hash. Continue Phase 4b with next α candidate.
Phase 5b review requirement
Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific).
Phase 7 outcome summary for Phase 8 close
Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to "kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing".
If α-1 (Phase 4b/6b) closes the bug:
- iter8 = clean PASS.
- Bug 4 closed.
- Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264.
If α-1 doesn't close:
- iter8 = PARTIAL — Bug 4 narrowed but not fixed.
- Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding).
Tracking
| Artifact | Hash |
|---|---|
| h264_baseline.yuv | 71ac099b8d007836… |
| h264_zeroed.yuv | 71ac099b8d007836… |
| h264_gamma.yuv | 71ac099b8d007836… |
| kdirect_h264.yuv | 1e7a0bc98d5bd83f… (target, correct) |
Backend SHA e4649a48… carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.