# Iteration 8 — Phase 7 (γ + IMP-1 verification) Captured 2026-05-13 on fresnel via `fresnel.vpn`. Backend SHA `e4649a48…` (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: `7eae6ea` (γ dump), `66ecbef` (memset gate), `6f4e583` (stdlib.h fix-fwd). ## What γ revealed Running libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`: - **Slot v4l2_index=0** (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), **non_zero=256**. Head bytes: `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00`. - **Slots v4l2_index=1..9** (subsequent decodes): plane[0] **non_zero=0**. Head bytes: all 0. - **Slot v4l2_index=10** (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head: `80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00...`. - **All slots plane[1] (UV)**: non_zero in 16-32 byte range, **stride-4 pattern** `01 00 00 00 00 00 00 00 01 00 00 00 ...` — looks like kernel scratch markers (frame indices, ref counts, or similar). - **Plane[0] tail bytes**: all zero across all slots. Key conclusion: **what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output**. The hash of the dump-on YUV (`71ac099b…`) is identical to baseline. Libva is reading the buffer correctly. ## What IMP-1 (memset-before-QBUF) revealed Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with `LIBVA_V4L2_ZERO_CAPTURE=1`: | Run | Hash | Frame 1 non-zero | Frame 2 non-zero | Frame 3 non-zero | |---|---|---|---|---| | baseline (no pre-zero) | `71ac099b…` | 512 | 32 | 16 | | pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) | `71ac099b…` (identical) | 512 | 32 | 16 | **The 16×32 patch is NOT stale residue.** The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3. ## Definitive root cause classification The bug is confirmed kernel-side. **rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.** This rules out from Phase 4's outcome table: - ~~Plane[0] fully populated → libva mis-reads~~ — eliminated (γ shows only 256 non-zero in scan). - ~~Plane[0] populated with other content → slot binding wrong~~ — eliminated (γ shows the buffer holds exactly what YUV output displays). Confirmed: - **Plane[0] only 16×32 populated → kernel didn't decode properly** — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state. The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us **rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream**. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR. ## Comparison with kdirect kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3): ``` libva SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...] kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...] ^^ constraint_set_flags differs ``` Both backends: - profile_idc = 0x4D = 77 (Main). - level_idc = 0x29 = 41 (Level 4.1). - seq_parameter_set_id = 0. - chroma_format_idc = 1 (4:2:0). - bit_depth_*_minus8 = 0. - log2_max_frame_num_minus4 = 1. - pic_order_cnt_type = 0. - log2_max_pic_order_cnt_lsb_minus4 = 3. - max_num_ref_frames = 2. Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE). **The only meaningful SPS diff is `constraint_set_flags`: 0x00 (libva) vs 0x02 (kdirect).** `constraint_set1_flag` (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser. ## Phase 4b plan: α-1 SPS constraint_set_flags fix ### Mechanism In `h264.c::h264_set_controls`, after setting `sps.profile_idc`, derive `constraint_set_flags` based on the VAAPI profile: ```c sps.profile_idc = h264_profile_to_idc(profile); sps.constraint_set_flags = h264_constraint_set_flags(profile); ``` Mapping (per H.264 spec for typical contents): - **Main** (77): `0x02` (constraint_set1_flag — Main subset of Baseline conformance hint). - **Baseline / ConstrainedBaseline** (66): `0x42` (constraint_set1_flag + constraint_set6_flag for constrained baseline) or `0x40`. - **High** (100): `0x00` (no constraints — high profile fields are unconstrained). - **MultiviewHigh** (118), **StereoHigh** (128): `0x00`. Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams: - Main → 0x02 - ConstrainedBaseline → 0x42 - Baseline → 0x40 - High / MultiviewHigh / StereoHigh → 0x00 For BBB H.264 Main, this is 0x02 — matching what kdirect sends. ### LOC ~10 LOC: a new `h264_constraint_set_flags(profile)` helper in `h264.c` similar to existing `h264_profile_to_idc`, plus 1 line in `h264_set_controls`. ### Expected outcome Three possibilities: 1. **Best case**: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1). 2. **Middle case**: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields). 3. **Worst case**: no change. Same hash. Continue Phase 4b with next α candidate. ### Phase 5b review requirement Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific). ## Phase 7 outcome summary for Phase 8 close Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to **"kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing"**. If α-1 (Phase 4b/6b) closes the bug: - iter8 = clean PASS. - Bug 4 closed. - Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264. If α-1 doesn't close: - iter8 = PARTIAL — Bug 4 narrowed but not fixed. - Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding). ## Tracking | Artifact | Hash | |---|---| | h264_baseline.yuv | `71ac099b8d007836…` | | h264_zeroed.yuv | `71ac099b8d007836…` | | h264_gamma.yuv | `71ac099b8d007836…` | | kdirect_h264.yuv | `1e7a0bc98d5bd83f…` (target, correct) | Backend SHA `e4649a48…` carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.