From 84c939692f7b119b0540b7ed66fe874dcd963826 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Wed, 13 May 2026 12:23:55 +0000 Subject: [PATCH] =?UTF-8?q?iter8=20Phase=207=20(=CE=B3=20+=20IMP-1):=20roo?= =?UTF-8?q?t=20cause=20confirmed=20kernel-side?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit γ dump confirms libva reads buffer correctly; the 16x32 patch and stride-4 UV markers appear at YUV output exactly as in the dump. IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output (identical hash). The 512 bytes ARE deterministic kernel writes, not stale residue. Bug root cause: rkvdec accepts libva's H.264 decode request without error flags but writes only 16x32 of luma-neutral data + stride-4 UV scratch. Kernel decoded a tiny bit then stopped. Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's 0x02 — likely the kernel hint that triggers rkvdec's full decode path for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required. Co-Authored-By: Claude Opus 4.7 (1M context) --- phase7_iter8_verification.md | 132 +++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 phase7_iter8_verification.md diff --git a/phase7_iter8_verification.md b/phase7_iter8_verification.md new file mode 100644 index 0000000..c935e41 --- /dev/null +++ b/phase7_iter8_verification.md @@ -0,0 +1,132 @@ +# Iteration 8 — Phase 7 (γ + IMP-1 verification) + +Captured 2026-05-13 on fresnel via `fresnel.vpn`. Backend SHA `e4649a48…` (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: `7eae6ea` (γ dump), `66ecbef` (memset gate), `6f4e583` (stdlib.h fix-fwd). + +## What γ revealed + +Running libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`: + +- **Slot v4l2_index=0** (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), **non_zero=256**. Head bytes: `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00`. +- **Slots v4l2_index=1..9** (subsequent decodes): plane[0] **non_zero=0**. Head bytes: all 0. +- **Slot v4l2_index=10** (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head: `80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00...`. +- **All slots plane[1] (UV)**: non_zero in 16-32 byte range, **stride-4 pattern** `01 00 00 00 00 00 00 00 01 00 00 00 ...` — looks like kernel scratch markers (frame indices, ref counts, or similar). +- **Plane[0] tail bytes**: all zero across all slots. + +Key conclusion: **what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output**. The hash of the dump-on YUV (`71ac099b…`) is identical to baseline. Libva is reading the buffer correctly. + +## What IMP-1 (memset-before-QBUF) revealed + +Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with `LIBVA_V4L2_ZERO_CAPTURE=1`: + +| Run | Hash | Frame 1 non-zero | Frame 2 non-zero | Frame 3 non-zero | +|---|---|---|---|---| +| baseline (no pre-zero) | `71ac099b…` | 512 | 32 | 16 | +| pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) | `71ac099b…` (identical) | 512 | 32 | 16 | + +**The 16×32 patch is NOT stale residue.** The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3. + +## Definitive root cause classification + +The bug is confirmed kernel-side. **rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.** + +This rules out from Phase 4's outcome table: +- ~~Plane[0] fully populated → libva mis-reads~~ — eliminated (γ shows only 256 non-zero in scan). +- ~~Plane[0] populated with other content → slot binding wrong~~ — eliminated (γ shows the buffer holds exactly what YUV output displays). + +Confirmed: +- **Plane[0] only 16×32 populated → kernel didn't decode properly** — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state. + +The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us **rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream**. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR. + +## Comparison with kdirect + +kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3): + +``` +libva SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...] +kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...] + ^^ + constraint_set_flags differs +``` + +Both backends: +- profile_idc = 0x4D = 77 (Main). +- level_idc = 0x29 = 41 (Level 4.1). +- seq_parameter_set_id = 0. +- chroma_format_idc = 1 (4:2:0). +- bit_depth_*_minus8 = 0. +- log2_max_frame_num_minus4 = 1. +- pic_order_cnt_type = 0. +- log2_max_pic_order_cnt_lsb_minus4 = 3. +- max_num_ref_frames = 2. + +Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE). + +**The only meaningful SPS diff is `constraint_set_flags`: 0x00 (libva) vs 0x02 (kdirect).** + +`constraint_set1_flag` (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser. + +## Phase 4b plan: α-1 SPS constraint_set_flags fix + +### Mechanism + +In `h264.c::h264_set_controls`, after setting `sps.profile_idc`, derive `constraint_set_flags` based on the VAAPI profile: + +```c +sps.profile_idc = h264_profile_to_idc(profile); +sps.constraint_set_flags = h264_constraint_set_flags(profile); +``` + +Mapping (per H.264 spec for typical contents): +- **Main** (77): `0x02` (constraint_set1_flag — Main subset of Baseline conformance hint). +- **Baseline / ConstrainedBaseline** (66): `0x42` (constraint_set1_flag + constraint_set6_flag for constrained baseline) or `0x40`. +- **High** (100): `0x00` (no constraints — high profile fields are unconstrained). +- **MultiviewHigh** (118), **StereoHigh** (128): `0x00`. + +Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams: +- Main → 0x02 +- ConstrainedBaseline → 0x42 +- Baseline → 0x40 +- High / MultiviewHigh / StereoHigh → 0x00 + +For BBB H.264 Main, this is 0x02 — matching what kdirect sends. + +### LOC + +~10 LOC: a new `h264_constraint_set_flags(profile)` helper in `h264.c` similar to existing `h264_profile_to_idc`, plus 1 line in `h264_set_controls`. + +### Expected outcome + +Three possibilities: + +1. **Best case**: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1). +2. **Middle case**: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields). +3. **Worst case**: no change. Same hash. Continue Phase 4b with next α candidate. + +### Phase 5b review requirement + +Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific). + +## Phase 7 outcome summary for Phase 8 close + +Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to **"kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing"**. + +If α-1 (Phase 4b/6b) closes the bug: +- iter8 = clean PASS. +- Bug 4 closed. +- Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264. + +If α-1 doesn't close: +- iter8 = PARTIAL — Bug 4 narrowed but not fixed. +- Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding). + +## Tracking + +| Artifact | Hash | +|---|---| +| h264_baseline.yuv | `71ac099b8d007836…` | +| h264_zeroed.yuv | `71ac099b8d007836…` | +| h264_gamma.yuv | `71ac099b8d007836…` | +| kdirect_h264.yuv | `1e7a0bc98d5bd83f…` (target, correct) | + +Backend SHA `e4649a48…` carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.