Files
fresnel-fourier/phase7_iter8_verification.md
T
marfrit 84c939692f iter8 Phase 7 (γ + IMP-1): root cause confirmed kernel-side
γ dump confirms libva reads buffer correctly; the 16x32 patch and
stride-4 UV markers appear at YUV output exactly as in the dump.

IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output
(identical hash). The 512 bytes ARE deterministic kernel writes, not
stale residue.

Bug root cause: rkvdec accepts libva's H.264 decode request without
error flags but writes only 16x32 of luma-neutral data + stride-4 UV
scratch. Kernel decoded a tiny bit then stopped.

Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's
0x02 — likely the kernel hint that triggers rkvdec's full decode path
for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per
VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 12:23:55 +00:00

133 lines
6.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iteration 8 — Phase 7 (γ + IMP-1 verification)
Captured 2026-05-13 on fresnel via `fresnel.vpn`. Backend SHA `e4649a48…` (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: `7eae6ea` (γ dump), `66ecbef` (memset gate), `6f4e583` (stdlib.h fix-fwd).
## What γ revealed
Running libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`:
- **Slot v4l2_index=0** (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), **non_zero=256**. Head bytes: `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00`.
- **Slots v4l2_index=1..9** (subsequent decodes): plane[0] **non_zero=0**. Head bytes: all 0.
- **Slot v4l2_index=10** (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head: `80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00...`.
- **All slots plane[1] (UV)**: non_zero in 16-32 byte range, **stride-4 pattern** `01 00 00 00 00 00 00 00 01 00 00 00 ...` — looks like kernel scratch markers (frame indices, ref counts, or similar).
- **Plane[0] tail bytes**: all zero across all slots.
Key conclusion: **what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output**. The hash of the dump-on YUV (`71ac099b…`) is identical to baseline. Libva is reading the buffer correctly.
## What IMP-1 (memset-before-QBUF) revealed
Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with `LIBVA_V4L2_ZERO_CAPTURE=1`:
| Run | Hash | Frame 1 non-zero | Frame 2 non-zero | Frame 3 non-zero |
|---|---|---|---|---|
| baseline (no pre-zero) | `71ac099b…` | 512 | 32 | 16 |
| pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) | `71ac099b…` (identical) | 512 | 32 | 16 |
**The 16×32 patch is NOT stale residue.** The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3.
## Definitive root cause classification
The bug is confirmed kernel-side. **rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.**
This rules out from Phase 4's outcome table:
- ~~Plane[0] fully populated → libva mis-reads~~ — eliminated (γ shows only 256 non-zero in scan).
- ~~Plane[0] populated with other content → slot binding wrong~~ — eliminated (γ shows the buffer holds exactly what YUV output displays).
Confirmed:
- **Plane[0] only 16×32 populated → kernel didn't decode properly** — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state.
The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us **rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream**. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR.
## Comparison with kdirect
kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3):
```
libva SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...]
kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...]
^^
constraint_set_flags differs
```
Both backends:
- profile_idc = 0x4D = 77 (Main).
- level_idc = 0x29 = 41 (Level 4.1).
- seq_parameter_set_id = 0.
- chroma_format_idc = 1 (4:2:0).
- bit_depth_*_minus8 = 0.
- log2_max_frame_num_minus4 = 1.
- pic_order_cnt_type = 0.
- log2_max_pic_order_cnt_lsb_minus4 = 3.
- max_num_ref_frames = 2.
Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE).
**The only meaningful SPS diff is `constraint_set_flags`: 0x00 (libva) vs 0x02 (kdirect).**
`constraint_set1_flag` (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser.
## Phase 4b plan: α-1 SPS constraint_set_flags fix
### Mechanism
In `h264.c::h264_set_controls`, after setting `sps.profile_idc`, derive `constraint_set_flags` based on the VAAPI profile:
```c
sps.profile_idc = h264_profile_to_idc(profile);
sps.constraint_set_flags = h264_constraint_set_flags(profile);
```
Mapping (per H.264 spec for typical contents):
- **Main** (77): `0x02` (constraint_set1_flag — Main subset of Baseline conformance hint).
- **Baseline / ConstrainedBaseline** (66): `0x42` (constraint_set1_flag + constraint_set6_flag for constrained baseline) or `0x40`.
- **High** (100): `0x00` (no constraints — high profile fields are unconstrained).
- **MultiviewHigh** (118), **StereoHigh** (128): `0x00`.
Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams:
- Main → 0x02
- ConstrainedBaseline → 0x42
- Baseline → 0x40
- High / MultiviewHigh / StereoHigh → 0x00
For BBB H.264 Main, this is 0x02 — matching what kdirect sends.
### LOC
~10 LOC: a new `h264_constraint_set_flags(profile)` helper in `h264.c` similar to existing `h264_profile_to_idc`, plus 1 line in `h264_set_controls`.
### Expected outcome
Three possibilities:
1. **Best case**: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1).
2. **Middle case**: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields).
3. **Worst case**: no change. Same hash. Continue Phase 4b with next α candidate.
### Phase 5b review requirement
Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific).
## Phase 7 outcome summary for Phase 8 close
Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to **"kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing"**.
If α-1 (Phase 4b/6b) closes the bug:
- iter8 = clean PASS.
- Bug 4 closed.
- Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264.
If α-1 doesn't close:
- iter8 = PARTIAL — Bug 4 narrowed but not fixed.
- Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding).
## Tracking
| Artifact | Hash |
|---|---|
| h264_baseline.yuv | `71ac099b8d007836…` |
| h264_zeroed.yuv | `71ac099b8d007836…` |
| h264_gamma.yuv | `71ac099b8d007836…` |
| kdirect_h264.yuv | `1e7a0bc98d5bd83f…` (target, correct) |
Backend SHA `e4649a48…` carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.