iter8 Phase 7 (γ + IMP-1): root cause confirmed kernel-side
γ dump confirms libva reads buffer correctly; the 16x32 patch and stride-4 UV markers appear at YUV output exactly as in the dump. IMP-1 memset-before-QBUF test: pre-zeroing buffer does NOT change output (identical hash). The 512 bytes ARE deterministic kernel writes, not stale residue. Bug root cause: rkvdec accepts libva's H.264 decode request without error flags but writes only 16x32 of luma-neutral data + stride-4 UV scratch. Kernel decoded a tiny bit then stopped. Phase 3 SPS diff: libva SPS.constraint_set_flags=0x00 vs kdirect's 0x02 — likely the kernel hint that triggers rkvdec's full decode path for Main profile. Phase 4b α-1 fix: derive constraint_set_flags per VAProfile in h264_set_controls. ~10 LOC. Phase 5b review required. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,132 @@
|
|||||||
|
# Iteration 8 — Phase 7 (γ + IMP-1 verification)
|
||||||
|
|
||||||
|
Captured 2026-05-13 on fresnel via `fresnel.vpn`. Backend SHA `e4649a48…` (iter8 P6 + IMP-1 exp + stdlib include fix). Three commits on fork: `7eae6ea` (γ dump), `66ecbef` (memset gate), `6f4e583` (stdlib.h fix-fwd).
|
||||||
|
|
||||||
|
## What γ revealed
|
||||||
|
|
||||||
|
Running libva H.264 sweep with `LIBVA_V4L2_DUMP_CAPTURE=1`:
|
||||||
|
|
||||||
|
- **Slot v4l2_index=0** (surface 67108864, first decode): plane[0] sz=2088960, scan=30720 (one MB row), **non_zero=256**. Head bytes: `81 81 80 80 80 7f 7f 7f 7f 7f 7f 80 80 80 81 81 00 00 ... 00`.
|
||||||
|
- **Slots v4l2_index=1..9** (subsequent decodes): plane[0] **non_zero=0**. Head bytes: all 0.
|
||||||
|
- **Slot v4l2_index=10** (cap_pool wraps, surface 67108864 reused): plane[0] non_zero=256, head: `80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 80 00 00...`.
|
||||||
|
- **All slots plane[1] (UV)**: non_zero in 16-32 byte range, **stride-4 pattern** `01 00 00 00 00 00 00 00 01 00 00 00 ...` — looks like kernel scratch markers (frame indices, ref counts, or similar).
|
||||||
|
- **Plane[0] tail bytes**: all zero across all slots.
|
||||||
|
|
||||||
|
Key conclusion: **what γ shows in destination_data[] is EXACTLY what appears in the libva-derived YUV output**. The hash of the dump-on YUV (`71ac099b…`) is identical to baseline. Libva is reading the buffer correctly.
|
||||||
|
|
||||||
|
## What IMP-1 (memset-before-QBUF) revealed
|
||||||
|
|
||||||
|
Pre-zeroing each CAPTURE slot's mmap region in BeginPicture (after cap_pool_acquire, before QBUF) with `LIBVA_V4L2_ZERO_CAPTURE=1`:
|
||||||
|
|
||||||
|
| Run | Hash | Frame 1 non-zero | Frame 2 non-zero | Frame 3 non-zero |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| baseline (no pre-zero) | `71ac099b…` | 512 | 32 | 16 |
|
||||||
|
| pre-zero (LIBVA_V4L2_ZERO_CAPTURE=1) | `71ac099b…` (identical) | 512 | 32 | 16 |
|
||||||
|
|
||||||
|
**The 16×32 patch is NOT stale residue.** The kernel deterministically writes those 512 bytes regardless of pre-state. Same goes for the 32/16-byte stride-4 UV-plane markers in frames 2/3.
|
||||||
|
|
||||||
|
## Definitive root cause classification
|
||||||
|
|
||||||
|
The bug is confirmed kernel-side. **rkvdec accepts libva's H.264 decode request, ack's it via MEDIA_REQUEST_IOC_QUEUE/DQBUF without error flags, but writes only a tiny structured patch and stops.**
|
||||||
|
|
||||||
|
This rules out from Phase 4's outcome table:
|
||||||
|
- ~~Plane[0] fully populated → libva mis-reads~~ — eliminated (γ shows only 256 non-zero in scan).
|
||||||
|
- ~~Plane[0] populated with other content → slot binding wrong~~ — eliminated (γ shows the buffer holds exactly what YUV output displays).
|
||||||
|
|
||||||
|
Confirmed:
|
||||||
|
- **Plane[0] only 16×32 populated → kernel didn't decode properly** — the leading hypothesis. The 16-byte head and the row-wise repetition match a tiled output where the kernel wrote partial state.
|
||||||
|
|
||||||
|
The 0x80-class luma-neutral values in frame 1's 16×32 patch and the kernel's stride-4 UV markers tell us **rkvdec ran some initialization/scratch DMA but did NOT complete decoding the bitstream**. The kernel may be aborting after a control-validation failure or an internal state error not surfaced as a DQBUF FLAG_ERROR.
|
||||||
|
|
||||||
|
## Comparison with kdirect
|
||||||
|
|
||||||
|
kdirect's S_EXT_CTRLS payloads (frame 1 SPS dump from Phase 3):
|
||||||
|
|
||||||
|
```
|
||||||
|
libva SPS[0..32]: 4d 00 29 00 01 00 00 01 00 03 02 00 [zeros...]
|
||||||
|
kdirect SPS[0..32]: 4d 02 29 00 01 00 00 01 00 03 02 00 [zeros...]
|
||||||
|
^^
|
||||||
|
constraint_set_flags differs
|
||||||
|
```
|
||||||
|
|
||||||
|
Both backends:
|
||||||
|
- profile_idc = 0x4D = 77 (Main).
|
||||||
|
- level_idc = 0x29 = 41 (Level 4.1).
|
||||||
|
- seq_parameter_set_id = 0.
|
||||||
|
- chroma_format_idc = 1 (4:2:0).
|
||||||
|
- bit_depth_*_minus8 = 0.
|
||||||
|
- log2_max_frame_num_minus4 = 1.
|
||||||
|
- pic_order_cnt_type = 0.
|
||||||
|
- log2_max_pic_order_cnt_lsb_minus4 = 3.
|
||||||
|
- max_num_ref_frames = 2.
|
||||||
|
|
||||||
|
Trailing bytes (1040..1047): identical in both — pic_width_in_mbs=119, pic_height=67, flags=0x50 (FRAME_MBS_ONLY|DIRECT_8X8_INFERENCE).
|
||||||
|
|
||||||
|
**The only meaningful SPS diff is `constraint_set_flags`: 0x00 (libva) vs 0x02 (kdirect).**
|
||||||
|
|
||||||
|
`constraint_set1_flag` (bit 1) indicates "this Main-profile stream conforms to Baseline constraints" — informational per H.264 spec. But the kernel may be using it as a profile-detection hint or branch in its parser.
|
||||||
|
|
||||||
|
## Phase 4b plan: α-1 SPS constraint_set_flags fix
|
||||||
|
|
||||||
|
### Mechanism
|
||||||
|
|
||||||
|
In `h264.c::h264_set_controls`, after setting `sps.profile_idc`, derive `constraint_set_flags` based on the VAAPI profile:
|
||||||
|
|
||||||
|
```c
|
||||||
|
sps.profile_idc = h264_profile_to_idc(profile);
|
||||||
|
sps.constraint_set_flags = h264_constraint_set_flags(profile);
|
||||||
|
```
|
||||||
|
|
||||||
|
Mapping (per H.264 spec for typical contents):
|
||||||
|
- **Main** (77): `0x02` (constraint_set1_flag — Main subset of Baseline conformance hint).
|
||||||
|
- **Baseline / ConstrainedBaseline** (66): `0x42` (constraint_set1_flag + constraint_set6_flag for constrained baseline) or `0x40`.
|
||||||
|
- **High** (100): `0x00` (no constraints — high profile fields are unconstrained).
|
||||||
|
- **MultiviewHigh** (118), **StereoHigh** (128): `0x00`.
|
||||||
|
|
||||||
|
Simplest correct mapping per H.264 §7.4.2.1.1 for typical streams:
|
||||||
|
- Main → 0x02
|
||||||
|
- ConstrainedBaseline → 0x42
|
||||||
|
- Baseline → 0x40
|
||||||
|
- High / MultiviewHigh / StereoHigh → 0x00
|
||||||
|
|
||||||
|
For BBB H.264 Main, this is 0x02 — matching what kdirect sends.
|
||||||
|
|
||||||
|
### LOC
|
||||||
|
|
||||||
|
~10 LOC: a new `h264_constraint_set_flags(profile)` helper in `h264.c` similar to existing `h264_profile_to_idc`, plus 1 line in `h264_set_controls`.
|
||||||
|
|
||||||
|
### Expected outcome
|
||||||
|
|
||||||
|
Three possibilities:
|
||||||
|
|
||||||
|
1. **Best case**: libva H.264 starts producing correct YUV. Hash matches kdirect. iter8 PASS (criterion 1).
|
||||||
|
2. **Middle case**: libva H.264 still 512-byte partial-fill OR different non-zero count. The constraint_set_flags wasn't the cause; further diff needed (DECODE_PARAMS fields).
|
||||||
|
3. **Worst case**: no change. Same hash. Continue Phase 4b with next α candidate.
|
||||||
|
|
||||||
|
### Phase 5b review requirement
|
||||||
|
|
||||||
|
Per CLAUDE.md "reviews are never skippable," a Phase 5b review of α-1 is required before Phase 6b implementation. Since the change is mechanical (10 LOC), the review is short — the reviewer should verify the H.264 spec mapping is correct and that the per-profile lookup doesn't break VP9/HEVC/MPEG-2/VP8 (it can't — it's H.264-specific).
|
||||||
|
|
||||||
|
## Phase 7 outcome summary for Phase 8 close
|
||||||
|
|
||||||
|
Iter8 Phase 7 has narrowed Bug 4 from "H.264 inter race-loss (vague)" to **"kernel writes only 16×32 luma-neutral patch + stride-4 UV scratch; libva readback is correct; SPS constraint_set_flags is missing"**.
|
||||||
|
|
||||||
|
If α-1 (Phase 4b/6b) closes the bug:
|
||||||
|
- iter8 = clean PASS.
|
||||||
|
- Bug 4 closed.
|
||||||
|
- Memory rule worth recording: SPS constraint_set_flags is load-bearing for rkvdec H.264.
|
||||||
|
|
||||||
|
If α-1 doesn't close:
|
||||||
|
- iter8 = PARTIAL — Bug 4 narrowed but not fixed.
|
||||||
|
- Defer to iter9 with refined hypothesis (e.g., DECODE_PARAMS fields, or slice-data encoding).
|
||||||
|
|
||||||
|
## Tracking
|
||||||
|
|
||||||
|
| Artifact | Hash |
|
||||||
|
|---|---|
|
||||||
|
| h264_baseline.yuv | `71ac099b8d007836…` |
|
||||||
|
| h264_zeroed.yuv | `71ac099b8d007836…` |
|
||||||
|
| h264_gamma.yuv | `71ac099b8d007836…` |
|
||||||
|
| kdirect_h264.yuv | `1e7a0bc98d5bd83f…` (target, correct) |
|
||||||
|
|
||||||
|
Backend SHA `e4649a48…` carries γ + IMP-1 + stdlib include. Phase 4b α-1 commit will land next.
|
||||||
Reference in New Issue
Block a user