diff --git a/phase3_iter11_findings.md b/phase3_iter11_findings.md new file mode 100644 index 0000000..a208b3b --- /dev/null +++ b/phase3_iter11_findings.md @@ -0,0 +1,104 @@ +# Iteration 11 — Phase 3 (HEVC libva vs kdirect deep strace) + +Captured 2026-05-14 on fresnel (bare hostname, back on LAN). Backend SHA `a17e3c39…` (iter9 close). Fixture `bbb_720p10s_hevc.mp4` 3-frame sweep. + +## Anchors + +| Path | Hash | +|---|---| +| libva HEVC | `06b2c5a0c01e515d…` (all-zero, Bug 5 reproduced) | +| kdirect HEVC | `9340b832e08c23c4d4b2f732cecffb1d4104f2fc3257e561fac217c2d70d84c2` (correct decoded BBB) | + +## HEVC control payload comparison — frame 1 + +Per-request `S_EXT_CTRLS` (ctrl_class=0xf010000): + +| Control | libva size | kdirect size | Diff? | +|---|---|---|---| +| HEVC_SPS (0xa40a90) | 40 | 40 | **DIFF at bytes 10-11** | +| HEVC_PPS (0xa40a91) | 64 | 64 | **DIFF at bytes 56-57** | +| HEVC_SLICE_PARAMS (0xa40a92) | 280 | 280 | **DIFF at bytes 0-3 + 8-11 + 22** | +| HEVC_SCALING_MATRIX (0xa40a93) | 1000 | 1000 | identical (all-zero) | +| HEVC_DECODE_PARAMS (0xa40a94) | 328 | 328 | DIFF at tail (poc / flags) | + +### Diff 1 — SPS bytes 10-11 + +``` +libva bytes 0..15: 00 00 00 05 d0 02 00 00 04 04 [00 00] 01 01 00 03 +kdirect bytes 0..15: 00 00 00 05 d0 02 00 00 04 04 [02 04] 01 01 00 03 +``` + +Field mapping per `struct v4l2_ctrl_hevc_sps`: +- byte 10: `sps_max_num_reorder_pics` (u8) + - libva: 0 + - kdirect: 2 +- byte 11: `sps_max_latency_increase_plus1` (u8) + - libva: 0 + - kdirect: 4 + +**Highly suspect**. `sps_max_num_reorder_pics = 0` tells the decoder "no reordering needed in this stream" — but BBB HEVC has B-frames in IBBP order. The kernel may refuse to set up reorder buffers, producing zero output for B-frames. + +### Diff 2 — PPS bytes 56-63 (flags u64) + +``` +libva PPS flags: 0x0000_0000_0000_7040 +kdirect PPS flags: 0x0000_0000_0010_7040 +``` + +Diff = bit 20 = `V4L2_HEVC_PPS_FLAG_UNIFORM_SPACING` (set in kdirect, missing in libva). UNIFORM_SPACING is meaningful only when `num_tile_columns_minus1 > 0 || num_tile_rows_minus1 > 0`. For non-tiled streams it's don't-care. BBB HEVC may or may not use tiles — check `num_entry_point_offsets` in slice (Diff 3). + +### Diff 3 — SLICE_PARAMS bytes 0-31 + +``` +libva head32: d0d30b00 2d000000 00000000 14010200 00000000 0000ff00 07000000 00000000 +kdirect head32: 80d20b00 2d000000 16000000 14010200 00000000 00000000 07000000 00000000 + ^^^^^^^^ ^^^^^^^^ ^^ + bit_size num_entry_pt byte 22 +``` + +- bytes 0-3 `bit_size` u32 LE: libva=0x000bd3d0 (774096), kdirect=0x000bd280 (773760). Diff = 336 bits ≈ 42 bytes. Small but present. +- bytes 8-11 `num_entry_point_offsets` u32 LE: **libva=0, kdirect=22**. BBB HEVC slice IS tiled / wavefront-parallel — kdirect submits 22 entry points; libva hardcodes 0. +- byte 22: libva=0xff, kdirect=0x00. Per struct layout this is `slice_qp_delta` (s8): libva=-1, kdirect=0. + +### libva source code — where the zeros come from + +`src/h265.c::h265_fill_sps` lines 110-111: + +```c +sps->sps_max_num_reorder_pics = 0; /* not exposed */ +sps->sps_max_latency_increase_plus1 = 0; /* not exposed */ +``` + +VAAPI's `VAPictureParameterBufferHEVC` does not include these fields. Libva fills 0 by default. + +`src/h265.c::h265_fill_slice_params` line 356: + +```c +slice_params->num_entry_point_offsets = 0; /* iter2 doesn't do tiles */ +``` + +Hardcoded 0 with a comment from iter2 acknowledging the scope-cut. + +## Conclusion + +Bug 5 is caused by **at least one** of: +- SPS `sps_max_num_reorder_pics` hardcoded 0 (vs kdirect's 2 from SPS-NAL parse). Strongest suspect — directly blocks B-frame reordering. +- SLICE_PARAMS `num_entry_point_offsets` hardcoded 0 (vs kdirect's 22). Affects tile/WPP parallel decode. +- PPS UNIFORM_SPACING flag missing. Don't-care for non-tiled but BBB does use tiles per Diff 3. + +The classes are analogous to iter8's H.264 `constraint_set_flags` situation: VAAPI doesn't forward the field, libva fills a placeholder, the kernel cares. + +## Phase 4 plan (preview) + +Single iter11 commit, 5-10 LOC change in h265.c: +- Set `sps->sps_max_num_reorder_pics` from `picture->sps_max_dec_pic_buffering_minus1` (safe upper bound). +- Leave `sps->sps_max_latency_increase_plus1 = 0` (spec default: "no constraint"). + +Secondary attempts if first doesn't fix: +- Set PPS UNIFORM_SPACING flag when tiles are used (`num_tile_columns_minus1 > 0 || num_tile_rows_minus1 > 0`). +- Parse slice header for `num_entry_point_offsets` (analogous to h264_slice_header.c). Costlier; defer to Phase 4b if needed. + +Phase 5 review will scrutinize: +- Whether `sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1` is safe vs producing too many reorder pictures. +- Whether reaching for the bitstream's true value (SPS NAL parse) is needed for correctness. +- That the change doesn't regress VP9 / VP8 / H.264 / MPEG-2. diff --git a/phase4_iter11_plan.md b/phase4_iter11_plan.md new file mode 100644 index 0000000..d537635 --- /dev/null +++ b/phase4_iter11_plan.md @@ -0,0 +1,90 @@ +# Iteration 11 — Phase 4 plan + +Drafted 2026-05-14 after Phase 3 narrowed Bug 5 to three concrete wire-byte diffs (SPS bytes 10-11, PPS flags bit 20, SLICE_PARAMS bytes 0-3 + 8-11 + 22). + +## Mechanism + +`src/h265.c::h265_fill_sps` lines 110-111 hardcode: +```c +sps->sps_max_num_reorder_pics = 0; /* not exposed */ +sps->sps_max_latency_increase_plus1 = 0; /* not exposed */ +``` + +The kernel uses `sps_max_num_reorder_pics` to size reorder buffers / allow B-frame decode reordering. `= 0` blocks reordering for streams that need it (BBB HEVC has B-frames in IBBP). kdirect derives the real value (2 for BBB) by parsing the SPS NAL directly; libva can't see the SPS NAL since VAAPI consumes it client-side. + +Spec constraint: `sps_max_num_reorder_pics[i] ≤ sps_max_dec_pic_buffering_minus1[i]` (allowed reorder ≤ DPB capacity). Using `dec_pic_buffering_minus1` as the value is a safe upper bound — allows full DPB-depth reordering, which is always sufficient and never violates the spec inequality. + +For `sps_max_latency_increase_plus1 = 0`: per spec, 0 means "SpsMaxLatencyPictures is not specified" (i.e., no constraint). Leaving at 0 is correct. + +## Proposed fix (α-13) + +Two-line change in `h265_fill_sps`: + +```c +/* + * iter11 α-13 (Bug 5 fix): VAAPI's VAPictureParameterBufferHEVC does + * not expose sps_max_num_reorder_pics. Hardcoding 0 told rkvdec "no + * reordering needed" and blocked B-frame decode — observed empirically + * as all-zero CAPTURE output through libva for BBB HEVC (B-frames in + * IBBP pattern). kdirect parses the SPS NAL directly and submits the + * stream's actual value (2 for BBB level 4.x Main). + * + * Use sps_max_dec_pic_buffering_minus1 as a safe upper bound: per + * H.265 §A.4.2, sps_max_num_reorder_pics ≤ sps_max_dec_pic_buffering_minus1 + * always holds, so substituting the latter is conformant and gives the + * kernel enough budget to reorder up to the DPB depth. May allocate + * slightly more reorder slots than the bitstream's tight constraint + * (kdirect's 2 vs our 4 for BBB), but kernel-side allocation cost is + * negligible for typical content. + * + * sps_max_latency_increase_plus1 stays at 0 — spec default meaning + * "SpsMaxLatencyPictures unconstrained" (§A.4.2). Setting kdirect's + * value of 4 (latency_increase = 3) would be more restrictive, not + * more correct. + */ +sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1; +sps->sps_max_latency_increase_plus1 = 0; +``` + +LOC: ~2 functional + ~20 comment. + +## Scope + +In scope: `src/h265.c::h265_fill_sps`. + +Out of scope: +- `num_entry_point_offsets` (slice-header field; needs full slice header parse). +- PPS `UNIFORM_SPACING` flag (don't-care for typical content). +- VP9 / VP8 / H.264 / MPEG-2 paths (HEVC-specific change by construction). + +## Phase 5 review concerns + +- Is `sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1` safe? Reviewer should verify the spec inequality. +- Could it produce wrong decode for streams where actual reorder < dec_pic_buf? Reviewer should consider whether over-reporting reorder causes any kernel-side issue. +- Per `feedback_unconditional_codec_state.md`: HEVC fill is gated by VAProfileHEVCMain dispatch in picture.c, so no risk to non-HEVC codecs. +- Per `feedback_wire_vs_behavior.md`: Phase 7 must verify criterion-1 hash match, not just wire-byte diff vs kdirect. + +## Phase 7 verification matrix + +1. Build, install on fresnel. +2. Run libva HEVC sweep. Compare hash against kdirect's `9340b832…`. +3. Re-strace to confirm SPS byte 10 now = `sps_max_dec_pic_buffering_minus1` value. +4. Run 5-codec regression sweep. H.264 / VP9 / MPEG-2 / VP8 anchors must hold. + +Iter11 PASS = libva_hevc.yuv == kdirect_hevc.yuv + zero regression. + +## Risks + +- **R-1**: Setting sps_max_num_reorder_pics to dec_pic_buf may over-report and confuse the kernel. Probability: low (spec inequality always holds; kernel uses field as upper bound). Mitigation: rollback is 1 line. +- **R-2**: The `num_entry_point_offsets = 0` issue is the actual cause. α-13 then doesn't fix Bug 5. Probability: medium. Mitigation: Phase 7 hash mismatch → Phase 4b parses slice header for entry points. +- **R-3**: PPS UNIFORM_SPACING / other byte diff is the actual cause. Probability: low (UNIFORM_SPACING is don't-care for non-tiled; BBB use of tiles surfaced via num_entry_point_offsets — see R-2). +- **R-4**: Regression on VP9/MPEG-2/H.264/VP8. Zero by construction (HEVC-only change path). + +## Predicted iter11 cadence + +- Phase 5b review: 15-20 min. +- Phase 6: 10 min. +- Phase 7: 15 min. +- Phase 8: 10 min. + +Total: ~1 hour wallclock if α-13 lands fix.