Files
fresnel-fourier/phase4_iter11_plan.md
marfrit a18ba53d6b iter11 Phase 3 + 4: HEVC SPS wire-byte diff narrows Bug 5 to α-13
Phase 3 deep strace: only meaningful SPS diff is bytes 10-11.
  libva   bytes 10-11 = 00 00 (sps_max_num_reorder_pics=0, latency=0)
  kdirect bytes 10-11 = 02 04 (reorder=2, latency=4)

Hardcoded at h265.c:110-111 with comment "/* not exposed */". VAAPI's
VAPictureParameterBufferHEVC doesn't forward these; kdirect parses
SPS NAL directly. sps_max_num_reorder_pics = 0 tells rkvdec "no
reordering" -> B-frame decode blocked -> all-zero output (Bug 5 fits).

Secondary diffs (Phase 4b candidates if α-13 doesn't close):
  - SLICE_PARAMS num_entry_point_offsets = 0 (hardcoded at h265.c:356
    with "iter2 doesn't do tiles" comment); kdirect submits 22.
  - PPS UNIFORM_SPACING flag bit 20 (don't-care for non-tiled).

Phase 4 α-13: ~2 LOC fix. Set sps_max_num_reorder_pics =
sps_max_dec_pic_buffering_minus1 (safe upper bound per H.265 §A.4.2).
Leave sps_max_latency_increase_plus1 = 0 (spec "unconstrained").

Phase 5b review required before Phase 6b implementation per
"reviews never skippable".

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-14 01:58:03 +00:00

4.5 KiB
Raw Permalink Blame History

Iteration 11 — Phase 4 plan

Drafted 2026-05-14 after Phase 3 narrowed Bug 5 to three concrete wire-byte diffs (SPS bytes 10-11, PPS flags bit 20, SLICE_PARAMS bytes 0-3 + 8-11 + 22).

Mechanism

src/h265.c::h265_fill_sps lines 110-111 hardcode:

sps->sps_max_num_reorder_pics = 0;       /* not exposed */
sps->sps_max_latency_increase_plus1 = 0; /* not exposed */

The kernel uses sps_max_num_reorder_pics to size reorder buffers / allow B-frame decode reordering. = 0 blocks reordering for streams that need it (BBB HEVC has B-frames in IBBP). kdirect derives the real value (2 for BBB) by parsing the SPS NAL directly; libva can't see the SPS NAL since VAAPI consumes it client-side.

Spec constraint: sps_max_num_reorder_pics[i] ≤ sps_max_dec_pic_buffering_minus1[i] (allowed reorder ≤ DPB capacity). Using dec_pic_buffering_minus1 as the value is a safe upper bound — allows full DPB-depth reordering, which is always sufficient and never violates the spec inequality.

For sps_max_latency_increase_plus1 = 0: per spec, 0 means "SpsMaxLatencyPictures is not specified" (i.e., no constraint). Leaving at 0 is correct.

Proposed fix (α-13)

Two-line change in h265_fill_sps:

/*
 * iter11 α-13 (Bug 5 fix): VAAPI's VAPictureParameterBufferHEVC does
 * not expose sps_max_num_reorder_pics. Hardcoding 0 told rkvdec "no
 * reordering needed" and blocked B-frame decode — observed empirically
 * as all-zero CAPTURE output through libva for BBB HEVC (B-frames in
 * IBBP pattern). kdirect parses the SPS NAL directly and submits the
 * stream's actual value (2 for BBB level 4.x Main).
 *
 * Use sps_max_dec_pic_buffering_minus1 as a safe upper bound: per
 * H.265 §A.4.2, sps_max_num_reorder_pics ≤ sps_max_dec_pic_buffering_minus1
 * always holds, so substituting the latter is conformant and gives the
 * kernel enough budget to reorder up to the DPB depth. May allocate
 * slightly more reorder slots than the bitstream's tight constraint
 * (kdirect's 2 vs our 4 for BBB), but kernel-side allocation cost is
 * negligible for typical content.
 *
 * sps_max_latency_increase_plus1 stays at 0 — spec default meaning
 * "SpsMaxLatencyPictures unconstrained" (§A.4.2). Setting kdirect's
 * value of 4 (latency_increase = 3) would be more restrictive, not
 * more correct.
 */
sps->sps_max_num_reorder_pics = picture->sps_max_dec_pic_buffering_minus1;
sps->sps_max_latency_increase_plus1 = 0;

LOC: ~2 functional + ~20 comment.

Scope

In scope: src/h265.c::h265_fill_sps.

Out of scope:

  • num_entry_point_offsets (slice-header field; needs full slice header parse).
  • PPS UNIFORM_SPACING flag (don't-care for typical content).
  • VP9 / VP8 / H.264 / MPEG-2 paths (HEVC-specific change by construction).

Phase 5 review concerns

  • Is sps_max_num_reorder_pics = sps_max_dec_pic_buffering_minus1 safe? Reviewer should verify the spec inequality.
  • Could it produce wrong decode for streams where actual reorder < dec_pic_buf? Reviewer should consider whether over-reporting reorder causes any kernel-side issue.
  • Per feedback_unconditional_codec_state.md: HEVC fill is gated by VAProfileHEVCMain dispatch in picture.c, so no risk to non-HEVC codecs.
  • Per feedback_wire_vs_behavior.md: Phase 7 must verify criterion-1 hash match, not just wire-byte diff vs kdirect.

Phase 7 verification matrix

  1. Build, install on fresnel.
  2. Run libva HEVC sweep. Compare hash against kdirect's 9340b832….
  3. Re-strace to confirm SPS byte 10 now = sps_max_dec_pic_buffering_minus1 value.
  4. Run 5-codec regression sweep. H.264 / VP9 / MPEG-2 / VP8 anchors must hold.

Iter11 PASS = libva_hevc.yuv == kdirect_hevc.yuv + zero regression.

Risks

  • R-1: Setting sps_max_num_reorder_pics to dec_pic_buf may over-report and confuse the kernel. Probability: low (spec inequality always holds; kernel uses field as upper bound). Mitigation: rollback is 1 line.
  • R-2: The num_entry_point_offsets = 0 issue is the actual cause. α-13 then doesn't fix Bug 5. Probability: medium. Mitigation: Phase 7 hash mismatch → Phase 4b parses slice header for entry points.
  • R-3: PPS UNIFORM_SPACING / other byte diff is the actual cause. Probability: low (UNIFORM_SPACING is don't-care for non-tiled; BBB use of tiles surfaced via num_entry_point_offsets — see R-2).
  • R-4: Regression on VP9/MPEG-2/H.264/VP8. Zero by construction (HEVC-only change path).

Predicted iter11 cadence

  • Phase 5b review: 15-20 min.
  • Phase 6: 10 min.
  • Phase 7: 15 min.
  • Phase 8: 10 min.

Total: ~1 hour wallclock if α-13 lands fix.