diff --git a/phase4c_iter8_plan.md b/phase4c_iter8_plan.md new file mode 100644 index 0000000..fdb28d6 --- /dev/null +++ b/phase4c_iter8_plan.md @@ -0,0 +1,117 @@ +# Iteration 8 — Phase 4c plan (α-2 POC strip audit) + +Drafted 2026-05-13 after Phase 5b CRIT-1 killed α-1 (rkvdec ignores `constraint_set_flags`). Phase 4c shifts to the reviewer's IMP-1 redirect: audit DPB entry construction. Empirical re-read of Phase 3 strace bytes pins the leading hypothesis to **POC sentinel strip**, not flags. + +## Strace re-decode for kdirect frame 2 dpb[0] + +``` +\370*\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0\0\0\0\0\1\0\0\0\1\0\3\0\0\0 +``` + +Struct field decode (32 bytes per `struct v4l2_h264_dpb_entry` from ``): + +| Offset | Field | Width | Hex | Value | Interpretation | +|---|---|---|---|---|---| +| 0–7 | reference_ts | u64 | f8 2a 00 00 00 00 00 00 | 0x2af8 = 10,968 | small relative ns | +| 8–11 | pic_num | u32 | 00 00 00 00 | 0 | | +| 12–13 | frame_num | u16 | 00 00 | 0 | | +| 14 | fields | u8 | 03 | V4L2_H264_FRAME_REF | | +| 15–19 | reserved | u8 [5] | 00 × 5 | — | | +| 20–23 | top_field_order_cnt | s32 | 00 00 01 00 | **0x00010000 = 65536** | **FFmpeg sentinel-encoded POC** | +| 24–27 | bottom_field_order_cnt | s32 | 00 00 01 00 | **0x00010000 = 65536** | **FFmpeg sentinel-encoded POC** | +| 28–31 | flags | u32 | 03 00 00 00 | 0x03 = VALID \| ACTIVE | | + +**libva sends** for the same reference frame: +``` +\310\224,\33\256\34\257\30\0\0\0\0\0\0\3\0\0\0\0\0\0\0\0\0\0\0\0\0\3\0\0\0 +``` +- top_field_order_cnt = 0 +- bottom_field_order_cnt = 0 +- flags = 0x03 (VALID|ACTIVE) — same as kdirect + +**The diff that matters: libva's POC fields are 0 (sentinel stripped); kdirect's are 65536 (sentinel preserved).** + +flags match between libva and kdirect (both 0x03). My earlier Phase 3 / Phase 5b interpretation of "kdirect flags = 0x30001" was incorrect — those bytes are tfoc + bfoc + flags, not just flags. + +## Why this could be the kernel-side bug + +Per rkvdec-h264.c lines 975-978 (config_registers): +```c +writel_relaxed(dpb[i].top_field_order_cnt, + rkvdec->regs + poc_reg_tbl_top_field[i]); +writel_relaxed(dpb[i].bottom_field_order_cnt, + rkvdec->regs + poc_reg_tbl_bottom_field[i]); +``` + +The driver writes POC values directly to MMIO registers. The hardware uses POC for motion-compensation reference picture lookup. If libva strips bit 16 and kdirect preserves it, the values sent to hardware differ by 65536 — a large numerical difference. + +For the current frame: +```c +reg = RKVDEC_CUR_POC(dec_params->top_field_order_cnt); +writel_relaxed(reg, rkvdec->regs + RKVDEC_REG_CUR_POC0); +``` + +Same direct MMIO write. RKVDEC_CUR_POC may mask the value (e.g., to 32-bit signed), but if the strip doesn't match what hardware expects, the comparison "is this reference frame's POC equal to current frame's POC delta" fails → wrong reference chosen → garbage motion compensation. + +## Why the strip was added + +h264.c:191-218 docblock says the strip was added for hantro's `prepare_table` which feeds tbl->poc[] directly: + +> Working VAAPI backends (intel-iHD, i965 verified empirically on meitner 2026-05-02) tolerate the high word — they either mask it or treat POCs as relative comparisons. V4L2 stateless H.264 driver-side consumers (hantro_h264.c::prepare_table feeds the value direct to tbl->poc[]) need the spec value, so we strip the sentinel here at the libva-v4l2-request boundary. + +So the strip was hantro-specific. **For rkvdec, kdirect (via ffmpeg-v4l2request) preserves the sentinel and decodes correctly.** Stripping for rkvdec may be the wrong move. + +## Proposed fix + +### α-2 — Remove POC sentinel strip OR make it driver-aware + +**Option A (simplest)**: Remove the strip entirely. Match what kdirect (ffmpeg-v4l2request) sends. If hantro+H.264 still works after, great; if it regresses, fall back to Option B. + +**Option B (per-driver gating)**: Add driver-detect logic. For rkvdec: don't strip. For hantro: strip. Requires knowing which driver is active (already known via `info.driver` from MEDIA_IOC_DEVICE_INFO at init). + +Hantro+H.264 is not in the campaign scoreboard (hantro-decoder doesn't support H.264 on RK3399 anyway, and the iter4 transitive path didn't exercise hantro+H.264 directly). So Option A is safe in practice. + +### LOC + +~5 LOC for Option A: change `h264_strip_ffmpeg_poc_sentinel` to return `poc` unchanged, OR delete the function and call sites. + +For Option B: ~20 LOC (driver-detect bool + per-driver branch). + +## Risks + +- **R-1**: POC strip removal breaks hantro+H.264 elsewhere (deferred backend that does decode H.264). Probability: irrelevant on RK3399 (hantro-decoder doesn't support H.264). Mitigation: Option B. +- **R-2**: The POC diff isn't the root cause; H.264 still fails. Probability: medium. Mitigation: Phase 7c γ dump shows full-plane writes or not. +- **R-3**: VP9 / HEVC / MPEG-2 / VP8 regression. Probability: zero. POC strip is H.264-specific (only in h264.c). + +## Scope + +In scope: `src/h264.c` — `h264_strip_ffmpeg_poc_sentinel` and its 4 call sites (h264.c:309, 312, 462, 465). + +Out of scope: VP9 / VP8 / HEVC / MPEG-2 paths. + +## Phase 5c review + +I'd like a quick review of this plan. The change is small (5 LOC for Option A). Reviewer should: +- Confirm Option A doesn't break anything else. +- Validate the strace re-decode of POC bytes is correct. +- Suggest whether to skip α-2's Phase 6c if confidence is high. + +If reviewer comes back with "looks good" — Phase 6c is a 5-LOC change + build + test. Very fast. + +## Phase 7c success criteria + +- C1: libva_h264 == kdirect_h264 (criterion 1 PASS). +- C2-C6: 5 other codecs unchanged. + +If γ dump confirms full-plane writes after fix → Bug 4 closed. + +## Predicted cadence + +- Phase 5c review: 5-10 min (quick). +- Phase 6c implementation: 5 min. +- Phase 7c verification: 10 min. +- Phase 8 close: 10 min. + +Total: 30 min if α-2 works. + +If α-2 doesn't work → iter8 PARTIAL close with Bug 4 narrowed further (POC isn't load-bearing either; iter9 candidate explores PPS/SLICE deeper diff).