diff --git a/phase3_final_close.md b/phase3_final_close.md new file mode 100644 index 0000000..0e4f4e3 --- /dev/null +++ b/phase3_final_close.md @@ -0,0 +1,70 @@ +# Phase 3 FINAL close — 10 iterations, structural stall, PIVOT recommended + +Date: 2026-05-17 ~08:35. Phase 3 closes at first-light-impossible after 10 register-tuning iterations + 2 architect-review cycles. + +## What Janet's structural review (post phase3_close.md) added + +Janet's prescription: zero `reg197_cabactbl_base` + struct-size sanity check, then PIVOT if still stuck. Acted on it; the diagnostic IOMMU faults revealed substantial new info: + +| Iter | Change | Result | +|---|---|---| +| 7 | `reg197 = 0` + sizeof asserts | IOMMU fault @ iova=0x700 (HW reads reg197 at offset 0x700!) | +| 8 | `reg197 = 16KB zero scratch` | IOMMU fault @ iova=0x300 (next register HW reads) | +| 9 | iter8 + `reg161/163 = scratch` (pps_base / rps_base) | No faults; HW hangs silently | +| 10 | iter9 + `reg160/162/172 = scratch` (zero-byte prob bases) | Identical hang to iter9 | + +**Key new findings:** +- HW reads `reg197_cabactbl_base`, `reg161_pps_base`, `reg163_rps_base`, and the prob bases UNCONDITIONALLY for VP9, despite BSP not explicitly populating most of these. BSP works because its kernel driver treats `fd=0` as "no buffer" and skips translation; mainline writes raw IOVAs so any 0 register IS read by HW. +- **Probability buffer content does NOT matter for the stall.** Zero-byte scratch (iter10) and legacy-format `priv_tbl.probs` (iter1-6) both hang HW identically. The "wrong prob format" hypothesis is falsified. +- Struct size check: `vp9_param = 196 bytes` vs needed `256 bytes` (60-byte tail unwritten for reg113..127), `common_addr = 60 bytes` vs full `128 bytes` (gap of reg143..159). HEVC has the same gaps and works → unwritten tail isn't the cause. + +## Final hypothesis space (all 3 require pivot or major effort) + +1. **BSP-specific kernel-side init we're missing**: cache config (`RKVDEC_REG_CACHE0/1/2_SIZE_BASE` + clear), AXI/QoS setup (`reg256/257/270`), or SRAM pool routing. Mainline HEVC works without these so they may not be VP9-relevant — but VP9's HW pipeline stages may differ. + +2. **vdpu381 mode-2 (VP9) was reserved but not validated**: Casanova's v7.0 series shipped HEVC + H.264 register definitions; `VDPU381_MODE_VP9=2` constant is in the header but no working reference exists. We're the first to drive it via mainline. + +3. **Probability buffer requires VP9-specific INITIALIZATION** (per BSP `hal_vp9d_prob_default`) — not just any pointer. Iter10 (zero-byte scratch) hung the same way as legacy-format, but maybe HW needs SPECIFIC byte patterns (CABAC-like lookup tables) that we'd need to reverse-engineer. + +## Branch state + +`boltzmann:~/src/linux-rockchip:vp9-enablement-iter1` — head `3d7ffae30626`. 7 commits total. 1620 LoC across 4 files. Compiles clean. Format enumerates correctly. HW does not decode. + +## Ampere state + +Currently loaded: iter10 (HW hangs on every VP9 frame). Recovery to sibling-campaign close: `sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec`. HEVC bit-perfect restored. + +## Pivot options (Janet recommended PIVOT) + +| Option | Effort | Outcome | +|---|---|---| +| **A: AV1 on vdpu383** | 1-2 days port (Casanova has full vdpu383 + AV1 already in v7.0) | New HW codec on ampere working; VP9 stays SW-only | +| **B: Add full BSP `mpp_rkvdec2` rkvdec2_run sequence to mainline** | Weeks; won't upstream | VP9 might work; not maintainable | +| **C: Coordinate with Collabora upstream** | Indefinite wait | Eventually a clean port exists | +| **D: Reverse-engineer probability buffer init from BSP `hal_vp9d_prob_default`** | Days of careful work | Maybe completes the current attempt | +| **E: Document findings + abandon** | 0 | Campaign closes at "structural impossibility identified" | + +## Campaign total + +- 0:50 — sibling campaign ampere-kernel-decoders closes (HEVC bit-perfect) +- 1:00 — Phase 0 opens +- 1:35 — Phase 1 plan v1 → Janet AMEND (2 BLOCKERs) +- 2:00 — Phase 1 plan v2 → Janet AMEND (2 amendments) +- 2:15 — Phase 1 v3 amendment → PROCEED +- 2:30 — Phase 2 implementation begins +- 3:00 — Phase 2.1 complete (full register-packing translation, compiles clean) +- 8:00 — Phase 3 install begins after morning resume +- 8:15 — Phase 3 first-light fails (6 iterations) +- 8:20 — Janet structural review → PIVOT (with one final test prescription) +- 8:30 — Iters 7-10 confirm structural impossibility +- 8:35 — Phase 3 final close + +Total: ~7h45m of active work (excluding overnight break). + +## Honest assessment for the user + +We have a structurally-correct kernel module that exposes VP9F format on `/dev/video1`, compiles clean, has the right segmented register layout per the BSP, and writes to all the right register addresses. The hardware reads our register configuration but never decodes — it hangs at some stage we cannot see through the IRQ status interface. + +Without HW logic-analyzer access or Collabora's internal validation suite, the gap between "register state we provide" and "register state HW needs" is opaque to mainline. BSP works because it includes complete kernel + userspace + firmware-table init; mainline's vdpu381 path is HEVC + H.264 only and that's not coincidence. + +Per Janet's PIVOT verdict, recommend Option A (AV1 on vdpu383) for shortest-path-to-working-HW-decode on ampere. VP9-on-vdpu381 should be coordinated with Collabora or deferred. Our work is preserved as the substrate for whoever takes this forward.