Files
ampere-vp9-enablement/phase3_final_close.md
claude-noether 117ee76762 Phase 3 FINAL close: structural stall confirmed across 10 iterations
Janet structural review + 4 more diagnostic iters (7-10) revealed:
- HW reads reg197_cabactbl_base, reg161/163, and prob bases
  unconditionally for VP9 (BSP works because its kernel skips fd=0;
  mainline writes raw IOVAs so 0 = IOMMU fault)
- Prob content does NOT matter for stall (zero-byte scratch and
  legacy-format priv_tbl.probs hang identically)
- Struct size gaps (60-byte unwritten tail in vp9_param) match HEVC's
  layout and aren't the cause

Final hypothesis space:
1. BSP-specific kernel init we're missing (cache config / AXI QoS)
2. vdpu381 mode-2 reserved but never validated by Casanova
3. Prob buffer needs VP9-specific INITIALIZATION (not just any pointer)

All 3 require structural rework, not register tuning. Per Janet
PIVOT verdict, recommend Option A: pivot to AV1 on vdpu383 (Casanova
ships it complete in v7.0; 1-2 days port).

Branch boltzmann:~/src/linux-rockchip:vp9-enablement-iter1 head
3d7ffae30626 — 7 commits, 1620 LoC, compiles clean, VP9F enumerates
on /dev/video1, HW does not decode.

Ampere recovery to sibling-campaign close:
  sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close \
    /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/
  sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec

Campaign closes at "structural impossibility identified", with
substantial substrate preserved for future Collabora coordination or
RE work.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 06:21:30 +00:00

5.4 KiB

Phase 3 FINAL close — 10 iterations, structural stall, PIVOT recommended

Date: 2026-05-17 ~08:35. Phase 3 closes at first-light-impossible after 10 register-tuning iterations + 2 architect-review cycles.

What Janet's structural review (post phase3_close.md) added

Janet's prescription: zero reg197_cabactbl_base + struct-size sanity check, then PIVOT if still stuck. Acted on it; the diagnostic IOMMU faults revealed substantial new info:

Iter Change Result
7 reg197 = 0 + sizeof asserts IOMMU fault @ iova=0x700 (HW reads reg197 at offset 0x700!)
8 reg197 = 16KB zero scratch IOMMU fault @ iova=0x300 (next register HW reads)
9 iter8 + reg161/163 = scratch (pps_base / rps_base) No faults; HW hangs silently
10 iter9 + reg160/162/172 = scratch (zero-byte prob bases) Identical hang to iter9

Key new findings:

  • HW reads reg197_cabactbl_base, reg161_pps_base, reg163_rps_base, and the prob bases UNCONDITIONALLY for VP9, despite BSP not explicitly populating most of these. BSP works because its kernel driver treats fd=0 as "no buffer" and skips translation; mainline writes raw IOVAs so any 0 register IS read by HW.
  • Probability buffer content does NOT matter for the stall. Zero-byte scratch (iter10) and legacy-format priv_tbl.probs (iter1-6) both hang HW identically. The "wrong prob format" hypothesis is falsified.
  • Struct size check: vp9_param = 196 bytes vs needed 256 bytes (60-byte tail unwritten for reg113..127), common_addr = 60 bytes vs full 128 bytes (gap of reg143..159). HEVC has the same gaps and works → unwritten tail isn't the cause.

Final hypothesis space (all 3 require pivot or major effort)

  1. BSP-specific kernel-side init we're missing: cache config (RKVDEC_REG_CACHE0/1/2_SIZE_BASE + clear), AXI/QoS setup (reg256/257/270), or SRAM pool routing. Mainline HEVC works without these so they may not be VP9-relevant — but VP9's HW pipeline stages may differ.

  2. vdpu381 mode-2 (VP9) was reserved but not validated: Casanova's v7.0 series shipped HEVC + H.264 register definitions; VDPU381_MODE_VP9=2 constant is in the header but no working reference exists. We're the first to drive it via mainline.

  3. Probability buffer requires VP9-specific INITIALIZATION (per BSP hal_vp9d_prob_default) — not just any pointer. Iter10 (zero-byte scratch) hung the same way as legacy-format, but maybe HW needs SPECIFIC byte patterns (CABAC-like lookup tables) that we'd need to reverse-engineer.

Branch state

boltzmann:~/src/linux-rockchip:vp9-enablement-iter1 — head 3d7ffae30626. 7 commits total. 1620 LoC across 4 files. Compiles clean. Format enumerates correctly. HW does not decode.

Ampere state

Currently loaded: iter10 (HW hangs on every VP9 frame). Recovery to sibling-campaign close: sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec. HEVC bit-perfect restored.

Option Effort Outcome
A: AV1 on vdpu383 1-2 days port (Casanova has full vdpu383 + AV1 already in v7.0) New HW codec on ampere working; VP9 stays SW-only
B: Add full BSP mpp_rkvdec2 rkvdec2_run sequence to mainline Weeks; won't upstream VP9 might work; not maintainable
C: Coordinate with Collabora upstream Indefinite wait Eventually a clean port exists
D: Reverse-engineer probability buffer init from BSP hal_vp9d_prob_default Days of careful work Maybe completes the current attempt
E: Document findings + abandon 0 Campaign closes at "structural impossibility identified"

Campaign total

  • 0:50 — sibling campaign ampere-kernel-decoders closes (HEVC bit-perfect)
  • 1:00 — Phase 0 opens
  • 1:35 — Phase 1 plan v1 → Janet AMEND (2 BLOCKERs)
  • 2:00 — Phase 1 plan v2 → Janet AMEND (2 amendments)
  • 2:15 — Phase 1 v3 amendment → PROCEED
  • 2:30 — Phase 2 implementation begins
  • 3:00 — Phase 2.1 complete (full register-packing translation, compiles clean)
  • 8:00 — Phase 3 install begins after morning resume
  • 8:15 — Phase 3 first-light fails (6 iterations)
  • 8:20 — Janet structural review → PIVOT (with one final test prescription)
  • 8:30 — Iters 7-10 confirm structural impossibility
  • 8:35 — Phase 3 final close

Total: ~7h45m of active work (excluding overnight break).

Honest assessment for the user

We have a structurally-correct kernel module that exposes VP9F format on /dev/video1, compiles clean, has the right segmented register layout per the BSP, and writes to all the right register addresses. The hardware reads our register configuration but never decodes — it hangs at some stage we cannot see through the IRQ status interface.

Without HW logic-analyzer access or Collabora's internal validation suite, the gap between "register state we provide" and "register state HW needs" is opaque to mainline. BSP works because it includes complete kernel + userspace + firmware-table init; mainline's vdpu381 path is HEVC + H.264 only and that's not coincidence.

Per Janet's PIVOT verdict, recommend Option A (AV1 on vdpu383) for shortest-path-to-working-HW-decode on ampere. VP9-on-vdpu381 should be coordinated with Collabora or deferred. Our work is preserved as the substrate for whoever takes this forward.