diff --git a/phase3_close.md b/phase3_close.md new file mode 100644 index 0000000..1a70a2b --- /dev/null +++ b/phase3_close.md @@ -0,0 +1,95 @@ +# Phase 3 close — first-light HW STALL across 6 iterations + +Date: 2026-05-17 ~08:15. Phase 3 (install + first decode) did NOT achieve first-light. 6 iterations of register-tuning all fail with the same pattern. + +## What works + +- Module builds + loads + format enumerates: `v4l2-ctl --device=/dev/video1 --list-formats-out` shows `VP9F` alongside `S265 (HEVC)` and `S264 (H264)`. Kernel-side wiring (vdpu381_coded_fmts[] entry, ops table, sysfs/uapi exposure) is correct. +- libva backend (iter38b) accepts VP9 frames and pushes to V4L2 OUTPUT. +- DMA addresses look reasonable (IOMMU-virtual high addrs). +- HEVC + H.264 unchanged from sibling-campaign-close (bit-perfect). + +## What fails + +Every VP9 decode attempt times out at the hardware level. With `reg011.dec_timeout_e = 1` (HW timeout enabled): + +``` +rkvdec vdpu381 IRQ #0: STA_INT=0x00000023 DEC_RDY=0 ERROR=0 TIMEOUT=1 SOFTRST=0 +``` + +0x23 = `IRQ_RAW (BIT 1) | TIMEOUT (BIT 5) | IRQ (BIT 0)`. HW engaged, ran its internal timer to threshold, fired IRQ with TIMEOUT bit. + +With `dec_timeout_e = 0` (HW timeout disabled): **HW never fires any IRQ**. Kernel software watchdog reports `Frame processing timed out!` every 40ms but no hardware IRQ. Confirms HW is genuinely stuck, not just slow. + +## Iterations attempted (all on `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1`) + +| Iter | Change | Result | +|---|---|---| +| 1 | Baseline (Phase 2.1 complete: 4-segment writes, RCB, block-gating, deltas, prob aliasing) | TIMEOUT | +| 2 | +12 BSP fields (reg010.dec_e, reg011 err_fills, reg012.colmv_compress, reg013.cur_pic_is_idr, reg026.busifd_auto_gating, reg028.poc_arb, full reg103 flag set, lasttile = stream_len - first_part_size, stream_len padding to +0x80) | TIMEOUT (same bit pattern) | +| 3 | iter2 + `prob_update_en = 0` (rule out priv_tbl.probs format mismatch) | TIMEOUT (no change) | +| 3b | iter3 + `cprhdr_offset = 0` (BSP semantic) | TIMEOUT (no change) | +| 4 | HEVC-aligned (slice_num=1, pix_range_det, raw stream_len no padding, bytesperline-based strides) | TIMEOUT (no change) | +| 5 | iter4 + reg160/162/172 = 0 (NULL all prob bases — rule out prob buffer entirely) | TIMEOUT (no change) | +| 6 | iter5 + dec_timeout_e = 0 | **NO IRQ AT ALL** (genuine HW stall) | + +## Diagnostic data captured + +``` +vp9 #0: stream_len=140 uncomp_hdr=18 comp_hdr=10 cprhdr_off=0 lasttile=130 + rlc_base=0xfcf00000 decout=0xffe00000 first16=... +``` + +- 140-byte first frame is plausible (black fade-in keyframe per sibling-campaign discovery). +- 18-byte uncompressed + 10-byte compressed header agree with VP9 spec. +- DMA addrs are IOMMU-virtual (rkvdec0_mmu enabled). +- `vb2_plane_vaddr` returned NULL — backend uses dmabuf import, not MMAP. CPU-side byte-dump can't validate stream contents but HW reads via DMA addr regardless. + +## Most-likely cause (architect review territory) + +The hypothesis space narrowed to three structural-level possibilities: + +1. **Probability buffer format mismatch**: legacy `struct rkvdec_vp9_probs` (vdpu34x layout) ≠ what vdpu381 HW expects. BSP MPP has `hal_vp9d_prob_default()` writing a different format. Iter5 (NULL bases) didn't help — but HW may still try to read at address 0 and fault silently. + +2. **Missing kernel-side init (cache / SRAM / AXI-QoS)**: BSP `mpp_rkvdec2.c::rkvdec2_run` writes RKVDEC_REG_CACHE0/1/2_SIZE_BASE and clears caches before each decode; also writes statistic-block setup (reg256/257/270 — QoS). Mainline HEVC doesn't do these and works — but maybe VP9 specifically requires them. + +3. **vdpu381 register layout doesn't fully expose VP9 control surface**: Casanova's v7.0 mainline series added vdpu381 register definitions for HEVC + H.264 only. The struct definitions we extended in `rkvdec-vdpu381-regs.h` are based on BSP `vdpu382_vp9d.h` — but maybe the BSP MPP also relies on mpp_rkvdec2 DRIVER-side translations (`mpp_dev_set_reg_offset(cfg->dev, 167, hw_ctx->offset_count)` for example) that mainline doesn't replicate. + +## Recommended next path forward + +**Sonnet/Janet architect review** before further iteration. The 6 iterations of register-tuning have established: +- No bit-flip-level fix is sufficient +- HW is genuinely stuck on something structural we're missing + +The review should consider: +- Pull mainline RKVDEC2 series WIP (Collabora) for VP9 hints +- Compare BSP `mpp_dev_set_reg_offset` usage to mainline `rkvdec_memcpy_toio` — does BSP do per-register address translation we're missing? +- Whether mainline-port-target should be the RKVDEC2 driver path (newer) instead of extending Casanova's vdpu381 +- Add SRAM/cache init alongside the regs writes + +## Tonight's persistent state + +- **Branch**: `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1` — 6 commits, 1517 LoC + - `47431635801d` regs header VP9 struct definitions + - `da8482271938` shared codec-spec helpers in rkvdec-vp9-common.h + - `71cc0d96d212` new vdpu381-vp9 backend + - `63c4db0095ca` Phase 2.1 register-packing + - `feaae8743450` iter1-6 first-light diagnostic state (this iteration) +- **Ampere module**: `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` (iter6 = 6th first-light attempt). To revert to sibling-campaign close: `sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec` (HEVC bit-perfect restored). +- **Diagnostic instrumentation in kernel**: vdpu381_irq_handler logs first 10 IRQs; vdpu381_config_vp9_regs logs first 3 frames. Removable in iter7. +- **Off-device backup**: `fresnel:~/backups/ampere/vp9-iter1/rockchip-vdec.ko.sibling-campaign-close` md5 5f5cfba42750e8d70902eb1381017d92. +- **NOT touched**: legacy `rkvdec-vp9.c` (RK3399 path untouched), kernel patches from sibling campaign (still in place). + +## Campaign summary so far + +- Phase 0: closed (substrate, architectural correction) +- Phase 1: closed (3 plan revisions, Janet PROCEED) +- Phase 2: closed (compile-clean backend, 1390 LoC) +- Phase 2.1: closed (full register-packing translation) +- Phase 3 (this): **FAILED first-light**, 6 register-tuning iterations all stall HW + +Total session work: ~10 hours (00:50 sibling-campaign close → 08:15 Phase 3 first-light failure). Substantial code artifact + comprehensive failure data; structural rethink required next. + +## Honest assessment + +We've translated BSP's MPP register layout to mainline vdpu381 register structs and applied every documented BSP setup step. The HW still refuses to decode. The remaining failure modes are structural: either the prob buffer needs a totally-different format we haven't reverse-engineered, OR the kernel-side init has gaps mainline never had to fill for HEVC. Both require architect-level rethinking rather than more iteration.