# Phase 3 close — first-light HW STALL across 6 iterations Date: 2026-05-17 ~08:15. Phase 3 (install + first decode) did NOT achieve first-light. 6 iterations of register-tuning all fail with the same pattern. ## What works - Module builds + loads + format enumerates: `v4l2-ctl --device=/dev/video1 --list-formats-out` shows `VP9F` alongside `S265 (HEVC)` and `S264 (H264)`. Kernel-side wiring (vdpu381_coded_fmts[] entry, ops table, sysfs/uapi exposure) is correct. - libva backend (iter38b) accepts VP9 frames and pushes to V4L2 OUTPUT. - DMA addresses look reasonable (IOMMU-virtual high addrs). - HEVC + H.264 unchanged from sibling-campaign-close (bit-perfect). ## What fails Every VP9 decode attempt times out at the hardware level. With `reg011.dec_timeout_e = 1` (HW timeout enabled): ``` rkvdec vdpu381 IRQ #0: STA_INT=0x00000023 DEC_RDY=0 ERROR=0 TIMEOUT=1 SOFTRST=0 ``` 0x23 = `IRQ_RAW (BIT 1) | TIMEOUT (BIT 5) | IRQ (BIT 0)`. HW engaged, ran its internal timer to threshold, fired IRQ with TIMEOUT bit. With `dec_timeout_e = 0` (HW timeout disabled): **HW never fires any IRQ**. Kernel software watchdog reports `Frame processing timed out!` every 40ms but no hardware IRQ. Confirms HW is genuinely stuck, not just slow. ## Iterations attempted (all on `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1`) | Iter | Change | Result | |---|---|---| | 1 | Baseline (Phase 2.1 complete: 4-segment writes, RCB, block-gating, deltas, prob aliasing) | TIMEOUT | | 2 | +12 BSP fields (reg010.dec_e, reg011 err_fills, reg012.colmv_compress, reg013.cur_pic_is_idr, reg026.busifd_auto_gating, reg028.poc_arb, full reg103 flag set, lasttile = stream_len - first_part_size, stream_len padding to +0x80) | TIMEOUT (same bit pattern) | | 3 | iter2 + `prob_update_en = 0` (rule out priv_tbl.probs format mismatch) | TIMEOUT (no change) | | 3b | iter3 + `cprhdr_offset = 0` (BSP semantic) | TIMEOUT (no change) | | 4 | HEVC-aligned (slice_num=1, pix_range_det, raw stream_len no padding, bytesperline-based strides) | TIMEOUT (no change) | | 5 | iter4 + reg160/162/172 = 0 (NULL all prob bases — rule out prob buffer entirely) | TIMEOUT (no change) | | 6 | iter5 + dec_timeout_e = 0 | **NO IRQ AT ALL** (genuine HW stall) | ## Diagnostic data captured ``` vp9 #0: stream_len=140 uncomp_hdr=18 comp_hdr=10 cprhdr_off=0 lasttile=130 rlc_base=0xfcf00000 decout=0xffe00000 first16=... ``` - 140-byte first frame is plausible (black fade-in keyframe per sibling-campaign discovery). - 18-byte uncompressed + 10-byte compressed header agree with VP9 spec. - DMA addrs are IOMMU-virtual (rkvdec0_mmu enabled). - `vb2_plane_vaddr` returned NULL — backend uses dmabuf import, not MMAP. CPU-side byte-dump can't validate stream contents but HW reads via DMA addr regardless. ## Most-likely cause (architect review territory) The hypothesis space narrowed to three structural-level possibilities: 1. **Probability buffer format mismatch**: legacy `struct rkvdec_vp9_probs` (vdpu34x layout) ≠ what vdpu381 HW expects. BSP MPP has `hal_vp9d_prob_default()` writing a different format. Iter5 (NULL bases) didn't help — but HW may still try to read at address 0 and fault silently. 2. **Missing kernel-side init (cache / SRAM / AXI-QoS)**: BSP `mpp_rkvdec2.c::rkvdec2_run` writes RKVDEC_REG_CACHE0/1/2_SIZE_BASE and clears caches before each decode; also writes statistic-block setup (reg256/257/270 — QoS). Mainline HEVC doesn't do these and works — but maybe VP9 specifically requires them. 3. **vdpu381 register layout doesn't fully expose VP9 control surface**: Casanova's v7.0 mainline series added vdpu381 register definitions for HEVC + H.264 only. The struct definitions we extended in `rkvdec-vdpu381-regs.h` are based on BSP `vdpu382_vp9d.h` — but maybe the BSP MPP also relies on mpp_rkvdec2 DRIVER-side translations (`mpp_dev_set_reg_offset(cfg->dev, 167, hw_ctx->offset_count)` for example) that mainline doesn't replicate. ## Recommended next path forward **Sonnet/Janet architect review** before further iteration. The 6 iterations of register-tuning have established: - No bit-flip-level fix is sufficient - HW is genuinely stuck on something structural we're missing The review should consider: - Pull mainline RKVDEC2 series WIP (Collabora) for VP9 hints - Compare BSP `mpp_dev_set_reg_offset` usage to mainline `rkvdec_memcpy_toio` — does BSP do per-register address translation we're missing? - Whether mainline-port-target should be the RKVDEC2 driver path (newer) instead of extending Casanova's vdpu381 - Add SRAM/cache init alongside the regs writes ## Tonight's persistent state - **Branch**: `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1` — 6 commits, 1517 LoC - `47431635801d` regs header VP9 struct definitions - `da8482271938` shared codec-spec helpers in rkvdec-vp9-common.h - `71cc0d96d212` new vdpu381-vp9 backend - `63c4db0095ca` Phase 2.1 register-packing - `feaae8743450` iter1-6 first-light diagnostic state (this iteration) - **Ampere module**: `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` (iter6 = 6th first-light attempt). To revert to sibling-campaign close: `sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec` (HEVC bit-perfect restored). - **Diagnostic instrumentation in kernel**: vdpu381_irq_handler logs first 10 IRQs; vdpu381_config_vp9_regs logs first 3 frames. Removable in iter7. - **Off-device backup**: `fresnel:~/backups/ampere/vp9-iter1/rockchip-vdec.ko.sibling-campaign-close` md5 5f5cfba42750e8d70902eb1381017d92. - **NOT touched**: legacy `rkvdec-vp9.c` (RK3399 path untouched), kernel patches from sibling campaign (still in place). ## Campaign summary so far - Phase 0: closed (substrate, architectural correction) - Phase 1: closed (3 plan revisions, Janet PROCEED) - Phase 2: closed (compile-clean backend, 1390 LoC) - Phase 2.1: closed (full register-packing translation) - Phase 3 (this): **FAILED first-light**, 6 register-tuning iterations all stall HW Total session work: ~10 hours (00:50 sibling-campaign close → 08:15 Phase 3 first-light failure). Substantial code artifact + comprehensive failure data; structural rethink required next. ## Honest assessment We've translated BSP's MPP register layout to mainline vdpu381 register structs and applied every documented BSP setup step. The HW still refuses to decode. The remaining failure modes are structural: either the prob buffer needs a totally-different format we haven't reverse-engineered, OR the kernel-side init has gaps mainline never had to fill for HEVC. Both require architect-level rethinking rather than more iteration.