Files
claude-noether 53df917afb Phase 3 close: first-light FAILED across 6 iterations — HW stalls
Module loads + format enumerates (kernel-side wiring works) but every
VP9 decode attempt stalls the HW: STA_INT=0x23 (IRQ_RAW + TIMEOUT) when
HW timeout enabled; NO IRQ at all when timeout disabled (genuine stall).

6 iterations of register-tuning all failed identically. Hypothesis
space narrowed to 3 structural-level possibilities:

1. Probability buffer format mismatch (legacy struct rkvdec_vp9_probs
   vs BSP hal_vp9d_prob_default format)
2. Missing kernel-side init (cache config, SRAM/QoS, AXI setup that
   BSP mpp_rkvdec2 does but mainline HEVC happens not to need)
3. vdpu381 register layout doesn't fully expose VP9 control surface
   (BSP MPP may rely on mpp_dev_set_reg_offset translations we don't
   replicate)

Recommended next path: Sonnet/Janet architect review BEFORE more
iteration. The 6 single-bit tuning rounds established that no bit-flip
fix is sufficient — structural rethink required.

Sibling-campaign close state (HEVC bit-perfect) recoverable on ampere
via backup at ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close
+ depmod cycle.

Phase 0-2.1 work preserved at boltzmann:~/src/linux-rockchip:
vp9-enablement-iter1 (6 commits, 1517 LoC). Format enumeration on
/dev/video1 confirms kernel-side wiring is correct.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-17 06:14:06 +00:00

6.7 KiB

Phase 3 close — first-light HW STALL across 6 iterations

Date: 2026-05-17 ~08:15. Phase 3 (install + first decode) did NOT achieve first-light. 6 iterations of register-tuning all fail with the same pattern.

What works

  • Module builds + loads + format enumerates: v4l2-ctl --device=/dev/video1 --list-formats-out shows VP9F alongside S265 (HEVC) and S264 (H264). Kernel-side wiring (vdpu381_coded_fmts[] entry, ops table, sysfs/uapi exposure) is correct.
  • libva backend (iter38b) accepts VP9 frames and pushes to V4L2 OUTPUT.
  • DMA addresses look reasonable (IOMMU-virtual high addrs).
  • HEVC + H.264 unchanged from sibling-campaign-close (bit-perfect).

What fails

Every VP9 decode attempt times out at the hardware level. With reg011.dec_timeout_e = 1 (HW timeout enabled):

rkvdec vdpu381 IRQ #0: STA_INT=0x00000023  DEC_RDY=0 ERROR=0 TIMEOUT=1 SOFTRST=0

0x23 = IRQ_RAW (BIT 1) | TIMEOUT (BIT 5) | IRQ (BIT 0). HW engaged, ran its internal timer to threshold, fired IRQ with TIMEOUT bit.

With dec_timeout_e = 0 (HW timeout disabled): HW never fires any IRQ. Kernel software watchdog reports Frame processing timed out! every 40ms but no hardware IRQ. Confirms HW is genuinely stuck, not just slow.

Iterations attempted (all on boltzmann:~/src/linux-rockchip:vp9-enablement-iter1)

Iter Change Result
1 Baseline (Phase 2.1 complete: 4-segment writes, RCB, block-gating, deltas, prob aliasing) TIMEOUT
2 +12 BSP fields (reg010.dec_e, reg011 err_fills, reg012.colmv_compress, reg013.cur_pic_is_idr, reg026.busifd_auto_gating, reg028.poc_arb, full reg103 flag set, lasttile = stream_len - first_part_size, stream_len padding to +0x80) TIMEOUT (same bit pattern)
3 iter2 + prob_update_en = 0 (rule out priv_tbl.probs format mismatch) TIMEOUT (no change)
3b iter3 + cprhdr_offset = 0 (BSP semantic) TIMEOUT (no change)
4 HEVC-aligned (slice_num=1, pix_range_det, raw stream_len no padding, bytesperline-based strides) TIMEOUT (no change)
5 iter4 + reg160/162/172 = 0 (NULL all prob bases — rule out prob buffer entirely) TIMEOUT (no change)
6 iter5 + dec_timeout_e = 0 NO IRQ AT ALL (genuine HW stall)

Diagnostic data captured

vp9 #0: stream_len=140 uncomp_hdr=18 comp_hdr=10 cprhdr_off=0 lasttile=130
        rlc_base=0xfcf00000 decout=0xffe00000 first16=...
  • 140-byte first frame is plausible (black fade-in keyframe per sibling-campaign discovery).
  • 18-byte uncompressed + 10-byte compressed header agree with VP9 spec.
  • DMA addrs are IOMMU-virtual (rkvdec0_mmu enabled).
  • vb2_plane_vaddr returned NULL — backend uses dmabuf import, not MMAP. CPU-side byte-dump can't validate stream contents but HW reads via DMA addr regardless.

Most-likely cause (architect review territory)

The hypothesis space narrowed to three structural-level possibilities:

  1. Probability buffer format mismatch: legacy struct rkvdec_vp9_probs (vdpu34x layout) ≠ what vdpu381 HW expects. BSP MPP has hal_vp9d_prob_default() writing a different format. Iter5 (NULL bases) didn't help — but HW may still try to read at address 0 and fault silently.

  2. Missing kernel-side init (cache / SRAM / AXI-QoS): BSP mpp_rkvdec2.c::rkvdec2_run writes RKVDEC_REG_CACHE0/1/2_SIZE_BASE and clears caches before each decode; also writes statistic-block setup (reg256/257/270 — QoS). Mainline HEVC doesn't do these and works — but maybe VP9 specifically requires them.

  3. vdpu381 register layout doesn't fully expose VP9 control surface: Casanova's v7.0 mainline series added vdpu381 register definitions for HEVC + H.264 only. The struct definitions we extended in rkvdec-vdpu381-regs.h are based on BSP vdpu382_vp9d.h — but maybe the BSP MPP also relies on mpp_rkvdec2 DRIVER-side translations (mpp_dev_set_reg_offset(cfg->dev, 167, hw_ctx->offset_count) for example) that mainline doesn't replicate.

Sonnet/Janet architect review before further iteration. The 6 iterations of register-tuning have established:

  • No bit-flip-level fix is sufficient
  • HW is genuinely stuck on something structural we're missing

The review should consider:

  • Pull mainline RKVDEC2 series WIP (Collabora) for VP9 hints
  • Compare BSP mpp_dev_set_reg_offset usage to mainline rkvdec_memcpy_toio — does BSP do per-register address translation we're missing?
  • Whether mainline-port-target should be the RKVDEC2 driver path (newer) instead of extending Casanova's vdpu381
  • Add SRAM/cache init alongside the regs writes

Tonight's persistent state

  • Branch: boltzmann:~/src/linux-rockchip:vp9-enablement-iter1 — 6 commits, 1517 LoC
    • 47431635801d regs header VP9 struct definitions
    • da8482271938 shared codec-spec helpers in rkvdec-vp9-common.h
    • 71cc0d96d212 new vdpu381-vp9 backend
    • 63c4db0095ca Phase 2.1 register-packing
    • feaae8743450 iter1-6 first-light diagnostic state (this iteration)
  • Ampere module: /lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko (iter6 = 6th first-light attempt). To revert to sibling-campaign close: sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec (HEVC bit-perfect restored).
  • Diagnostic instrumentation in kernel: vdpu381_irq_handler logs first 10 IRQs; vdpu381_config_vp9_regs logs first 3 frames. Removable in iter7.
  • Off-device backup: fresnel:~/backups/ampere/vp9-iter1/rockchip-vdec.ko.sibling-campaign-close md5 5f5cfba42750e8d70902eb1381017d92.
  • NOT touched: legacy rkvdec-vp9.c (RK3399 path untouched), kernel patches from sibling campaign (still in place).

Campaign summary so far

  • Phase 0: closed (substrate, architectural correction)
  • Phase 1: closed (3 plan revisions, Janet PROCEED)
  • Phase 2: closed (compile-clean backend, 1390 LoC)
  • Phase 2.1: closed (full register-packing translation)
  • Phase 3 (this): FAILED first-light, 6 register-tuning iterations all stall HW

Total session work: ~10 hours (00:50 sibling-campaign close → 08:15 Phase 3 first-light failure). Substantial code artifact + comprehensive failure data; structural rethink required next.

Honest assessment

We've translated BSP's MPP register layout to mainline vdpu381 register structs and applied every documented BSP setup step. The HW still refuses to decode. The remaining failure modes are structural: either the prob buffer needs a totally-different format we haven't reverse-engineered, OR the kernel-side init has gaps mainline never had to fill for HEVC. Both require architect-level rethinking rather than more iteration.