Phase 3 close: first-light FAILED across 6 iterations — HW stalls
Module loads + format enumerates (kernel-side wiring works) but every VP9 decode attempt stalls the HW: STA_INT=0x23 (IRQ_RAW + TIMEOUT) when HW timeout enabled; NO IRQ at all when timeout disabled (genuine stall). 6 iterations of register-tuning all failed identically. Hypothesis space narrowed to 3 structural-level possibilities: 1. Probability buffer format mismatch (legacy struct rkvdec_vp9_probs vs BSP hal_vp9d_prob_default format) 2. Missing kernel-side init (cache config, SRAM/QoS, AXI setup that BSP mpp_rkvdec2 does but mainline HEVC happens not to need) 3. vdpu381 register layout doesn't fully expose VP9 control surface (BSP MPP may rely on mpp_dev_set_reg_offset translations we don't replicate) Recommended next path: Sonnet/Janet architect review BEFORE more iteration. The 6 single-bit tuning rounds established that no bit-flip fix is sufficient — structural rethink required. Sibling-campaign close state (HEVC bit-perfect) recoverable on ampere via backup at ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close + depmod cycle. Phase 0-2.1 work preserved at boltzmann:~/src/linux-rockchip: vp9-enablement-iter1 (6 commits, 1517 LoC). Format enumeration on /dev/video1 confirms kernel-side wiring is correct. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,95 @@
|
|||||||
|
# Phase 3 close — first-light HW STALL across 6 iterations
|
||||||
|
|
||||||
|
Date: 2026-05-17 ~08:15. Phase 3 (install + first decode) did NOT achieve first-light. 6 iterations of register-tuning all fail with the same pattern.
|
||||||
|
|
||||||
|
## What works
|
||||||
|
|
||||||
|
- Module builds + loads + format enumerates: `v4l2-ctl --device=/dev/video1 --list-formats-out` shows `VP9F` alongside `S265 (HEVC)` and `S264 (H264)`. Kernel-side wiring (vdpu381_coded_fmts[] entry, ops table, sysfs/uapi exposure) is correct.
|
||||||
|
- libva backend (iter38b) accepts VP9 frames and pushes to V4L2 OUTPUT.
|
||||||
|
- DMA addresses look reasonable (IOMMU-virtual high addrs).
|
||||||
|
- HEVC + H.264 unchanged from sibling-campaign-close (bit-perfect).
|
||||||
|
|
||||||
|
## What fails
|
||||||
|
|
||||||
|
Every VP9 decode attempt times out at the hardware level. With `reg011.dec_timeout_e = 1` (HW timeout enabled):
|
||||||
|
|
||||||
|
```
|
||||||
|
rkvdec vdpu381 IRQ #0: STA_INT=0x00000023 DEC_RDY=0 ERROR=0 TIMEOUT=1 SOFTRST=0
|
||||||
|
```
|
||||||
|
|
||||||
|
0x23 = `IRQ_RAW (BIT 1) | TIMEOUT (BIT 5) | IRQ (BIT 0)`. HW engaged, ran its internal timer to threshold, fired IRQ with TIMEOUT bit.
|
||||||
|
|
||||||
|
With `dec_timeout_e = 0` (HW timeout disabled): **HW never fires any IRQ**. Kernel software watchdog reports `Frame processing timed out!` every 40ms but no hardware IRQ. Confirms HW is genuinely stuck, not just slow.
|
||||||
|
|
||||||
|
## Iterations attempted (all on `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1`)
|
||||||
|
|
||||||
|
| Iter | Change | Result |
|
||||||
|
|---|---|---|
|
||||||
|
| 1 | Baseline (Phase 2.1 complete: 4-segment writes, RCB, block-gating, deltas, prob aliasing) | TIMEOUT |
|
||||||
|
| 2 | +12 BSP fields (reg010.dec_e, reg011 err_fills, reg012.colmv_compress, reg013.cur_pic_is_idr, reg026.busifd_auto_gating, reg028.poc_arb, full reg103 flag set, lasttile = stream_len - first_part_size, stream_len padding to +0x80) | TIMEOUT (same bit pattern) |
|
||||||
|
| 3 | iter2 + `prob_update_en = 0` (rule out priv_tbl.probs format mismatch) | TIMEOUT (no change) |
|
||||||
|
| 3b | iter3 + `cprhdr_offset = 0` (BSP semantic) | TIMEOUT (no change) |
|
||||||
|
| 4 | HEVC-aligned (slice_num=1, pix_range_det, raw stream_len no padding, bytesperline-based strides) | TIMEOUT (no change) |
|
||||||
|
| 5 | iter4 + reg160/162/172 = 0 (NULL all prob bases — rule out prob buffer entirely) | TIMEOUT (no change) |
|
||||||
|
| 6 | iter5 + dec_timeout_e = 0 | **NO IRQ AT ALL** (genuine HW stall) |
|
||||||
|
|
||||||
|
## Diagnostic data captured
|
||||||
|
|
||||||
|
```
|
||||||
|
vp9 #0: stream_len=140 uncomp_hdr=18 comp_hdr=10 cprhdr_off=0 lasttile=130
|
||||||
|
rlc_base=0xfcf00000 decout=0xffe00000 first16=...
|
||||||
|
```
|
||||||
|
|
||||||
|
- 140-byte first frame is plausible (black fade-in keyframe per sibling-campaign discovery).
|
||||||
|
- 18-byte uncompressed + 10-byte compressed header agree with VP9 spec.
|
||||||
|
- DMA addrs are IOMMU-virtual (rkvdec0_mmu enabled).
|
||||||
|
- `vb2_plane_vaddr` returned NULL — backend uses dmabuf import, not MMAP. CPU-side byte-dump can't validate stream contents but HW reads via DMA addr regardless.
|
||||||
|
|
||||||
|
## Most-likely cause (architect review territory)
|
||||||
|
|
||||||
|
The hypothesis space narrowed to three structural-level possibilities:
|
||||||
|
|
||||||
|
1. **Probability buffer format mismatch**: legacy `struct rkvdec_vp9_probs` (vdpu34x layout) ≠ what vdpu381 HW expects. BSP MPP has `hal_vp9d_prob_default()` writing a different format. Iter5 (NULL bases) didn't help — but HW may still try to read at address 0 and fault silently.
|
||||||
|
|
||||||
|
2. **Missing kernel-side init (cache / SRAM / AXI-QoS)**: BSP `mpp_rkvdec2.c::rkvdec2_run` writes RKVDEC_REG_CACHE0/1/2_SIZE_BASE and clears caches before each decode; also writes statistic-block setup (reg256/257/270 — QoS). Mainline HEVC doesn't do these and works — but maybe VP9 specifically requires them.
|
||||||
|
|
||||||
|
3. **vdpu381 register layout doesn't fully expose VP9 control surface**: Casanova's v7.0 mainline series added vdpu381 register definitions for HEVC + H.264 only. The struct definitions we extended in `rkvdec-vdpu381-regs.h` are based on BSP `vdpu382_vp9d.h` — but maybe the BSP MPP also relies on mpp_rkvdec2 DRIVER-side translations (`mpp_dev_set_reg_offset(cfg->dev, 167, hw_ctx->offset_count)` for example) that mainline doesn't replicate.
|
||||||
|
|
||||||
|
## Recommended next path forward
|
||||||
|
|
||||||
|
**Sonnet/Janet architect review** before further iteration. The 6 iterations of register-tuning have established:
|
||||||
|
- No bit-flip-level fix is sufficient
|
||||||
|
- HW is genuinely stuck on something structural we're missing
|
||||||
|
|
||||||
|
The review should consider:
|
||||||
|
- Pull mainline RKVDEC2 series WIP (Collabora) for VP9 hints
|
||||||
|
- Compare BSP `mpp_dev_set_reg_offset` usage to mainline `rkvdec_memcpy_toio` — does BSP do per-register address translation we're missing?
|
||||||
|
- Whether mainline-port-target should be the RKVDEC2 driver path (newer) instead of extending Casanova's vdpu381
|
||||||
|
- Add SRAM/cache init alongside the regs writes
|
||||||
|
|
||||||
|
## Tonight's persistent state
|
||||||
|
|
||||||
|
- **Branch**: `boltzmann:~/src/linux-rockchip:vp9-enablement-iter1` — 6 commits, 1517 LoC
|
||||||
|
- `47431635801d` regs header VP9 struct definitions
|
||||||
|
- `da8482271938` shared codec-spec helpers in rkvdec-vp9-common.h
|
||||||
|
- `71cc0d96d212` new vdpu381-vp9 backend
|
||||||
|
- `63c4db0095ca` Phase 2.1 register-packing
|
||||||
|
- `feaae8743450` iter1-6 first-light diagnostic state (this iteration)
|
||||||
|
- **Ampere module**: `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` (iter6 = 6th first-light attempt). To revert to sibling-campaign close: `sudo cp ~/vp9-iter1-backup/rockchip-vdec.ko.sibling-campaign-close /lib/modules/$(uname -r)/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko && sudo depmod -a && sudo modprobe -r rockchip-vdec && sudo modprobe rockchip-vdec` (HEVC bit-perfect restored).
|
||||||
|
- **Diagnostic instrumentation in kernel**: vdpu381_irq_handler logs first 10 IRQs; vdpu381_config_vp9_regs logs first 3 frames. Removable in iter7.
|
||||||
|
- **Off-device backup**: `fresnel:~/backups/ampere/vp9-iter1/rockchip-vdec.ko.sibling-campaign-close` md5 5f5cfba42750e8d70902eb1381017d92.
|
||||||
|
- **NOT touched**: legacy `rkvdec-vp9.c` (RK3399 path untouched), kernel patches from sibling campaign (still in place).
|
||||||
|
|
||||||
|
## Campaign summary so far
|
||||||
|
|
||||||
|
- Phase 0: closed (substrate, architectural correction)
|
||||||
|
- Phase 1: closed (3 plan revisions, Janet PROCEED)
|
||||||
|
- Phase 2: closed (compile-clean backend, 1390 LoC)
|
||||||
|
- Phase 2.1: closed (full register-packing translation)
|
||||||
|
- Phase 3 (this): **FAILED first-light**, 6 register-tuning iterations all stall HW
|
||||||
|
|
||||||
|
Total session work: ~10 hours (00:50 sibling-campaign close → 08:15 Phase 3 first-light failure). Substantial code artifact + comprehensive failure data; structural rethink required next.
|
||||||
|
|
||||||
|
## Honest assessment
|
||||||
|
|
||||||
|
We've translated BSP's MPP register layout to mainline vdpu381 register structs and applied every documented BSP setup step. The HW still refuses to decode. The remaining failure modes are structural: either the prob buffer needs a totally-different format we haven't reverse-engineered, OR the kernel-side init has gaps mainline never had to fill for HEVC. Both require architect-level rethinking rather than more iteration.
|
||||||
Reference in New Issue
Block a user