# Phase 1 plan v2 — VP9 backend port to vdpu381 (post-Janet-review) Date: 2026-05-17 ~01:35. Phase 1 v1 was AMENDED by Janet architect review (verdict: AMEND, two BLOCKERs + one AMENDMENT). ## Why v2 — the Phase-0 wiring-patch thesis is dead Janet's review surfaced two source-code blockers that overturn the "VP9 is < 100 lines wiring" claim from Phase 0/Phase 1 v1: ### Blocker A: `vdpu381_variant.num_regs` is unset `rkvdec.c:1707-1715` defines `vdpu381_variant` with no `.num_regs` field — defaults to 0. The legacy VP9 path at `rkvdec-vp9.c:660-661` writes the register file via: ```c rkvdec_memcpy_toio(rkvdec->regs, regs, MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs)); ``` `MIN(sizeof(*regs), 0) = 0`. **Zero bytes hit MMIO.** If we naively added VP9 to `vdpu381_coded_fmts[]` pointing at `rkvdec_vp9_fmt_ops`, the hardware would never be configured. The watchdog would fire 2s later with no IRQ. ### Blocker B: vdpu381 register layout is SEGMENTED, legacy VP9 is FLAT Legacy `rkvdec_vp9.c:660` writes the entire register file at flat `rkvdec->regs + 0`. vdpu381 paths (`rkvdec-vdpu381-hevc.c`, `rkvdec-vdpu381-h264.c`) use **five distinct register-block offsets** within the MMIO: ```c #define OFFSET_COMMON_REGS (8 * sizeof(u32)) // 0x020 #define OFFSET_CODEC_PARAMS_REGS (64 * sizeof(u32)) // 0x100 #define OFFSET_COMMON_ADDR_REGS (128 * sizeof(u32)) // 0x200 #define OFFSET_CODEC_ADDR_REGS (160 * sizeof(u32)) // 0x280 #define OFFSET_POC_HIGHBIT_REGS (200 * sizeof(u32)) // 0x320 ``` Plus the decode trigger at `VDPU381_REG_DEC_E = 0x028` (NOT `RKVDEC_REG_INTERRUPT = 0x004` like legacy). A flat memcpy at offset 0 lands in the wrong fields — would overwrite control/status registers and never hit the codec-params block where the bitstream config goes. **Confirmed via BSP `mpp_rkvdec2.c:467 rkvdec2_run`**: writes are dispatched via `mpp_write_req` iterating over `task->w_reqs[i].offset / sizeof(u32)` — segmented per request, matching mainline's vdpu381 model. **Confirmed via BSP `mpp_rkvdec2.c:107 trans_tbl_vp9d[]`**: VP9's DMA-translated register indices are 128..199 (offset 0x200..0x320 = `OFFSET_COMMON_ADDR_REGS` through `OFFSET_POC_HIGHBIT_REGS`) — IDENTICAL to HEVC's range (also 128..199). VP9 uses the same segmented register layout as HEVC on this hardware. ### Amendment C: IRQ status bit semantics BSP `rkvdec2_irq` checks `RKVDEC_IRQ_RAW = BIT(1)`. Mainline `vdpu381_irq_handler` checks `VDPU381_STA_INT_DEC_RDY_STA = BIT(2)`. Same register (0x380), different bits. If VP9 sets BIT(1) instead of BIT(2) on success, every IRQ gets misclassified as ERROR and `rkvdec_iommu_restore` fires. Mitigation: instrument the IRQ handler with `pr_warn(status)` for the first 5 decodes pre-flash, compare to expected `BIT(2) | BIT(1)` mask. ## Revised architecture VP9 on RK3588's vdpu381 IP needs a **new backend file** `rkvdec-vdpu381-vp9.c`, modelled structurally on `rkvdec-vdpu381-hevc.c` (639 lines) but borrowing the SPEC LAYER from legacy `rkvdec-vp9.c`. Estimated 500-900 lines new code. ### File breakdown | File | Action | Lines | |---|---|---| | `rkvdec-vdpu381-vp9.c` | NEW: vdpu381 register-block packing + run/start/stop/done ops | ~600-800 | | `rkvdec-vp9-common.{c,h}` | NEW: extract from legacy `rkvdec-vp9.c` the codec-spec layer (probability tables, frame_ctx state machine, segmap mgmt, V4L2 vp9 helpers, `rkvdec_vp9_priv_tbl`, `rkvdec_vp9_start/stop` body) | ~600 (mostly moved from legacy) | | `rkvdec-vp9.c` | REFACTOR: replace flat register-dump body with thin shim that calls common helpers; keep the legacy `rkvdec_vp9_fmt_ops` ABI intact for RK3399 | ~400 (down from 1042) | | `rkvdec-vdpu381-regs.h` | EXTEND: add VP9 register-struct definitions (`struct rkvdec_vp9_common_regs`, `_codec_params_regs`, `_addr_regs`) | +200 | | `rkvdec.c` | wire: add `vdpu38x_vp9_ctrls`, `rkvdec_vdpu381_vp9_fmt_ops` extern, add to `vdpu381_coded_fmts[]`, set `vdpu381_variant.num_regs` if any legacy code still consults it | +30 | ### Reuse boundaries REUSED FROM LEGACY (in common file): - VP9 probability tables (`vp9_default_*_probs`) - `struct rkvdec_vp9_run` augmenting `rkvdec_run` - `rkvdec_vp9_priv_tbl` (probe + count buffers, segmap) - `init_v4l2_vp9_count_tbl`, `rkvdec_vp9_done` (entropy update from HW count buffer) - `rkvdec_vp9_adjust_fmt` (frame-size clamping) - `rkvdec_vp9_start` / `rkvdec_vp9_stop` (DMA alloc for probe/count/segmap) - V4L2 vp9 helpers from `` REWRITTEN FOR VDPU381 (in new backend file): - `rkvdec_vdpu381_vp9_run` — register packing into the 5 segmented blocks, trigger at `VDPU381_REG_DEC_E = 0x028` - Register struct definitions (bit-packed `swreg_*` fields per Rockchip BSP MPP `Vdpu382Vp9dCtx_t` pattern from `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c`) REUSED FROM EXISTING VDPU381 INFRASTRUCTURE: - `vdpu381_irq_handler` (single handler, codec-agnostic — confirmed via BSP `rkvdec2_irq`) - `vdpu381_rcb_sizes[]` (HEVC-tuned, oversized for 4K VP9 but harmless — sub-optimal SRAM use, not a correctness issue) - IOMMU (`rkvdec0_mmu`) — `dma_alloc_coherent` works transparently through the IOMMU - V4L2 control descriptors (`rkvdec_vp9_ctrl_descs[]` line 419) — HW-agnostic ## Register-struct sourcing The bit-packed `struct rkvdec_vp9_common_regs` etc. will be derived from Rockchip BSP MPP `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c` (vdpu382 = sibling IP of vdpu381). Pattern: each `swreg_NNN` becomes a u32 field; bit-packed sub-fields use C bitfield syntax exactly like `struct rkvdec_hevc_sps` in `rkvdec-vdpu381-hevc.c`. Concrete starting point: `vp9_hw_regs->common.reg028.swreg_vp9_rd_prob_idx` (BSP line 595) shows the register at common-block offset 0x028 has a `vp9_rd_prob_idx` bit-field. Inventory all such `.common.regNNN` accesses in `hal_vp9d_vdpu382.c` → reproduce as a single `struct rkvdec_vp9_common_regs` in `rkvdec-vdpu381-regs.h`. ## Test plan (unchanged from v1, plus a new pre-flight) Add a Pre-Phase-3 step: with the NEW backend compiled, instrument `vdpu381_irq_handler` to `pr_warn` the status register value for the first 10 IRQs after VP9 enable, to resolve Amendment C empirically (do VP9 IRQs set BIT(2)?). One module build cycle; takes ~5 min added to Phase 3. ## Test fixture - **Test bitstream**: bbb_30s_720p.vp9.webm — need to generate on fresnel via `ffmpeg -i bbb-1080p.mkv -c:v libvpx-vp9 -b:v 1500k -t 30 -vf scale=1280:720 -y bbb_30s_720p.vp9.webm`. Use 720p so it's well under the 4096 max-width. - **SW reference**: `ffmpeg -c:v vp9` on x86 fresnel — known good. - **Verification**: SHA the raw NV12 byte stream, not PNG ([feedback_pixel_compare_in_yuv_not_png](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_pixel_compare_in_yuv_not_png.md)). ## Updated phase sequence 1. **Phase 1 v2** (this doc) → Sonnet/Janet re-review for sign-off. 2. **Phase 2** — implement the 3 new files + edits. Boltzmann tree, `linux-rk3588-marfrit` branch, sibling commits to make review easier. 3. **Phase 3** — build module (~5 min via distcc), install on ampere with backup ([feedback_backup_before_module_replace](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_backup_before_module_replace.md)), `modprobe -r/modprobe`. **Pre-flight**: enable IRQ diagnostic `pr_warn` for first 10 statuses. 4. **Phase 4** — decode test bbb_30s_720p.vp9.webm at frame 100, byte-compare against SW reference. 5. **Phase 5** — Sonnet code review of the actual implementation diff. 6. **Phase 6** — verification iteration; widen test vectors (resolutions, segmentation, super-frames). 7. **Phase 7** — strip diagnostic `pr_warn`, finalise commits, prepare upstream patch series via `b4`, route via kernel-agent (issues #16+). 8. **Phase 8** — backend (`libva-v4l2-request-fourier`) VP9 dispatch path added (sibling, iter33 VP8 pattern). Fluster `VP9-TEST-VECTORS` validation. ## Risk register v2 | # | Risk | Status | |---|---|---| | R1 | Same physical IP, two register dialects — hidden mode-switch register? | RESOLVED: BSP `rkvdec_rk3588_hw_ops.init` = `rkvdec2_init` (codec-generic); no mode switch. | | R2 | libva backend lacks rkvdec VP9 dispatch | Phase 8, independent of kernel patch | | R3 | RK3588 VP9 max-resolution | 4096-max confirmed; cap entry at 4096×2304 conservatively | | R4 | dirac RK3399 regression | Mitigated by keeping legacy `rkvdec_vp9_fmt_ops` ABI intact via the common-file split; legacy path unchanged at the ops level. Refactor risk = the common-file extraction itself. Cross-test required before commit | | R5 | Casanova upstream VP9 series mid-effort | Monitor weekly | | R6 | RCB sizing oversizes for VP9 (HEVC sized for 8K) | Resolved as PROCEED-with-note (sub-optimal SRAM, not a correctness issue) | | R7 (new) | Bit-packed register struct may not match HW exactly — endianness, bit-ordering, packing | Use exact BSP layout from MPP `hal_vp9d_vdpu382.c`; instrument `pr_warn` writes for first decode to verify expected register values | | R8 (new) | `rkvdec_vp9.c` refactor extracts code that legacy path still needs as inlined — breaks RK3399 build | Compile-test rk3399 defconfig after the refactor; before VP9-on-vdpu381 work begins | ## Open questions for v2 review - Q-P1v2.1: Should we extract `rkvdec-vp9-common.c` upfront as a clean separate patch (easier review, but no immediate user benefit), or inline the reuse via `#include` directives inside `rkvdec-vdpu381-vp9.c` (less clean, but a single self-contained patch)? - Q-P1v2.2: Can the new file use `static inline` helpers in a shared header to avoid the common-file refactor entirely? - Q-P1v2.3: Is there value in cross-validating against the BSP MPP `Vdpu382Vp9dCtx_t` semi-public ABI before declaring our register struct definitive? Sending to Janet for the AMEND-cycle re-review.