# Phase 1 plan — VP9 wiring patch for vdpu381 (RK3588) Date: 2026-05-17 ~01:15. Phase 0 closed `8dce724` with the architectural correction that VP9 is a wiring patch, not a backend port. ## Phase 1 substrate resolutions (open questions from Phase 0) | Q | Question | Resolution | |---|---|---| | O1 | Translate-table fits 0x400 MMIO? | Yes. `rkvdec_v2_trans[RKVDEC_FMT_VP9D]` offsets 128..232 × 4 = 0x200..0x3A0 < 0x400. Confirmed in `mpp_rkvdec2.c` BSP. | | O2 | Codec-aware IRQ split needed? | **NO**. BSP `rkvdec2_irq` (used for all v2-family devices including RK3588) reads ONE register — `RKVDEC_REG_INT_EN = 0x380`. Mainline `vdpu381_irq_handler` reads the **same offset** under the name `VDPU381_REG_STA_INT = 0x380`. Same hardware register; same handler works for HEVC, H.264, and VP9. | | O3 | RK3588-specific clock/reset requirements for VP9 beyond HEVC? | None observed. BSP `rkvdec_rk3588_hw_ops` uses generic `rkvdec2_init`, `rkvdec2_clk_on/off`, `rkvdec2_sip_reset` — codec-agnostic. Mainline `vdpu381_variant` already exposes the full RK3588 clock set (`aclk_vcodec`, `hclk_vcodec`, `clk_core`, `clk_cabac`, `clk_hevc_cabac`). | | O4 | Legacy `rkvdec_vp9_start`/`stop` against RK3588 IOMMU? | Should work transparently — vb2_dma_contig handles IOMMU. Verify at first decode attempt. | | O5 | RCB / SRAM — needed for VP9 on RK3588? | **Yes** — BSP MPP `vp9d_vdpu34x.c` allocates `rcb_buf`/`g_buf[i].rcb_buf`. Mainline `rkvdec_start_streaming` already calls `rkvdec_allocate_rcb(ctx, variant->rcb_sizes, variant->num_rcb_sizes)` **per-variant**, so a VP9 fmt added to `vdpu381_coded_fmts[]` will automatically inherit `vdpu381_rcb_sizes[]`. RCB sizes are HEVC-flavoured but the buffer-purpose names (intra, transd, stream, inter, dblk, sao, fbc, filt-col) are codec-generic primitives; HEVC sizing should oversize-but-work for VP9 (4096-max width is smaller than HEVC 8K max). | | O6 | Validate via Fluster `VP9-TEST-VECTORS` post-Phase-3 | Deferred; first pass uses ffmpeg-vaapi + byte-compare. | | O7 | Watch for Collabora upstream VP9 series | Monitor lore + Collabora gitlab weekly. | | R3 | RK3588 VP9 max-resolution | 4096×N. BSP `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu34x.c:284` clamps at `width > 4096`. Mirror RK3399's `{4096, 2304, step 64}` frmsize in the new fmt entry. | ## The wiring patch Target file: `drivers/media/platform/rockchip/rkvdec/rkvdec.c`. ### Change 1: add `vdpu38x_vp9_ctrls` ctrl-set descriptor Insert near `vdpu38x_h264_ctrls` (~line 397): ```c static const struct rkvdec_ctrls vdpu38x_vp9_ctrls = { .ctrls = rkvdec_vp9_ctrl_descs, .num_ctrls = ARRAY_SIZE(rkvdec_vp9_ctrl_descs), }; ``` Reuses the existing `rkvdec_vp9_ctrl_descs[]` (line 419). V4L2 controls are HW-agnostic so no duplication needed; only a fresh `rkvdec_ctrls` wrapper. ### Change 2: add VP9 entry to `vdpu381_coded_fmts[]` Append after the H264 entry (~line 549): ```c { .fourcc = V4L2_PIX_FMT_VP9_FRAME, .frmsize = { .min_width = 64, .max_width = 4096, .step_width = 64, .min_height = 64, .max_height = 2304, .step_height = 64, }, .ctrls = &vdpu38x_vp9_ctrls, .ops = &rkvdec_vp9_fmt_ops, .num_decoded_fmts = ARRAY_SIZE(rkvdec_vp9_decoded_fmts), .decoded_fmts = rkvdec_vp9_decoded_fmts, }, ``` Reuses existing `rkvdec_vp9_fmt_ops` (legacy vdpu34x register layout, written for RK3399) and `rkvdec_vp9_decoded_fmts` (NV12_4L4 + NV12). No `subsystem_flags = VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF` because VP9 frames are self-contained (no slice fragmentation). ### Change 3 (conditional): instrumentation only if first decode silent-fails Add a one-shot DIAG `pr_warn` at the end of `rkvdec_vp9_run` if Phase 6 verification shows the IRQ never fires or fires with unexpected status. Otherwise the patch is **two structural additions and zero new logic**. ## Test plan | Step | Action | Pass criterion | |---|---|---| | 1 | Build `drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` on boltzmann (already has tree + toolchain) | Builds clean | | 2 | Backup current `7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` on ampere ([feedback_backup_before_module_replace](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_backup_before_module_replace.md)) | Backup md5'd + off-device copy on fresnel | | 3 | Install + `modprobe -r rockchip-vdec; modprobe rockchip-vdec` (no reboot needed) | `dmesg \| grep rkvdec` shows VP9 fmt enumerated. `v4l2-ctl --list-formats-out --device=/dev/video0` shows `VP9F` | | 4 | Decode test bitstream: `LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i -vf hwdownload,format=nv12 -frames:v 100 -f rawvideo -pix_fmt nv12 /tmp/hw-vp9.nv12` | rc=0, output size = 100 × W × H × 1.5 bytes | | 5 | SW reference: `ffmpeg -i -vf format=nv12,select='eq(n,100)' -frames:v 1 -f rawvideo -pix_fmt nv12 /tmp/sw-vp9-f100.nv12` | rc=0 | | 6 | Byte-compare frame 100 (per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md)) | ≥ 99% exact match | | 7 | Iteratively widen test vectors (different resolutions, frame-parallel decoding, super-frame, segmentation enabled) | Each VP9 feature within `vdpu381` capability passes | ## Risk register update | # | Risk | Mitigation | |---|---|---| | R1 | Same physical IP, two register dialects — hidden mode-switch register? | BSP `rkvdec_rk3588_hw_ops.init` = `rkvdec2_init` (generic). No mode switch. **Resolved**. | | R2 | `libva-v4l2-request-fourier` lacks rkvdec VP9 dispatch | Phase 7 task: add VP9 fmt to backend's `vendor_fourcc_to_driver_fourcc` switch. Sibling iter33 VP8 pattern. Independent of kernel patch. | | R3 | RK3588 VP9 max-resolution differs from RK3399's 4096×2304 | 4096-max confirmed; height limit unchecked. Open empirically; cap conservatively at 2304 first. | | R4 | `dirac` RK3399 cross-test fixture status | NOT regression-impacted — patch only ADDS to `vdpu381_coded_fmts[]`, doesn't touch `rk3399_coded_fmts[]` or `rkvdec-vp9.c`. No risk to RK3399 path. **Resolved**. | | R5 | Casanova posts upstream VP9 series mid-effort → fork divergence | Monitor; coordinate if posted. | | R6 (new) | RCB sizing oversizes for VP9 (HEVC sized for 8K, VP9 caps at 4K) — wastes SRAM but should still work | First attempt uses HEVC sizes. If sub-optimal, follow-up patch adds VP9-specific RCB descriptor table. Not a correctness issue. | ## Sequence Phase 1: this plan, then Sonnet Phase 5 architect review. Phase 2: implement Change 1 + 2 in `boltzmann:~/src/linux-rockchip` `linux-rk3588-marfrit` branch. Phase 3: build module on boltzmann via distcc. Phase 4: install on ampere + `modprobe -r/modprobe` cycle + sanity v4l2-ctl enumeration. Phase 5: SW review of code + Sonnet checkpoint. Phase 6: decode test + byte-compare verification. Phase 7: persistence — commit to `linux-rk3588-marfrit`, prepare upstream patch via b4, route through kernel-agent. Phase 8: monitor + extend test vectors; backend (libva) update tracked separately. ## Open questions remaining for Phase 1 review - Q-P1.1: Does mainline `vdpu381_irq_handler` correctly handle the case where the IRQ status doesn't set `VDPU381_STA_INT_DEC_RDY_STA` (BIT 2) on a successful VP9 decode? i.e. does VP9 set a *different* status bit than HEVC? Mitigation: add diagnostic `pr_warn` for first 5 IRQs. - Q-P1.2: Does `rkvdec_vp9_start` use `dma_alloc_*` that conflicts with rkvdec0's IOMMU mode? Likely no; vb2 handles this. - Q-P1.3: VP9 needs a `done` callback (`rkvdec_vp9_done` populates probability update for adaptive entropy). Verify the existing `done` runs in the correct workqueue context after IRQ — should "just work" since mainline has the dispatch.