Files
ampere-vp9-enablement/phase1_plan.md
T
claude-noether 5a1ba37791 Phase 1 plan + v2 amendment per Janet architect review
Phase 1 v1 claimed VP9 enablement was a < 100-line wiring patch
reusing legacy rkvdec_vp9_fmt_ops in vdpu381_coded_fmts[]. Janet
architect review (verdict AMEND) surfaced two BLOCKERs that overturn
the thesis:

A. vdpu381_variant.num_regs is unset (defaults 0); legacy VP9 path's
   rkvdec_memcpy_toio writes MIN(sizeof, 0)=0 bytes. Zero MMIO config.

B. vdpu381 register layout is segmented across 5 distinct OFFSET_*
   blocks (COMMON, CODEC_PARAMS, COMMON_ADDR, CODEC_ADDR, POC_HIGHBIT)
   plus decode trigger at VDPU381_REG_DEC_E=0x028. Legacy VP9 writes
   flat-from-base+0 and triggers via RKVDEC_REG_INTERRUPT=0x004 —
   wrong register-block layout AND wrong trigger offset.

Plus Amendment C: BSP rkvdec2_irq checks BIT(1); mainline
vdpu381_irq_handler checks BIT(2). Same register (0x380), different
bit. VP9-set BIT(1) status would misclassify as ERROR.

Both blockers confirmed empirically by reading mainline rkvdec.c and
BSP mpp_rkvdec2.c source (rkvdec2_run dispatches via mpp_write_req
on req->offset, NOT a flat dump; trans_tbl_vp9d shares HEVC's
128..199 register-DMA range).

v2 architecture: new rkvdec-vdpu381-vp9.c backend (~600-800 lines)
+ extracted rkvdec-vp9-common.{c,h} for the SPEC layer (probability
tables, frame_ctx, segmap) + VP9 register-struct extensions in
rkvdec-vdpu381-regs.h. Reuses vdpu381_irq_handler, V4L2 controls,
RCB infrastructure, IOMMU path — all confirmed as codec-agnostic.

Sourcing the bit-packed register struct definitions from Rockchip
BSP MPP hal_vp9d_vdpu382.c (vdpu382 = sibling IP family).

Next: Phase 1 v2 → Janet re-review for sign-off before Phase 2 code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:08:52 +00:00

7.6 KiB
Raw Blame History

Phase 1 plan — VP9 wiring patch for vdpu381 (RK3588)

Date: 2026-05-17 ~01:15. Phase 0 closed 8dce724 with the architectural correction that VP9 is a wiring patch, not a backend port.

Phase 1 substrate resolutions (open questions from Phase 0)

Q Question Resolution
O1 Translate-table fits 0x400 MMIO? Yes. rkvdec_v2_trans[RKVDEC_FMT_VP9D] offsets 128..232 × 4 = 0x200..0x3A0 < 0x400. Confirmed in mpp_rkvdec2.c BSP.
O2 Codec-aware IRQ split needed? NO. BSP rkvdec2_irq (used for all v2-family devices including RK3588) reads ONE register — RKVDEC_REG_INT_EN = 0x380. Mainline vdpu381_irq_handler reads the same offset under the name VDPU381_REG_STA_INT = 0x380. Same hardware register; same handler works for HEVC, H.264, and VP9.
O3 RK3588-specific clock/reset requirements for VP9 beyond HEVC? None observed. BSP rkvdec_rk3588_hw_ops uses generic rkvdec2_init, rkvdec2_clk_on/off, rkvdec2_sip_reset — codec-agnostic. Mainline vdpu381_variant already exposes the full RK3588 clock set (aclk_vcodec, hclk_vcodec, clk_core, clk_cabac, clk_hevc_cabac).
O4 Legacy rkvdec_vp9_start/stop against RK3588 IOMMU? Should work transparently — vb2_dma_contig handles IOMMU. Verify at first decode attempt.
O5 RCB / SRAM — needed for VP9 on RK3588? Yes — BSP MPP vp9d_vdpu34x.c allocates rcb_buf/g_buf[i].rcb_buf. Mainline rkvdec_start_streaming already calls rkvdec_allocate_rcb(ctx, variant->rcb_sizes, variant->num_rcb_sizes) per-variant, so a VP9 fmt added to vdpu381_coded_fmts[] will automatically inherit vdpu381_rcb_sizes[]. RCB sizes are HEVC-flavoured but the buffer-purpose names (intra, transd, stream, inter, dblk, sao, fbc, filt-col) are codec-generic primitives; HEVC sizing should oversize-but-work for VP9 (4096-max width is smaller than HEVC 8K max).
O6 Validate via Fluster VP9-TEST-VECTORS post-Phase-3 Deferred; first pass uses ffmpeg-vaapi + byte-compare.
O7 Watch for Collabora upstream VP9 series Monitor lore + Collabora gitlab weekly.
R3 RK3588 VP9 max-resolution 4096×N. BSP mpp/hal/rkdec/vp9d/hal_vp9d_vdpu34x.c:284 clamps at width > 4096. Mirror RK3399's {4096, 2304, step 64} frmsize in the new fmt entry.

The wiring patch

Target file: drivers/media/platform/rockchip/rkvdec/rkvdec.c.

Change 1: add vdpu38x_vp9_ctrls ctrl-set descriptor

Insert near vdpu38x_h264_ctrls (~line 397):

static const struct rkvdec_ctrls vdpu38x_vp9_ctrls = {
	.ctrls = rkvdec_vp9_ctrl_descs,
	.num_ctrls = ARRAY_SIZE(rkvdec_vp9_ctrl_descs),
};

Reuses the existing rkvdec_vp9_ctrl_descs[] (line 419). V4L2 controls are HW-agnostic so no duplication needed; only a fresh rkvdec_ctrls wrapper.

Change 2: add VP9 entry to vdpu381_coded_fmts[]

Append after the H264 entry (~line 549):

	{
		.fourcc = V4L2_PIX_FMT_VP9_FRAME,
		.frmsize = {
			.min_width = 64,
			.max_width = 4096,
			.step_width = 64,
			.min_height = 64,
			.max_height = 2304,
			.step_height = 64,
		},
		.ctrls = &vdpu38x_vp9_ctrls,
		.ops = &rkvdec_vp9_fmt_ops,
		.num_decoded_fmts = ARRAY_SIZE(rkvdec_vp9_decoded_fmts),
		.decoded_fmts = rkvdec_vp9_decoded_fmts,
	},

Reuses existing rkvdec_vp9_fmt_ops (legacy vdpu34x register layout, written for RK3399) and rkvdec_vp9_decoded_fmts (NV12_4L4 + NV12). No subsystem_flags = VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF because VP9 frames are self-contained (no slice fragmentation).

Change 3 (conditional): instrumentation only if first decode silent-fails

Add a one-shot DIAG pr_warn at the end of rkvdec_vp9_run if Phase 6 verification shows the IRQ never fires or fires with unexpected status. Otherwise the patch is two structural additions and zero new logic.

Test plan

Step Action Pass criterion
1 Build drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko on boltzmann (already has tree + toolchain) Builds clean
2 Backup current 7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko on ampere (feedback_backup_before_module_replace) Backup md5'd + off-device copy on fresnel
3 Install + modprobe -r rockchip-vdec; modprobe rockchip-vdec (no reboot needed) dmesg | grep rkvdec shows VP9 fmt enumerated. v4l2-ctl --list-formats-out --device=/dev/video0 shows VP9F
4 Decode test bitstream: LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i <vp9.webm> -vf hwdownload,format=nv12 -frames:v 100 -f rawvideo -pix_fmt nv12 /tmp/hw-vp9.nv12 rc=0, output size = 100 × W × H × 1.5 bytes
5 SW reference: ffmpeg -i <vp9.webm> -vf format=nv12,select='eq(n,100)' -frames:v 1 -f rawvideo -pix_fmt nv12 /tmp/sw-vp9-f100.nv12 rc=0
6 Byte-compare frame 100 (per feedback_compare_hw_against_sw_reference) ≥ 99% exact match
7 Iteratively widen test vectors (different resolutions, frame-parallel decoding, super-frame, segmentation enabled) Each VP9 feature within vdpu381 capability passes

Risk register update

# Risk Mitigation
R1 Same physical IP, two register dialects — hidden mode-switch register? BSP rkvdec_rk3588_hw_ops.init = rkvdec2_init (generic). No mode switch. Resolved.
R2 libva-v4l2-request-fourier lacks rkvdec VP9 dispatch Phase 7 task: add VP9 fmt to backend's vendor_fourcc_to_driver_fourcc switch. Sibling iter33 VP8 pattern. Independent of kernel patch.
R3 RK3588 VP9 max-resolution differs from RK3399's 4096×2304 4096-max confirmed; height limit unchecked. Open empirically; cap conservatively at 2304 first.
R4 dirac RK3399 cross-test fixture status NOT regression-impacted — patch only ADDS to vdpu381_coded_fmts[], doesn't touch rk3399_coded_fmts[] or rkvdec-vp9.c. No risk to RK3399 path. Resolved.
R5 Casanova posts upstream VP9 series mid-effort → fork divergence Monitor; coordinate if posted.
R6 (new) RCB sizing oversizes for VP9 (HEVC sized for 8K, VP9 caps at 4K) — wastes SRAM but should still work First attempt uses HEVC sizes. If sub-optimal, follow-up patch adds VP9-specific RCB descriptor table. Not a correctness issue.

Sequence

Phase 1: this plan, then Sonnet Phase 5 architect review. Phase 2: implement Change 1 + 2 in boltzmann:~/src/linux-rockchip linux-rk3588-marfrit branch. Phase 3: build module on boltzmann via distcc. Phase 4: install on ampere + modprobe -r/modprobe cycle + sanity v4l2-ctl enumeration. Phase 5: SW review of code + Sonnet checkpoint. Phase 6: decode test + byte-compare verification. Phase 7: persistence — commit to linux-rk3588-marfrit, prepare upstream patch via b4, route through kernel-agent. Phase 8: monitor + extend test vectors; backend (libva) update tracked separately.

Open questions remaining for Phase 1 review

  • Q-P1.1: Does mainline vdpu381_irq_handler correctly handle the case where the IRQ status doesn't set VDPU381_STA_INT_DEC_RDY_STA (BIT 2) on a successful VP9 decode? i.e. does VP9 set a different status bit than HEVC? Mitigation: add diagnostic pr_warn for first 5 IRQs.
  • Q-P1.2: Does rkvdec_vp9_start use dma_alloc_* that conflicts with rkvdec0's IOMMU mode? Likely no; vb2 handles this.
  • Q-P1.3: VP9 needs a done callback (rkvdec_vp9_done populates probability update for adaptive entropy). Verify the existing done runs in the correct workqueue context after IRQ — should "just work" since mainline has the dispatch.