Files
ampere-vp9-enablement/phase0_findings.md
T
claude-noether 8dce724b8c Phase 0: architectural correction — VP9 is a wiring patch, not a port
Rockchip BSP inspection (mpp_rkvdec2.c, soc.cmake, vp9d/CMakeLists.txt)
overturns the initial premise. The same physical rkvdec IP on RK3588
accepts two register-protocol dialects within its 0x400 MMIO window:

- vdpu381 dialect (Casanova mainline naming) — H.264 + HEVC
- vdpu34x dialect (Rockchip legacy naming)    — VP9 + AVS2

BSP rkvdec_rk3588_data uses rkvdec_v2_hw_info + rkvdec_v2_trans, the
same dispatch tables as RK356X. MPP userspace builds the vdpu34x VP9
backend for RK3588 because no vdpu381 VP9 backend exists; it isn't
needed — the existing vdpu34x register layout drives this hardware.

Implication: mainline rkvdec_vp9_fmt_ops (vdpu34x layout, written for
RK3399) can drive RK3588 rkvdec hardware as-is. VP9 enablement is a
< 100-line wiring patch (third entry in vdpu381_coded_fmts[] + maybe a
codec-aware IRQ split), not a 1000+ line backend port.

Open questions revised; risk register tightened.

Phase 1 starts by reading BSP rkvdec_rk3588_hw_ops IRQ + power-on
routines to resolve O2/O3 (codec-aware dispatch needed? mode-switch
register?) and BSP vp9d_vdpu34x.c for max-resolution + RCB usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:57:36 +00:00

12 KiB
Raw Blame History

Phase 0 findings — ampere VP9 enablement substrate

Date: 2026-05-17, opening session of the campaign (sibling: ampere-kernel-decoders closed at HEVC bit-perfect ~30 min earlier).

Goal

Bring VP9 hardware decode up on RK3588 ampere via rkvdec (vdpu381 register layout), upstream-aligned, suitable for a clean linux-media RFC. End-state criterion: bit-perfect against ffmpeg -c:v vp9 SW reference per feedback_compare_hw_against_sw_reference.

Upstream status (search round 1)

Source Result
Collabora blog 2026-05 (Panthor → RK3588) "Going forward, Collabora will work on ... VP9 code support on RK3588" — roadmap item, no series posted
Collabora RK3588/RK3576 decoders merged Linux 7.0 landed H.264 + HEVC for vdpu381/vdpu383 only
WebSearch "rk3588" OR "vdpu381" rkvdec vp9 patch site:lore.kernel.org AV1 series + other unrelated; no VP9 vdpu381 series
WebSearch rkvdec2 vp9 rk3588 collabora linux-media kernel patch 2026 Same conclusion; RKVDEC2 driver supports H.264 only at posting time
lore.kernel.org/linux-media WebFetch Anubis access-denied (anti-bot block)
lore.kernel.org/linaro-mm-sig WebFetch Anubis access-denied
git remote -v on boltzmann:~/src/linux-rockchip → collabora remote collabora/add-rkvdec2-driver* branches exist (vdpu383-hevc variant); no *-vp9* branch

Conclusion: VP9 on RK3588 vdpu381 is not yet in flight upstream. We are first to implement.

Existing code substrate (boltzmann:~/src/linux-rockchip @ linux-rk3588-marfrit)

Legacy reference (RK3399 / vdpu341)

  • drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c — 1042 lines (Brezillon 2019 + Pietrasiewicz 2021 + Alpha Lin 2016). Uses writel-style register access via rkvdec-regs.h.
  • rkvdec.c:419 defines rkvdec_vp9_ctrl_descs[] (V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR — small ctrl set)
  • rkvdec.c:478..492 registers VP9 in rk3399_coded_fmts[] (4096×2304 max, 64×64 alignment step)
  • Ops: rkvdec_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run } (no try_ctrl)

vdpu381 reference (RK3588) — pattern to follow

  • rkvdec-vdpu381-hevc.c — 639 lines, 2025 Casanova. Struct-based register layout (rkvdec-vdpu381-regs.h), shared preamble via rkvdec-hevc-common.c/h.
  • rkvdec-vdpu381-h264.c — same pattern, h264-common shared file.
  • rkvdec.c:513..549 defines vdpu381_coded_fmts[] with HEVC + H.264 only — VP9 entry must be added here.
  • rkvdec.c:1701 vdpu381_variant_ops exposes single-IRQ handler + coded_fmts table — no per-codec dispatch needed.

Common helpers already in place

  • rkvdec-cabac.c/h — CABAC tables, codec-agnostic
  • rkvdec-rcb.c/h — Row Cache Buffer / SRAM management (vdpu38x has internal SRAM for line caches)
  • rkvdec-h264-common.c/h, rkvdec-hevc-common.c/h — codec spec parsing, RPS prep, control-batch helpers

VP9 has no rkvdec-vp9-common.* yet. Today the legacy rkvdec-vp9.c holds both the spec/probability logic AND the vdpu341 register code in one file.

Work plan outline (to be refined in Phase 1)

Step Output Notes
1 rkvdec-vp9-common.{c,h} — extracted from legacy rkvdec-vp9.c Probability tables, frame_ctx state, segmap mgmt, libv4l2 vp9 helpers (v4l2-vp9.h). Stays codec-spec-only, no register access. Legacy rkvdec-vp9.c then includes/links to it.
2 rkvdec-vdpu381-vp9.c — new backend rkvdec_vdpu381_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }. Re-implements register packing against struct vp9_regs in vdpu381 layout.
3 rkvdec-vdpu381-regs.h additions VP9 register struct definitions (need Rockchip TRM or BSP reference — see open-question O1)
4 vdpu38x_vp9_ctrl_descs[] in rkvdec.c Likely identical to legacy rkvdec_vp9_ctrl_descs[] (V4L2 controls are HW-agnostic) — just renamed and possibly with vdpu38x-specific dims.
5 vdpu381_coded_fmts[] third entry V4L2_PIX_FMT_VP9_FRAME pointing to the new ops + ctrls. Sizes likely 65472×65472 to match HEVC entry.
6 Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference Per feedback_compare_hw_against_sw_reference.md.
7 Series-prep: split into individual reviewable patches Eventually for linux-media submission via b4.

Step 1 is the biggest single chunk (refactor + maintain bit-perfect behaviour on legacy path); steps 2-3 are where the unknown register layout dominates time.

Architectural correction (mid-Phase-0)

Original premise overturned by Rockchip BSP inspection. The work-plan-outline above assumed vdpu381 (mainline RK3588) is a different IP from vdpu341 (RK3399), requiring a new register backend. The BSP investigation says no:

Source Finding
BSP DTS rk3588s.dtsi (lines 5059, 5113) rkvdec0@fdc38000, rkvdec1@fdc48000 carry compatible = "rockchip,rkv-decoder-v2" and a 0x400-byte MMIO window. No separate vdpu34x physical IP exists on RK3588.
BSP driver drivers/video/rockchip/mpp/mpp_rkvdec2.c:1659 rkvdec_rk3588_data ties RK3588 to rkvdec_v2_hw_info and rkvdec_v2_trans (the same dispatch tables used for RK356X). RK3576 alone routes to rkvdec_vdpu383_*. RK3588 stays on the v2/vdpu34x family in BSP naming.
BSP MPP userspace mpp/soc.cmake add_soc_config("RK3588" "VDPU381,VDPU34X,...") — RK3588 enables both register-protocol backends. Same physical rkvdec IP accepts two register-layout dialects within its MMIO window: vdpu381 dialect (Casanova mainline naming, used for H.264/HEVC) and vdpu34x dialect (Rockchip legacy naming, used for VP9 + AVS2).
BSP MPP mpp/hal/rkdec/vp9d/CMakeLists.txt VP9 backends gated only on HAVE_VDPU34X / VDPU382 / VDPU383 / VDPU384B. No HAVE_VDPU381 VP9 backend exists — vdpu381-class hardware uses the vdpu34x VP9 backend.

Implication: RK3588 VP9 enablement does NOT require porting rkvdec-vp9.c to a new register layout. The existing mainline rkvdec_vp9_fmt_ops (vdpu34x layout, written for RK3399) can drive RK3588's rkvdec hardware as-is. The missing work is a wiring patch, not a backend port.

Revised work plan

Step Output Notes
1 Add VP9 entry to vdpu381_coded_fmts[] in rkvdec.c:513, pointing to existing rkvdec_vp9_fmt_ops and reusing rkvdec_vp9_ctrls Frame-size limits from RK3399's entry (4096×2304, step 64) — RK3588 VP9 hard limits may differ; cross-check vp9d_vdpu34x.c for max-resolution constants
2 Wire vdpu381 IRQ handler to recognise VP9 codec context Legacy IRQ (rkvdec_irq) and vdpu381 IRQ (vdpu381_irq_handler) read different status registers — VP9 path may need legacy IRQ semantics. Verify against BSP mpp_rkvdec2.c IRQ + trans_tbl_vp9d register layout
3 Verify RK3588 rkvdec clock topology matches what legacy VP9 path needs RK3588 DT has clk_core, clk_cabac, clk_hevc_cabac — legacy VP9 path uses clk_core/clk_cabac (subset)
4 Verify legacy rkvdec-regs.h register offsets are valid on RK3588 rkvdec MMIO (0xfdc38100 + 0x400) Same physical IP, same register window. Smoke-test with devmem2 against ampere
5 Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference Per feedback_compare_hw_against_sw_reference.
6 Single RFC patch to linux-media If wiring is the only delta — easy upstream sell

Expected size: < 100 lines if IRQ-handler doesn't need codec-aware split, ~200 lines if VP9 requires per-codec IRQ dispatch. A small fraction of the original "port 1042 lines" estimate.

Open questions (revised)

# Question Resolution path
O1 Does rkvdec_v2_trans[RKVDEC_FMT_VP9D] register-translate table (offsets 128..232 × 4 = 0x200..0x3A0) fit inside the rkvdec MMIO window (0x400 bytes)? Yes by inspection (mpp_rkvdec2.c BSP). Confirmed.
O2 Does the vdpu381 IRQ handler need codec-aware dispatch, or can it handle VP9 termination identically to HEVC/H.264? Read rkvdec_rk3588_hw_ops IRQ in mpp_rkvdec2.c + compare to legacy rkvdec_irq in mainline. If different status-bit semantics, need if (ctx->coded_fmt == V4L2_PIX_FMT_VP9_FRAME) split
O3 RK3588-specific clock/reset requirements for VP9 beyond HEVC? Compare BSP mpp_rkvdec2.c IRQ + power-on routines for FMT_VP9D vs FMT_H265D
O4 Does legacy rkvdec_vp9_start / rkvdec_vp9_stop (probe + segmap buffer alloc) work against RK3588's IOMMU configuration? Most likely yes (vb2_dma_contig handles IOMMU transparently). Verify at first decode attempt
O5 RCB / SRAM — legacy VP9 path doesn't use RCB; vdpu381 HEVC does. Is RCB needed for VP9 on RK3588? Compare vp9d_vdpu34x.c MPP backend's RCB usage to vp9d_vdpu382.c. If vdpu34x backend works without RCB on RK3588 in BSP, mainline doesn't need it either for VP9
O6 Validate via Fluster VP9-TEST-VECTORS post-Phase-3 Set up GStreamer-VP9-V4L2SL-Gst1.0 test rig
O7 If a Collabora linux-rkvdec-vp9-on-rk3588 series appears, pivot to coordination Monitor lore + Collabora gitlab weekly

Risk register (revised)

# Risk Mitigation
R1 Same physical IP accepting two register dialects is unusual — there may be a hidden mode-switch register that must be set before VP9 work Inspect BSP rkvdec_rk3588_hw_ops for any pre-decode setup distinguishing VP9 from HEVC
R2 Backend (libva-v4l2-request-fourier) doesn't yet have rkvdec VP9 dispatch path; only hantro VP8 exists Mirror the iter33 VP8 pattern: profile-gated codec dispatch in RequestCreateConfig. Sibling: feedback_unconditional_codec_state.md (must per-codec gate)
R3 RK3588 VP9 max-resolution may differ from RK3399's 4096×2304 Read MPP vp9d_vdpu34x.c max-resolution constants for confirmation
R4 dirac (RK3399) cross-test fixture status unknown — needed if we modify legacy rkvdec-vp9.c for any reason If modification needed, verify dirac is reachable before commit. Mostly we should NOT need to touch legacy file
R5 Casanova posts upstream VP9 series mid-effort → fork divergence Monitor weekly; coordinate if posted

Substrate locked

Phase 0 closes here. Phase 1 (architectural plan + Sonnet review) starts next session:

  • Read BSP mpp_rkvdec2.c rkvdec_rk3588_hw_ops and IRQ routine to resolve O2/O3 (mode-switch register? VP9-specific IRQ status?)
  • Read BSP MPP vp9d_vdpu34x.c to confirm O5 (RCB usage) and R3 (max-resolution)
  • Draft the wiring patch outline (3rd entry in vdpu381_coded_fmts[] + any IRQ split)
  • Decide ampere-side test fixture (which VP9 bitstreams; bbb-vp9 + a streaming-capable test vector)

Persistence

  • Repo: /home/mfritsche/src/ampere-vp9-enablement/ on fresnel
  • Gitea remote: TBD (file as claude-noether/ampere-vp9-enablement per feedback_gitea_as_claude_noether)
  • Kernel work: boltzmann:~/src/linux-rockchip branch linux-rk3588-marfrit (same tree as ampere-kernel-decoders campaign — separate iteration branches under vp9-* namespace recommended)
  • ampere current state: vanilla 7.0.0-rc3-devices+ kernel + iter3/iter4-fixed modules from sibling campaign; bit-perfect HEVC verified; backend v4l2_request_drv_video.so is iter38b. VP9 has not been exercised on this system since the kernel-agent rollout.