Files
ampere-vp9-enablement/phase0_findings.md
T
claude-noether 8dce724b8c Phase 0: architectural correction — VP9 is a wiring patch, not a port
Rockchip BSP inspection (mpp_rkvdec2.c, soc.cmake, vp9d/CMakeLists.txt)
overturns the initial premise. The same physical rkvdec IP on RK3588
accepts two register-protocol dialects within its 0x400 MMIO window:

- vdpu381 dialect (Casanova mainline naming) — H.264 + HEVC
- vdpu34x dialect (Rockchip legacy naming)    — VP9 + AVS2

BSP rkvdec_rk3588_data uses rkvdec_v2_hw_info + rkvdec_v2_trans, the
same dispatch tables as RK356X. MPP userspace builds the vdpu34x VP9
backend for RK3588 because no vdpu381 VP9 backend exists; it isn't
needed — the existing vdpu34x register layout drives this hardware.

Implication: mainline rkvdec_vp9_fmt_ops (vdpu34x layout, written for
RK3399) can drive RK3588 rkvdec hardware as-is. VP9 enablement is a
< 100-line wiring patch (third entry in vdpu381_coded_fmts[] + maybe a
codec-aware IRQ split), not a 1000+ line backend port.

Open questions revised; risk register tightened.

Phase 1 starts by reading BSP rkvdec_rk3588_hw_ops IRQ + power-on
routines to resolve O2/O3 (codec-aware dispatch needed? mode-switch
register?) and BSP vp9d_vdpu34x.c for max-resolution + RCB usage.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:57:36 +00:00

123 lines
12 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 findings — ampere VP9 enablement substrate
Date: 2026-05-17, opening session of the campaign (sibling: ampere-kernel-decoders closed at HEVC bit-perfect ~30 min earlier).
## Goal
Bring VP9 hardware decode up on RK3588 ampere via rkvdec (vdpu381 register layout), upstream-aligned, suitable for a clean linux-media RFC. End-state criterion: bit-perfect against `ffmpeg -c:v vp9` SW reference per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md).
## Upstream status (search round 1)
| Source | Result |
|---|---|
| Collabora blog 2026-05 ([Panthor → RK3588](https://www.collabora.com/news-and-blog/news-and-events/from-panthor-to-rk3588-advancing-graphics-video-soc-support-linux-kernel-7.html)) | "Going forward, Collabora will work on ... VP9 code support on RK3588" — roadmap item, no series posted |
| [Collabora RK3588/RK3576 decoders merged](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html) | Linux 7.0 landed H.264 + HEVC for vdpu381/vdpu383 only |
| WebSearch `"rk3588" OR "vdpu381" rkvdec vp9 patch site:lore.kernel.org` | AV1 series + other unrelated; no VP9 vdpu381 series |
| WebSearch `rkvdec2 vp9 rk3588 collabora linux-media kernel patch 2026` | Same conclusion; RKVDEC2 driver supports H.264 only at posting time |
| `lore.kernel.org/linux-media` WebFetch | Anubis access-denied (anti-bot block) |
| `lore.kernel.org/linaro-mm-sig` WebFetch | Anubis access-denied |
| `git remote -v` on boltzmann:~/src/linux-rockchip → collabora remote | `collabora/add-rkvdec2-driver*` branches exist (vdpu383-hevc variant); **no `*-vp9*` branch** |
Conclusion: VP9 on RK3588 vdpu381 is **not yet in flight upstream**. We are first to implement.
## Existing code substrate (boltzmann:~/src/linux-rockchip @ `linux-rk3588-marfrit`)
### Legacy reference (RK3399 / vdpu341)
- `drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c` — 1042 lines (Brezillon 2019 + Pietrasiewicz 2021 + Alpha Lin 2016). Uses `writel`-style register access via `rkvdec-regs.h`.
- `rkvdec.c:419` defines `rkvdec_vp9_ctrl_descs[]` (V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR — small ctrl set)
- `rkvdec.c:478..492` registers VP9 in `rk3399_coded_fmts[]` (4096×2304 max, 64×64 alignment step)
- Ops: `rkvdec_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }` (no `try_ctrl`)
### vdpu381 reference (RK3588) — pattern to follow
- `rkvdec-vdpu381-hevc.c` — 639 lines, 2025 Casanova. **Struct-based register layout** (`rkvdec-vdpu381-regs.h`), shared preamble via `rkvdec-hevc-common.c/h`.
- `rkvdec-vdpu381-h264.c` — same pattern, h264-common shared file.
- `rkvdec.c:513..549` defines `vdpu381_coded_fmts[]` with HEVC + H.264 only — **VP9 entry must be added here**.
- `rkvdec.c:1701` `vdpu381_variant_ops` exposes single-IRQ handler + coded_fmts table — no per-codec dispatch needed.
### Common helpers already in place
- `rkvdec-cabac.c/h` — CABAC tables, codec-agnostic
- `rkvdec-rcb.c/h` — Row Cache Buffer / SRAM management (vdpu38x has internal SRAM for line caches)
- `rkvdec-h264-common.c/h`, `rkvdec-hevc-common.c/h` — codec spec parsing, RPS prep, control-batch helpers
VP9 has no `rkvdec-vp9-common.*` yet. Today the legacy `rkvdec-vp9.c` holds both the spec/probability logic AND the vdpu341 register code in one file.
## Work plan outline (to be refined in Phase 1)
| Step | Output | Notes |
|---|---|---|
| 1 | `rkvdec-vp9-common.{c,h}` — extracted from legacy `rkvdec-vp9.c` | Probability tables, frame_ctx state, segmap mgmt, libv4l2 vp9 helpers (`v4l2-vp9.h`). Stays codec-spec-only, no register access. Legacy `rkvdec-vp9.c` then includes/links to it. |
| 2 | `rkvdec-vdpu381-vp9.c` — new backend | `rkvdec_vdpu381_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }`. Re-implements register packing against `struct vp9_regs` in vdpu381 layout. |
| 3 | `rkvdec-vdpu381-regs.h` additions | VP9 register struct definitions (need Rockchip TRM or BSP reference — see open-question O1) |
| 4 | `vdpu38x_vp9_ctrl_descs[]` in `rkvdec.c` | Likely identical to legacy `rkvdec_vp9_ctrl_descs[]` (V4L2 controls are HW-agnostic) — just renamed and possibly with vdpu38x-specific dims. |
| 5 | `vdpu381_coded_fmts[]` third entry | V4L2_PIX_FMT_VP9_FRAME pointing to the new ops + ctrls. Sizes likely 65472×65472 to match HEVC entry. |
| 6 | Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference | Per `feedback_compare_hw_against_sw_reference.md`. |
| 7 | Series-prep: split into individual reviewable patches | Eventually for linux-media submission via `b4`. |
Step 1 is the biggest single chunk (refactor + maintain bit-perfect behaviour on legacy path); steps 2-3 are where the unknown register layout dominates time.
## Architectural correction (mid-Phase-0)
**Original premise overturned by Rockchip BSP inspection.** The work-plan-outline above assumed vdpu381 (mainline RK3588) is a *different IP* from vdpu341 (RK3399), requiring a new register backend. The BSP investigation says no:
| Source | Finding |
|---|---|
| BSP DTS `rk3588s.dtsi` (lines 5059, 5113) | `rkvdec0@fdc38000`, `rkvdec1@fdc48000` carry `compatible = "rockchip,rkv-decoder-v2"` and a 0x400-byte MMIO window. **No separate vdpu34x physical IP exists on RK3588.** |
| BSP driver `drivers/video/rockchip/mpp/mpp_rkvdec2.c:1659` | `rkvdec_rk3588_data` ties RK3588 to `rkvdec_v2_hw_info` and `rkvdec_v2_trans` (the *same* dispatch tables used for RK356X). RK3576 alone routes to `rkvdec_vdpu383_*`. **RK3588 stays on the v2/vdpu34x family in BSP naming.** |
| BSP MPP userspace `mpp/soc.cmake` | `add_soc_config("RK3588" "VDPU381,VDPU34X,...")` — RK3588 *enables both* register-protocol backends. Same physical rkvdec IP accepts two register-layout dialects within its MMIO window: vdpu381 dialect (Casanova mainline naming, used for H.264/HEVC) and vdpu34x dialect (Rockchip legacy naming, used for VP9 + AVS2). |
| BSP MPP `mpp/hal/rkdec/vp9d/CMakeLists.txt` | VP9 backends gated only on `HAVE_VDPU34X / VDPU382 / VDPU383 / VDPU384B`. **No `HAVE_VDPU381` VP9 backend exists** — vdpu381-class hardware uses the vdpu34x VP9 backend. |
**Implication**: RK3588 VP9 enablement does NOT require porting `rkvdec-vp9.c` to a new register layout. The existing mainline `rkvdec_vp9_fmt_ops` (vdpu34x layout, written for RK3399) can drive RK3588's rkvdec hardware *as-is*. The missing work is a **wiring patch**, not a backend port.
## Revised work plan
| Step | Output | Notes |
|---|---|---|
| 1 | Add VP9 entry to `vdpu381_coded_fmts[]` in `rkvdec.c:513`, pointing to existing `rkvdec_vp9_fmt_ops` and reusing `rkvdec_vp9_ctrls` | Frame-size limits from RK3399's entry (4096×2304, step 64) — RK3588 VP9 hard limits may differ; cross-check `vp9d_vdpu34x.c` for max-resolution constants |
| 2 | Wire vdpu381 IRQ handler to recognise VP9 codec context | Legacy IRQ (`rkvdec_irq`) and vdpu381 IRQ (`vdpu381_irq_handler`) read different status registers — VP9 path may need legacy IRQ semantics. Verify against BSP `mpp_rkvdec2.c` IRQ + `trans_tbl_vp9d` register layout |
| 3 | Verify RK3588 rkvdec clock topology matches what legacy VP9 path needs | RK3588 DT has `clk_core`, `clk_cabac`, `clk_hevc_cabac` — legacy VP9 path uses `clk_core`/`clk_cabac` (subset) |
| 4 | Verify legacy `rkvdec-regs.h` register offsets are valid on RK3588 rkvdec MMIO (0xfdc38100 + 0x400) | Same physical IP, same register window. Smoke-test with devmem2 against ampere |
| 5 | Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference | Per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md). |
| 6 | Single RFC patch to linux-media | If wiring is the only delta — easy upstream sell |
**Expected size**: < 100 lines if IRQ-handler doesn't need codec-aware split, ~200 lines if VP9 requires per-codec IRQ dispatch. A small fraction of the original "port 1042 lines" estimate.
## Open questions (revised)
| # | Question | Resolution path |
|---|---|---|
| O1 | Does `rkvdec_v2_trans[RKVDEC_FMT_VP9D]` register-translate table (offsets 128..232 × 4 = 0x200..0x3A0) fit inside the rkvdec MMIO window (0x400 bytes)? | Yes by inspection (`mpp_rkvdec2.c` BSP). Confirmed. |
| O2 | Does the vdpu381 IRQ handler need codec-aware dispatch, or can it handle VP9 termination identically to HEVC/H.264? | Read `rkvdec_rk3588_hw_ops` IRQ in `mpp_rkvdec2.c` + compare to legacy `rkvdec_irq` in mainline. If different status-bit semantics, need `if (ctx->coded_fmt == V4L2_PIX_FMT_VP9_FRAME)` split |
| O3 | RK3588-specific clock/reset requirements for VP9 beyond HEVC? | Compare BSP `mpp_rkvdec2.c` IRQ + power-on routines for `FMT_VP9D` vs `FMT_H265D` |
| O4 | Does legacy `rkvdec_vp9_start` / `rkvdec_vp9_stop` (probe + segmap buffer alloc) work against RK3588's IOMMU configuration? | Most likely yes (vb2_dma_contig handles IOMMU transparently). Verify at first decode attempt |
| O5 | RCB / SRAM — legacy VP9 path doesn't use RCB; vdpu381 HEVC does. Is RCB needed for VP9 on RK3588? | Compare `vp9d_vdpu34x.c` MPP backend's RCB usage to `vp9d_vdpu382.c`. If vdpu34x backend works without RCB on RK3588 in BSP, mainline doesn't need it either for VP9 |
| O6 | Validate via Fluster `VP9-TEST-VECTORS` post-Phase-3 | Set up `GStreamer-VP9-V4L2SL-Gst1.0` test rig |
| O7 | If a Collabora `linux-rkvdec-vp9-on-rk3588` series appears, pivot to coordination | Monitor lore + Collabora gitlab weekly |
## Risk register (revised)
| # | Risk | Mitigation |
|---|---|---|
| R1 | Same physical IP accepting two register dialects is unusual — there may be a hidden mode-switch register that must be set before VP9 work | Inspect BSP `rkvdec_rk3588_hw_ops` for any pre-decode setup distinguishing VP9 from HEVC |
| R2 | Backend (`libva-v4l2-request-fourier`) doesn't yet have rkvdec VP9 dispatch path; only hantro VP8 exists | Mirror the iter33 VP8 pattern: profile-gated codec dispatch in `RequestCreateConfig`. Sibling: `feedback_unconditional_codec_state.md` (must per-codec gate) |
| R3 | RK3588 VP9 max-resolution may differ from RK3399's 4096×2304 | Read MPP `vp9d_vdpu34x.c` max-resolution constants for confirmation |
| R4 | `dirac` (RK3399) cross-test fixture status unknown — needed if we modify legacy `rkvdec-vp9.c` for any reason | If modification needed, verify dirac is reachable before commit. Mostly we should NOT need to touch legacy file |
| R5 | Casanova posts upstream VP9 series mid-effort → fork divergence | Monitor weekly; coordinate if posted |
## Substrate locked
Phase 0 closes here. Phase 1 (architectural plan + Sonnet review) starts next session:
- Read BSP `mpp_rkvdec2.c` rkvdec_rk3588_hw_ops and IRQ routine to resolve O2/O3 (mode-switch register? VP9-specific IRQ status?)
- Read BSP MPP `vp9d_vdpu34x.c` to confirm O5 (RCB usage) and R3 (max-resolution)
- Draft the wiring patch outline (3rd entry in `vdpu381_coded_fmts[]` + any IRQ split)
- Decide ampere-side test fixture (which VP9 bitstreams; bbb-vp9 + a streaming-capable test vector)
## Persistence
- Repo: `/home/mfritsche/src/ampere-vp9-enablement/` on fresnel
- Gitea remote: TBD (file as `claude-noether/ampere-vp9-enablement` per [feedback_gitea_as_claude_noether](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_gitea_as_claude_noether.md))
- Kernel work: `boltzmann:~/src/linux-rockchip` branch `linux-rk3588-marfrit` (same tree as ampere-kernel-decoders campaign — separate iteration branches under `vp9-*` namespace recommended)
- ampere current state: vanilla `7.0.0-rc3-devices+` kernel + iter3/iter4-fixed modules from sibling campaign; bit-perfect HEVC verified; backend `v4l2_request_drv_video.so` is iter38b. VP9 has not been exercised on this system since the kernel-agent rollout.