Phase 0: open campaign + substrate findings

Open ampere-vp9-enablement to enable VP9 hardware decode on RK3588 ampere
(rkvdec / vdpu381 register layout). Sibling to ampere-kernel-decoders
(closed at HEVC bit-perfect 2026-05-17 ~00:42).

Phase 0 substrate locked: upstream status (Collabora roadmap, no series
posted), legacy code reference (rkvdec-vp9.c 1042 lines, vdpu341),
vdpu381 pattern reference (rkvdec-vdpu381-hevc.c, struct-based regs +
common-file split), work-plan outline, open questions (chiefly: where
is the vdpu381 VP9 register layout documented), risk register.

Phase 1 (architectural plan + Sonnet review) next session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 22:48:26 +00:00
commit eb60ecd224
2 changed files with 124 additions and 0 deletions
+28
View File
@@ -0,0 +1,28 @@
# ampere-vp9-enablement
Stand-alone port + upstream-targeting work to enable VP9 hardware decode on Rockchip RK3588's rkvdec (vdpu381 register layout).
## Status (2026-05-17 ~01:00)
Upstream RK3588 mainline rkvdec ([Casanova v7.0 series](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html), landed in Linux 7.0) supports **H.264 + HEVC only**. VP9 is on Collabora's stated roadmap but no WIP series has been posted to linux-media as of this campaign open. The legacy `rkvdec-vp9.c` (RK3399 / vdpu341 hardware) is feature-complete at 1042 lines but its register-config logic does not translate directly to vdpu381.
This campaign:
1. Ports VP9 enablement to vdpu381 register layout (new file `rkvdec-vdpu381-vp9.c`)
2. Registers VP9 V4L2 controls in `vdpu38x_vp9_ctrl_descs[]`
3. Adds VP9 fmt to `vdpu381_coded_fmts[]` with the new ops
4. Verifies bit-perfect HW vs SW decode (per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md))
5. Proposes upstream via linux-media
Sibling campaign: [ampere-kernel-decoders](https://git.reauktion.de/claude-noether/ampere-kernel-decoders) closed at HEVC bit-perfect (kernel-agent#14 + #15 are the prerequisite kernel fixes).
## Scope (out of)
- VP9 on RK3399 (works via legacy `rkvdec-vp9.c` already in mainline)
- VP9 on hantro (hantro decoder on RK3588 doesn't expose VP9; this campaign targets rkvdec)
- AV1 on RK3588 (separate work; AV1 is on hantro fdc70000 already + per Collabora)
- VP8 (already works via hantro)
- HEVC (closed in ampere-kernel-decoders)
## Process
8-phase loop (per ~/.claude/CLAUDE.md). All commits via `claude-noether` identity. Patches will be RFC-quality and routed via kernel-agent once ready.
+96
View File
@@ -0,0 +1,96 @@
# Phase 0 findings — ampere VP9 enablement substrate
Date: 2026-05-17, opening session of the campaign (sibling: ampere-kernel-decoders closed at HEVC bit-perfect ~30 min earlier).
## Goal
Bring VP9 hardware decode up on RK3588 ampere via rkvdec (vdpu381 register layout), upstream-aligned, suitable for a clean linux-media RFC. End-state criterion: bit-perfect against `ffmpeg -c:v vp9` SW reference per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md).
## Upstream status (search round 1)
| Source | Result |
|---|---|
| Collabora blog 2026-05 ([Panthor → RK3588](https://www.collabora.com/news-and-blog/news-and-events/from-panthor-to-rk3588-advancing-graphics-video-soc-support-linux-kernel-7.html)) | "Going forward, Collabora will work on ... VP9 code support on RK3588" — roadmap item, no series posted |
| [Collabora RK3588/RK3576 decoders merged](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html) | Linux 7.0 landed H.264 + HEVC for vdpu381/vdpu383 only |
| WebSearch `"rk3588" OR "vdpu381" rkvdec vp9 patch site:lore.kernel.org` | AV1 series + other unrelated; no VP9 vdpu381 series |
| WebSearch `rkvdec2 vp9 rk3588 collabora linux-media kernel patch 2026` | Same conclusion; RKVDEC2 driver supports H.264 only at posting time |
| `lore.kernel.org/linux-media` WebFetch | Anubis access-denied (anti-bot block) |
| `lore.kernel.org/linaro-mm-sig` WebFetch | Anubis access-denied |
| `git remote -v` on boltzmann:~/src/linux-rockchip → collabora remote | `collabora/add-rkvdec2-driver*` branches exist (vdpu383-hevc variant); **no `*-vp9*` branch** |
Conclusion: VP9 on RK3588 vdpu381 is **not yet in flight upstream**. We are first to implement.
## Existing code substrate (boltzmann:~/src/linux-rockchip @ `linux-rk3588-marfrit`)
### Legacy reference (RK3399 / vdpu341)
- `drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c` — 1042 lines (Brezillon 2019 + Pietrasiewicz 2021 + Alpha Lin 2016). Uses `writel`-style register access via `rkvdec-regs.h`.
- `rkvdec.c:419` defines `rkvdec_vp9_ctrl_descs[]` (V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR — small ctrl set)
- `rkvdec.c:478..492` registers VP9 in `rk3399_coded_fmts[]` (4096×2304 max, 64×64 alignment step)
- Ops: `rkvdec_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }` (no `try_ctrl`)
### vdpu381 reference (RK3588) — pattern to follow
- `rkvdec-vdpu381-hevc.c` — 639 lines, 2025 Casanova. **Struct-based register layout** (`rkvdec-vdpu381-regs.h`), shared preamble via `rkvdec-hevc-common.c/h`.
- `rkvdec-vdpu381-h264.c` — same pattern, h264-common shared file.
- `rkvdec.c:513..549` defines `vdpu381_coded_fmts[]` with HEVC + H.264 only — **VP9 entry must be added here**.
- `rkvdec.c:1701` `vdpu381_variant_ops` exposes single-IRQ handler + coded_fmts table — no per-codec dispatch needed.
### Common helpers already in place
- `rkvdec-cabac.c/h` — CABAC tables, codec-agnostic
- `rkvdec-rcb.c/h` — Row Cache Buffer / SRAM management (vdpu38x has internal SRAM for line caches)
- `rkvdec-h264-common.c/h`, `rkvdec-hevc-common.c/h` — codec spec parsing, RPS prep, control-batch helpers
VP9 has no `rkvdec-vp9-common.*` yet. Today the legacy `rkvdec-vp9.c` holds both the spec/probability logic AND the vdpu341 register code in one file.
## Work plan outline (to be refined in Phase 1)
| Step | Output | Notes |
|---|---|---|
| 1 | `rkvdec-vp9-common.{c,h}` — extracted from legacy `rkvdec-vp9.c` | Probability tables, frame_ctx state, segmap mgmt, libv4l2 vp9 helpers (`v4l2-vp9.h`). Stays codec-spec-only, no register access. Legacy `rkvdec-vp9.c` then includes/links to it. |
| 2 | `rkvdec-vdpu381-vp9.c` — new backend | `rkvdec_vdpu381_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }`. Re-implements register packing against `struct vp9_regs` in vdpu381 layout. |
| 3 | `rkvdec-vdpu381-regs.h` additions | VP9 register struct definitions (need Rockchip TRM or BSP reference — see open-question O1) |
| 4 | `vdpu38x_vp9_ctrl_descs[]` in `rkvdec.c` | Likely identical to legacy `rkvdec_vp9_ctrl_descs[]` (V4L2 controls are HW-agnostic) — just renamed and possibly with vdpu38x-specific dims. |
| 5 | `vdpu381_coded_fmts[]` third entry | V4L2_PIX_FMT_VP9_FRAME pointing to the new ops + ctrls. Sizes likely 65472×65472 to match HEVC entry. |
| 6 | Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference | Per `feedback_compare_hw_against_sw_reference.md`. |
| 7 | Series-prep: split into individual reviewable patches | Eventually for linux-media submission via `b4`. |
Step 1 is the biggest single chunk (refactor + maintain bit-perfect behaviour on legacy path); steps 2-3 are where the unknown register layout dominates time.
## Open questions
| # | Question | Resolution path |
|---|---|---|
| O1 | **Where is the vdpu381 VP9 register layout documented?** Public Rockchip TRMs (RK3588 TRM v0.7 / v1.0) cover vdpu341 only. We need either: (a) Rockchip BSP kernel (linux-5.10-rkr or 6.1-rkr) inspection — they have a working VP9 path, (b) Casanova's WIP if it exists privately, (c) blind RE from hardware behaviour | Step 1: pull Rockchip BSP `kernel-5.10` or `kernel-6.1` rkvdec source (`mpp_vp9d_vdpu*` in mpp/userspace; rkvdec kernel side typically minimal) |
| O2 | **Does vdpu381 share enough VP9 hardware with vdpu341 that legacy register sequencing is largely portable, or is this a clean-sheet IP?** | Inspect a Rockchip BSP `rkvdec` node from RK3588 DTS — register-map size + interrupt + clock topology says a lot. Compare to RK3399's |
| O3 | **Probability table format/layout — same between IPs?** | VP9 spec is spec; HW prob-table layout is HW-specific. Need register doc. |
| O4 | **Is RCB / SRAM usage required for VP9 on vdpu381 same as for HEVC?** | Reuse `rkvdec-rcb` helper if so; new sizing constants if not |
| O5 | **Multicore disabled** (commit `e570307ac987`) — does that affect VP9? | Likely not — VP9 was never multicore-aware; single decoder core path will work |
| O6 | **Validate via Fluster (200/239 AV1 example) or VP9-TEST-VECTORS suite** | Set up fluster GStreamer-VP9-V4L2SL-Gst1.0 test post-Phase-3 |
| O7 | **Stretch: can we cross-port the RKVDEC2 (Casanova WIP) approach** if upstream `add-rkvdec2-driver-vp9` appears mid-campaign? | Watch lore + Collabora gitlab |
## Risk register
| # | Risk | Mitigation |
|---|---|---|
| R1 | Register layout unknown — could spend weeks reverse-engineering with no public docs | Lean hard on Rockchip BSP source; if blocked, file Collabora inquiry to short-circuit |
| R2 | Legacy `rkvdec-vp9.c` refactor (extract common) breaks RK3399 path | Cross-test the legacy build on dirac (RK3399 ROCK Pi 4) before merging — sibling: `dirac.fritz.box` should still have the old kernel for regression testing |
| R3 | VP9 spec features (compressed header, segmentation, frame parallel decode) not supported by vdpu381 HW | Determine empirically; document limitations upstream |
| R4 | Backend (`libva-v4l2-request-fourier`) already has VP9 path for hantro (per `feedback_vaapi_strips_vp8_uncompressed_header.md`) but rkvdec-vp9 VAAPI integration may need adaptation | Trace ffmpeg-vaapi VP9 OUTPUT layout vs the iter38b backend's VP9 dispatch; sibling: fresnel-fourier iter33 VP8 work |
| R5 | Casanova posts an upstream VP9 series mid-effort, causing fork divergence | Monitor `collabora/add-rkvdec2-driver-vp9` branch + lore weekly; pivot to coordination if so |
## Substrate locked
Phase 0 closes here. Phase 1 (architectural plan + Sonnet review) starts next session:
- Pull Rockchip BSP rkvdec source for VP9 register-layout reference (O1)
- Draft `rkvdec-vp9-common.c` split outline
- Draft `vdpu381-vp9.c` register-packing skeleton
- Identify any V4L2 uAPI additions needed (likely none — `V4L2_CID_STATELESS_VP9_*` already exist)
## Persistence
- Repo: `/home/mfritsche/src/ampere-vp9-enablement/` on fresnel
- Gitea remote: TBD (file as `claude-noether/ampere-vp9-enablement` per [feedback_gitea_as_claude_noether](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_gitea_as_claude_noether.md))
- Kernel work: `boltzmann:~/src/linux-rockchip` branch `linux-rk3588-marfrit` (same tree as ampere-kernel-decoders campaign — separate iteration branches under `vp9-*` namespace recommended)
- ampere current state: vanilla `7.0.0-rc3-devices+` kernel + iter3/iter4-fixed modules from sibling campaign; bit-perfect HEVC verified; backend `v4l2_request_drv_video.so` is iter38b. VP9 has not been exercised on this system since the kernel-agent rollout.