Files
ampere-vp9-enablement/phase0_findings.md
T
claude-noether eb60ecd224 Phase 0: open campaign + substrate findings
Open ampere-vp9-enablement to enable VP9 hardware decode on RK3588 ampere
(rkvdec / vdpu381 register layout). Sibling to ampere-kernel-decoders
(closed at HEVC bit-perfect 2026-05-17 ~00:42).

Phase 0 substrate locked: upstream status (Collabora roadmap, no series
posted), legacy code reference (rkvdec-vp9.c 1042 lines, vdpu341),
vdpu381 pattern reference (rkvdec-vdpu381-hevc.c, struct-based regs +
common-file split), work-plan outline, open questions (chiefly: where
is the vdpu381 VP9 register layout documented), risk register.

Phase 1 (architectural plan + Sonnet review) next session.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 22:48:26 +00:00

97 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 0 findings — ampere VP9 enablement substrate
Date: 2026-05-17, opening session of the campaign (sibling: ampere-kernel-decoders closed at HEVC bit-perfect ~30 min earlier).
## Goal
Bring VP9 hardware decode up on RK3588 ampere via rkvdec (vdpu381 register layout), upstream-aligned, suitable for a clean linux-media RFC. End-state criterion: bit-perfect against `ffmpeg -c:v vp9` SW reference per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md).
## Upstream status (search round 1)
| Source | Result |
|---|---|
| Collabora blog 2026-05 ([Panthor → RK3588](https://www.collabora.com/news-and-blog/news-and-events/from-panthor-to-rk3588-advancing-graphics-video-soc-support-linux-kernel-7.html)) | "Going forward, Collabora will work on ... VP9 code support on RK3588" — roadmap item, no series posted |
| [Collabora RK3588/RK3576 decoders merged](https://www.collabora.com/news-and-blog/news-and-events/rk3588-and-rk3576-video-decoders-support-merged-in-the-upstream-linux-kernel.html) | Linux 7.0 landed H.264 + HEVC for vdpu381/vdpu383 only |
| WebSearch `"rk3588" OR "vdpu381" rkvdec vp9 patch site:lore.kernel.org` | AV1 series + other unrelated; no VP9 vdpu381 series |
| WebSearch `rkvdec2 vp9 rk3588 collabora linux-media kernel patch 2026` | Same conclusion; RKVDEC2 driver supports H.264 only at posting time |
| `lore.kernel.org/linux-media` WebFetch | Anubis access-denied (anti-bot block) |
| `lore.kernel.org/linaro-mm-sig` WebFetch | Anubis access-denied |
| `git remote -v` on boltzmann:~/src/linux-rockchip → collabora remote | `collabora/add-rkvdec2-driver*` branches exist (vdpu383-hevc variant); **no `*-vp9*` branch** |
Conclusion: VP9 on RK3588 vdpu381 is **not yet in flight upstream**. We are first to implement.
## Existing code substrate (boltzmann:~/src/linux-rockchip @ `linux-rk3588-marfrit`)
### Legacy reference (RK3399 / vdpu341)
- `drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c` — 1042 lines (Brezillon 2019 + Pietrasiewicz 2021 + Alpha Lin 2016). Uses `writel`-style register access via `rkvdec-regs.h`.
- `rkvdec.c:419` defines `rkvdec_vp9_ctrl_descs[]` (V4L2_CID_STATELESS_VP9_FRAME + V4L2_CID_STATELESS_VP9_COMPRESSED_HDR — small ctrl set)
- `rkvdec.c:478..492` registers VP9 in `rk3399_coded_fmts[]` (4096×2304 max, 64×64 alignment step)
- Ops: `rkvdec_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }` (no `try_ctrl`)
### vdpu381 reference (RK3588) — pattern to follow
- `rkvdec-vdpu381-hevc.c` — 639 lines, 2025 Casanova. **Struct-based register layout** (`rkvdec-vdpu381-regs.h`), shared preamble via `rkvdec-hevc-common.c/h`.
- `rkvdec-vdpu381-h264.c` — same pattern, h264-common shared file.
- `rkvdec.c:513..549` defines `vdpu381_coded_fmts[]` with HEVC + H.264 only — **VP9 entry must be added here**.
- `rkvdec.c:1701` `vdpu381_variant_ops` exposes single-IRQ handler + coded_fmts table — no per-codec dispatch needed.
### Common helpers already in place
- `rkvdec-cabac.c/h` — CABAC tables, codec-agnostic
- `rkvdec-rcb.c/h` — Row Cache Buffer / SRAM management (vdpu38x has internal SRAM for line caches)
- `rkvdec-h264-common.c/h`, `rkvdec-hevc-common.c/h` — codec spec parsing, RPS prep, control-batch helpers
VP9 has no `rkvdec-vp9-common.*` yet. Today the legacy `rkvdec-vp9.c` holds both the spec/probability logic AND the vdpu341 register code in one file.
## Work plan outline (to be refined in Phase 1)
| Step | Output | Notes |
|---|---|---|
| 1 | `rkvdec-vp9-common.{c,h}` — extracted from legacy `rkvdec-vp9.c` | Probability tables, frame_ctx state, segmap mgmt, libv4l2 vp9 helpers (`v4l2-vp9.h`). Stays codec-spec-only, no register access. Legacy `rkvdec-vp9.c` then includes/links to it. |
| 2 | `rkvdec-vdpu381-vp9.c` — new backend | `rkvdec_vdpu381_vp9_fmt_ops = { .adjust_fmt, .start, .stop, .run }`. Re-implements register packing against `struct vp9_regs` in vdpu381 layout. |
| 3 | `rkvdec-vdpu381-regs.h` additions | VP9 register struct definitions (need Rockchip TRM or BSP reference — see open-question O1) |
| 4 | `vdpu38x_vp9_ctrl_descs[]` in `rkvdec.c` | Likely identical to legacy `rkvdec_vp9_ctrl_descs[]` (V4L2 controls are HW-agnostic) — just renamed and possibly with vdpu38x-specific dims. |
| 5 | `vdpu381_coded_fmts[]` third entry | V4L2_PIX_FMT_VP9_FRAME pointing to the new ops + ctrls. Sizes likely 65472×65472 to match HEVC entry. |
| 6 | Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference | Per `feedback_compare_hw_against_sw_reference.md`. |
| 7 | Series-prep: split into individual reviewable patches | Eventually for linux-media submission via `b4`. |
Step 1 is the biggest single chunk (refactor + maintain bit-perfect behaviour on legacy path); steps 2-3 are where the unknown register layout dominates time.
## Open questions
| # | Question | Resolution path |
|---|---|---|
| O1 | **Where is the vdpu381 VP9 register layout documented?** Public Rockchip TRMs (RK3588 TRM v0.7 / v1.0) cover vdpu341 only. We need either: (a) Rockchip BSP kernel (linux-5.10-rkr or 6.1-rkr) inspection — they have a working VP9 path, (b) Casanova's WIP if it exists privately, (c) blind RE from hardware behaviour | Step 1: pull Rockchip BSP `kernel-5.10` or `kernel-6.1` rkvdec source (`mpp_vp9d_vdpu*` in mpp/userspace; rkvdec kernel side typically minimal) |
| O2 | **Does vdpu381 share enough VP9 hardware with vdpu341 that legacy register sequencing is largely portable, or is this a clean-sheet IP?** | Inspect a Rockchip BSP `rkvdec` node from RK3588 DTS — register-map size + interrupt + clock topology says a lot. Compare to RK3399's |
| O3 | **Probability table format/layout — same between IPs?** | VP9 spec is spec; HW prob-table layout is HW-specific. Need register doc. |
| O4 | **Is RCB / SRAM usage required for VP9 on vdpu381 same as for HEVC?** | Reuse `rkvdec-rcb` helper if so; new sizing constants if not |
| O5 | **Multicore disabled** (commit `e570307ac987`) — does that affect VP9? | Likely not — VP9 was never multicore-aware; single decoder core path will work |
| O6 | **Validate via Fluster (200/239 AV1 example) or VP9-TEST-VECTORS suite** | Set up fluster GStreamer-VP9-V4L2SL-Gst1.0 test post-Phase-3 |
| O7 | **Stretch: can we cross-port the RKVDEC2 (Casanova WIP) approach** if upstream `add-rkvdec2-driver-vp9` appears mid-campaign? | Watch lore + Collabora gitlab |
## Risk register
| # | Risk | Mitigation |
|---|---|---|
| R1 | Register layout unknown — could spend weeks reverse-engineering with no public docs | Lean hard on Rockchip BSP source; if blocked, file Collabora inquiry to short-circuit |
| R2 | Legacy `rkvdec-vp9.c` refactor (extract common) breaks RK3399 path | Cross-test the legacy build on dirac (RK3399 ROCK Pi 4) before merging — sibling: `dirac.fritz.box` should still have the old kernel for regression testing |
| R3 | VP9 spec features (compressed header, segmentation, frame parallel decode) not supported by vdpu381 HW | Determine empirically; document limitations upstream |
| R4 | Backend (`libva-v4l2-request-fourier`) already has VP9 path for hantro (per `feedback_vaapi_strips_vp8_uncompressed_header.md`) but rkvdec-vp9 VAAPI integration may need adaptation | Trace ffmpeg-vaapi VP9 OUTPUT layout vs the iter38b backend's VP9 dispatch; sibling: fresnel-fourier iter33 VP8 work |
| R5 | Casanova posts an upstream VP9 series mid-effort, causing fork divergence | Monitor `collabora/add-rkvdec2-driver-vp9` branch + lore weekly; pivot to coordination if so |
## Substrate locked
Phase 0 closes here. Phase 1 (architectural plan + Sonnet review) starts next session:
- Pull Rockchip BSP rkvdec source for VP9 register-layout reference (O1)
- Draft `rkvdec-vp9-common.c` split outline
- Draft `vdpu381-vp9.c` register-packing skeleton
- Identify any V4L2 uAPI additions needed (likely none — `V4L2_CID_STATELESS_VP9_*` already exist)
## Persistence
- Repo: `/home/mfritsche/src/ampere-vp9-enablement/` on fresnel
- Gitea remote: TBD (file as `claude-noether/ampere-vp9-enablement` per [feedback_gitea_as_claude_noether](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_gitea_as_claude_noether.md))
- Kernel work: `boltzmann:~/src/linux-rockchip` branch `linux-rk3588-marfrit` (same tree as ampere-kernel-decoders campaign — separate iteration branches under `vp9-*` namespace recommended)
- ampere current state: vanilla `7.0.0-rc3-devices+` kernel + iter3/iter4-fixed modules from sibling campaign; bit-perfect HEVC verified; backend `v4l2_request_drv_video.so` is iter38b. VP9 has not been exercised on this system since the kernel-agent rollout.