diff --git a/phase0_findings.md b/phase0_findings.md index 10bb165..d4b4e34 100644 --- a/phase0_findings.md +++ b/phase0_findings.md @@ -58,35 +58,61 @@ VP9 has no `rkvdec-vp9-common.*` yet. Today the legacy `rkvdec-vp9.c` holds both Step 1 is the biggest single chunk (refactor + maintain bit-perfect behaviour on legacy path); steps 2-3 are where the unknown register layout dominates time. -## Open questions +## Architectural correction (mid-Phase-0) + +**Original premise overturned by Rockchip BSP inspection.** The work-plan-outline above assumed vdpu381 (mainline RK3588) is a *different IP* from vdpu341 (RK3399), requiring a new register backend. The BSP investigation says no: + +| Source | Finding | +|---|---| +| BSP DTS `rk3588s.dtsi` (lines 5059, 5113) | `rkvdec0@fdc38000`, `rkvdec1@fdc48000` carry `compatible = "rockchip,rkv-decoder-v2"` and a 0x400-byte MMIO window. **No separate vdpu34x physical IP exists on RK3588.** | +| BSP driver `drivers/video/rockchip/mpp/mpp_rkvdec2.c:1659` | `rkvdec_rk3588_data` ties RK3588 to `rkvdec_v2_hw_info` and `rkvdec_v2_trans` (the *same* dispatch tables used for RK356X). RK3576 alone routes to `rkvdec_vdpu383_*`. **RK3588 stays on the v2/vdpu34x family in BSP naming.** | +| BSP MPP userspace `mpp/soc.cmake` | `add_soc_config("RK3588" "VDPU381,VDPU34X,...")` — RK3588 *enables both* register-protocol backends. Same physical rkvdec IP accepts two register-layout dialects within its MMIO window: vdpu381 dialect (Casanova mainline naming, used for H.264/HEVC) and vdpu34x dialect (Rockchip legacy naming, used for VP9 + AVS2). | +| BSP MPP `mpp/hal/rkdec/vp9d/CMakeLists.txt` | VP9 backends gated only on `HAVE_VDPU34X / VDPU382 / VDPU383 / VDPU384B`. **No `HAVE_VDPU381` VP9 backend exists** — vdpu381-class hardware uses the vdpu34x VP9 backend. | + +**Implication**: RK3588 VP9 enablement does NOT require porting `rkvdec-vp9.c` to a new register layout. The existing mainline `rkvdec_vp9_fmt_ops` (vdpu34x layout, written for RK3399) can drive RK3588's rkvdec hardware *as-is*. The missing work is a **wiring patch**, not a backend port. + +## Revised work plan + +| Step | Output | Notes | +|---|---|---| +| 1 | Add VP9 entry to `vdpu381_coded_fmts[]` in `rkvdec.c:513`, pointing to existing `rkvdec_vp9_fmt_ops` and reusing `rkvdec_vp9_ctrls` | Frame-size limits from RK3399's entry (4096×2304, step 64) — RK3588 VP9 hard limits may differ; cross-check `vp9d_vdpu34x.c` for max-resolution constants | +| 2 | Wire vdpu381 IRQ handler to recognise VP9 codec context | Legacy IRQ (`rkvdec_irq`) and vdpu381 IRQ (`vdpu381_irq_handler`) read different status registers — VP9 path may need legacy IRQ semantics. Verify against BSP `mpp_rkvdec2.c` IRQ + `trans_tbl_vp9d` register layout | +| 3 | Verify RK3588 rkvdec clock topology matches what legacy VP9 path needs | RK3588 DT has `clk_core`, `clk_cabac`, `clk_hevc_cabac` — legacy VP9 path uses `clk_core`/`clk_cabac` (subset) | +| 4 | Verify legacy `rkvdec-regs.h` register offsets are valid on RK3588 rkvdec MMIO (0xfdc38100 + 0x400) | Same physical IP, same register window. Smoke-test with devmem2 against ampere | +| 5 | Test: ffmpeg-vaapi VP9 decode + byte-compare against SW reference | Per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md). | +| 6 | Single RFC patch to linux-media | If wiring is the only delta — easy upstream sell | + +**Expected size**: < 100 lines if IRQ-handler doesn't need codec-aware split, ~200 lines if VP9 requires per-codec IRQ dispatch. A small fraction of the original "port 1042 lines" estimate. + +## Open questions (revised) | # | Question | Resolution path | |---|---|---| -| O1 | **Where is the vdpu381 VP9 register layout documented?** Public Rockchip TRMs (RK3588 TRM v0.7 / v1.0) cover vdpu341 only. We need either: (a) Rockchip BSP kernel (linux-5.10-rkr or 6.1-rkr) inspection — they have a working VP9 path, (b) Casanova's WIP if it exists privately, (c) blind RE from hardware behaviour | Step 1: pull Rockchip BSP `kernel-5.10` or `kernel-6.1` rkvdec source (`mpp_vp9d_vdpu*` in mpp/userspace; rkvdec kernel side typically minimal) | -| O2 | **Does vdpu381 share enough VP9 hardware with vdpu341 that legacy register sequencing is largely portable, or is this a clean-sheet IP?** | Inspect a Rockchip BSP `rkvdec` node from RK3588 DTS — register-map size + interrupt + clock topology says a lot. Compare to RK3399's | -| O3 | **Probability table format/layout — same between IPs?** | VP9 spec is spec; HW prob-table layout is HW-specific. Need register doc. | -| O4 | **Is RCB / SRAM usage required for VP9 on vdpu381 same as for HEVC?** | Reuse `rkvdec-rcb` helper if so; new sizing constants if not | -| O5 | **Multicore disabled** (commit `e570307ac987`) — does that affect VP9? | Likely not — VP9 was never multicore-aware; single decoder core path will work | -| O6 | **Validate via Fluster (200/239 AV1 example) or VP9-TEST-VECTORS suite** | Set up fluster GStreamer-VP9-V4L2SL-Gst1.0 test post-Phase-3 | -| O7 | **Stretch: can we cross-port the RKVDEC2 (Casanova WIP) approach** if upstream `add-rkvdec2-driver-vp9` appears mid-campaign? | Watch lore + Collabora gitlab | +| O1 | Does `rkvdec_v2_trans[RKVDEC_FMT_VP9D]` register-translate table (offsets 128..232 × 4 = 0x200..0x3A0) fit inside the rkvdec MMIO window (0x400 bytes)? | Yes by inspection (`mpp_rkvdec2.c` BSP). Confirmed. | +| O2 | Does the vdpu381 IRQ handler need codec-aware dispatch, or can it handle VP9 termination identically to HEVC/H.264? | Read `rkvdec_rk3588_hw_ops` IRQ in `mpp_rkvdec2.c` + compare to legacy `rkvdec_irq` in mainline. If different status-bit semantics, need `if (ctx->coded_fmt == V4L2_PIX_FMT_VP9_FRAME)` split | +| O3 | RK3588-specific clock/reset requirements for VP9 beyond HEVC? | Compare BSP `mpp_rkvdec2.c` IRQ + power-on routines for `FMT_VP9D` vs `FMT_H265D` | +| O4 | Does legacy `rkvdec_vp9_start` / `rkvdec_vp9_stop` (probe + segmap buffer alloc) work against RK3588's IOMMU configuration? | Most likely yes (vb2_dma_contig handles IOMMU transparently). Verify at first decode attempt | +| O5 | RCB / SRAM — legacy VP9 path doesn't use RCB; vdpu381 HEVC does. Is RCB needed for VP9 on RK3588? | Compare `vp9d_vdpu34x.c` MPP backend's RCB usage to `vp9d_vdpu382.c`. If vdpu34x backend works without RCB on RK3588 in BSP, mainline doesn't need it either for VP9 | +| O6 | Validate via Fluster `VP9-TEST-VECTORS` post-Phase-3 | Set up `GStreamer-VP9-V4L2SL-Gst1.0` test rig | +| O7 | If a Collabora `linux-rkvdec-vp9-on-rk3588` series appears, pivot to coordination | Monitor lore + Collabora gitlab weekly | -## Risk register +## Risk register (revised) | # | Risk | Mitigation | |---|---|---| -| R1 | Register layout unknown — could spend weeks reverse-engineering with no public docs | Lean hard on Rockchip BSP source; if blocked, file Collabora inquiry to short-circuit | -| R2 | Legacy `rkvdec-vp9.c` refactor (extract common) breaks RK3399 path | Cross-test the legacy build on dirac (RK3399 ROCK Pi 4) before merging — sibling: `dirac.fritz.box` should still have the old kernel for regression testing | -| R3 | VP9 spec features (compressed header, segmentation, frame parallel decode) not supported by vdpu381 HW | Determine empirically; document limitations upstream | -| R4 | Backend (`libva-v4l2-request-fourier`) already has VP9 path for hantro (per `feedback_vaapi_strips_vp8_uncompressed_header.md`) but rkvdec-vp9 VAAPI integration may need adaptation | Trace ffmpeg-vaapi VP9 OUTPUT layout vs the iter38b backend's VP9 dispatch; sibling: fresnel-fourier iter33 VP8 work | -| R5 | Casanova posts an upstream VP9 series mid-effort, causing fork divergence | Monitor `collabora/add-rkvdec2-driver-vp9` branch + lore weekly; pivot to coordination if so | +| R1 | Same physical IP accepting two register dialects is unusual — there may be a hidden mode-switch register that must be set before VP9 work | Inspect BSP `rkvdec_rk3588_hw_ops` for any pre-decode setup distinguishing VP9 from HEVC | +| R2 | Backend (`libva-v4l2-request-fourier`) doesn't yet have rkvdec VP9 dispatch path; only hantro VP8 exists | Mirror the iter33 VP8 pattern: profile-gated codec dispatch in `RequestCreateConfig`. Sibling: `feedback_unconditional_codec_state.md` (must per-codec gate) | +| R3 | RK3588 VP9 max-resolution may differ from RK3399's 4096×2304 | Read MPP `vp9d_vdpu34x.c` max-resolution constants for confirmation | +| R4 | `dirac` (RK3399) cross-test fixture status unknown — needed if we modify legacy `rkvdec-vp9.c` for any reason | If modification needed, verify dirac is reachable before commit. Mostly we should NOT need to touch legacy file | +| R5 | Casanova posts upstream VP9 series mid-effort → fork divergence | Monitor weekly; coordinate if posted | ## Substrate locked Phase 0 closes here. Phase 1 (architectural plan + Sonnet review) starts next session: -- Pull Rockchip BSP rkvdec source for VP9 register-layout reference (O1) -- Draft `rkvdec-vp9-common.c` split outline -- Draft `vdpu381-vp9.c` register-packing skeleton -- Identify any V4L2 uAPI additions needed (likely none — `V4L2_CID_STATELESS_VP9_*` already exist) +- Read BSP `mpp_rkvdec2.c` rkvdec_rk3588_hw_ops and IRQ routine to resolve O2/O3 (mode-switch register? VP9-specific IRQ status?) +- Read BSP MPP `vp9d_vdpu34x.c` to confirm O5 (RCB usage) and R3 (max-resolution) +- Draft the wiring patch outline (3rd entry in `vdpu381_coded_fmts[]` + any IRQ split) +- Decide ampere-side test fixture (which VP9 bitstreams; bbb-vp9 + a streaming-capable test vector) ## Persistence