Files
ampere-vp9-enablement/phase1_plan_v2.md
T
claude-noether 5a1ba37791 Phase 1 plan + v2 amendment per Janet architect review
Phase 1 v1 claimed VP9 enablement was a < 100-line wiring patch
reusing legacy rkvdec_vp9_fmt_ops in vdpu381_coded_fmts[]. Janet
architect review (verdict AMEND) surfaced two BLOCKERs that overturn
the thesis:

A. vdpu381_variant.num_regs is unset (defaults 0); legacy VP9 path's
   rkvdec_memcpy_toio writes MIN(sizeof, 0)=0 bytes. Zero MMIO config.

B. vdpu381 register layout is segmented across 5 distinct OFFSET_*
   blocks (COMMON, CODEC_PARAMS, COMMON_ADDR, CODEC_ADDR, POC_HIGHBIT)
   plus decode trigger at VDPU381_REG_DEC_E=0x028. Legacy VP9 writes
   flat-from-base+0 and triggers via RKVDEC_REG_INTERRUPT=0x004 —
   wrong register-block layout AND wrong trigger offset.

Plus Amendment C: BSP rkvdec2_irq checks BIT(1); mainline
vdpu381_irq_handler checks BIT(2). Same register (0x380), different
bit. VP9-set BIT(1) status would misclassify as ERROR.

Both blockers confirmed empirically by reading mainline rkvdec.c and
BSP mpp_rkvdec2.c source (rkvdec2_run dispatches via mpp_write_req
on req->offset, NOT a flat dump; trans_tbl_vp9d shares HEVC's
128..199 register-DMA range).

v2 architecture: new rkvdec-vdpu381-vp9.c backend (~600-800 lines)
+ extracted rkvdec-vp9-common.{c,h} for the SPEC layer (probability
tables, frame_ctx, segmap) + VP9 register-struct extensions in
rkvdec-vdpu381-regs.h. Reuses vdpu381_irq_handler, V4L2 controls,
RCB infrastructure, IOMMU path — all confirmed as codec-agnostic.

Sourcing the bit-packed register struct definitions from Rockchip
BSP MPP hal_vp9d_vdpu382.c (vdpu382 = sibling IP family).

Next: Phase 1 v2 → Janet re-review for sign-off before Phase 2 code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:08:52 +00:00

122 lines
9.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 plan v2 — VP9 backend port to vdpu381 (post-Janet-review)
Date: 2026-05-17 ~01:35. Phase 1 v1 was AMENDED by Janet architect review (verdict: AMEND, two BLOCKERs + one AMENDMENT).
## Why v2 — the Phase-0 wiring-patch thesis is dead
Janet's review surfaced two source-code blockers that overturn the "VP9 is < 100 lines wiring" claim from Phase 0/Phase 1 v1:
### Blocker A: `vdpu381_variant.num_regs` is unset
`rkvdec.c:1707-1715` defines `vdpu381_variant` with no `.num_regs` field — defaults to 0. The legacy VP9 path at `rkvdec-vp9.c:660-661` writes the register file via:
```c
rkvdec_memcpy_toio(rkvdec->regs, regs,
MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));
```
`MIN(sizeof(*regs), 0) = 0`. **Zero bytes hit MMIO.** If we naively added VP9 to `vdpu381_coded_fmts[]` pointing at `rkvdec_vp9_fmt_ops`, the hardware would never be configured. The watchdog would fire 2s later with no IRQ.
### Blocker B: vdpu381 register layout is SEGMENTED, legacy VP9 is FLAT
Legacy `rkvdec_vp9.c:660` writes the entire register file at flat `rkvdec->regs + 0`. vdpu381 paths (`rkvdec-vdpu381-hevc.c`, `rkvdec-vdpu381-h264.c`) use **five distinct register-block offsets** within the MMIO:
```c
#define OFFSET_COMMON_REGS (8 * sizeof(u32)) // 0x020
#define OFFSET_CODEC_PARAMS_REGS (64 * sizeof(u32)) // 0x100
#define OFFSET_COMMON_ADDR_REGS (128 * sizeof(u32)) // 0x200
#define OFFSET_CODEC_ADDR_REGS (160 * sizeof(u32)) // 0x280
#define OFFSET_POC_HIGHBIT_REGS (200 * sizeof(u32)) // 0x320
```
Plus the decode trigger at `VDPU381_REG_DEC_E = 0x028` (NOT `RKVDEC_REG_INTERRUPT = 0x004` like legacy). A flat memcpy at offset 0 lands in the wrong fields — would overwrite control/status registers and never hit the codec-params block where the bitstream config goes.
**Confirmed via BSP `mpp_rkvdec2.c:467 rkvdec2_run`**: writes are dispatched via `mpp_write_req` iterating over `task->w_reqs[i].offset / sizeof(u32)` — segmented per request, matching mainline's vdpu381 model. **Confirmed via BSP `mpp_rkvdec2.c:107 trans_tbl_vp9d[]`**: VP9's DMA-translated register indices are 128..199 (offset 0x200..0x320 = `OFFSET_COMMON_ADDR_REGS` through `OFFSET_POC_HIGHBIT_REGS`) — IDENTICAL to HEVC's range (also 128..199). VP9 uses the same segmented register layout as HEVC on this hardware.
### Amendment C: IRQ status bit semantics
BSP `rkvdec2_irq` checks `RKVDEC_IRQ_RAW = BIT(1)`. Mainline `vdpu381_irq_handler` checks `VDPU381_STA_INT_DEC_RDY_STA = BIT(2)`. Same register (0x380), different bits. If VP9 sets BIT(1) instead of BIT(2) on success, every IRQ gets misclassified as ERROR and `rkvdec_iommu_restore` fires. Mitigation: instrument the IRQ handler with `pr_warn(status)` for the first 5 decodes pre-flash, compare to expected `BIT(2) | BIT(1)` mask.
## Revised architecture
VP9 on RK3588's vdpu381 IP needs a **new backend file** `rkvdec-vdpu381-vp9.c`, modelled structurally on `rkvdec-vdpu381-hevc.c` (639 lines) but borrowing the SPEC LAYER from legacy `rkvdec-vp9.c`. Estimated 500-900 lines new code.
### File breakdown
| File | Action | Lines |
|---|---|---|
| `rkvdec-vdpu381-vp9.c` | NEW: vdpu381 register-block packing + run/start/stop/done ops | ~600-800 |
| `rkvdec-vp9-common.{c,h}` | NEW: extract from legacy `rkvdec-vp9.c` the codec-spec layer (probability tables, frame_ctx state machine, segmap mgmt, V4L2 vp9 helpers, `rkvdec_vp9_priv_tbl`, `rkvdec_vp9_start/stop` body) | ~600 (mostly moved from legacy) |
| `rkvdec-vp9.c` | REFACTOR: replace flat register-dump body with thin shim that calls common helpers; keep the legacy `rkvdec_vp9_fmt_ops` ABI intact for RK3399 | ~400 (down from 1042) |
| `rkvdec-vdpu381-regs.h` | EXTEND: add VP9 register-struct definitions (`struct rkvdec_vp9_common_regs`, `_codec_params_regs`, `_addr_regs`) | +200 |
| `rkvdec.c` | wire: add `vdpu38x_vp9_ctrls`, `rkvdec_vdpu381_vp9_fmt_ops` extern, add to `vdpu381_coded_fmts[]`, set `vdpu381_variant.num_regs` if any legacy code still consults it | +30 |
### Reuse boundaries
REUSED FROM LEGACY (in common file):
- VP9 probability tables (`vp9_default_*_probs`)
- `struct rkvdec_vp9_run` augmenting `rkvdec_run`
- `rkvdec_vp9_priv_tbl` (probe + count buffers, segmap)
- `init_v4l2_vp9_count_tbl`, `rkvdec_vp9_done` (entropy update from HW count buffer)
- `rkvdec_vp9_adjust_fmt` (frame-size clamping)
- `rkvdec_vp9_start` / `rkvdec_vp9_stop` (DMA alloc for probe/count/segmap)
- V4L2 vp9 helpers from `<media/v4l2-vp9.h>`
REWRITTEN FOR VDPU381 (in new backend file):
- `rkvdec_vdpu381_vp9_run` — register packing into the 5 segmented blocks, trigger at `VDPU381_REG_DEC_E = 0x028`
- Register struct definitions (bit-packed `swreg_*` fields per Rockchip BSP MPP `Vdpu382Vp9dCtx_t` pattern from `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c`)
REUSED FROM EXISTING VDPU381 INFRASTRUCTURE:
- `vdpu381_irq_handler` (single handler, codec-agnostic — confirmed via BSP `rkvdec2_irq`)
- `vdpu381_rcb_sizes[]` (HEVC-tuned, oversized for 4K VP9 but harmless — sub-optimal SRAM use, not a correctness issue)
- IOMMU (`rkvdec0_mmu`) — `dma_alloc_coherent` works transparently through the IOMMU
- V4L2 control descriptors (`rkvdec_vp9_ctrl_descs[]` line 419) — HW-agnostic
## Register-struct sourcing
The bit-packed `struct rkvdec_vp9_common_regs` etc. will be derived from Rockchip BSP MPP `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c` (vdpu382 = sibling IP of vdpu381). Pattern: each `swreg_NNN` becomes a u32 field; bit-packed sub-fields use C bitfield syntax exactly like `struct rkvdec_hevc_sps` in `rkvdec-vdpu381-hevc.c`.
Concrete starting point: `vp9_hw_regs->common.reg028.swreg_vp9_rd_prob_idx` (BSP line 595) shows the register at common-block offset 0x028 has a `vp9_rd_prob_idx` bit-field. Inventory all such `.common.regNNN` accesses in `hal_vp9d_vdpu382.c` → reproduce as a single `struct rkvdec_vp9_common_regs` in `rkvdec-vdpu381-regs.h`.
## Test plan (unchanged from v1, plus a new pre-flight)
Add a Pre-Phase-3 step: with the NEW backend compiled, instrument `vdpu381_irq_handler` to `pr_warn` the status register value for the first 10 IRQs after VP9 enable, to resolve Amendment C empirically (do VP9 IRQs set BIT(2)?). One module build cycle; takes ~5 min added to Phase 3.
## Test fixture
- **Test bitstream**: bbb_30s_720p.vp9.webm — need to generate on fresnel via `ffmpeg -i bbb-1080p.mkv -c:v libvpx-vp9 -b:v 1500k -t 30 -vf scale=1280:720 -y bbb_30s_720p.vp9.webm`. Use 720p so it's well under the 4096 max-width.
- **SW reference**: `ffmpeg -c:v vp9` on x86 fresnel — known good.
- **Verification**: SHA the raw NV12 byte stream, not PNG ([feedback_pixel_compare_in_yuv_not_png](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_pixel_compare_in_yuv_not_png.md)).
## Updated phase sequence
1. **Phase 1 v2** (this doc) → Sonnet/Janet re-review for sign-off.
2. **Phase 2** — implement the 3 new files + edits. Boltzmann tree, `linux-rk3588-marfrit` branch, sibling commits to make review easier.
3. **Phase 3** — build module (~5 min via distcc), install on ampere with backup ([feedback_backup_before_module_replace](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_backup_before_module_replace.md)), `modprobe -r/modprobe`. **Pre-flight**: enable IRQ diagnostic `pr_warn` for first 10 statuses.
4. **Phase 4** — decode test bbb_30s_720p.vp9.webm at frame 100, byte-compare against SW reference.
5. **Phase 5** — Sonnet code review of the actual implementation diff.
6. **Phase 6** — verification iteration; widen test vectors (resolutions, segmentation, super-frames).
7. **Phase 7** — strip diagnostic `pr_warn`, finalise commits, prepare upstream patch series via `b4`, route via kernel-agent (issues #16+).
8. **Phase 8** — backend (`libva-v4l2-request-fourier`) VP9 dispatch path added (sibling, iter33 VP8 pattern). Fluster `VP9-TEST-VECTORS` validation.
## Risk register v2
| # | Risk | Status |
|---|---|---|
| R1 | Same physical IP, two register dialects — hidden mode-switch register? | RESOLVED: BSP `rkvdec_rk3588_hw_ops.init` = `rkvdec2_init` (codec-generic); no mode switch. |
| R2 | libva backend lacks rkvdec VP9 dispatch | Phase 8, independent of kernel patch |
| R3 | RK3588 VP9 max-resolution | 4096-max confirmed; cap entry at 4096×2304 conservatively |
| R4 | dirac RK3399 regression | Mitigated by keeping legacy `rkvdec_vp9_fmt_ops` ABI intact via the common-file split; legacy path unchanged at the ops level. Refactor risk = the common-file extraction itself. Cross-test required before commit |
| R5 | Casanova upstream VP9 series mid-effort | Monitor weekly |
| R6 | RCB sizing oversizes for VP9 (HEVC sized for 8K) | Resolved as PROCEED-with-note (sub-optimal SRAM, not a correctness issue) |
| R7 (new) | Bit-packed register struct may not match HW exactly — endianness, bit-ordering, packing | Use exact BSP layout from MPP `hal_vp9d_vdpu382.c`; instrument `pr_warn` writes for first decode to verify expected register values |
| R8 (new) | `rkvdec_vp9.c` refactor extracts code that legacy path still needs as inlined — breaks RK3399 build | Compile-test rk3399 defconfig after the refactor; before VP9-on-vdpu381 work begins |
## Open questions for v2 review
- Q-P1v2.1: Should we extract `rkvdec-vp9-common.c` upfront as a clean separate patch (easier review, but no immediate user benefit), or inline the reuse via `#include` directives inside `rkvdec-vdpu381-vp9.c` (less clean, but a single self-contained patch)?
- Q-P1v2.2: Can the new file use `static inline` helpers in a shared header to avoid the common-file refactor entirely?
- Q-P1v2.3: Is there value in cross-validating against the BSP MPP `Vdpu382Vp9dCtx_t` semi-public ABI before declaring our register struct definitive?
Sending to Janet for the AMEND-cycle re-review.