Phase 1 v1 claimed VP9 enablement was a < 100-line wiring patch
reusing legacy rkvdec_vp9_fmt_ops in vdpu381_coded_fmts[]. Janet
architect review (verdict AMEND) surfaced two BLOCKERs that overturn
the thesis:
A. vdpu381_variant.num_regs is unset (defaults 0); legacy VP9 path's
rkvdec_memcpy_toio writes MIN(sizeof, 0)=0 bytes. Zero MMIO config.
B. vdpu381 register layout is segmented across 5 distinct OFFSET_*
blocks (COMMON, CODEC_PARAMS, COMMON_ADDR, CODEC_ADDR, POC_HIGHBIT)
plus decode trigger at VDPU381_REG_DEC_E=0x028. Legacy VP9 writes
flat-from-base+0 and triggers via RKVDEC_REG_INTERRUPT=0x004 —
wrong register-block layout AND wrong trigger offset.
Plus Amendment C: BSP rkvdec2_irq checks BIT(1); mainline
vdpu381_irq_handler checks BIT(2). Same register (0x380), different
bit. VP9-set BIT(1) status would misclassify as ERROR.
Both blockers confirmed empirically by reading mainline rkvdec.c and
BSP mpp_rkvdec2.c source (rkvdec2_run dispatches via mpp_write_req
on req->offset, NOT a flat dump; trans_tbl_vp9d shares HEVC's
128..199 register-DMA range).
v2 architecture: new rkvdec-vdpu381-vp9.c backend (~600-800 lines)
+ extracted rkvdec-vp9-common.{c,h} for the SPEC layer (probability
tables, frame_ctx, segmap) + VP9 register-struct extensions in
rkvdec-vdpu381-regs.h. Reuses vdpu381_irq_handler, V4L2 controls,
RCB infrastructure, IOMMU path — all confirmed as codec-agnostic.
Sourcing the bit-packed register struct definitions from Rockchip
BSP MPP hal_vp9d_vdpu382.c (vdpu382 = sibling IP family).
Next: Phase 1 v2 → Janet re-review for sign-off before Phase 2 code.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
9.6 KiB
Phase 1 plan v2 — VP9 backend port to vdpu381 (post-Janet-review)
Date: 2026-05-17 ~01:35. Phase 1 v1 was AMENDED by Janet architect review (verdict: AMEND, two BLOCKERs + one AMENDMENT).
Why v2 — the Phase-0 wiring-patch thesis is dead
Janet's review surfaced two source-code blockers that overturn the "VP9 is < 100 lines wiring" claim from Phase 0/Phase 1 v1:
Blocker A: vdpu381_variant.num_regs is unset
rkvdec.c:1707-1715 defines vdpu381_variant with no .num_regs field — defaults to 0. The legacy VP9 path at rkvdec-vp9.c:660-661 writes the register file via:
rkvdec_memcpy_toio(rkvdec->regs, regs,
MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));
MIN(sizeof(*regs), 0) = 0. Zero bytes hit MMIO. If we naively added VP9 to vdpu381_coded_fmts[] pointing at rkvdec_vp9_fmt_ops, the hardware would never be configured. The watchdog would fire 2s later with no IRQ.
Blocker B: vdpu381 register layout is SEGMENTED, legacy VP9 is FLAT
Legacy rkvdec_vp9.c:660 writes the entire register file at flat rkvdec->regs + 0. vdpu381 paths (rkvdec-vdpu381-hevc.c, rkvdec-vdpu381-h264.c) use five distinct register-block offsets within the MMIO:
#define OFFSET_COMMON_REGS (8 * sizeof(u32)) // 0x020
#define OFFSET_CODEC_PARAMS_REGS (64 * sizeof(u32)) // 0x100
#define OFFSET_COMMON_ADDR_REGS (128 * sizeof(u32)) // 0x200
#define OFFSET_CODEC_ADDR_REGS (160 * sizeof(u32)) // 0x280
#define OFFSET_POC_HIGHBIT_REGS (200 * sizeof(u32)) // 0x320
Plus the decode trigger at VDPU381_REG_DEC_E = 0x028 (NOT RKVDEC_REG_INTERRUPT = 0x004 like legacy). A flat memcpy at offset 0 lands in the wrong fields — would overwrite control/status registers and never hit the codec-params block where the bitstream config goes.
Confirmed via BSP mpp_rkvdec2.c:467 rkvdec2_run: writes are dispatched via mpp_write_req iterating over task->w_reqs[i].offset / sizeof(u32) — segmented per request, matching mainline's vdpu381 model. Confirmed via BSP mpp_rkvdec2.c:107 trans_tbl_vp9d[]: VP9's DMA-translated register indices are 128..199 (offset 0x200..0x320 = OFFSET_COMMON_ADDR_REGS through OFFSET_POC_HIGHBIT_REGS) — IDENTICAL to HEVC's range (also 128..199). VP9 uses the same segmented register layout as HEVC on this hardware.
Amendment C: IRQ status bit semantics
BSP rkvdec2_irq checks RKVDEC_IRQ_RAW = BIT(1). Mainline vdpu381_irq_handler checks VDPU381_STA_INT_DEC_RDY_STA = BIT(2). Same register (0x380), different bits. If VP9 sets BIT(1) instead of BIT(2) on success, every IRQ gets misclassified as ERROR and rkvdec_iommu_restore fires. Mitigation: instrument the IRQ handler with pr_warn(status) for the first 5 decodes pre-flash, compare to expected BIT(2) | BIT(1) mask.
Revised architecture
VP9 on RK3588's vdpu381 IP needs a new backend file rkvdec-vdpu381-vp9.c, modelled structurally on rkvdec-vdpu381-hevc.c (639 lines) but borrowing the SPEC LAYER from legacy rkvdec-vp9.c. Estimated 500-900 lines new code.
File breakdown
| File | Action | Lines |
|---|---|---|
rkvdec-vdpu381-vp9.c |
NEW: vdpu381 register-block packing + run/start/stop/done ops | ~600-800 |
rkvdec-vp9-common.{c,h} |
NEW: extract from legacy rkvdec-vp9.c the codec-spec layer (probability tables, frame_ctx state machine, segmap mgmt, V4L2 vp9 helpers, rkvdec_vp9_priv_tbl, rkvdec_vp9_start/stop body) |
~600 (mostly moved from legacy) |
rkvdec-vp9.c |
REFACTOR: replace flat register-dump body with thin shim that calls common helpers; keep the legacy rkvdec_vp9_fmt_ops ABI intact for RK3399 |
~400 (down from 1042) |
rkvdec-vdpu381-regs.h |
EXTEND: add VP9 register-struct definitions (struct rkvdec_vp9_common_regs, _codec_params_regs, _addr_regs) |
+200 |
rkvdec.c |
wire: add vdpu38x_vp9_ctrls, rkvdec_vdpu381_vp9_fmt_ops extern, add to vdpu381_coded_fmts[], set vdpu381_variant.num_regs if any legacy code still consults it |
+30 |
Reuse boundaries
REUSED FROM LEGACY (in common file):
- VP9 probability tables (
vp9_default_*_probs) struct rkvdec_vp9_runaugmentingrkvdec_runrkvdec_vp9_priv_tbl(probe + count buffers, segmap)init_v4l2_vp9_count_tbl,rkvdec_vp9_done(entropy update from HW count buffer)rkvdec_vp9_adjust_fmt(frame-size clamping)rkvdec_vp9_start/rkvdec_vp9_stop(DMA alloc for probe/count/segmap)- V4L2 vp9 helpers from
<media/v4l2-vp9.h>
REWRITTEN FOR VDPU381 (in new backend file):
rkvdec_vdpu381_vp9_run— register packing into the 5 segmented blocks, trigger atVDPU381_REG_DEC_E = 0x028- Register struct definitions (bit-packed
swreg_*fields per Rockchip BSP MPPVdpu382Vp9dCtx_tpattern frommpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c)
REUSED FROM EXISTING VDPU381 INFRASTRUCTURE:
vdpu381_irq_handler(single handler, codec-agnostic — confirmed via BSPrkvdec2_irq)vdpu381_rcb_sizes[](HEVC-tuned, oversized for 4K VP9 but harmless — sub-optimal SRAM use, not a correctness issue)- IOMMU (
rkvdec0_mmu) —dma_alloc_coherentworks transparently through the IOMMU - V4L2 control descriptors (
rkvdec_vp9_ctrl_descs[]line 419) — HW-agnostic
Register-struct sourcing
The bit-packed struct rkvdec_vp9_common_regs etc. will be derived from Rockchip BSP MPP mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c (vdpu382 = sibling IP of vdpu381). Pattern: each swreg_NNN becomes a u32 field; bit-packed sub-fields use C bitfield syntax exactly like struct rkvdec_hevc_sps in rkvdec-vdpu381-hevc.c.
Concrete starting point: vp9_hw_regs->common.reg028.swreg_vp9_rd_prob_idx (BSP line 595) shows the register at common-block offset 0x028 has a vp9_rd_prob_idx bit-field. Inventory all such .common.regNNN accesses in hal_vp9d_vdpu382.c → reproduce as a single struct rkvdec_vp9_common_regs in rkvdec-vdpu381-regs.h.
Test plan (unchanged from v1, plus a new pre-flight)
Add a Pre-Phase-3 step: with the NEW backend compiled, instrument vdpu381_irq_handler to pr_warn the status register value for the first 10 IRQs after VP9 enable, to resolve Amendment C empirically (do VP9 IRQs set BIT(2)?). One module build cycle; takes ~5 min added to Phase 3.
Test fixture
- Test bitstream: bbb_30s_720p.vp9.webm — need to generate on fresnel via
ffmpeg -i bbb-1080p.mkv -c:v libvpx-vp9 -b:v 1500k -t 30 -vf scale=1280:720 -y bbb_30s_720p.vp9.webm. Use 720p so it's well under the 4096 max-width. - SW reference:
ffmpeg -c:v vp9on x86 fresnel — known good. - Verification: SHA the raw NV12 byte stream, not PNG (feedback_pixel_compare_in_yuv_not_png).
Updated phase sequence
- Phase 1 v2 (this doc) → Sonnet/Janet re-review for sign-off.
- Phase 2 — implement the 3 new files + edits. Boltzmann tree,
linux-rk3588-marfritbranch, sibling commits to make review easier. - Phase 3 — build module (~5 min via distcc), install on ampere with backup (feedback_backup_before_module_replace),
modprobe -r/modprobe. Pre-flight: enable IRQ diagnosticpr_warnfor first 10 statuses. - Phase 4 — decode test bbb_30s_720p.vp9.webm at frame 100, byte-compare against SW reference.
- Phase 5 — Sonnet code review of the actual implementation diff.
- Phase 6 — verification iteration; widen test vectors (resolutions, segmentation, super-frames).
- Phase 7 — strip diagnostic
pr_warn, finalise commits, prepare upstream patch series viab4, route via kernel-agent (issues #16+). - Phase 8 — backend (
libva-v4l2-request-fourier) VP9 dispatch path added (sibling, iter33 VP8 pattern). FlusterVP9-TEST-VECTORSvalidation.
Risk register v2
| # | Risk | Status |
|---|---|---|
| R1 | Same physical IP, two register dialects — hidden mode-switch register? | RESOLVED: BSP rkvdec_rk3588_hw_ops.init = rkvdec2_init (codec-generic); no mode switch. |
| R2 | libva backend lacks rkvdec VP9 dispatch | Phase 8, independent of kernel patch |
| R3 | RK3588 VP9 max-resolution | 4096-max confirmed; cap entry at 4096×2304 conservatively |
| R4 | dirac RK3399 regression | Mitigated by keeping legacy rkvdec_vp9_fmt_ops ABI intact via the common-file split; legacy path unchanged at the ops level. Refactor risk = the common-file extraction itself. Cross-test required before commit |
| R5 | Casanova upstream VP9 series mid-effort | Monitor weekly |
| R6 | RCB sizing oversizes for VP9 (HEVC sized for 8K) | Resolved as PROCEED-with-note (sub-optimal SRAM, not a correctness issue) |
| R7 (new) | Bit-packed register struct may not match HW exactly — endianness, bit-ordering, packing | Use exact BSP layout from MPP hal_vp9d_vdpu382.c; instrument pr_warn writes for first decode to verify expected register values |
| R8 (new) | rkvdec_vp9.c refactor extracts code that legacy path still needs as inlined — breaks RK3399 build |
Compile-test rk3399 defconfig after the refactor; before VP9-on-vdpu381 work begins |
Open questions for v2 review
- Q-P1v2.1: Should we extract
rkvdec-vp9-common.cupfront as a clean separate patch (easier review, but no immediate user benefit), or inline the reuse via#includedirectives insiderkvdec-vdpu381-vp9.c(less clean, but a single self-contained patch)? - Q-P1v2.2: Can the new file use
static inlinehelpers in a shared header to avoid the common-file refactor entirely? - Q-P1v2.3: Is there value in cross-validating against the BSP MPP
Vdpu382Vp9dCtx_tsemi-public ABI before declaring our register struct definitive?
Sending to Janet for the AMEND-cycle re-review.