Files
ampere-vp9-enablement/phase1_plan_v2.md
T
claude-noether 5a1ba37791 Phase 1 plan + v2 amendment per Janet architect review
Phase 1 v1 claimed VP9 enablement was a < 100-line wiring patch
reusing legacy rkvdec_vp9_fmt_ops in vdpu381_coded_fmts[]. Janet
architect review (verdict AMEND) surfaced two BLOCKERs that overturn
the thesis:

A. vdpu381_variant.num_regs is unset (defaults 0); legacy VP9 path's
   rkvdec_memcpy_toio writes MIN(sizeof, 0)=0 bytes. Zero MMIO config.

B. vdpu381 register layout is segmented across 5 distinct OFFSET_*
   blocks (COMMON, CODEC_PARAMS, COMMON_ADDR, CODEC_ADDR, POC_HIGHBIT)
   plus decode trigger at VDPU381_REG_DEC_E=0x028. Legacy VP9 writes
   flat-from-base+0 and triggers via RKVDEC_REG_INTERRUPT=0x004 —
   wrong register-block layout AND wrong trigger offset.

Plus Amendment C: BSP rkvdec2_irq checks BIT(1); mainline
vdpu381_irq_handler checks BIT(2). Same register (0x380), different
bit. VP9-set BIT(1) status would misclassify as ERROR.

Both blockers confirmed empirically by reading mainline rkvdec.c and
BSP mpp_rkvdec2.c source (rkvdec2_run dispatches via mpp_write_req
on req->offset, NOT a flat dump; trans_tbl_vp9d shares HEVC's
128..199 register-DMA range).

v2 architecture: new rkvdec-vdpu381-vp9.c backend (~600-800 lines)
+ extracted rkvdec-vp9-common.{c,h} for the SPEC layer (probability
tables, frame_ctx, segmap) + VP9 register-struct extensions in
rkvdec-vdpu381-regs.h. Reuses vdpu381_irq_handler, V4L2 controls,
RCB infrastructure, IOMMU path — all confirmed as codec-agnostic.

Sourcing the bit-packed register struct definitions from Rockchip
BSP MPP hal_vp9d_vdpu382.c (vdpu382 = sibling IP family).

Next: Phase 1 v2 → Janet re-review for sign-off before Phase 2 code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:08:52 +00:00

9.6 KiB
Raw Blame History

Phase 1 plan v2 — VP9 backend port to vdpu381 (post-Janet-review)

Date: 2026-05-17 ~01:35. Phase 1 v1 was AMENDED by Janet architect review (verdict: AMEND, two BLOCKERs + one AMENDMENT).

Why v2 — the Phase-0 wiring-patch thesis is dead

Janet's review surfaced two source-code blockers that overturn the "VP9 is < 100 lines wiring" claim from Phase 0/Phase 1 v1:

Blocker A: vdpu381_variant.num_regs is unset

rkvdec.c:1707-1715 defines vdpu381_variant with no .num_regs field — defaults to 0. The legacy VP9 path at rkvdec-vp9.c:660-661 writes the register file via:

rkvdec_memcpy_toio(rkvdec->regs, regs,
                   MIN(sizeof(*regs), sizeof(u32) * rkvdec->variant->num_regs));

MIN(sizeof(*regs), 0) = 0. Zero bytes hit MMIO. If we naively added VP9 to vdpu381_coded_fmts[] pointing at rkvdec_vp9_fmt_ops, the hardware would never be configured. The watchdog would fire 2s later with no IRQ.

Blocker B: vdpu381 register layout is SEGMENTED, legacy VP9 is FLAT

Legacy rkvdec_vp9.c:660 writes the entire register file at flat rkvdec->regs + 0. vdpu381 paths (rkvdec-vdpu381-hevc.c, rkvdec-vdpu381-h264.c) use five distinct register-block offsets within the MMIO:

#define OFFSET_COMMON_REGS         (8   * sizeof(u32))  // 0x020
#define OFFSET_CODEC_PARAMS_REGS   (64  * sizeof(u32))  // 0x100
#define OFFSET_COMMON_ADDR_REGS    (128 * sizeof(u32))  // 0x200
#define OFFSET_CODEC_ADDR_REGS     (160 * sizeof(u32))  // 0x280
#define OFFSET_POC_HIGHBIT_REGS    (200 * sizeof(u32))  // 0x320

Plus the decode trigger at VDPU381_REG_DEC_E = 0x028 (NOT RKVDEC_REG_INTERRUPT = 0x004 like legacy). A flat memcpy at offset 0 lands in the wrong fields — would overwrite control/status registers and never hit the codec-params block where the bitstream config goes.

Confirmed via BSP mpp_rkvdec2.c:467 rkvdec2_run: writes are dispatched via mpp_write_req iterating over task->w_reqs[i].offset / sizeof(u32) — segmented per request, matching mainline's vdpu381 model. Confirmed via BSP mpp_rkvdec2.c:107 trans_tbl_vp9d[]: VP9's DMA-translated register indices are 128..199 (offset 0x200..0x320 = OFFSET_COMMON_ADDR_REGS through OFFSET_POC_HIGHBIT_REGS) — IDENTICAL to HEVC's range (also 128..199). VP9 uses the same segmented register layout as HEVC on this hardware.

Amendment C: IRQ status bit semantics

BSP rkvdec2_irq checks RKVDEC_IRQ_RAW = BIT(1). Mainline vdpu381_irq_handler checks VDPU381_STA_INT_DEC_RDY_STA = BIT(2). Same register (0x380), different bits. If VP9 sets BIT(1) instead of BIT(2) on success, every IRQ gets misclassified as ERROR and rkvdec_iommu_restore fires. Mitigation: instrument the IRQ handler with pr_warn(status) for the first 5 decodes pre-flash, compare to expected BIT(2) | BIT(1) mask.

Revised architecture

VP9 on RK3588's vdpu381 IP needs a new backend file rkvdec-vdpu381-vp9.c, modelled structurally on rkvdec-vdpu381-hevc.c (639 lines) but borrowing the SPEC LAYER from legacy rkvdec-vp9.c. Estimated 500-900 lines new code.

File breakdown

File Action Lines
rkvdec-vdpu381-vp9.c NEW: vdpu381 register-block packing + run/start/stop/done ops ~600-800
rkvdec-vp9-common.{c,h} NEW: extract from legacy rkvdec-vp9.c the codec-spec layer (probability tables, frame_ctx state machine, segmap mgmt, V4L2 vp9 helpers, rkvdec_vp9_priv_tbl, rkvdec_vp9_start/stop body) ~600 (mostly moved from legacy)
rkvdec-vp9.c REFACTOR: replace flat register-dump body with thin shim that calls common helpers; keep the legacy rkvdec_vp9_fmt_ops ABI intact for RK3399 ~400 (down from 1042)
rkvdec-vdpu381-regs.h EXTEND: add VP9 register-struct definitions (struct rkvdec_vp9_common_regs, _codec_params_regs, _addr_regs) +200
rkvdec.c wire: add vdpu38x_vp9_ctrls, rkvdec_vdpu381_vp9_fmt_ops extern, add to vdpu381_coded_fmts[], set vdpu381_variant.num_regs if any legacy code still consults it +30

Reuse boundaries

REUSED FROM LEGACY (in common file):

  • VP9 probability tables (vp9_default_*_probs)
  • struct rkvdec_vp9_run augmenting rkvdec_run
  • rkvdec_vp9_priv_tbl (probe + count buffers, segmap)
  • init_v4l2_vp9_count_tbl, rkvdec_vp9_done (entropy update from HW count buffer)
  • rkvdec_vp9_adjust_fmt (frame-size clamping)
  • rkvdec_vp9_start / rkvdec_vp9_stop (DMA alloc for probe/count/segmap)
  • V4L2 vp9 helpers from <media/v4l2-vp9.h>

REWRITTEN FOR VDPU381 (in new backend file):

  • rkvdec_vdpu381_vp9_run — register packing into the 5 segmented blocks, trigger at VDPU381_REG_DEC_E = 0x028
  • Register struct definitions (bit-packed swreg_* fields per Rockchip BSP MPP Vdpu382Vp9dCtx_t pattern from mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c)

REUSED FROM EXISTING VDPU381 INFRASTRUCTURE:

  • vdpu381_irq_handler (single handler, codec-agnostic — confirmed via BSP rkvdec2_irq)
  • vdpu381_rcb_sizes[] (HEVC-tuned, oversized for 4K VP9 but harmless — sub-optimal SRAM use, not a correctness issue)
  • IOMMU (rkvdec0_mmu) — dma_alloc_coherent works transparently through the IOMMU
  • V4L2 control descriptors (rkvdec_vp9_ctrl_descs[] line 419) — HW-agnostic

Register-struct sourcing

The bit-packed struct rkvdec_vp9_common_regs etc. will be derived from Rockchip BSP MPP mpp/hal/rkdec/vp9d/hal_vp9d_vdpu382.c (vdpu382 = sibling IP of vdpu381). Pattern: each swreg_NNN becomes a u32 field; bit-packed sub-fields use C bitfield syntax exactly like struct rkvdec_hevc_sps in rkvdec-vdpu381-hevc.c.

Concrete starting point: vp9_hw_regs->common.reg028.swreg_vp9_rd_prob_idx (BSP line 595) shows the register at common-block offset 0x028 has a vp9_rd_prob_idx bit-field. Inventory all such .common.regNNN accesses in hal_vp9d_vdpu382.c → reproduce as a single struct rkvdec_vp9_common_regs in rkvdec-vdpu381-regs.h.

Test plan (unchanged from v1, plus a new pre-flight)

Add a Pre-Phase-3 step: with the NEW backend compiled, instrument vdpu381_irq_handler to pr_warn the status register value for the first 10 IRQs after VP9 enable, to resolve Amendment C empirically (do VP9 IRQs set BIT(2)?). One module build cycle; takes ~5 min added to Phase 3.

Test fixture

  • Test bitstream: bbb_30s_720p.vp9.webm — need to generate on fresnel via ffmpeg -i bbb-1080p.mkv -c:v libvpx-vp9 -b:v 1500k -t 30 -vf scale=1280:720 -y bbb_30s_720p.vp9.webm. Use 720p so it's well under the 4096 max-width.
  • SW reference: ffmpeg -c:v vp9 on x86 fresnel — known good.
  • Verification: SHA the raw NV12 byte stream, not PNG (feedback_pixel_compare_in_yuv_not_png).

Updated phase sequence

  1. Phase 1 v2 (this doc) → Sonnet/Janet re-review for sign-off.
  2. Phase 2 — implement the 3 new files + edits. Boltzmann tree, linux-rk3588-marfrit branch, sibling commits to make review easier.
  3. Phase 3 — build module (~5 min via distcc), install on ampere with backup (feedback_backup_before_module_replace), modprobe -r/modprobe. Pre-flight: enable IRQ diagnostic pr_warn for first 10 statuses.
  4. Phase 4 — decode test bbb_30s_720p.vp9.webm at frame 100, byte-compare against SW reference.
  5. Phase 5 — Sonnet code review of the actual implementation diff.
  6. Phase 6 — verification iteration; widen test vectors (resolutions, segmentation, super-frames).
  7. Phase 7 — strip diagnostic pr_warn, finalise commits, prepare upstream patch series via b4, route via kernel-agent (issues #16+).
  8. Phase 8 — backend (libva-v4l2-request-fourier) VP9 dispatch path added (sibling, iter33 VP8 pattern). Fluster VP9-TEST-VECTORS validation.

Risk register v2

# Risk Status
R1 Same physical IP, two register dialects — hidden mode-switch register? RESOLVED: BSP rkvdec_rk3588_hw_ops.init = rkvdec2_init (codec-generic); no mode switch.
R2 libva backend lacks rkvdec VP9 dispatch Phase 8, independent of kernel patch
R3 RK3588 VP9 max-resolution 4096-max confirmed; cap entry at 4096×2304 conservatively
R4 dirac RK3399 regression Mitigated by keeping legacy rkvdec_vp9_fmt_ops ABI intact via the common-file split; legacy path unchanged at the ops level. Refactor risk = the common-file extraction itself. Cross-test required before commit
R5 Casanova upstream VP9 series mid-effort Monitor weekly
R6 RCB sizing oversizes for VP9 (HEVC sized for 8K) Resolved as PROCEED-with-note (sub-optimal SRAM, not a correctness issue)
R7 (new) Bit-packed register struct may not match HW exactly — endianness, bit-ordering, packing Use exact BSP layout from MPP hal_vp9d_vdpu382.c; instrument pr_warn writes for first decode to verify expected register values
R8 (new) rkvdec_vp9.c refactor extracts code that legacy path still needs as inlined — breaks RK3399 build Compile-test rk3399 defconfig after the refactor; before VP9-on-vdpu381 work begins

Open questions for v2 review

  • Q-P1v2.1: Should we extract rkvdec-vp9-common.c upfront as a clean separate patch (easier review, but no immediate user benefit), or inline the reuse via #include directives inside rkvdec-vdpu381-vp9.c (less clean, but a single self-contained patch)?
  • Q-P1v2.2: Can the new file use static inline helpers in a shared header to avoid the common-file refactor entirely?
  • Q-P1v2.3: Is there value in cross-validating against the BSP MPP Vdpu382Vp9dCtx_t semi-public ABI before declaring our register struct definitive?

Sending to Janet for the AMEND-cycle re-review.