Files
claude-noether 5a1ba37791 Phase 1 plan + v2 amendment per Janet architect review
Phase 1 v1 claimed VP9 enablement was a < 100-line wiring patch
reusing legacy rkvdec_vp9_fmt_ops in vdpu381_coded_fmts[]. Janet
architect review (verdict AMEND) surfaced two BLOCKERs that overturn
the thesis:

A. vdpu381_variant.num_regs is unset (defaults 0); legacy VP9 path's
   rkvdec_memcpy_toio writes MIN(sizeof, 0)=0 bytes. Zero MMIO config.

B. vdpu381 register layout is segmented across 5 distinct OFFSET_*
   blocks (COMMON, CODEC_PARAMS, COMMON_ADDR, CODEC_ADDR, POC_HIGHBIT)
   plus decode trigger at VDPU381_REG_DEC_E=0x028. Legacy VP9 writes
   flat-from-base+0 and triggers via RKVDEC_REG_INTERRUPT=0x004 —
   wrong register-block layout AND wrong trigger offset.

Plus Amendment C: BSP rkvdec2_irq checks BIT(1); mainline
vdpu381_irq_handler checks BIT(2). Same register (0x380), different
bit. VP9-set BIT(1) status would misclassify as ERROR.

Both blockers confirmed empirically by reading mainline rkvdec.c and
BSP mpp_rkvdec2.c source (rkvdec2_run dispatches via mpp_write_req
on req->offset, NOT a flat dump; trans_tbl_vp9d shares HEVC's
128..199 register-DMA range).

v2 architecture: new rkvdec-vdpu381-vp9.c backend (~600-800 lines)
+ extracted rkvdec-vp9-common.{c,h} for the SPEC layer (probability
tables, frame_ctx, segmap) + VP9 register-struct extensions in
rkvdec-vdpu381-regs.h. Reuses vdpu381_irq_handler, V4L2 controls,
RCB infrastructure, IOMMU path — all confirmed as codec-agnostic.

Sourcing the bit-packed register struct definitions from Rockchip
BSP MPP hal_vp9d_vdpu382.c (vdpu382 = sibling IP family).

Next: Phase 1 v2 → Janet re-review for sign-off before Phase 2 code.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 23:08:52 +00:00

102 lines
7.6 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase 1 plan — VP9 wiring patch for vdpu381 (RK3588)
Date: 2026-05-17 ~01:15. Phase 0 closed `8dce724` with the architectural correction that VP9 is a wiring patch, not a backend port.
## Phase 1 substrate resolutions (open questions from Phase 0)
| Q | Question | Resolution |
|---|---|---|
| O1 | Translate-table fits 0x400 MMIO? | Yes. `rkvdec_v2_trans[RKVDEC_FMT_VP9D]` offsets 128..232 × 4 = 0x200..0x3A0 < 0x400. Confirmed in `mpp_rkvdec2.c` BSP. |
| O2 | Codec-aware IRQ split needed? | **NO**. BSP `rkvdec2_irq` (used for all v2-family devices including RK3588) reads ONE register — `RKVDEC_REG_INT_EN = 0x380`. Mainline `vdpu381_irq_handler` reads the **same offset** under the name `VDPU381_REG_STA_INT = 0x380`. Same hardware register; same handler works for HEVC, H.264, and VP9. |
| O3 | RK3588-specific clock/reset requirements for VP9 beyond HEVC? | None observed. BSP `rkvdec_rk3588_hw_ops` uses generic `rkvdec2_init`, `rkvdec2_clk_on/off`, `rkvdec2_sip_reset` — codec-agnostic. Mainline `vdpu381_variant` already exposes the full RK3588 clock set (`aclk_vcodec`, `hclk_vcodec`, `clk_core`, `clk_cabac`, `clk_hevc_cabac`). |
| O4 | Legacy `rkvdec_vp9_start`/`stop` against RK3588 IOMMU? | Should work transparently — vb2_dma_contig handles IOMMU. Verify at first decode attempt. |
| O5 | RCB / SRAM — needed for VP9 on RK3588? | **Yes** — BSP MPP `vp9d_vdpu34x.c` allocates `rcb_buf`/`g_buf[i].rcb_buf`. Mainline `rkvdec_start_streaming` already calls `rkvdec_allocate_rcb(ctx, variant->rcb_sizes, variant->num_rcb_sizes)` **per-variant**, so a VP9 fmt added to `vdpu381_coded_fmts[]` will automatically inherit `vdpu381_rcb_sizes[]`. RCB sizes are HEVC-flavoured but the buffer-purpose names (intra, transd, stream, inter, dblk, sao, fbc, filt-col) are codec-generic primitives; HEVC sizing should oversize-but-work for VP9 (4096-max width is smaller than HEVC 8K max). |
| O6 | Validate via Fluster `VP9-TEST-VECTORS` post-Phase-3 | Deferred; first pass uses ffmpeg-vaapi + byte-compare. |
| O7 | Watch for Collabora upstream VP9 series | Monitor lore + Collabora gitlab weekly. |
| R3 | RK3588 VP9 max-resolution | 4096×N. BSP `mpp/hal/rkdec/vp9d/hal_vp9d_vdpu34x.c:284` clamps at `width > 4096`. Mirror RK3399's `{4096, 2304, step 64}` frmsize in the new fmt entry. |
## The wiring patch
Target file: `drivers/media/platform/rockchip/rkvdec/rkvdec.c`.
### Change 1: add `vdpu38x_vp9_ctrls` ctrl-set descriptor
Insert near `vdpu38x_h264_ctrls` (~line 397):
```c
static const struct rkvdec_ctrls vdpu38x_vp9_ctrls = {
.ctrls = rkvdec_vp9_ctrl_descs,
.num_ctrls = ARRAY_SIZE(rkvdec_vp9_ctrl_descs),
};
```
Reuses the existing `rkvdec_vp9_ctrl_descs[]` (line 419). V4L2 controls are HW-agnostic so no duplication needed; only a fresh `rkvdec_ctrls` wrapper.
### Change 2: add VP9 entry to `vdpu381_coded_fmts[]`
Append after the H264 entry (~line 549):
```c
{
.fourcc = V4L2_PIX_FMT_VP9_FRAME,
.frmsize = {
.min_width = 64,
.max_width = 4096,
.step_width = 64,
.min_height = 64,
.max_height = 2304,
.step_height = 64,
},
.ctrls = &vdpu38x_vp9_ctrls,
.ops = &rkvdec_vp9_fmt_ops,
.num_decoded_fmts = ARRAY_SIZE(rkvdec_vp9_decoded_fmts),
.decoded_fmts = rkvdec_vp9_decoded_fmts,
},
```
Reuses existing `rkvdec_vp9_fmt_ops` (legacy vdpu34x register layout, written for RK3399) and `rkvdec_vp9_decoded_fmts` (NV12_4L4 + NV12). No `subsystem_flags = VB2_V4L2_FL_SUPPORTS_M2M_HOLD_CAPTURE_BUF` because VP9 frames are self-contained (no slice fragmentation).
### Change 3 (conditional): instrumentation only if first decode silent-fails
Add a one-shot DIAG `pr_warn` at the end of `rkvdec_vp9_run` if Phase 6 verification shows the IRQ never fires or fires with unexpected status. Otherwise the patch is **two structural additions and zero new logic**.
## Test plan
| Step | Action | Pass criterion |
|---|---|---|
| 1 | Build `drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` on boltzmann (already has tree + toolchain) | Builds clean |
| 2 | Backup current `7.0.0-rc3-devices+/kernel/drivers/media/platform/rockchip/rkvdec/rockchip-vdec.ko` on ampere ([feedback_backup_before_module_replace](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_backup_before_module_replace.md)) | Backup md5'd + off-device copy on fresnel |
| 3 | Install + `modprobe -r rockchip-vdec; modprobe rockchip-vdec` (no reboot needed) | `dmesg \| grep rkvdec` shows VP9 fmt enumerated. `v4l2-ctl --list-formats-out --device=/dev/video0` shows `VP9F` |
| 4 | Decode test bitstream: `LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -hwaccel_output_format vaapi -i <vp9.webm> -vf hwdownload,format=nv12 -frames:v 100 -f rawvideo -pix_fmt nv12 /tmp/hw-vp9.nv12` | rc=0, output size = 100 × W × H × 1.5 bytes |
| 5 | SW reference: `ffmpeg -i <vp9.webm> -vf format=nv12,select='eq(n,100)' -frames:v 1 -f rawvideo -pix_fmt nv12 /tmp/sw-vp9-f100.nv12` | rc=0 |
| 6 | Byte-compare frame 100 (per [feedback_compare_hw_against_sw_reference](../../.claude/projects/-home-mfritsche-src-fresnel-fourier/memory/feedback_compare_hw_against_sw_reference.md)) | ≥ 99% exact match |
| 7 | Iteratively widen test vectors (different resolutions, frame-parallel decoding, super-frame, segmentation enabled) | Each VP9 feature within `vdpu381` capability passes |
## Risk register update
| # | Risk | Mitigation |
|---|---|---|
| R1 | Same physical IP, two register dialects — hidden mode-switch register? | BSP `rkvdec_rk3588_hw_ops.init` = `rkvdec2_init` (generic). No mode switch. **Resolved**. |
| R2 | `libva-v4l2-request-fourier` lacks rkvdec VP9 dispatch | Phase 7 task: add VP9 fmt to backend's `vendor_fourcc_to_driver_fourcc` switch. Sibling iter33 VP8 pattern. Independent of kernel patch. |
| R3 | RK3588 VP9 max-resolution differs from RK3399's 4096×2304 | 4096-max confirmed; height limit unchecked. Open empirically; cap conservatively at 2304 first. |
| R4 | `dirac` RK3399 cross-test fixture status | NOT regression-impacted — patch only ADDS to `vdpu381_coded_fmts[]`, doesn't touch `rk3399_coded_fmts[]` or `rkvdec-vp9.c`. No risk to RK3399 path. **Resolved**. |
| R5 | Casanova posts upstream VP9 series mid-effort → fork divergence | Monitor; coordinate if posted. |
| R6 (new) | RCB sizing oversizes for VP9 (HEVC sized for 8K, VP9 caps at 4K) — wastes SRAM but should still work | First attempt uses HEVC sizes. If sub-optimal, follow-up patch adds VP9-specific RCB descriptor table. Not a correctness issue. |
## Sequence
Phase 1: this plan, then Sonnet Phase 5 architect review.
Phase 2: implement Change 1 + 2 in `boltzmann:~/src/linux-rockchip` `linux-rk3588-marfrit` branch.
Phase 3: build module on boltzmann via distcc.
Phase 4: install on ampere + `modprobe -r/modprobe` cycle + sanity v4l2-ctl enumeration.
Phase 5: SW review of code + Sonnet checkpoint.
Phase 6: decode test + byte-compare verification.
Phase 7: persistence — commit to `linux-rk3588-marfrit`, prepare upstream patch via b4, route through kernel-agent.
Phase 8: monitor + extend test vectors; backend (libva) update tracked separately.
## Open questions remaining for Phase 1 review
- Q-P1.1: Does mainline `vdpu381_irq_handler` correctly handle the case where the IRQ status doesn't set `VDPU381_STA_INT_DEC_RDY_STA` (BIT 2) on a successful VP9 decode? i.e. does VP9 set a *different* status bit than HEVC? Mitigation: add diagnostic `pr_warn` for first 5 IRQs.
- Q-P1.2: Does `rkvdec_vp9_start` use `dma_alloc_*` that conflicts with rkvdec0's IOMMU mode? Likely no; vb2 handles this.
- Q-P1.3: VP9 needs a `done` callback (`rkvdec_vp9_done` populates probability update for adaptive entropy). Verify the existing `done` runs in the correct workqueue context after IRQ — should "just work" since mainline has the dispatch.