iter1 phase0: substrate + prior-art survey (HEVC reclassified)

Phase 0 ran the operator-mandated upstream prior-art survey FIRST,
before any source-read or hypothesis. Headline finding: the HEVC
OOPS is fundamentally re-scoped from 'kernel bug' to 'userspace UAPI
gap' against the new 7.0 controls
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS.

Survey + verification:
- Casanova/Collabora v8 series merged in Linux 7.0 added the two
  new V4L2 controls for VDPU381 HEVC; backend 7ac934e (June pre-iter38)
  predates the UAPI and grep returns zero hits for these CIDs.
- ampere linux-api-headers is still 6.19-1, doesn't define the
  constants — the backend literally cannot reference them without a
  headers bump.
- ampere kernel source rkvdec-hevc-common.c:500-509 looks up the new
  CIDs; if backend never set them, rkvdec_hevc_prepare_hw_st_rps
  reads invalid memory via memcmp — exactly the __pi_memcmp OOPS
  symptom.

VP9 still kernel-side per the v4 cover ('This patch only adds support
for H264 and H265 in both variants'). Multiple competing out-of-tree
starting trees: Sarma's android tree (working but Android-flavored),
dongioia/rock5bplus-rkvdec2 (mainline-style claims), Kwiboo (no VP9
on RK3588 yet). RKVDEC2 separate-driver path is dead — future VP9
extends the existing rkvdec driver's VDPU381 variant_ops.

Five open questions tabled for Phase 1 — most important being campaign
re-scope (HEVC moves to backend campaign; this stays VP9 kernel-only
OR becomes a meta-campaign coordinating both).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
2026-05-16 08:10:40 +00:00
parent 72f658d7b9
commit 2a5f5c296e
+116
View File
@@ -0,0 +1,116 @@
# Phase 0 — Substrate / Motivation / Inventory (iter1 of ampere-kernel-decoders)
Closed 2026-05-16 (afternoon). Locks the research question, captures the substrate, and — critically for this campaign — runs the **upstream prior-art survey** before writing or considering any patch.
## Research question
**Which of the two RK3588-decoder gaps surfaced by `ampere-fourier` iter1 (HEVC kernel OOPS, VP9 not exposed) actually need a *kernel* patch as their fix path, and for those that do, what's the minimal candidate patch — starting from existing upstream / out-of-tree work, not from a clean re-derivation?**
Operator-supplied mechanism (verbatim, in-session):
> Consult linux-rockchip and linux-mm mailing lists for prior art regarding enabling the video decoders.
Phase 0 follows that direction: a prior-art survey is the first item, *before* any source-read or hypothesis.
## Substrate
| Property | Value |
|---|---|
| Substrate kernel branch | `ampere:~/src/linux-rockchip` branch `ampere-minimal-devices`, tip `7c241f2e2835 arm64: dts: rockchip: rk3588-coolpi-cm5-genbook: add lid switch and USB3 PHY lane config` |
| Sister tree | `boltzmann:~/src/linux-rockchip` branch `linux-rk3588-marfrit`, tip `fccdf164bfec phy: rockchip-snps-pcie3: Only check PHY1 status when using it` — has collabora remotes tracked (`add-rkvdec2-driver`, `add-rkvdec2-driver-iommu`, `add-rkvdec2-driver-sre`, `add-rkvdec2-driver-vdpu383-hevc`) |
| Baseline kernel package | `linux-ampere-fourier 7.0rc3.kafr1-1` — vanilla v7.0-rc3 + ampere DTS/board patches; built from the ampere tree above |
| ampere `linux-api-headers` | `6.19-1`**PREDATES** the 7.0 UAPI additions for `V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS` / `_LT_RPS` |
| libva backend installed | `libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1` (hand-built `0c9a7efaab…` over the broken CI binary per marfrit-packages#17) |
| Available rkvdec sources | `drivers/media/platform/rockchip/rkvdec/{rkvdec.c, rkvdec-hevc.c, rkvdec-hevc-common.c, rkvdec-vdpu381-hevc.c, rkvdec-vdpu383-hevc.c, rkvdec-vdpu381-h264.c, rkvdec-vdpu383-h264.c, rkvdec-vp9.c, …}` on ampere |
The rkvdec source has **separate VDPU381 and VDPU383 HEVC files** (rkvdec-vdpu381-hevc.c + rkvdec-vdpu383-hevc.c) plus a shared `rkvdec-hevc-common.c` that contains the OOPSing function. The VP9 source (`rkvdec-vp9.c`) exists too but doesn't appear in the VDPU381/383 variant_ops registration — i.e. the file exists for the RK3399 legacy rkvdec path only.
## Prior-art survey (operator-mandated Phase 0 step)
Surveyed 2026-05-16 by general-purpose subagent against linux-rockchip / linux-media / linux-mm / lore.kernel.org / Kwiboo / Bootlin / Collabora / D.V.A.B. Sarma / dongioia. Key findings (full subagent transcript: `~/.../tasks/a0d583fc904274132.output`):
### Maturity baseline of RK3588 mainline decoder support
RK3588 (VDPU381) and RK3576 (VDPU383) decoder support was merged in **Linux 7.0** as a 17-patch series from Detlev Casanova / Collabora ("media: rkvdec: Add support for VDPU381 and VDPU383", v8 at lkml.org/lkml/2026/1/9/1334). The series adds **H.264 and HEVC only — VP9 is NOT in 7.0 mainline**, multi-core glue is NOT in 7.0 mainline, AV1 (RK3576 only) is preliminary. The Collabora blog (collabora.com news 2026-02-27) explicitly frames VP9 on RK3588 as **future work attributed to D.V.A.B. Sarma's existing out-of-tree driver**.
So `7.0-rc3` is the **very first kernel where RK3588 HEVC even exists upstream**. Regression-fix candidates in -rc4..-rc7 / -stable are plausible but not yet surveyed (lore.kernel.org was Anubis-gated during the survey — manual recheck at `https://lore.kernel.org/linux-media/?q=rkvdec+VDPU381+RPS` deferred).
### HEVC OOPS root cause — **reclassified from "kernel bug" to "userspace UAPI gap"**
The Casanova v8 series **introduces two new V4L2 controls**:
- `V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS` (short-term RPS)
- `V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS` (long-term RPS)
Per the survey, the VDPU381 HEVC path requires userspace to populate these. Verified by reading the actual code on ampere:
`drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:500-509`:
```c
if (ctx->has_sps_st_rps) {
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS);
run->ext_sps_st_rps = ctrl ? ctrl->p_cur.p : NULL;
}
if (ctx->has_sps_lt_rps) {
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS);
run->ext_sps_lt_rps = ctrl ? ctrl->p_cur.p : NULL;
}
```
`drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:380-410` (`rkvdec_hevc_prepare_hw_st_rps`, the OOPS site):
```c
if (!run->ext_sps_st_rps)
return;
if (!memcmp(cache, run->ext_sps_st_rps, sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
return;
```
Empirical state on ampere:
- `grep V4L2_CID_STATELESS_HEVC_EXT ~/src/libva-v4l2-request-fourier/src/` returns **zero hits**. Backend `7ac934e` (June pre-iter38) predates the 7.0 UAPI and never populates either control.
- `grep V4L2_CID_STATELESS_HEVC_EXT /usr/include/linux/v4l2-controls.h` returns **zero hits**. `linux-api-headers 6.19-1` doesn't even define the constants.
**Mechanism reconstruction (highly plausible, not yet test-verified):** ampere's `ctx->has_sps_st_rps` is true (VDPU381 variant_ops sets it), so the kernel calls `v4l2_ctrl_find` for the new CID. The control may be auto-registered with a non-NULL `p_cur.p` pointing to a kernel-allocated but never-written buffer (uninitialized data). The early-return `if (!run->ext_sps_st_rps) return;` doesn't fire (pointer is non-NULL), so the function proceeds to `memcmp(cache, run->ext_sps_st_rps, sizeof(struct))` which reads from invalid / unmapped offsets and faults in `__pi_memcmp`.
**Alternative mechanism**: `ctx->has_sps_st_rps` is true but kernel never auto-allocates the control storage, so `ctrl->p_cur.p` is a stale/null pointer the kernel doesn't validate. Either way: the **fix path is userspace** — make the libva backend set the new CIDs with valid data parsed from the HEVC SPS.
**Reclassification**: kernel-agent#11 should be **closed and re-filed against `marfrit/libva-v4l2-request-fourier`** as a new issue: "extend backend to populate V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS for VDPU381 HEVC." There's still a kernel-side hardening case (add NULL/uninit guard to `prepare_hw_st_rps` so a forgetful userspace doesn't OOPS the kernel) — but it's an upstream-defense-in-depth item, not the fix-path for ampere HEVC HW decode.
### VP9 — kernel-side, multiple competing upstream starting points
**Confirmed**: VP9 is not enabled in any v3..v8 of the Casanova series. The v4 cover (patchew.org/linux/20251022174508.284929-1-detlev.casanova@collabora.com/) explicitly says *"This patch only adds support for H264 and H265 in both variants."* So `S264`/`S265`-only on `/dev/video1` is documented 7.0 mainline behaviour, **not a build / config miss**.
Out-of-tree options to evaluate as iter2+ starting points:
| Source | Tree | Status | Notes |
|--------|------|--------|-------|
| **D.V.A.B. Sarma** | `dvab-sarma/android_kernel_rk_opi` (github.com/dvab-sarma) | Working VP9 on RK3588 ≤ 4K@30, profile 0 | Android-flavoured tree (not a clean mainline diff). Tracker: `github.com/dvab-sarma/android_local_manifest/issues/3`. Collabora has offered to coach Sarma on kernel-submission etiquette for a v1; nothing on list yet. |
| **dongioia/rock5bplus-rkvdec2** | github.com/dongioia/rock5bplus-rkvdec2 | Claims RKVDEC2/VDPU381 H.264 + HEVC + VP9 @ 4K via mainline-style patches | Worth reading as a rebase candidate if Sarma's android tree proves too far from mainline |
| **Kwiboo** | github.com/Kwiboo/linux-rockchip | Active HEVC work on `linuxtv-rkvdec-hevc-v3` (Sep 2025) | **No VP9 / RK3588 branch as of the survey**. Kwiboo's RK3588 contribution is HEVC for RK3399-class, not VP9 |
| **rcawston** | github.com/rcawston/rockchip-rk3588-mainline-patches | Encoder + HDMI only | No decoder content |
The **`rkvdec2` separate-driver approach** (Casanova June 2024 RFC, lwn.net/Articles/1015469) was abandoned in favour of extending the existing `rkvdec` driver — which is what landed in 7.0. So future VP9-on-RK3588 will extend `rkvdec`'s VDPU381 variant_ops, **not** introduce a separate `rkvdec2` module.
**DTS does NOT need to change to enable VP9** — once the variant_ops gains a VP9 backend, the same `compatible = "rockchip,rk3588-vdpu381"` node will advertise `V4L2_PIX_FMT_VP9_FRAME` automatically.
### Adjacent finding worth tracking
**`media: rkvdec: Restore iommu addresses on errors`** — the only known post-merge stability fix in the 7.0 VDPU381 path per Collabora's retrospective. The decoder's embedded IOMMU is reset alongside the decoder on error recovery, dropping mappings the kernel still considers live. **Verify this is present in our `7.0-rc3` checkout** — if absent, any decoder error recovery (e.g. one bad frame) wedges subsequent decode until reset. Low-confidence whether `ampere-minimal-devices @ 7c241f2e2835` includes it — Phase 2 source-read item.
## Predecessor data — what carries vs what doesn't
Per `feedback_dev_process.md` Phase 0 rules:
- **Carries (state)**: `ampere-fourier` iter1 baseline numbers as *reference history* for Phase 7 regression checks; the operator-policy rule that codec patches stay OUT of `linux-ampere-fourier` baseline; the backend source pin `7ac934e`; memory entries about V4L2-control semantics (`feedback_unconditional_codec_state`, `feedback_per_driver_kludge_gating`, `feedback_va_st_rps_bits_is_slice_field`).
- **Does NOT carry**: the iter1 N=3 FPS numbers — those were for the 3-codec subset on this exact substrate; iter2's success metric (HEVC works) is independent.
## Open questions tabled into Phase 1
1. **Scope of this kernel campaign vs. spawning a sibling backend campaign**: HEVC is now established as fundamentally userspace work (extend backend to populate new CIDs). VP9 is kernel work. Phase 1 needs to decide whether (a) this campaign drops HEVC and focuses on VP9 only, (b) becomes a meta-campaign coordinating an HEVC backend-iter40 + a VP9 kernel-iter1, or (c) splits into two distinct campaigns (`ampere-kernel-decoders` for VP9, sibling backend campaign for HEVC).
2. **VP9 starting tree**: Sarma's Android branch, dongioia's mainline-style overlay, or wait for Casanova v1 mainline submission? Trade-off between time-to-validate-on-ampere and time-to-upstreamable-patch-quality.
3. **Test-verification of HEVC mechanism reconstruction**: the kernel-source read above strongly suggests the new-CID gap is the OOPS root cause, but it's not yet proved. Phase 1 might lock a sub-goal "write a minimal libva backend patch that registers the new CIDs (even with dummy data) — if HEVC oops vanishes / changes shape, hypothesis confirmed; if not, loop back to Phase 2 with the new evidence."
4. **IOMMU restore patch present?**: confirm whether `~/src/linux-rockchip @ ampere-minimal-devices` has the "Restore iommu addresses on errors" fix. Phase 2 source-read.
5. **lore.kernel.org Anubis re-check**: the survey couldn't enumerate lore directly. Phase 2 should manually re-check `https://lore.kernel.org/linux-media/?q=rkvdec+VDPU381+RPS` for any HEVC stability patches between v7.0 and v7.0-rcN.
## Phase 0 close
Research question locked. Substrate captured. Prior-art survey delivered the headline finding: **the HEVC OOPS is most likely a userspace UAPI gap, not a kernel bug**, which fundamentally re-scopes this campaign. VP9 remains kernel-side and has at least two viable out-of-tree starting trees. Five open questions tabled for Phase 1.
Iteration log:
- iter1: 2026-05-16 — this document.