Files
marfrit 2a5f5c296e iter1 phase0: substrate + prior-art survey (HEVC reclassified)
Phase 0 ran the operator-mandated upstream prior-art survey FIRST,
before any source-read or hypothesis. Headline finding: the HEVC
OOPS is fundamentally re-scoped from 'kernel bug' to 'userspace UAPI
gap' against the new 7.0 controls
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS.

Survey + verification:
- Casanova/Collabora v8 series merged in Linux 7.0 added the two
  new V4L2 controls for VDPU381 HEVC; backend 7ac934e (June pre-iter38)
  predates the UAPI and grep returns zero hits for these CIDs.
- ampere linux-api-headers is still 6.19-1, doesn't define the
  constants — the backend literally cannot reference them without a
  headers bump.
- ampere kernel source rkvdec-hevc-common.c:500-509 looks up the new
  CIDs; if backend never set them, rkvdec_hevc_prepare_hw_st_rps
  reads invalid memory via memcmp — exactly the __pi_memcmp OOPS
  symptom.

VP9 still kernel-side per the v4 cover ('This patch only adds support
for H264 and H265 in both variants'). Multiple competing out-of-tree
starting trees: Sarma's android tree (working but Android-flavored),
dongioia/rock5bplus-rkvdec2 (mainline-style claims), Kwiboo (no VP9
on RK3588 yet). RKVDEC2 separate-driver path is dead — future VP9
extends the existing rkvdec driver's VDPU381 variant_ops.

Five open questions tabled for Phase 1 — most important being campaign
re-scope (HEVC moves to backend campaign; this stays VP9 kernel-only
OR becomes a meta-campaign coordinating both).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-16 08:10:40 +00:00

11 KiB

Phase 0 — Substrate / Motivation / Inventory (iter1 of ampere-kernel-decoders)

Closed 2026-05-16 (afternoon). Locks the research question, captures the substrate, and — critically for this campaign — runs the upstream prior-art survey before writing or considering any patch.

Research question

Which of the two RK3588-decoder gaps surfaced by ampere-fourier iter1 (HEVC kernel OOPS, VP9 not exposed) actually need a kernel patch as their fix path, and for those that do, what's the minimal candidate patch — starting from existing upstream / out-of-tree work, not from a clean re-derivation?

Operator-supplied mechanism (verbatim, in-session):

Consult linux-rockchip and linux-mm mailing lists for prior art regarding enabling the video decoders.

Phase 0 follows that direction: a prior-art survey is the first item, before any source-read or hypothesis.

Substrate

Property Value
Substrate kernel branch ampere:~/src/linux-rockchip branch ampere-minimal-devices, tip 7c241f2e2835 arm64: dts: rockchip: rk3588-coolpi-cm5-genbook: add lid switch and USB3 PHY lane config
Sister tree boltzmann:~/src/linux-rockchip branch linux-rk3588-marfrit, tip fccdf164bfec phy: rockchip-snps-pcie3: Only check PHY1 status when using it — has collabora remotes tracked (add-rkvdec2-driver, add-rkvdec2-driver-iommu, add-rkvdec2-driver-sre, add-rkvdec2-driver-vdpu383-hevc)
Baseline kernel package linux-ampere-fourier 7.0rc3.kafr1-1 — vanilla v7.0-rc3 + ampere DTS/board patches; built from the ampere tree above
ampere linux-api-headers 6.19-1PREDATES the 7.0 UAPI additions for V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS
libva backend installed libva-v4l2-request-fourier 1.0.0.r348.7ac934e-1 (hand-built 0c9a7efaab… over the broken CI binary per marfrit-packages#17)
Available rkvdec sources drivers/media/platform/rockchip/rkvdec/{rkvdec.c, rkvdec-hevc.c, rkvdec-hevc-common.c, rkvdec-vdpu381-hevc.c, rkvdec-vdpu383-hevc.c, rkvdec-vdpu381-h264.c, rkvdec-vdpu383-h264.c, rkvdec-vp9.c, …} on ampere

The rkvdec source has separate VDPU381 and VDPU383 HEVC files (rkvdec-vdpu381-hevc.c + rkvdec-vdpu383-hevc.c) plus a shared rkvdec-hevc-common.c that contains the OOPSing function. The VP9 source (rkvdec-vp9.c) exists too but doesn't appear in the VDPU381/383 variant_ops registration — i.e. the file exists for the RK3399 legacy rkvdec path only.

Prior-art survey (operator-mandated Phase 0 step)

Surveyed 2026-05-16 by general-purpose subagent against linux-rockchip / linux-media / linux-mm / lore.kernel.org / Kwiboo / Bootlin / Collabora / D.V.A.B. Sarma / dongioia. Key findings (full subagent transcript: ~/.../tasks/a0d583fc904274132.output):

Maturity baseline of RK3588 mainline decoder support

RK3588 (VDPU381) and RK3576 (VDPU383) decoder support was merged in Linux 7.0 as a 17-patch series from Detlev Casanova / Collabora ("media: rkvdec: Add support for VDPU381 and VDPU383", v8 at lkml.org/lkml/2026/1/9/1334). The series adds H.264 and HEVC only — VP9 is NOT in 7.0 mainline, multi-core glue is NOT in 7.0 mainline, AV1 (RK3576 only) is preliminary. The Collabora blog (collabora.com news 2026-02-27) explicitly frames VP9 on RK3588 as future work attributed to D.V.A.B. Sarma's existing out-of-tree driver.

So 7.0-rc3 is the very first kernel where RK3588 HEVC even exists upstream. Regression-fix candidates in -rc4..-rc7 / -stable are plausible but not yet surveyed (lore.kernel.org was Anubis-gated during the survey — manual recheck at https://lore.kernel.org/linux-media/?q=rkvdec+VDPU381+RPS deferred).

HEVC OOPS root cause — reclassified from "kernel bug" to "userspace UAPI gap"

The Casanova v8 series introduces two new V4L2 controls:

  • V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS (short-term RPS)
  • V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS (long-term RPS)

Per the survey, the VDPU381 HEVC path requires userspace to populate these. Verified by reading the actual code on ampere:

drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:500-509:

if (ctx->has_sps_st_rps) {
    ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS);
    run->ext_sps_st_rps = ctrl ? ctrl->p_cur.p : NULL;
}
if (ctx->has_sps_lt_rps) {
    ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS);
    run->ext_sps_lt_rps = ctrl ? ctrl->p_cur.p : NULL;
}

drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:380-410 (rkvdec_hevc_prepare_hw_st_rps, the OOPS site):

if (!run->ext_sps_st_rps)
    return;

if (!memcmp(cache, run->ext_sps_st_rps, sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
    return;

Empirical state on ampere:

  • grep V4L2_CID_STATELESS_HEVC_EXT ~/src/libva-v4l2-request-fourier/src/ returns zero hits. Backend 7ac934e (June pre-iter38) predates the 7.0 UAPI and never populates either control.
  • grep V4L2_CID_STATELESS_HEVC_EXT /usr/include/linux/v4l2-controls.h returns zero hits. linux-api-headers 6.19-1 doesn't even define the constants.

Mechanism reconstruction (highly plausible, not yet test-verified): ampere's ctx->has_sps_st_rps is true (VDPU381 variant_ops sets it), so the kernel calls v4l2_ctrl_find for the new CID. The control may be auto-registered with a non-NULL p_cur.p pointing to a kernel-allocated but never-written buffer (uninitialized data). The early-return if (!run->ext_sps_st_rps) return; doesn't fire (pointer is non-NULL), so the function proceeds to memcmp(cache, run->ext_sps_st_rps, sizeof(struct)) which reads from invalid / unmapped offsets and faults in __pi_memcmp.

Alternative mechanism: ctx->has_sps_st_rps is true but kernel never auto-allocates the control storage, so ctrl->p_cur.p is a stale/null pointer the kernel doesn't validate. Either way: the fix path is userspace — make the libva backend set the new CIDs with valid data parsed from the HEVC SPS.

Reclassification: kernel-agent#11 should be closed and re-filed against marfrit/libva-v4l2-request-fourier as a new issue: "extend backend to populate V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS / _LT_RPS for VDPU381 HEVC." There's still a kernel-side hardening case (add NULL/uninit guard to prepare_hw_st_rps so a forgetful userspace doesn't OOPS the kernel) — but it's an upstream-defense-in-depth item, not the fix-path for ampere HEVC HW decode.

VP9 — kernel-side, multiple competing upstream starting points

Confirmed: VP9 is not enabled in any v3..v8 of the Casanova series. The v4 cover (patchew.org/linux/20251022174508.284929-1-detlev.casanova@collabora.com/) explicitly says "This patch only adds support for H264 and H265 in both variants." So S264/S265-only on /dev/video1 is documented 7.0 mainline behaviour, not a build / config miss.

Out-of-tree options to evaluate as iter2+ starting points:

Source Tree Status Notes
D.V.A.B. Sarma dvab-sarma/android_kernel_rk_opi (github.com/dvab-sarma) Working VP9 on RK3588 ≤ 4K@30, profile 0 Android-flavoured tree (not a clean mainline diff). Tracker: github.com/dvab-sarma/android_local_manifest/issues/3. Collabora has offered to coach Sarma on kernel-submission etiquette for a v1; nothing on list yet.
dongioia/rock5bplus-rkvdec2 github.com/dongioia/rock5bplus-rkvdec2 Claims RKVDEC2/VDPU381 H.264 + HEVC + VP9 @ 4K via mainline-style patches Worth reading as a rebase candidate if Sarma's android tree proves too far from mainline
Kwiboo github.com/Kwiboo/linux-rockchip Active HEVC work on linuxtv-rkvdec-hevc-v3 (Sep 2025) No VP9 / RK3588 branch as of the survey. Kwiboo's RK3588 contribution is HEVC for RK3399-class, not VP9
rcawston github.com/rcawston/rockchip-rk3588-mainline-patches Encoder + HDMI only No decoder content

The rkvdec2 separate-driver approach (Casanova June 2024 RFC, lwn.net/Articles/1015469) was abandoned in favour of extending the existing rkvdec driver — which is what landed in 7.0. So future VP9-on-RK3588 will extend rkvdec's VDPU381 variant_ops, not introduce a separate rkvdec2 module.

DTS does NOT need to change to enable VP9 — once the variant_ops gains a VP9 backend, the same compatible = "rockchip,rk3588-vdpu381" node will advertise V4L2_PIX_FMT_VP9_FRAME automatically.

Adjacent finding worth tracking

media: rkvdec: Restore iommu addresses on errors — the only known post-merge stability fix in the 7.0 VDPU381 path per Collabora's retrospective. The decoder's embedded IOMMU is reset alongside the decoder on error recovery, dropping mappings the kernel still considers live. Verify this is present in our 7.0-rc3 checkout — if absent, any decoder error recovery (e.g. one bad frame) wedges subsequent decode until reset. Low-confidence whether ampere-minimal-devices @ 7c241f2e2835 includes it — Phase 2 source-read item.

Predecessor data — what carries vs what doesn't

Per feedback_dev_process.md Phase 0 rules:

  • Carries (state): ampere-fourier iter1 baseline numbers as reference history for Phase 7 regression checks; the operator-policy rule that codec patches stay OUT of linux-ampere-fourier baseline; the backend source pin 7ac934e; memory entries about V4L2-control semantics (feedback_unconditional_codec_state, feedback_per_driver_kludge_gating, feedback_va_st_rps_bits_is_slice_field).
  • Does NOT carry: the iter1 N=3 FPS numbers — those were for the 3-codec subset on this exact substrate; iter2's success metric (HEVC works) is independent.

Open questions tabled into Phase 1

  1. Scope of this kernel campaign vs. spawning a sibling backend campaign: HEVC is now established as fundamentally userspace work (extend backend to populate new CIDs). VP9 is kernel work. Phase 1 needs to decide whether (a) this campaign drops HEVC and focuses on VP9 only, (b) becomes a meta-campaign coordinating an HEVC backend-iter40 + a VP9 kernel-iter1, or (c) splits into two distinct campaigns (ampere-kernel-decoders for VP9, sibling backend campaign for HEVC).
  2. VP9 starting tree: Sarma's Android branch, dongioia's mainline-style overlay, or wait for Casanova v1 mainline submission? Trade-off between time-to-validate-on-ampere and time-to-upstreamable-patch-quality.
  3. Test-verification of HEVC mechanism reconstruction: the kernel-source read above strongly suggests the new-CID gap is the OOPS root cause, but it's not yet proved. Phase 1 might lock a sub-goal "write a minimal libva backend patch that registers the new CIDs (even with dummy data) — if HEVC oops vanishes / changes shape, hypothesis confirmed; if not, loop back to Phase 2 with the new evidence."
  4. IOMMU restore patch present?: confirm whether ~/src/linux-rockchip @ ampere-minimal-devices has the "Restore iommu addresses on errors" fix. Phase 2 source-read.
  5. lore.kernel.org Anubis re-check: the survey couldn't enumerate lore directly. Phase 2 should manually re-check https://lore.kernel.org/linux-media/?q=rkvdec+VDPU381+RPS for any HEVC stability patches between v7.0 and v7.0-rcN.

Phase 0 close

Research question locked. Substrate captured. Prior-art survey delivered the headline finding: the HEVC OOPS is most likely a userspace UAPI gap, not a kernel bug, which fundamentally re-scopes this campaign. VP9 remains kernel-side and has at least two viable out-of-tree starting trees. Five open questions tabled for Phase 1.

Iteration log:

  • iter1: 2026-05-16 — this document.