rkvdec HEVC: uninitialized run.ext_sps_st_rps/lt_rps causes OOPS on RK3588/RK3576 #14
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Summary
The Casanova/Collabora v7.0 HEVC EXT_SPS_*_RPS series introduces a stack-uninit bug in
rkvdec_hevc_runon vdpu381 (RK3588) and vdpu383 (RK3576). On every HEVC decode where the dispatcher enters the assemble path,prepare_hw_st_rpsdereferences a non-NULL stack-garbage pointer and faults.Repro
linux-fresnel-fourier-style 7.0-rc3 + mmind v7.0 + Casanova HEVC v7.0 patches.LIBVA_DRIVER_NAME=v4l2_request ffmpeg -hwaccel vaapi -i any.hevc.mp4 -frames:v 3 -f rawvideo /tmp/o.nv120x51a0is deterministic stack-leftover for the specific calling pattern — not a ctrl pointer.Root cause
drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c:591:drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:498-508:When
ctx->has_sps_st_rpsis false (the common case: userspace doesnt yet submit the new controls),run->ext_sps_st_rpskeeps stack-leftover bytes. Downstream:drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c:380:Fix (one line)
Option A (recommended — fix at the producer):
Option B (defensive — zero-init at caller, applies to both vdpu381 + vdpu383):
Scope
vdpu381_variant(RK3588) +vdpu383_variant(RK3576) — both call rkvdec_hevc_run.rkvdec-hevc.c::rkvdec_hevc_run) needs the same check — has the same uninit pattern. Worth a sweep.Campaign cross-ref
Found in
ampere-kernel-decodersiter3. Full forensic trace initer3_close.md. Backend instrumentation that surfaced this:libva-v4l2-request-fourieriter3 diagnostic build (md5 404041ea).Blocks
ampere-kernel-decodersiter4 — once this lands in the ampere kernel, the standard 5-control batch failure (EINVAL) can be debugged separately.Empirical verification: Option A patch eliminates the OOPS
Applied Option A to ampere's running kernel tree (
linux-rk3588-marfrit-equivalent,7.0.0-rc3-devices+) and re-ran the exact repro.Patch as applied
Built via
make M=drivers/media/platform/rockchip/rkvdec modules, installed to/lib/modules/7.0.0-rc3-devices+/kernel/..., depmod, fresh reboot for a clean slate (prior OOPS had left av4l2_releasethread stuck in D-state holding the module refcount — rmmod was blocked).Test
Result
Internal error/Oops/Call trace/pgd=(was: deterministic__pi_memcmpfault at0x51a0per iter3_close.md)populate_ext_sps_rps_cache exit: err=-61 cache_valid=0→Unable to set control(s): Invalid argumentper frame, BeginPicture/RenderPicture/EndPicture continues cleanly, no kernel faultCaveat (out of scope for this issue)
/tmp/o.nv12is all-zero — the EXT_SPS_*_RPS controls fail userspace-side with EINVAL and ffmpeg-vaapi forwards empty CAPTURE buffers. This is the separate iter4 EINVAL flagged initer3_close.md("standard-5-controls EINVAL — likely a slice_params field shape or DECODE_PARAMS flag the iter2 backend isn't setting correctly"), unrelated to this patch. The OOPS that this issue is about is gone.Verdict
Option A is a clean fix for the OOPS. Ready to land in the Casanova v7.0 series. Also worth applying the same NULL-init (or
struct rkvdec_hevc_run run = {}at caller) sweep for the original RK3399 path inrkvdec-hevc.c::rkvdec_hevc_runas noted in the "Scope" section.Substrate at verification:
7.0.0-rc3-devices+, branchampere-minimal-deviceslibva-v4l2-request-fourieriter3 diagnostic build (md5 404041ea)rockchip-vdec.ko121872 bytes (was 121192 pre-patch)Triage refresh 2026-05-18. No empirical re-test needed — the bug is closed-form (stack-uninitialized field, deterministic fault, kernel-side). Confirming the issue is still open / unfixed:
fleet/ampere.yamlin kernel-agent does NOT include this fix among ampere's 6 board-DTS patches.linux-ampere-fourier 7.0rc3.kafr1-1baseline policy is "clean mainline + board-DTS only; fixes belong in experiment branch / target, not baseline" (perampere.yamlpreamble + the ampere-fourier campaign convention).Closing sibling #11 as duplicate (symptom report; this issue carries the root cause + one-line fix).
Indirect mitigation path may exist via userspace work
There's a non-fix that could make the bug NOT FIRE under fleet conditions, without patching the kernel:
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS/_LT_RPSinvdpu38x_hevc_ctrl_descs[]. This makesctx->has_sps_st_rps = truein the kernel.V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS. iter2 commits393d02f+f0ef69din the libva repo look like this work has landed.If both conditions hold (kernel has_sps_st_rps=true AND userspace submits the control), the
if (ctx->has_sps_st_rps)branch inrkvdec_hevc_run_preambleinitializesrun->ext_sps_st_rpsproperly, the stack-garbage path is never reached, and thememcmp(0x51a0, ...)fault doesn't fire. The underlying bug is still there (any other userspace that doesn't submit EXT_SPS_*_RPS would still oops), but the fleet's libva backend would no longer trip it.Decision points (operator)
a) Direct fix path: apply Option B (
struct rkvdec_hevc_run run = {};, one-liner inrkvdec-vdpu381-hevc.c:591+rkvdec-vdpu383-hevc.c) as a kernel-agent experiment branch / scope-tagged patch underpatches/driver/media/and ship via a non-baselinelinux-ampere-fourier-rkvdec-hevc-fixvariant. Bulletproof — doesn't depend on userspace cooperation.b) Indirect path: rely on the userspace coverage from libva iter2 + kernel-agent#15 landing. Cheaper if those are happening anyway, but the bug remains a footgun for any other v4l2-request consumer (e.g.
ffmpeg -hwaccel v4l2requestdirect without libva).c) Upstream-first: send Option A to linux-media (Casanova/Collabora HEVC v7.0 series authors). Right long-term home; doesn't block local progress because (a) is still available as a stop-gap.
Recommend (c) as the right ultimate fix + (b) as the no-action interim, because the libva iter2 work appears to already be done. Reproducer-on-ampere would confirm — but reproducing requires installing a libva backend with iter2 on ampere, which the current sandbox policy gates.
EWONTFIX 2026-05-18.
HEVC on ampere (RK3588) is scoped out indefinitely. Joint decision covering this issue plus kernel-agent#14 (kernel stack-uninit OOPS), kernel-agent#15 (HEVC_SLICE_PARAMS registration), and libva-v4l2-request-fourier#3 (libva backend EXT_SPS_*_RPS submission).
Rationale
daedalus-fourierREADME's "YouTube ∩ Pi5-HW = ∅" framing) — HEVC on ampere doesn't move that needle.Reopen criteria
Reopen any of the three if a concrete HEVC workflow emerges on ampere (4K local file collection, HEVC-encoded archive that doesn't fit on fresnel, specific app that requires HEVC and runs only on ampere). The kernel side fix in #14 + #15 is closed-form (one-liner each); the libva side in #3 is straightforward bitstream-translation work. None of the analysis is lost; it's all archived in the closed issues' bodies + comments.
Closing.