rkvdec_hevc_run_preamble conditionally assigns run->ext_sps_st_rps and run->ext_sps_lt_rps only when ctx->has_sps_*_rps is true. The caller (vdpu381-hevc.c:591) allocates the run struct without zero-init, so when has_sps_st_rps is false, run.ext_sps_st_rps retains stack garbage (0x51a0 deterministically on RK3588 CoolPi CM5 GenBook). prepare_hw_st_rps's `if (!run->ext_sps_st_rps) return;` check passes (garbage is non-NULL), then memcmp dereferences 0x51a0 → OOPS. Backend diagnostic instrumentation (iter3-Q4) revealed the full dispatch path BeginPicture → RenderPicture → EndPicture → h265_set_controls → populate_ext_sps_rps_cache(ENODATA, because ffmpeg-vaapi strips SPS/VPS/PPS, leaving only slice NALs) → fallback to 5-ctrl batch (EINVAL) → MEDIA_REQUEST_IOC_QUEUE → kernel OOPS. The OOPS occurs even when our EXT_SPS_*_RPS controls are NEVER submitted — purely a kernel-side init bug. 1-line patches proposed (Option A — fix in preamble, Option B — zero-init run struct at caller). Q1-Q5 all answered. Userspace iter2 work stays — upstream-aligned path for streams that need EXT_SPS_*_RPS. Iter4 deferred until kernel patch lands via kernel-agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
8.5 KiB
Iter3 close — kernel-side root cause located
Date: 2026-05-16 (afternoon)
Branch: master
Substrate: ampere 7.0.0-rc3-devices+ (mmind v7.0 + Casanova HEVC 7.0 series + ampere DTS)
Backend: libva-v4l2-request-fourier iter2 + iter3 diagnostic instrumentation (md5 404041ea)
Bottom line
The HEVC OOPS on RK3588 is a one-line kernel bug in the Casanova/Collabora 7.0 HEVC series. Userspace work to vendor the GStreamer parser and submit V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS controls is the correct upstream-aligned approach for HEVC streams that need it, BUT the OOPS occurs even without those controls because of a missing struct-field initialization in the kernel.
Falsifier outcome
F1 (no kernel OOPS): TRUE — F1 fires again, same as iter2. OOPS reproduced deterministically on fresh-boot run at 12:30:15 with iter3 instrumented build.
F2 (vendored GStreamer parser extracts SPS from libavcodec-stripped source): FALSE — verified empirically. Diagnostic log shows populate_ext_sps_rps_cache entry: src=0xffff7de54000 src_size=280 cache_valid=0 → nal_iter=0 pr=5 offset=0 nalu.type=20 nalu.size=277 → exit: err=-61 cache_valid=0. First NAL in source_data is type 20 (IDR_N_LP slice), not type 33 (SPS). ffmpeg-vaapi strips VPS/SPS/PPS — confirms Phase 5 reviewer's prediction. This is consistent with the iter2 strace finding and not the bug.
Root cause (Q1 + Q3 combined)
In drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c:591:
static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
{
struct rkvdec_dev *rkvdec = ctx->dev;
struct rkvdec_hevc_run run; // ← UNINITIALIZED
struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu;
rkvdec_hevc_run_preamble(ctx, &run);
...
}
rkvdec_hevc_run_preamble in rkvdec-hevc-common.c:478 assigns most run->* fields unconditionally, but the new EXT_SPS fields are conditional:
if (ctx->has_sps_st_rps) {
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS);
run->ext_sps_st_rps = ctrl ? ctrl->p_cur.p : NULL;
}
if (ctx->has_sps_lt_rps) {
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS);
run->ext_sps_lt_rps = ctrl ? ctrl->p_cur.p : NULL;
}
When ctx->has_sps_st_rps == false (the common case for any HEVC stream where userspace doesn't supply EXT_SPS_*_RPS), run.ext_sps_st_rps retains stack garbage — the leftover bytes from a previous stack frame at the same offset. Empirically on the CoolPi CM5 GenBook this garbage is deterministically 0x51a0 per call, because the calling pattern is fixed.
Downstream in rkvdec_hevc_prepare_hw_st_rps (rkvdec-hevc-common.c:380):
if (!run->ext_sps_st_rps)
return; /* ← garbage 0x51a0 is non-NULL */
if (!memcmp(cache, run->ext_sps_st_rps, sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps)))
return; /* ← memcmp derefs 0x51a0 → OOPS */
The NULL-check is bypassed by stack garbage; memcmp faults at the bogus address.
This is reachable on every HEVC decode that hits the assemble path on vdpu381_variant (RK3588) and vdpu383_variant (RK3576) — i.e., every stream where the dispatcher gate in rkvdec-vdpu{381,383}-hevc.c:606 enters the else branch.
Minimal kernel patch
Either of these one-line patches fixes the OOPS without changing any behavior for valid use cases:
Option A — fix at the producer (preamble):
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c
@@ -495,6 +495,9 @@ void rkvdec_hevc_run_preamble(struct rkvdec_ctx *ctx,
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
V4L2_CID_STATELESS_HEVC_SCALING_MATRIX);
run->scaling_matrix = ctrl ? ctrl->p_cur.p : NULL;
+
+ run->ext_sps_st_rps = NULL;
+ run->ext_sps_lt_rps = NULL;
if (ctx->has_sps_st_rps) {
ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl,
Option B — fix at the callers (zero-init the run struct):
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
@@ -588,7 +588,7 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
{
struct rkvdec_dev *rkvdec = ctx->dev;
- struct rkvdec_hevc_run run;
+ struct rkvdec_hevc_run run = {};
struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv;
struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu;
(same for rkvdec-vdpu383-hevc.c)
Option A is more localized to the cause (the preamble's incomplete initialization). Option B is more defensive (catches any future fields added without explicit init). Either works.
Why iter2's userspace fix matters separately
Independently of the kernel patch, the iter2 work (vendored GStreamer 1.28.2 parser + UAPI shim + h265 cache population) IS the upstream-aligned path for HEVC streams where the SPS parameter set semantics require the new ST/LT RPS controls. When ffmpeg-vaapi forwards a buffer that DOES contain the SPS (e.g. some bitstream wrappers, or future ffmpeg versions that don't strip VPS/SPS/PPS), our parser activates and submits the EXT_SPS_*_RPS controls — exactly the upstream contract from the Casanova series.
For the current ffmpeg-vaapi (which strips SPS), our backend correctly returns ENODATA from the parser and falls back to submitting only the standard 5 controls. With the kernel patch above, this fallback succeeds without OOPS.
(There remains a separate question: for bbb_60s_720p.hevc, the SPS encoded by libx265 may or may not actually require EXT_SPS_*_RPS — typical streams use simple ST_RPS that the kernel can derive from slice headers alone, which is what the current rkvdec dispatcher path expects. The user-pref direction is "do it correctly aligned with upstream architecture", so the parser+cache+inject path stays.)
Phase 6 question completion
| Q | Answer |
|---|---|
Q1 — what is 0x51a0 |
Stack garbage at run.ext_sps_st_rps offset from prior frame. Deterministic per calling pattern. |
Q2 — where is ctrl->p_cur.p = 0x51a0 set |
N/A — not a ctrl pointer corruption. The 0x51a0 is local stack memory, not from the V4L2 ctrl handler. |
Q3 — when ctx->has_sps_st_rps true |
FALSE in our case. The OOPS triggers via the missing-init-of-run, not via the has_sps_st_rps gate. |
| Q4 — backend log fires? | YES, after fflush patch + picture.c entry-point instrumentation. Confirmed full path BeginPicture → RenderPicture×2 → EndPicture → h265_set_controls → populate_ext_sps_rps_cache → ENODATA → fallback to 5-ctrl batch → EINVAL → MEDIA_REQUEST_IOC_QUEUE → kernel OOPS. |
| Q5 — vdpu381 vs vdpu383 on RK3588 | RK3588 → vdpu381_variant. Confirmed in drivers/media/platform/rockchip/rkvdec/rkvdec.c:1757 ("rockchip,rk3588-vdec" → &vdpu381_variant). vdpu383 binds only to RK3576. |
Open follow-ups
- kernel-agent issue (new): file the 1-line patch above as an upstream-aligned fix to the Casanova v7.0 series. Tag for: applies to vdpu381 + vdpu383 both; reproducer is "any HEVC stream where ffmpeg-vaapi strips SPS" (i.e., everything).
- iter4 plan: with kernel patched, re-run HEVC and confirm decode succeeds. Then iterate on the standard-5-controls EINVAL (separate issue — likely a slice_params field shape or DECODE_PARAMS flag the iter2 backend isn't setting correctly).
- Optional userspace robustness: after kernel patch, the EXT_SPS_*_RPS path stays dormant on standard streams; consider deferring h265_populate to only when SPS is detected in source_data (cheap optimization).
- Diagnostic instrumentation removal: iter3-Q4 logs in
picture.c+h265.c+utils.c::fflushshould be stripped once iter4 close confirms the kernel patch works. fflush in utils.c can stay (cheap, useful).
Substrate state at close
- Backend
.so: md5404041ea2dcc03c769e0ab8c43ddadd6on ampere/usr/lib/dri/ - Backend source: ampere
~/src/libva-v4l2-request-fourierworking tree has uncommitted iter3 diagnostics (picture.c + utils.c + h265.c) - Kernel: unchanged from iter2 substrate (waiting on kernel-agent issue + patch land)
Iter3 closes with root cause identified and a 1-line fix proposed, going from negative-result (iter2 close) to definitive-finding in one diagnostic iteration. The 8-phase loop's Phase 5 (review) recommendation in iter2 to add backend logging was the unblocking move.