From 659c08c81c5b3a969ef3011bd34f449918af15a9 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 10:34:13 +0000 Subject: [PATCH] =?UTF-8?q?iter3=20close=20=E2=80=94=20kernel=20root=20cau?= =?UTF-8?q?se:=20uninitialized=20run.ext=5Fsps=5Fst=5Frps?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit rkvdec_hevc_run_preamble conditionally assigns run->ext_sps_st_rps and run->ext_sps_lt_rps only when ctx->has_sps_*_rps is true. The caller (vdpu381-hevc.c:591) allocates the run struct without zero-init, so when has_sps_st_rps is false, run.ext_sps_st_rps retains stack garbage (0x51a0 deterministically on RK3588 CoolPi CM5 GenBook). prepare_hw_st_rps's `if (!run->ext_sps_st_rps) return;` check passes (garbage is non-NULL), then memcmp dereferences 0x51a0 → OOPS. Backend diagnostic instrumentation (iter3-Q4) revealed the full dispatch path BeginPicture → RenderPicture → EndPicture → h265_set_controls → populate_ext_sps_rps_cache(ENODATA, because ffmpeg-vaapi strips SPS/VPS/PPS, leaving only slice NALs) → fallback to 5-ctrl batch (EINVAL) → MEDIA_REQUEST_IOC_QUEUE → kernel OOPS. The OOPS occurs even when our EXT_SPS_*_RPS controls are NEVER submitted — purely a kernel-side init bug. 1-line patches proposed (Option A — fix in preamble, Option B — zero-init run struct at caller). Q1-Q5 all answered. Userspace iter2 work stays — upstream-aligned path for streams that need EXT_SPS_*_RPS. Iter4 deferred until kernel patch lands via kernel-agent. Co-Authored-By: Claude Opus 4.7 --- iter3_close.md | 132 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 132 insertions(+) create mode 100644 iter3_close.md diff --git a/iter3_close.md b/iter3_close.md new file mode 100644 index 0000000..be98948 --- /dev/null +++ b/iter3_close.md @@ -0,0 +1,132 @@ +# Iter3 close — kernel-side root cause located + +Date: 2026-05-16 (afternoon) +Branch: `master` +Substrate: ampere `7.0.0-rc3-devices+` (mmind v7.0 + Casanova HEVC 7.0 series + ampere DTS) +Backend: `libva-v4l2-request-fourier` iter2 + iter3 diagnostic instrumentation (md5 `404041ea`) + +## Bottom line + +**The HEVC OOPS on RK3588 is a one-line kernel bug in the Casanova/Collabora 7.0 HEVC series.** Userspace work to vendor the GStreamer parser and submit `V4L2_CID_STATELESS_HEVC_EXT_SPS_{ST,LT}_RPS` controls is the *correct upstream-aligned approach* for HEVC streams that need it, BUT the OOPS occurs even without those controls because of a missing struct-field initialization in the kernel. + +## Falsifier outcome + +F1 (no kernel OOPS): **TRUE — F1 fires again**, same as iter2. OOPS reproduced deterministically on fresh-boot run at 12:30:15 with iter3 instrumented build. + +F2 (vendored GStreamer parser extracts SPS from libavcodec-stripped source): **FALSE — verified empirically.** Diagnostic log shows `populate_ext_sps_rps_cache entry: src=0xffff7de54000 src_size=280 cache_valid=0` → `nal_iter=0 pr=5 offset=0 nalu.type=20 nalu.size=277` → `exit: err=-61 cache_valid=0`. First NAL in source_data is type 20 (`IDR_N_LP slice`), not type 33 (`SPS`). ffmpeg-vaapi strips VPS/SPS/PPS — confirms Phase 5 reviewer's prediction. This is consistent with the iter2 strace finding and not the bug. + +## Root cause (Q1 + Q3 combined) + +In `drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c:591`: + +```c +static int rkvdec_hevc_run(struct rkvdec_ctx *ctx) +{ + struct rkvdec_dev *rkvdec = ctx->dev; + struct rkvdec_hevc_run run; // ← UNINITIALIZED + struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv; + struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu; + + rkvdec_hevc_run_preamble(ctx, &run); + ... +} +``` + +`rkvdec_hevc_run_preamble` in `rkvdec-hevc-common.c:478` assigns most `run->*` fields unconditionally, but the new EXT_SPS fields are conditional: + +```c +if (ctx->has_sps_st_rps) { + ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, + V4L2_CID_STATELESS_HEVC_EXT_SPS_ST_RPS); + run->ext_sps_st_rps = ctrl ? ctrl->p_cur.p : NULL; +} +if (ctx->has_sps_lt_rps) { + ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, + V4L2_CID_STATELESS_HEVC_EXT_SPS_LT_RPS); + run->ext_sps_lt_rps = ctrl ? ctrl->p_cur.p : NULL; +} +``` + +When `ctx->has_sps_st_rps == false` (the common case for any HEVC stream where userspace doesn't supply EXT_SPS_*_RPS), `run.ext_sps_st_rps` retains **stack garbage** — the leftover bytes from a previous stack frame at the same offset. Empirically on the CoolPi CM5 GenBook this garbage is deterministically `0x51a0` per call, because the calling pattern is fixed. + +Downstream in `rkvdec_hevc_prepare_hw_st_rps` (rkvdec-hevc-common.c:380): + +```c +if (!run->ext_sps_st_rps) + return; /* ← garbage 0x51a0 is non-NULL */ +if (!memcmp(cache, run->ext_sps_st_rps, sizeof(struct v4l2_ctrl_hevc_ext_sps_st_rps))) + return; /* ← memcmp derefs 0x51a0 → OOPS */ +``` + +The NULL-check is bypassed by stack garbage; `memcmp` faults at the bogus address. + +This is reachable on every HEVC decode that hits the assemble path on `vdpu381_variant` (RK3588) and `vdpu383_variant` (RK3576) — i.e., every stream where the dispatcher gate in `rkvdec-vdpu{381,383}-hevc.c:606` enters the else branch. + +## Minimal kernel patch + +Either of these one-line patches fixes the OOPS without changing any behavior for valid use cases: + +**Option A — fix at the producer (preamble):** +```diff +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc-common.c +@@ -495,6 +495,9 @@ void rkvdec_hevc_run_preamble(struct rkvdec_ctx *ctx, + ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, + V4L2_CID_STATELESS_HEVC_SCALING_MATRIX); + run->scaling_matrix = ctrl ? ctrl->p_cur.p : NULL; ++ ++ run->ext_sps_st_rps = NULL; ++ run->ext_sps_lt_rps = NULL; + + if (ctx->has_sps_st_rps) { + ctrl = v4l2_ctrl_find(&ctx->ctrl_hdl, +``` + +**Option B — fix at the callers (zero-init the run struct):** +```diff +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c +@@ -588,7 +588,7 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx) + { + struct rkvdec_dev *rkvdec = ctx->dev; +- struct rkvdec_hevc_run run; ++ struct rkvdec_hevc_run run = {}; + struct rkvdec_hevc_ctx *hevc_ctx = ctx->priv; + struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu; +``` +(same for `rkvdec-vdpu383-hevc.c`) + +Option A is more localized to the *cause* (the preamble's incomplete initialization). Option B is more defensive (catches any future fields added without explicit init). Either works. + +## Why iter2's userspace fix matters separately + +Independently of the kernel patch, the iter2 work (vendored GStreamer 1.28.2 parser + UAPI shim + h265 cache population) IS the upstream-aligned path for HEVC streams where the SPS parameter set semantics require the new ST/LT RPS controls. When ffmpeg-vaapi forwards a buffer that DOES contain the SPS (e.g. some bitstream wrappers, or future ffmpeg versions that don't strip VPS/SPS/PPS), our parser activates and submits the EXT_SPS_*_RPS controls — exactly the upstream contract from the Casanova series. + +For the current ffmpeg-vaapi (which strips SPS), our backend correctly returns ENODATA from the parser and falls back to submitting only the standard 5 controls. With the kernel patch above, this fallback succeeds without OOPS. + +(There remains a separate question: for bbb_60s_720p.hevc, the SPS encoded by libx265 may or may not actually require EXT_SPS_*_RPS — typical streams use simple ST_RPS that the kernel can derive from slice headers alone, which is what the current rkvdec dispatcher path expects. The user-pref direction is "do it correctly aligned with upstream architecture", so the parser+cache+inject path stays.) + +## Phase 6 question completion + +| Q | Answer | +|---|--------| +| Q1 — what is `0x51a0` | Stack garbage at `run.ext_sps_st_rps` offset from prior frame. Deterministic per calling pattern. | +| Q2 — where is `ctrl->p_cur.p = 0x51a0` set | N/A — not a ctrl pointer corruption. The `0x51a0` is local stack memory, not from the V4L2 ctrl handler. | +| Q3 — when `ctx->has_sps_st_rps` true | FALSE in our case. The OOPS triggers via the missing-init-of-run, not via the has_sps_st_rps gate. | +| Q4 — backend log fires? | YES, after fflush patch + picture.c entry-point instrumentation. Confirmed full path BeginPicture → RenderPicture×2 → EndPicture → h265_set_controls → populate_ext_sps_rps_cache → ENODATA → fallback to 5-ctrl batch → EINVAL → MEDIA_REQUEST_IOC_QUEUE → kernel OOPS. | +| Q5 — vdpu381 vs vdpu383 on RK3588 | RK3588 → `vdpu381_variant`. Confirmed in `drivers/media/platform/rockchip/rkvdec/rkvdec.c:1757` (`"rockchip,rk3588-vdec"` → `&vdpu381_variant`). vdpu383 binds only to RK3576. | + +## Open follow-ups + +- **kernel-agent issue (new)**: file the 1-line patch above as an upstream-aligned fix to the Casanova v7.0 series. Tag for: applies to vdpu381 + vdpu383 both; reproducer is "any HEVC stream where ffmpeg-vaapi strips SPS" (i.e., everything). +- **iter4 plan**: with kernel patched, re-run HEVC and confirm decode succeeds. Then iterate on the standard-5-controls EINVAL (separate issue — likely a slice_params field shape or DECODE_PARAMS flag the iter2 backend isn't setting correctly). +- **Optional userspace robustness**: after kernel patch, the EXT_SPS_*_RPS path stays dormant on standard streams; consider deferring h265_populate to *only when SPS is detected in source_data* (cheap optimization). +- **Diagnostic instrumentation removal**: iter3-Q4 logs in `picture.c` + `h265.c` + `utils.c::fflush` should be stripped once iter4 close confirms the kernel patch works. fflush in utils.c can stay (cheap, useful). + +## Substrate state at close + +- Backend `.so`: md5 `404041ea2dcc03c769e0ab8c43ddadd6` on ampere `/usr/lib/dri/` +- Backend source: ampere `~/src/libva-v4l2-request-fourier` working tree has uncommitted iter3 diagnostics (picture.c + utils.c + h265.c) +- Kernel: unchanged from iter2 substrate (waiting on kernel-agent issue + patch land) + +Iter3 closes with **root cause identified and a 1-line fix proposed**, going from negative-result (iter2 close) to definitive-finding in one diagnostic iteration. The 8-phase loop's Phase 5 (review) recommendation in iter2 to add backend logging was the unblocking move.