iter6 post-mortem Phase 4 v2: per-codec 0007 + lockdep base kernel

Amendments per Sonnet architect review round 1:
- A1: 0007 v2 rewritten — 7 per-codec run() insertion points,
  matches hantro pattern (after preamble metadata copy, before HW kick).
  Old v1 (in rkvdec_device_run) REJECTED — wrong structural placement.
- A2: panthor ww_mutex/dma_resv contention added as primary hypothesis H1.
  Smoke test 1.9/2.x extended to exercise GPU compositor path.
- A3: CONFIG_LOCALVERSION=-lockdep so lockdep kernel uname differs from
  vanilla — prevents modules_install overwriting working tree.
- A4: pstore/serial gate is now HARD (one-of required); pre-flight aborts
  if neither serial nor ramoops is functional.
- A5: PROVE_RCU removed from initial config — boot latency risk pushes
  past watchdog before lockdep prints. Add back only if first run clean.

0007-v2 patch attached: 8 hunks across rkvdec-{h264,hevc,vdpu381-h264,
vdpu381-hevc,vdpu383-h264,vdpu383-hevc,vp9}.c + rkvdec.c queue_init flag.
25 lines insertions.

Pending Phase 5 round 2 delta-review of v2 source + amended plan before
any execution.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
Markus Fritsche
2026-05-16 16:33:08 +00:00
parent 0ef64406b6
commit b02baffca7
2 changed files with 254 additions and 0 deletions
+163
View File
@@ -0,0 +1,163 @@
From PENDING-COMMIT-SHA Mon Sep 17 00:00:00 2001
From: Markus Fritsche <mfritsche@reauktion.de>
Date: Sat May 16 2026
Subject: [PATCH RFC v2 0007] media: rkvdec: attach dma_resv release fence at per-codec run()
Opt the rkvdec CAPTURE queue into vb2 release-fence publishing,
following the validated hantro (0005) and rockchip-rga (0006) patterns.
The fence-attach is placed inside each per-codec `<codec>_run()` function
AFTER `<codec>_run_preamble()` returns and BEFORE the HW kick writel().
This matches hantro's pattern of attaching after `v4l2_m2m_buf_copy_metadata()`
(which `rkvdec_run_preamble()` calls internally).
Attach sites:
- drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c::rkvdec_h264_run() (legacy RK3399)
- drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c::rkvdec_hevc_run() (legacy RK3399)
- drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c::rkvdec_h264_run() (RK3588)
- drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c::rkvdec_hevc_run() (RK3588)
- drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c::rkvdec_h264_run() (RK3576)
- drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c::rkvdec_hevc_run() (RK3576)
- drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run() (RK3399 + variants)
The vp9 path requires special handling: rkvdec_vp9_run_preamble() returns
int and can fail (returns ret with cleanup via rkvdec_run_postamble). Fence
attach must be AFTER the error check.
Queue-init flag opts the dst_vq into vb2_buffer_attach_release_fence's
no-op-unless-opted-in gate. No-op without CONFIG_VIDEOBUF2_RELEASE_FENCES=y.
Changes vs v1 0007 (which placed fence-attach in rkvdec_device_run before
desc->ops->run): moved INTO desc->ops->run after preamble, matching hantro's
prior-art structurally. Independent Phase 5 architect review identified the
v1 placement as architecturally inconsistent with the validated pattern.
Validated locally on RK3588 vdpu381 with PROVE_LOCKING enabled before merge.
Cc: linux-media@vger.kernel.org
Signed-off-by: Markus Fritsche <mfritsche@reauktion.de>
---
drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c | 3 +++
drivers/media/platform/rockchip/rkvdec/rkvdec.c | 4 ++++
8 files changed, 25 insertions(+)
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c
@@ -412,6 +412,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
struct rkvdec_h264_priv_tbl *tbl = h264_ctx->priv_tbl.cpu;
rkvdec_h264_run_preamble(ctx, &run);
+
+ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
/* Build the P/B{0,1} ref lists. */
v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params,
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c
@@ -561,6 +561,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
u32 reg;
rkvdec_hevc_run_preamble(ctx, &run);
+
+ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list,
&hevc_ctx->scaling_matrix_cache);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c
@@ -420,6 +420,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
struct rkvdec_h264_run run;
rkvdec_h264_run_preamble(ctx, &run);
+
+ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
/* Build the P/B{0,1} ref lists. */
v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params,
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c
@@ -589,6 +589,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu;
rkvdec_hevc_run_preamble(ctx, &run);
+
+ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list,
&hevc_ctx->scaling_matrix_cache);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c
@@ -485,6 +485,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx)
u32 timeout_threshold;
rkvdec_h264_run_preamble(ctx, &run);
+
+ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
/* Build the P/B{0,1} ref lists. */
v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params,
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c
@@ -605,6 +605,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx)
return -EINVAL;
}
+ /* iter6 v2: attach release fence after preamble metadata copy + RPS validation, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
+
rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list,
&hevc_ctx->scaling_matrix_cache);
assemble_hw_pps(ctx, &run);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c
@@ -773,6 +773,9 @@ static int rkvdec_vp9_run(struct rkvdec_ctx *ctx)
rkvdec_run_postamble(ctx, &run.base);
return ret;
}
+
+ /* iter6 v2: attach release fence after preamble succeeded, before HW kick */
+ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf);
/* Prepare probs. */
init_probs(ctx, &run);
diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c
+++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c
@@ -1241,6 +1241,10 @@ static int rkvdec_queue_init(void *priv,
dst_vq->lock = &rkvdec->vdev_lock;
dst_vq->dev = rkvdec->v4l2_dev.dev;
+ /* iter6: opt CAPTURE queue into vb2 release-fence publishing.
+ * No-op unless CONFIG_VIDEOBUF2_RELEASE_FENCES=y. */
+ dst_vq->supports_release_fences = true;
+
return vb2_queue_init(dst_vq);
}
--
2.53.0
+91
View File
@@ -0,0 +1,91 @@
# Phase 4 — iter6 post-mortem retry plan (v2, amended per Phase 5 architect review)
Plan ID: iter6-postmortem-attempt2
Author: Claude Opus 4.7
Reviewer: Sonnet (architect, Phase 5 round 1, returned 5 amendments + 0007 REJECT)
Status: v2 pending Phase 5 round 2 (delta-review) before any execution
Cross-ref: phase0_findings_iter6_postmortem.md (commit 11d2dde), phase4_plan_iter6_postmortem.md (commit 0ef6440 — superseded)
## Goal
Same as v1. Apply 0004 / 0005 / 0006 / 0007-v2 to ampere kernel WITHOUT silent watchdog reset. Either succeed, or fail with a self-diagnostic lockdep splat that survives to logs.
## Amendments vs v1 (architect findings)
| # | Finding | Resolution |
|---|---------|-----------|
| A1 | 0007 v1 placement (in `rkvdec_device_run`) inconsistent with hantro's pattern + metadata-copy ordering wrong | **0007 v2 written**: 7 per-codec `*_run()` insertion points (h264, hevc, vdpu381-h264, vdpu381-hevc, vdpu383-h264, vdpu383-hevc, vp9), each AFTER preamble + BEFORE HW kick. `supports_release_fences = true` stays in queue_init. See `0007-v2-rkvdec-opt-in.patch` (8 hunks, 25 +lines) |
| A2 | Missing risk: panthor `ww_acquire_ctx` contention on same resv as vb2 `dma_resv_lock(NULL)` under `dma_fence_begin_signalling` — primary RK3588 wedge hypothesis | Added to risk register as **Primary Hypothesis H1**. Step 2.1's smoke test extended to include GPU compositor exercise (open kwin, mpv playback with hwdec=vaapi, drm-vblank-trigger pattern), not just headless ffmpeg |
| A3 | Step 1.5 `modules_install` overwrites working modules on `uname -r` collision | Added `CONFIG_LOCALVERSION="-lockdep"` to step 1.2 config block → distinct release suffix `7.0.0-rc3-devices-lockdep+` → separate `/lib/modules/.../`. Original modules untouched |
| A4 | Step 0.4 "consider adding ramoops" too soft — silent watchdog reset can't be diagnosed without pre-reset capture path | Step 0.4 reworked as **HARD GATE**. Pre-flight aborts if neither serial console (TTL-USB cable confirmed connected and capturing) NOR ramoops region (verified writable + producing test entry) is functional. User decision input required before pre-flight starts |
| A5 | `CONFIG_PROVE_RCU` slows boot + may push past watchdog before lockdep prints | Removed from step 1.2 config set. Kept: PROVE_LOCKING, DEBUG_ATOMIC_SLEEP, LOCKDEP, DEBUG_RT_MUTEXES, DEBUG_SPINLOCK, DEBUG_MUTEXES, DEBUG_LOCK_ALLOC, PROVE_RAW_LOCK_NESTING, DEBUG_WW_MUTEX_SLOWPATH. PROVE_RCU added only if first lockdep run passes clean |
## Pre-flight (REWRITTEN, hard gates marked)
| Step | Action | Verify | Gate |
|------|--------|--------|------|
| 0.1 | SDDM auto-login disabled | Done — `/etc/sddm.conf.d/autologin.conf.disabled-iter6postmortem` | ✓ |
| 0.2 | Backup `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/{common/videobuf2,platform/verisilicon,platform/rockchip/{rga,rkvdec}}/*.ko` as `attempt2-pre-base-<ts>.bkp` AND scp tarball to `boltzmann:/home/mfritsche/iter6-postmortem-backups/` | `ls` shows .bkp + scp returned 0 | **HARD GATE** — abort if backup write fails |
| 0.3 | Backup `/boot/firmware/Image-7.0.0-rc3-devices+` and `initramfs-7.0.0-rc3-devices+` as `*.pre-attempt2.bkp` | `ls` | **HARD GATE** |
| 0.4a | Check pstore: `ls -la /sys/fs/pstore/` as root, `dmesg \| grep -i pstore`, look for `ramoops` reserved region in `/proc/iomem` or DT | pstore writable + ramoops region present | one-of (with 0.4b) |
| 0.4b | Check serial console: confirm user has TTL-USB cable connected to ampere's ttyS2 UART; run `screen /dev/ttyUSB0 1500000` (or equivalent) on a host that has it; ampere's extlinux already has `console=ttyS2,1500000` | user types confirmation after seeing serial output | one-of (with 0.4a) |
| 0.4-G | **HARD GATE**: if BOTH 0.4a and 0.4b fail (no pstore AND no serial), ABORT and ask user to obtain serial cable OR investigate ramoops DT addition before retry | gate enforced | mandatory |
| 0.5 | Verify ampere `~/src/linux-rockchip` working tree state: iter3+iter4+diag patches present, iter6 v1 reverted (per recovery), `git status` shows expected files modified | git status output matches | informational |
## Build the lockdep debug base kernel (~45 min one-time)
| Step | Action |
|------|--------|
| 1.1 | `cp .config .config.pre-iter6postmortem` |
| 1.2 | `./scripts/config --enable PROVE_LOCKING --enable DEBUG_ATOMIC_SLEEP --enable LOCKDEP --enable DEBUG_RT_MUTEXES --enable DEBUG_SPINLOCK --enable DEBUG_MUTEXES --enable DEBUG_LOCK_ALLOC --enable PROVE_RAW_LOCK_NESTING --enable DEBUG_WW_MUTEX_SLOWPATH` (NO PROVE_RCU per A5). Set `CONFIG_LOCALVERSION="-lockdep"` (A3). `make olddefconfig` |
| 1.3 | `time make -j8 Image modules dtbs` (~45 min) |
| 1.4 | `make modules_install INSTALL_MOD_PATH=/lib/modules/7.0.0-rc3-devices-lockdep+/...` — actually use kernel's default `make modules_install` which respects LOCALVERSION → installs to `/lib/modules/7.0.0-rc3-devices-lockdep+/` separate from working tree. Verify destination path before sudo |
| 1.5 | `sudo cp arch/arm64/boot/Image /boot/firmware/Image-7.0.0-rc3-devices-lockdep+`. Generate initramfs for new release: `sudo mkinitcpio -k 7.0.0-rc3-devices-lockdep+ -g /boot/firmware/initramfs-7.0.0-rc3-devices-lockdep+` |
| 1.6 | Backup extlinux.conf, then `sudo` edit: ADD new label `arch_devices_lockdep` pointing at the new Image + initrd, leave `arch_devices` as the default. So system boots vanilla by default; user picks lockdep at U-Boot menu |
| 1.7 | Reboot. At U-Boot menu, manually select `arch_devices_lockdep`. Verify `uname -r` = `7.0.0-rc3-devices-lockdep+`. Verify journal has `Lockdep is enabled` |
| 1.8 | Smoke test: ffmpeg HEVC decode (iter5 baseline), check `journalctl -k -p warning -b 0` for any new lockdep splats produced by iter3+4 alone. Expectation: clean (or only pre-existing edp/vblank WARNs) |
**1.9 — A2 GPU smoke test**: open SDDM login → log in to plasma wayland → open mpv with `--hwdec=vaapi-copy ~/measurements/encoded/bbb_60s_720p.hevc.mp4`. Watch ~30 seconds. Check journal again. Reason: A2 hypothesis is panthor + kwin + V4L2 dmabuf contention, which only surfaces under active GPU composition. If iter3+4 alone (no fence helper yet) emits a lockdep splat with GPU active, the bug is even more upstream than iter6 patches — STOP and investigate.
## Bisect-apply iter6 patches on lockdep base
Order: 0004 → 0005 → 0006 → 0007-v2. Between each: reboot into lockdep kernel → smoke test → if regression, REVERT only that step, capture splat, document, STOP.
| Step | Patch | Build effort | Smoke test |
|------|-------|--------------|------------|
| 2.1 | 0004 (vb2 core helper + Kconfig) | full kernel rebuild (~45 min — Kconfig + videobuf2-core.c) | Step 1.8 + Step 1.9 |
| 2.2 | 0005 (hantro opt-in) | hantro_vpu module (~3 min) | Step 1.8 + Step 1.9 |
| 2.3 | 0006 (rockchip-rga opt-in) | rockchip-rga module (~3 min) | Step 1.8 + Step 1.9 |
| 2.4 | 0007-v2 (rkvdec per-codec opt-in) | rockchip-vdec module (~3 min) | Step 1.8 + Step 1.9 |
## Post-bisect
- All 4 pass on the lockdep kernel: try the same set on the vanilla kernel (rebuild without lockdep). If vanilla also passes, iter6 succeeds; move to iter7 cache-coherency hypothesis test.
- Specific step regresses: capture lockdep splat via serial or pstore, attach to a new kernel-agent issue (#16), revert that step from lockdep kernel, document. If the failing step is 2.1 (0004 core helper), iter6 line is closed — switch iter7 to a non-fence cache-coherency approach.
## Risk register (v2 — added H1)
| # | Risk | Mitigation |
|---|------|-----------|
| **H1** | **PRIMARY**: panthor `ww_acquire_ctx` on same resv as vb2's `dma_resv_lock(NULL)` published inside `dma_fence_begin_signalling()` — cross-class lockdep violation invisible without PROVE_LOCKING on RK3588 + panthor. RK3566 + panfrost validation didn't surface this. | LOCKDEP base kernel (step 1.x) is specifically designed to catch this. Step 1.9 + 2.x smoke tests exercise GPU compositor path |
| R2 | Lockdep-debug kernel doesn't boot at all | Step 1.6 keeps vanilla as default; manual menu selection for lockdep; recovery stick handles total-loss |
| R3 | Lockdep-debug kernel boots but `uname -r` collides with vanilla → modules_install overwrites | `CONFIG_LOCALVERSION="-lockdep"` (A3) makes the suffix distinct; verify before sudo cp |
| R4 | Watchdog fires before lockdep splat reaches journal | Step 0.4 HARD GATE: serial OR ramoops for pre-reset capture path |
| R5 | 0007-v2 has a different bug than v1 (e.g. `run.base.bufs.dst` access pattern wrong for some codec) | Phase 5 round 2 reviews v2 source. If reviewer rejects, write v3 informed by feedback |
| R6 | User wants to use ampere during 45-min kernel build → reboot conflict | Per `feedback_no_bulldoze_reboots.md` — ASK before kicking off |
## Estimated wall-time
| Phase | Time |
|-------|------|
| Pre-flight (0.1-0.5) + serial confirmation | 15-30 min (depends on serial cable hookup) |
| Lockdep base kernel build (1.1-1.9) | 60 min |
| 2.1 (0004 + Kconfig + full kernel rebuild) | 50 min |
| 2.2 (0005 hantro module) | 10 min |
| 2.3 (0006 rga module) | 10 min |
| 2.4 (0007-v2 rkvdec module) | 10 min |
| Total | ~3-3.5h wall-time clean-path |
## Exit conditions
Same as v1. Success / Partial / Failure trees unchanged.