From b02baffca70e2d1c0f96403f545190c4c15a73b0 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 16:33:08 +0000 Subject: [PATCH] iter6 post-mortem Phase 4 v2: per-codec 0007 + lockdep base kernel MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Amendments per Sonnet architect review round 1: - A1: 0007 v2 rewritten — 7 per-codec run() insertion points, matches hantro pattern (after preamble metadata copy, before HW kick). Old v1 (in rkvdec_device_run) REJECTED — wrong structural placement. - A2: panthor ww_mutex/dma_resv contention added as primary hypothesis H1. Smoke test 1.9/2.x extended to exercise GPU compositor path. - A3: CONFIG_LOCALVERSION=-lockdep so lockdep kernel uname differs from vanilla — prevents modules_install overwriting working tree. - A4: pstore/serial gate is now HARD (one-of required); pre-flight aborts if neither serial nor ramoops is functional. - A5: PROVE_RCU removed from initial config — boot latency risk pushes past watchdog before lockdep prints. Add back only if first run clean. 0007-v2 patch attached: 8 hunks across rkvdec-{h264,hevc,vdpu381-h264, vdpu381-hevc,vdpu383-h264,vdpu383-hevc,vp9}.c + rkvdec.c queue_init flag. 25 lines insertions. Pending Phase 5 round 2 delta-review of v2 source + amended plan before any execution. Co-Authored-By: Claude Opus 4.7 --- 0007-v2-rkvdec-opt-in.patch | 163 +++++++++++++++++++++++++++++ phase4_plan_iter6_postmortem_v2.md | 91 ++++++++++++++++ 2 files changed, 254 insertions(+) create mode 100644 0007-v2-rkvdec-opt-in.patch create mode 100644 phase4_plan_iter6_postmortem_v2.md diff --git a/0007-v2-rkvdec-opt-in.patch b/0007-v2-rkvdec-opt-in.patch new file mode 100644 index 0000000..4571194 --- /dev/null +++ b/0007-v2-rkvdec-opt-in.patch @@ -0,0 +1,163 @@ +From PENDING-COMMIT-SHA Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Sat May 16 2026 +Subject: [PATCH RFC v2 0007] media: rkvdec: attach dma_resv release fence at per-codec run() + +Opt the rkvdec CAPTURE queue into vb2 release-fence publishing, +following the validated hantro (0005) and rockchip-rga (0006) patterns. + +The fence-attach is placed inside each per-codec `_run()` function +AFTER `_run_preamble()` returns and BEFORE the HW kick writel(). +This matches hantro's pattern of attaching after `v4l2_m2m_buf_copy_metadata()` +(which `rkvdec_run_preamble()` calls internally). + +Attach sites: + - drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c::rkvdec_h264_run() (legacy RK3399) + - drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c::rkvdec_hevc_run() (legacy RK3399) + - drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c::rkvdec_h264_run() (RK3588) + - drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c::rkvdec_hevc_run() (RK3588) + - drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c::rkvdec_h264_run() (RK3576) + - drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c::rkvdec_hevc_run() (RK3576) + - drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c::rkvdec_vp9_run() (RK3399 + variants) + +The vp9 path requires special handling: rkvdec_vp9_run_preamble() returns +int and can fail (returns ret with cleanup via rkvdec_run_postamble). Fence +attach must be AFTER the error check. + +Queue-init flag opts the dst_vq into vb2_buffer_attach_release_fence's +no-op-unless-opted-in gate. No-op without CONFIG_VIDEOBUF2_RELEASE_FENCES=y. + +Changes vs v1 0007 (which placed fence-attach in rkvdec_device_run before +desc->ops->run): moved INTO desc->ops->run after preamble, matching hantro's +prior-art structurally. Independent Phase 5 architect review identified the +v1 placement as architecturally inconsistent with the validated pattern. + +Validated locally on RK3588 vdpu381 with PROVE_LOCKING enabled before merge. + +Cc: linux-media@vger.kernel.org +Signed-off-by: Markus Fritsche +--- + drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c | 3 +++ + drivers/media/platform/rockchip/rkvdec/rkvdec.c | 4 ++++ + 8 files changed, 25 insertions(+) + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-h264.c +@@ -412,6 +412,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx) + struct rkvdec_h264_priv_tbl *tbl = h264_ctx->priv_tbl.cpu; + + rkvdec_h264_run_preamble(ctx, &run); ++ ++ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + /* Build the P/B{0,1} ref lists. */ + v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params, + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-hevc.c +@@ -561,6 +561,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx) + u32 reg; + + rkvdec_hevc_run_preamble(ctx, &run); ++ ++ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list, + &hevc_ctx->scaling_matrix_cache); + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-h264.c +@@ -420,6 +420,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx) + struct rkvdec_h264_run run; + + rkvdec_h264_run_preamble(ctx, &run); ++ ++ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + /* Build the P/B{0,1} ref lists. */ + v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params, + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu381-hevc.c +@@ -589,6 +589,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx) + struct rkvdec_hevc_priv_tbl *tbl = hevc_ctx->priv_tbl.cpu; + + rkvdec_hevc_run_preamble(ctx, &run); ++ ++ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list, + &hevc_ctx->scaling_matrix_cache); + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-h264.c +@@ -485,6 +485,9 @@ static int rkvdec_h264_run(struct rkvdec_ctx *ctx) + u32 timeout_threshold; + + rkvdec_h264_run_preamble(ctx, &run); ++ ++ /* iter6 v2: attach release fence after preamble metadata copy, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + /* Build the P/B{0,1} ref lists. */ + v4l2_h264_init_reflist_builder(&reflist_builder, run.decode_params, + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vdpu383-hevc.c +@@ -605,6 +605,9 @@ static int rkvdec_hevc_run(struct rkvdec_ctx *ctx) + return -EINVAL; + } + ++ /* iter6 v2: attach release fence after preamble metadata copy + RPS validation, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); ++ + rkvdec_hevc_assemble_hw_scaling_list(ctx, &run, &tbl->scaling_list, + &hevc_ctx->scaling_matrix_cache); + assemble_hw_pps(ctx, &run); + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec-vp9.c +@@ -773,6 +773,9 @@ static int rkvdec_vp9_run(struct rkvdec_ctx *ctx) + rkvdec_run_postamble(ctx, &run.base); + return ret; + } ++ ++ /* iter6 v2: attach release fence after preamble succeeded, before HW kick */ ++ (void)vb2_buffer_attach_release_fence(&run.base.bufs.dst->vb2_buf); + + /* Prepare probs. */ + init_probs(ctx, &run); + +diff --git a/drivers/media/platform/rockchip/rkvdec/rkvdec.c b/drivers/media/platform/rockchip/rkvdec/rkvdec.c +--- a/drivers/media/platform/rockchip/rkvdec/rkvdec.c ++++ b/drivers/media/platform/rockchip/rkvdec/rkvdec.c +@@ -1241,6 +1241,10 @@ static int rkvdec_queue_init(void *priv, + dst_vq->lock = &rkvdec->vdev_lock; + dst_vq->dev = rkvdec->v4l2_dev.dev; + ++ /* iter6: opt CAPTURE queue into vb2 release-fence publishing. ++ * No-op unless CONFIG_VIDEOBUF2_RELEASE_FENCES=y. */ ++ dst_vq->supports_release_fences = true; ++ + return vb2_queue_init(dst_vq); + } + +-- +2.53.0 diff --git a/phase4_plan_iter6_postmortem_v2.md b/phase4_plan_iter6_postmortem_v2.md new file mode 100644 index 0000000..96b44d8 --- /dev/null +++ b/phase4_plan_iter6_postmortem_v2.md @@ -0,0 +1,91 @@ +# Phase 4 — iter6 post-mortem retry plan (v2, amended per Phase 5 architect review) + +Plan ID: iter6-postmortem-attempt2 +Author: Claude Opus 4.7 +Reviewer: Sonnet (architect, Phase 5 round 1, returned 5 amendments + 0007 REJECT) +Status: v2 pending Phase 5 round 2 (delta-review) before any execution +Cross-ref: phase0_findings_iter6_postmortem.md (commit 11d2dde), phase4_plan_iter6_postmortem.md (commit 0ef6440 — superseded) + +## Goal + +Same as v1. Apply 0004 / 0005 / 0006 / 0007-v2 to ampere kernel WITHOUT silent watchdog reset. Either succeed, or fail with a self-diagnostic lockdep splat that survives to logs. + +## Amendments vs v1 (architect findings) + +| # | Finding | Resolution | +|---|---------|-----------| +| A1 | 0007 v1 placement (in `rkvdec_device_run`) inconsistent with hantro's pattern + metadata-copy ordering wrong | **0007 v2 written**: 7 per-codec `*_run()` insertion points (h264, hevc, vdpu381-h264, vdpu381-hevc, vdpu383-h264, vdpu383-hevc, vp9), each AFTER preamble + BEFORE HW kick. `supports_release_fences = true` stays in queue_init. See `0007-v2-rkvdec-opt-in.patch` (8 hunks, 25 +lines) | +| A2 | Missing risk: panthor `ww_acquire_ctx` contention on same resv as vb2 `dma_resv_lock(NULL)` under `dma_fence_begin_signalling` — primary RK3588 wedge hypothesis | Added to risk register as **Primary Hypothesis H1**. Step 2.1's smoke test extended to include GPU compositor exercise (open kwin, mpv playback with hwdec=vaapi, drm-vblank-trigger pattern), not just headless ffmpeg | +| A3 | Step 1.5 `modules_install` overwrites working modules on `uname -r` collision | Added `CONFIG_LOCALVERSION="-lockdep"` to step 1.2 config block → distinct release suffix `7.0.0-rc3-devices-lockdep+` → separate `/lib/modules/.../`. Original modules untouched | +| A4 | Step 0.4 "consider adding ramoops" too soft — silent watchdog reset can't be diagnosed without pre-reset capture path | Step 0.4 reworked as **HARD GATE**. Pre-flight aborts if neither serial console (TTL-USB cable confirmed connected and capturing) NOR ramoops region (verified writable + producing test entry) is functional. User decision input required before pre-flight starts | +| A5 | `CONFIG_PROVE_RCU` slows boot + may push past watchdog before lockdep prints | Removed from step 1.2 config set. Kept: PROVE_LOCKING, DEBUG_ATOMIC_SLEEP, LOCKDEP, DEBUG_RT_MUTEXES, DEBUG_SPINLOCK, DEBUG_MUTEXES, DEBUG_LOCK_ALLOC, PROVE_RAW_LOCK_NESTING, DEBUG_WW_MUTEX_SLOWPATH. PROVE_RCU added only if first lockdep run passes clean | + +## Pre-flight (REWRITTEN, hard gates marked) + +| Step | Action | Verify | Gate | +|------|--------|--------|------| +| 0.1 | SDDM auto-login disabled | Done — `/etc/sddm.conf.d/autologin.conf.disabled-iter6postmortem` | ✓ | +| 0.2 | Backup `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/{common/videobuf2,platform/verisilicon,platform/rockchip/{rga,rkvdec}}/*.ko` as `attempt2-pre-base-.bkp` AND scp tarball to `boltzmann:/home/mfritsche/iter6-postmortem-backups/` | `ls` shows .bkp + scp returned 0 | **HARD GATE** — abort if backup write fails | +| 0.3 | Backup `/boot/firmware/Image-7.0.0-rc3-devices+` and `initramfs-7.0.0-rc3-devices+` as `*.pre-attempt2.bkp` | `ls` | **HARD GATE** | +| 0.4a | Check pstore: `ls -la /sys/fs/pstore/` as root, `dmesg \| grep -i pstore`, look for `ramoops` reserved region in `/proc/iomem` or DT | pstore writable + ramoops region present | one-of (with 0.4b) | +| 0.4b | Check serial console: confirm user has TTL-USB cable connected to ampere's ttyS2 UART; run `screen /dev/ttyUSB0 1500000` (or equivalent) on a host that has it; ampere's extlinux already has `console=ttyS2,1500000` | user types confirmation after seeing serial output | one-of (with 0.4a) | +| 0.4-G | **HARD GATE**: if BOTH 0.4a and 0.4b fail (no pstore AND no serial), ABORT and ask user to obtain serial cable OR investigate ramoops DT addition before retry | gate enforced | mandatory | +| 0.5 | Verify ampere `~/src/linux-rockchip` working tree state: iter3+iter4+diag patches present, iter6 v1 reverted (per recovery), `git status` shows expected files modified | git status output matches | informational | + +## Build the lockdep debug base kernel (~45 min one-time) + +| Step | Action | +|------|--------| +| 1.1 | `cp .config .config.pre-iter6postmortem` | +| 1.2 | `./scripts/config --enable PROVE_LOCKING --enable DEBUG_ATOMIC_SLEEP --enable LOCKDEP --enable DEBUG_RT_MUTEXES --enable DEBUG_SPINLOCK --enable DEBUG_MUTEXES --enable DEBUG_LOCK_ALLOC --enable PROVE_RAW_LOCK_NESTING --enable DEBUG_WW_MUTEX_SLOWPATH` (NO PROVE_RCU per A5). Set `CONFIG_LOCALVERSION="-lockdep"` (A3). `make olddefconfig` | +| 1.3 | `time make -j8 Image modules dtbs` (~45 min) | +| 1.4 | `make modules_install INSTALL_MOD_PATH=/lib/modules/7.0.0-rc3-devices-lockdep+/...` — actually use kernel's default `make modules_install` which respects LOCALVERSION → installs to `/lib/modules/7.0.0-rc3-devices-lockdep+/` separate from working tree. Verify destination path before sudo | +| 1.5 | `sudo cp arch/arm64/boot/Image /boot/firmware/Image-7.0.0-rc3-devices-lockdep+`. Generate initramfs for new release: `sudo mkinitcpio -k 7.0.0-rc3-devices-lockdep+ -g /boot/firmware/initramfs-7.0.0-rc3-devices-lockdep+` | +| 1.6 | Backup extlinux.conf, then `sudo` edit: ADD new label `arch_devices_lockdep` pointing at the new Image + initrd, leave `arch_devices` as the default. So system boots vanilla by default; user picks lockdep at U-Boot menu | +| 1.7 | Reboot. At U-Boot menu, manually select `arch_devices_lockdep`. Verify `uname -r` = `7.0.0-rc3-devices-lockdep+`. Verify journal has `Lockdep is enabled` | +| 1.8 | Smoke test: ffmpeg HEVC decode (iter5 baseline), check `journalctl -k -p warning -b 0` for any new lockdep splats produced by iter3+4 alone. Expectation: clean (or only pre-existing edp/vblank WARNs) | + +**1.9 — A2 GPU smoke test**: open SDDM login → log in to plasma wayland → open mpv with `--hwdec=vaapi-copy ~/measurements/encoded/bbb_60s_720p.hevc.mp4`. Watch ~30 seconds. Check journal again. Reason: A2 hypothesis is panthor + kwin + V4L2 dmabuf contention, which only surfaces under active GPU composition. If iter3+4 alone (no fence helper yet) emits a lockdep splat with GPU active, the bug is even more upstream than iter6 patches — STOP and investigate. + +## Bisect-apply iter6 patches on lockdep base + +Order: 0004 → 0005 → 0006 → 0007-v2. Between each: reboot into lockdep kernel → smoke test → if regression, REVERT only that step, capture splat, document, STOP. + +| Step | Patch | Build effort | Smoke test | +|------|-------|--------------|------------| +| 2.1 | 0004 (vb2 core helper + Kconfig) | full kernel rebuild (~45 min — Kconfig + videobuf2-core.c) | Step 1.8 + Step 1.9 | +| 2.2 | 0005 (hantro opt-in) | hantro_vpu module (~3 min) | Step 1.8 + Step 1.9 | +| 2.3 | 0006 (rockchip-rga opt-in) | rockchip-rga module (~3 min) | Step 1.8 + Step 1.9 | +| 2.4 | 0007-v2 (rkvdec per-codec opt-in) | rockchip-vdec module (~3 min) | Step 1.8 + Step 1.9 | + +## Post-bisect + +- All 4 pass on the lockdep kernel: try the same set on the vanilla kernel (rebuild without lockdep). If vanilla also passes, iter6 succeeds; move to iter7 cache-coherency hypothesis test. +- Specific step regresses: capture lockdep splat via serial or pstore, attach to a new kernel-agent issue (#16), revert that step from lockdep kernel, document. If the failing step is 2.1 (0004 core helper), iter6 line is closed — switch iter7 to a non-fence cache-coherency approach. + +## Risk register (v2 — added H1) + +| # | Risk | Mitigation | +|---|------|-----------| +| **H1** | **PRIMARY**: panthor `ww_acquire_ctx` on same resv as vb2's `dma_resv_lock(NULL)` published inside `dma_fence_begin_signalling()` — cross-class lockdep violation invisible without PROVE_LOCKING on RK3588 + panthor. RK3566 + panfrost validation didn't surface this. | LOCKDEP base kernel (step 1.x) is specifically designed to catch this. Step 1.9 + 2.x smoke tests exercise GPU compositor path | +| R2 | Lockdep-debug kernel doesn't boot at all | Step 1.6 keeps vanilla as default; manual menu selection for lockdep; recovery stick handles total-loss | +| R3 | Lockdep-debug kernel boots but `uname -r` collides with vanilla → modules_install overwrites | `CONFIG_LOCALVERSION="-lockdep"` (A3) makes the suffix distinct; verify before sudo cp | +| R4 | Watchdog fires before lockdep splat reaches journal | Step 0.4 HARD GATE: serial OR ramoops for pre-reset capture path | +| R5 | 0007-v2 has a different bug than v1 (e.g. `run.base.bufs.dst` access pattern wrong for some codec) | Phase 5 round 2 reviews v2 source. If reviewer rejects, write v3 informed by feedback | +| R6 | User wants to use ampere during 45-min kernel build → reboot conflict | Per `feedback_no_bulldoze_reboots.md` — ASK before kicking off | + +## Estimated wall-time + +| Phase | Time | +|-------|------| +| Pre-flight (0.1-0.5) + serial confirmation | 15-30 min (depends on serial cable hookup) | +| Lockdep base kernel build (1.1-1.9) | 60 min | +| 2.1 (0004 + Kconfig + full kernel rebuild) | 50 min | +| 2.2 (0005 hantro module) | 10 min | +| 2.3 (0006 rga module) | 10 min | +| 2.4 (0007-v2 rkvdec module) | 10 min | +| Total | ~3-3.5h wall-time clean-path | + +## Exit conditions + +Same as v1. Success / Partial / Failure trees unchanged.