iter6 plan v2: ramoops amendments (user chose ramoops over serial)
User picked ramoops path for the 0.4 hard gate. Current ampere kernel has CONFIG_PSTORE_RAM=m but lacks PSTORE_CONSOLE, so ramoops can only be made operational AFTER lockdep kernel rebuild. 4 amendments: - 0.4: restructured. 0.4a/b survey current state (informational only), 0.4c notes accepted limitation (hard spinlock+IRQ-off won't flush), 0.4-G hard gate moves to step 1.8a (after lockdep kernel boots) - 1.2: add --enable PSTORE_CONSOLE --enable PSTORE_PMSG - 1.6: extend lockdep extlinux append with ramoops carve-out cmdline (memmap=0x100000$0x10000000 ramoops.mem_address=0x10000000 ramoops.mem_size=0x100000 console_size=0x40000 dump_oops=1). DEFAULT override is mandatory per Q3 (ramoops-only operator). - 1.7/1.8: split into 1.7 (boot+module load), 1.8a (sysrq-trigger ramoops verify HARD GATE), 1.8b (regular smoke test) Documented limitation accepted by user: hard spinlock-with-IRQ-off deadlocks (the worst-case iter6 v1 wedge shape) may not flush to pstore before watchdog reset. Serial would catch those; ramoops may miss. Bisect-apply 0004→0005→0006→0007v2 should surface lockdep splats BEFORE the deadlock becomes a hard hang anyway. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
@@ -27,9 +27,10 @@ Same as v1. Apply 0004 / 0005 / 0006 / 0007-v2 to ampere kernel WITHOUT silent w
|
||||
| 0.1 | SDDM auto-login disabled | Done — `/etc/sddm.conf.d/autologin.conf.disabled-iter6postmortem` | ✓ |
|
||||
| 0.2 | Backup `/lib/modules/7.0.0-rc3-devices+/kernel/drivers/media/{common/videobuf2,platform/verisilicon,platform/rockchip/{rga,rkvdec}}/*.ko` as `attempt2-pre-base-<ts>.bkp` AND scp tarball to `boltzmann:/home/mfritsche/iter6-postmortem-backups/` | `ls` shows .bkp + scp returned 0 | **HARD GATE** — abort if backup write fails |
|
||||
| 0.3 | Backup `/boot/firmware/Image-7.0.0-rc3-devices+` and `initramfs-7.0.0-rc3-devices+` as `*.pre-attempt2.bkp` | `ls` | **HARD GATE** |
|
||||
| 0.4a | Check pstore: `ls -la /sys/fs/pstore/` as root, `dmesg \| grep -i pstore`, look for `ramoops` reserved region in `/proc/iomem` or DT | pstore writable + ramoops region present | one-of (with 0.4b) |
|
||||
| 0.4b | Check serial console: confirm user has TTL-USB cable connected to ampere's ttyS2 UART; run `screen /dev/ttyUSB0 1500000` (or equivalent) on a host that has it; ampere's extlinux already has `console=ttyS2,1500000` | user types confirmation after seeing serial output | one-of (with 0.4a) |
|
||||
| 0.4-G | **HARD GATE**: if BOTH 0.4a and 0.4b fail (no pstore AND no serial), ABORT and ask user to obtain serial cable OR investigate ramoops DT addition before retry | gate enforced | mandatory |
|
||||
| 0.4a | Probe current kernel: `CONFIG_PSTORE_RAM=m` confirmed on ampere; `CONFIG_PSTORE_CONSOLE` currently `n` (will be enabled in step 1.2). `/sys/fs/pstore/` exists + empty | survey only | informational |
|
||||
| 0.4b | Memory layout survey: `cat /proc/iomem`. Plan ramoops carve-out at `0x10000000` (1 MB inside the normal-RAM gap between `0x4500000` and `0xd7a00000`) | iomem confirms region is "System RAM" not pre-reserved | informational |
|
||||
| 0.4c | **NOTE**: User chose ramoops over serial, accepted documented limitation that hard spinlock-with-IRQ-off deadlocks may not flush to pstore before watchdog reset. Serial would capture those; ramoops may not | acknowledged | informational |
|
||||
| 0.4-G | **HARD GATE moved to step 1.8**: ramoops verification only possible after lockdep kernel boots with PSTORE_CONSOLE=y. Pre-flight does NOT abort here. Step 1.8 runs `echo c > /proc/sysrq-trigger` to force a crash, reboots, verifies `/sys/fs/pstore/` is non-empty; if empty, ABORT before bisect-apply 2.x and reconsider serial | step 1.8 enforces | mandatory at 1.8 |
|
||||
| 0.5 | Verify ampere `~/src/linux-rockchip` working tree state: iter3+iter4+diag patches present, iter6 v1 reverted (per recovery), `git status` shows expected files modified | git status output matches | informational |
|
||||
|
||||
## Build the lockdep debug base kernel (~45 min one-time)
|
||||
@@ -37,13 +38,14 @@ Same as v1. Apply 0004 / 0005 / 0006 / 0007-v2 to ampere kernel WITHOUT silent w
|
||||
| Step | Action |
|
||||
|------|--------|
|
||||
| 1.1 | `cp .config .config.pre-iter6postmortem` |
|
||||
| 1.2 | `./scripts/config --enable PROVE_LOCKING --enable DEBUG_ATOMIC_SLEEP --enable LOCKDEP --enable DEBUG_RT_MUTEXES --enable DEBUG_SPINLOCK --enable DEBUG_MUTEXES --enable DEBUG_LOCK_ALLOC --enable PROVE_RAW_LOCK_NESTING --enable DEBUG_WW_MUTEX_SLOWPATH` (NO PROVE_RCU per A5). Set `CONFIG_LOCALVERSION="-lockdep"` (A3). `make olddefconfig` |
|
||||
| 1.2 | `./scripts/config --enable PROVE_LOCKING --enable DEBUG_ATOMIC_SLEEP --enable LOCKDEP --enable DEBUG_RT_MUTEXES --enable DEBUG_SPINLOCK --enable DEBUG_MUTEXES --enable DEBUG_LOCK_ALLOC --enable PROVE_RAW_LOCK_NESTING --enable DEBUG_WW_MUTEX_SLOWPATH` (NO PROVE_RCU per A5). **Ramoops/pstore**: `./scripts/config --enable PSTORE_CONSOLE --enable PSTORE_PMSG --module PSTORE_RAM` (PSTORE_RAM stays =m, console=y captures kernel printk to the ramoops region). Set `CONFIG_LOCALVERSION="-lockdep"` (A3). `make olddefconfig` |
|
||||
| 1.3 | `time make -j8 Image modules dtbs` (~45 min) |
|
||||
| 1.4 | `make modules_install INSTALL_MOD_PATH=/lib/modules/7.0.0-rc3-devices-lockdep+/...` — actually use kernel's default `make modules_install` which respects LOCALVERSION → installs to `/lib/modules/7.0.0-rc3-devices-lockdep+/` separate from working tree. Verify destination path before sudo |
|
||||
| 1.5 | `sudo cp arch/arm64/boot/Image /boot/firmware/Image-7.0.0-rc3-devices-lockdep+`. Generate initramfs for new release: `sudo mkinitcpio -k 7.0.0-rc3-devices-lockdep+ -g /boot/firmware/initramfs-7.0.0-rc3-devices-lockdep+` |
|
||||
| 1.6 | Backup extlinux.conf, then `sudo` edit: ADD new label `arch_devices_lockdep` pointing at the new Image + initrd, leave `arch_devices` as the default. So system boots vanilla by default; user picks lockdep at U-Boot menu. **Remote-operator note (round 2 review)**: If ampere is accessed SSH-only and the serial console from 0.4b is the only OOB path (no physical keyboard / HDMI), temporarily set `DEFAULT arch_devices_lockdep` in extlinux.conf for this test boot. Restore `DEFAULT arch_devices` before any subsequent reboot where lockdep boot is not desired. If 0.4a (ramoops only, no serial), U-Boot menu selection is not possible — the `DEFAULT` override is MANDATORY |
|
||||
| 1.7 | Reboot. At U-Boot menu, manually select `arch_devices_lockdep`. Verify `uname -r` = `7.0.0-rc3-devices-lockdep+`. Verify journal has `Lockdep is enabled` |
|
||||
| 1.8 | Smoke test: ffmpeg HEVC decode (iter5 baseline), check `journalctl -k -p warning -b 0` for any new lockdep splats produced by iter3+4 alone. Expectation: clean (or only pre-existing edp/vblank WARNs) |
|
||||
| 1.6 | Backup extlinux.conf, then `sudo` edit: ADD new label `arch_devices_lockdep` pointing at the new Image + initrd. Since we're ramoops-only (no serial), per round 2 Q3 the `DEFAULT arch_devices_lockdep` override is MANDATORY for this test boot. Restore `DEFAULT arch_devices` before any subsequent reboot where lockdep boot is not desired. **Ramoops cmdline (append)**: extend the lockdep label's `append` line with `memmap=0x100000$0x10000000 ramoops.mem_address=0x10000000 ramoops.mem_size=0x100000 ramoops.record_size=0x10000 ramoops.console_size=0x40000 ramoops.dump_oops=1`. Carves 1 MB at 256 MB physical → 256 KB console buffer + 11 × 64 KB oops records |
|
||||
| 1.7 | Reboot. Since DEFAULT was set to `arch_devices_lockdep` in step 1.6, U-Boot loads lockdep kernel automatically. Verify `uname -r` = `7.0.0-rc3-devices-lockdep+`. Verify journal has `Lockdep is enabled`. `sudo modprobe pstore_ram` (or verify auto-loaded). Verify `ls /sys/fs/pstore/` after triggering a benign console write to confirm the ramoops region is accessible |
|
||||
| 1.8a | Ramoops verify (HARD GATE per 0.4-G): `sync; sudo bash -c "echo c > /proc/sysrq-trigger"`. Ampere panics → reboots (still on lockdep default). After ssh comes back, check `ls /sys/fs/pstore/` — must contain at least one `dmesg-ramoops-*` and one `console-ramoops-*` file. If empty, ABORT — ramoops doesn't capture; reconsider serial cable. Backup pstore contents to `/home/mfritsche/iter6-postmortem-ramoops-verify/` then `sudo rm /sys/fs/pstore/*` to clear for actual tests |
|
||||
| 1.8b | Smoke test: ffmpeg HEVC decode (iter5 baseline), check `journalctl -k -p warning -b 0` for any new lockdep splats produced by iter3+4 alone. Expectation: clean (or only pre-existing edp/vblank WARNs) |
|
||||
|
||||
**1.9 — A2 GPU smoke test**: open SDDM login → log in to plasma wayland → open mpv with `--hwdec=vaapi-copy ~/measurements/encoded/bbb_60s_720p.hevc.mp4`. Watch ~30 seconds. Check journal again. Reason: A2 hypothesis is panthor + kwin + V4L2 dmabuf contention, which only surfaces under active GPU composition. If iter3+4 alone (no fence helper yet) emits a lockdep splat with GPU active, the bug is even more upstream than iter6 patches — STOP and investigate.
|
||||
|
||||
|
||||
Reference in New Issue
Block a user