From b8930df80187f2a71d28bc2f771c3509f860e520 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Sat, 16 May 2026 22:29:57 +0000 Subject: [PATCH] iter6 v6 substrate: source-trace points NULL deref at 0x20 to dma_fence->context Decoded ESR 0x96000004 = DFSC level-0 (pure NULL deref) at virtual address 0x20. Structural offset analysis: struct dma_fence has u64 context at offset 32 (=0x20). dma_buf->ops also at 0x20 but 0004's code guards against NULL dbuf. Leading hypothesis: dma_resv_add_fence iterates existing fences in dbuf->resv->shared[] to merge-by-context. If RCU-managed fence cleanup races with concurrent add, a freed slot becomes NULL and the iteration dereferences NULL->context (offset 0x20). Timing matches: 18-31 min uptime for first wedge (decode-cycle churn needed); fast reboot loops after (BTRFS replays unflushed state). KASAN doesn't catch (NULL deref is not UAF). Lockdep doesn't catch (fence lifecycle race, not lock order). Proposed 0004 v2 fix: use DMA_RESV_USAGE_KERNEL (single-slot, replaces previous) instead of DMA_RESV_USAGE_WRITE (multi-slot list with race window), OR dma_resv_replace_fences() for explicit context-keyed atomic swap. Confirmation path: when UART lands, look for pc inside dma_resv_add_fence and the NULL-pointer register holding the stale fence slot. Co-Authored-By: Claude Opus 4.7 --- iter6_v6_substrate_null_deref_at_0x20.md | 113 +++++++++++++++++++++++ 1 file changed, 113 insertions(+) create mode 100644 iter6_v6_substrate_null_deref_at_0x20.md diff --git a/iter6_v6_substrate_null_deref_at_0x20.md b/iter6_v6_substrate_null_deref_at_0x20.md new file mode 100644 index 0000000..fb7373b --- /dev/null +++ b/iter6_v6_substrate_null_deref_at_0x20.md @@ -0,0 +1,113 @@ +# iter6 v6 substrate — source-trace of NULL deref at 0x20 + +Date: 2026-05-17 ~00:30 +Cross-ref: iter6_v2_attempt2_close.md (boot -2/-7 panic), phase4_plan_iter6_v3.md +Status: hypothesis, awaits UART confirmation + +## Symptom (recap) + +Boot -2/-7 (lockdep+0004) journal tail: +``` +Unable to handle kernel NULL pointer dereference at virtual address 0000000000000020 +Mem abort info: + ESR = 0x0000000096000004 +``` + +Decoded ESR: +- EC = 0x25 = data abort from current EL (kernel-mode) +- IL = 1 = 32-bit instruction +- DFSC = 0x04 = **translation fault, level 0** = page table not present, i.e. pure NULL pointer deref +- ISV = 0, no syndrome valid info + +Faulting virtual address = `0x20`. Therefore a `NULL` base pointer was dereferenced for a field at byte offset `0x20`. Register dump not in journal (panic-before-flush); UART will capture it. + +## Candidate structs whose field-at-0x20 matches + +Computed from upstream Linux 7.0-rc3 / ARM64 LP64: + +| Struct | Field at offset 0x20 | Probability | +|--------|----------------------|-------------| +| `struct dma_fence` | `u64 context` (offsets: lock 0, ops 8, cb_list/timestamp/rcu 16-31, **context 32 = 0x20**, seqno 40, flags 48, error 56) | **HIGH** — exactly 0x20, and it's the only ID-comparison field that's frequently read on the hot path | +| `struct dma_buf` | `const struct dma_buf_ops *ops` (offsets: size 0, file 8, attachments list_head 16-31, **ops 32 = 0x20**, ...) | MEDIUM — but the 0004 patch has `if (!dbuf) continue;` NULL guard so unlikely | +| `struct dma_resv` | `dma_resv_list *fences` (offset 40 = 0x28) | LOW — wrong offset | +| `struct vb2_buffer` | varies, but the typical `vb2_queue *vb2_queue` is at offset 0; offset 0x20 inside vb2_buffer is somewhere in early fields | LOW | + +**Leading hypothesis: NULL `dma_fence *fence` dereferenced at `fence->context`** (offset 0x20) somewhere on the 0004 fence path. + +## Where is `dma_fence->context` accessed? + +`grep -rn "fence->context\|->context" linux/dma-fence` shows it's accessed in: +1. `dma_resv_add_fence()` — comparing the new fence's context with existing fences in the resv list to detect/merge duplicates +2. `dma_fence_add_callback()` — checks context for cb routing in some flows +3. `dma_fence_default_wait()` / `dma_fence_signal_*()` paths — when iterating callbacks +4. `dma_fence_match_context()` — exists in some kernels + +In the 0004-introduced path: +- `vb2_buffer_attach_release_fence` calls `dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE)`. Inside `dma_resv_add_fence`, the kernel iterates existing fences and compares contexts to merge. **If any existing fence in `dbuf->resv`'s list is freed-and-zeroed mid-iteration, accessing its `->context` (offset 0x20) on a NULL pointer = exactly this bug.** + +## Trigger sequence (proposed) + +1. Buffer A's `dbuf->resv` accumulates fences over many decode cycles (each `attach_release_fence` adds, `signal_release_fence` doesn't explicitly remove from `dbuf->resv` — relies on dma_resv's own GC). +2. dma_resv's fence-cleanup RCU runs and frees one of the older fences while `dma_resv_add_fence` is iterating. +3. The freed fence slot in the resv's `dma_resv_list->shared[i]` becomes a NULL pointer (or the cmpxchg races). +4. The compare-context check reads `existing_fence->context` with `existing_fence == NULL`. +5. NULL + 0x20 deref → panic. + +## Why this matches the observed timing + +- Boot -2/-7 lived **18-31 minutes** before the wedge — many decode cycles needed to accumulate enough resv fence churn for the race window to open. +- Subsequent boots wedge fast — BTRFS dirty state likely re-queues the unflushed decode requests at boot, hitting the race immediately. +- KASAN didn't catch it — KASAN doesn't catch the NULL deref itself (KASAN catches use-after-free where the freed pointer is non-NULL and points to poisoned memory; here it's already NULLed out). +- Lockdep didn't catch it — not a lock-order issue, it's a fence-lifecycle race. + +## What 0004 does wrong (if hypothesis is right) + +0004's `vb2_buffer_attach_release_fence` calls `dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE)` **on every buffer cycle**. The dma_resv accumulates write fences. The producer (vb2) doesn't garbage-collect or replace old write fences explicitly. The dma_resv core does its own RCU-managed cleanup that may race with concurrent add operations. + +The upstream pattern for V4L2 producers (where the fix is): use `DMA_RESV_USAGE_KERNEL` (single-slot, replaces previous fence) instead of `DMA_RESV_USAGE_WRITE` (multi-slot list), OR explicitly call `dma_resv_replace_fences()` to atomically swap rather than add. + +## Confirmation path (UART) + +When UART trace lands, look in panic register dump for: +- `pc` pointing inside `dma_resv_add_fence` or `dma_fence_add_callback` +- The register that holds the NULL pointer (likely `x0` if first arg) — that pointer is the `fence` being dereferenced +- Modules linked in: should include `videobuf2_common`, `gpu_sched`, possibly `panthor` +- Backtrace ascending should go via `dma_resv_add_fence` → `vb2_buffer_attach_release_fence` → some `rkvdec_*_run` or `hantro_device_run` (whichever opted in) + +If `pc` is NOT in dma_resv_add_fence territory, the hypothesis is falsified and we re-analyze with the real trace. + +## Proposed 0004 v2 fix (if hypothesis confirmed) + +```c +// In vb2_buffer_attach_release_fence: +// Replace: +dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_WRITE); +// With: +dma_resv_add_fence(dbuf->resv, fence, DMA_RESV_USAGE_KERNEL); +// OR more correctly: +dma_resv_replace_fences(dbuf->resv, q->dma_resv_fence_context, fence, DMA_RESV_USAGE_WRITE); +``` + +`USAGE_KERNEL` semantics: single-slot kernel fence, replaces previous. Each new producer-fence atomically supersedes the prior one. No list growth, no per-call iteration of stale slots, no race window. + +`replace_fences` semantics: same but with explicit context-keyed replacement, which is what upstream's mantra "one writer at a time" actually maps to. + +## What to commit if hypothesis confirmed + +- File against kernel-agent (issue #16, the third Casanova v7.0 fix). Patch is one-line change in vb2_buffer_attach_release_fence. +- Backport the change for the rkvdec opt-in (0007 v2) too. +- Re-run iter6 v7 with the fixed 0004 + 0007 on the lockdep-kasan substrate that's already built — should now boot clean + run for hours under GPU compositor load. + +## What if hypothesis is wrong + +- The UART register dump will tell us which struct + which field. Re-analyze in the same shape (struct + offset 0x20 candidates) and propose a different fix. +- The wedge is real and reproducible; the trace will pinpoint it once UART lands. + +## State of preparedness when UART connects + +- ampere kernel: vanilla `arch_devices` default (safe), lockdep + lockdep-kasan kernels installed but not booted +- ampere /boot/firmware: 3 kernels (devices+, lockdep+, lockdep-kasan+) all present, /boot has 527MB free +- Lockdep extlinux label: cmdline includes `console=ttyS2,1500000` — UART will receive kernel printk from initcall sequence regardless of any other config +- Recovery infra: WeChat stick verified working, restore-modules backup at `~/iter6-postmortem-backups-attempt2-pre-base-20260516-1853/` +- Iter3+4 fixes (verified working): still in source tree (uncommitted) and active in vanilla kernel modules +- Iter6 v6 plan (this doc): commit, ready