diff --git a/notes/observed-bugs.md b/notes/observed-bugs.md index 7fbe8e7df..8016e5da0 100644 --- a/notes/observed-bugs.md +++ b/notes/observed-bugs.md @@ -82,25 +82,47 @@ without board power-cycle"). **Status**: task c3 (indirectly, via bes_chardev removal which currently gates the signal/nosignal mode switch path). -## Backlog — full architect review of bes2600 driver code quality +## Architect review — now BUG-#5-blocking (was backlog) -The Phase 0 perf trace for Bug #5 exposes a "when in doubt, add a lock" -pattern in the BH path (~20 % CPU in `_raw_spin_unlock_irqrestore` even -during healthy throughput). Markus has flagged this for a separate -architect-review pass: have Claude Sonnet (or equivalent reviewer) do a -top-to-bottom code-quality review of the bes2600 sources we have on -boltzmann (`~/src/besser/bes2600-dkms-mobian/bes2600/`), looking for: +The Phase 0 perf trace for Bug #5 first exposed a "when in doubt, add a +lock" pattern (~20 % CPU in `_raw_spin_unlock_irqrestore`). The +follow-up ftrace measurement (2026-05-07 17:00) refined the root cause +to an architectural problem: **the bes2600 driver dispatches every +SDIO transaction through the kernel workqueue**. Numbers from a 3-min +4 MB/s ohm capture (post-reboot, srcversion `1B3B3ED0`): -- needless lock proliferation -- BH / workqueue dispatch shape -- error-handling coverage -- dead code / leftover-from-cw1200 cruft +``` +wsm_cmd_send: 13/sec (host-to-chip command rate, surprisingly low) +bes2600_rx_cb: 611/sec +bes2600_bh_wakeup: 267/sec +lock contention_begin: 50/sec +workqueue_execute_start: 5,643/sec ← DOMINATES; matches the mmc + transaction rate from earlier perf +``` + +5.6 k workqueue dispatches per second is the throughput floor — not a +specific lock, not WSM-command rate, not decrypt-state. A surgical fix +to any single function won't move the floor; the architecture needs +to be restructured to amortise SDIO transactions across fewer work- +items (or move SDIO RX out of the workqueue entirely). + +This is where the **Claude Sonnet architect review** belongs: a +top-to-bottom assessment of `~/src/besser/bes2600-dkms-mobian/bes2600/` +focused on: + +- the workqueue dispatch shape (most actionable) +- needless lock proliferation (the original signal) +- BH / RX scheduling boundaries +- error-handling coverage and dead-code from the cw1200 ancestor - API contract violations relative to mainline mac80211 -Output: ranked list of cleanup targets that would make later patch series -land more cleanly. Not blocking on Bug #5 — independent track. +Output: ranked list of restructuring targets, with predicted-delta +estimates against the Phase 1 metric (≥ 2 MB/s sustained @ 4 MB/s cap, +< 10 % CPU in lock-cycling, no link cascade in 30 min). -**Status**: backlog. Schedule when Bug #5's measurement pass finishes. +**Status**: now blocking on Bug #5 (was independent track). Surgical +patches B5-1, B5-2, B5-3 from the original Phase 4 candidate list are +all DEFERRED until the architect review's restructuring map is in. ## Bug #5 — RX path degrades under attempted-throughput pressure