notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver

Phase 7 of Patch C (PR #9 → bes2600-dkms PR #3 → boot -1 of ohm 20:18:10) failed with a thread-safety race: wsm_release_tx_buffer's unlocked R-M-W on hw_bufs_used races against wsm_alloc_tx_buffer in the bh thread when Patch C moved the RX-confirm decrement into sdio_rx_work. WARN storm at +13s under stress, chip wedges, host off-network. Phase 6 contract analysis cited wsm_handle_rx's sleepability and held-lock invariants but stopped at the function signature. Did not enumerate hw_bufs_used as shared state mutated by the callee. Lesson saved as feedback_phase6_contract_threadsafety memory. Phase 4 v2 designs around that gap. Two-step: 1. Patch C-prep: NFC refactor — convert hw_bufs_used, hw_bufs_used_vif[], wsm_tx_pending[] from int / int[] to atomic_t / atomic_t[]. Use atomic_fetch_sub_release in wsm_release_tx_buffer (returns prior value for the >= numInpChBufs - 1 predicate). Mechanical atomic_read swap at ~58 read sites. Lands first; Phase 7 should show zero delta from baseline. 2. Patch C v2: re-apply the sdio_rx_work direct-deliver on top of C-prep. Identical structural change to the closed PR #3, but now the racing counter is safe. Contract block in bes2600_bh_handle_rx_skb expanded to include the shared-state delta table. Plan §2 is the shared-state delta table — every field bes2600_bh_handle_rx_skb mutates directly or transitively, with current protection and required action. 3 fields need atomic_t, the rest are already concurrency-safe or stay single-writer. Plan §6 lists 6 risks including memory-ordering choices, the inc/dec_pending_count timer-decision race, and the new wired-rig fallback (enu1 192.168.88.80) that survives bes2600 wedges so Phase 7 can capture dmesg / ftrace from a wedged ohm without reboot. PR superseded #3 closed with full verdict comment. Phase B rolled back on ohm at /lib/modules/.../extra/bes2600.ko. Markus's reboot button to land Patch B again before C-prep work begins.
Merge pull request 'notes: Patch C Phase 4 plan (item 1 only — collapse sdio_rx_work into BH)' (#9 ) from claude-noether-7 into main
2026-05-07 20:50:39 +02:00 · 2026-05-07 17:21:37 +00:00 · 2026-05-07 19:04:53 +02:00 · 2026-05-07 16:58:55 +00:00 · 2026-05-07 18:12:54 +02:00 · 2026-05-07 16:01:55 +00:00
8 changed files with 1159 additions and 0 deletions
@@ -0,0 +1,180 @@
 # BES2600 architecture review — Bug #5
 Date assembled: 2026-05-07
 Reviewer: Claude Sonnet (general-purpose subagent, model=sonnet)
 Driver source: `~/src/besser/bes2600-dkms-mobian/bes2600/` on boltzmann
 This is the architect-review pass requested in `notes/observed-bugs.md` after the Phase 0 measurement showed the throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. The reviewer was given the measurement summary, source location, and a focused brief; output is a ranked restructuring map with file:line citations for every concrete claim.
 ---
 ## Measurement context (input to the reviewer)
 ```
 Reproduction:                pv -L 4M < /dev/zero | nc ohm 12345
 Module under test:           bes2600.ko srcversion 1B3B3ED0...  (cleanups + Patch A + Patch B)
 Hardware:                    PineTab2, RK3566 Cortex-A55 ARMv8.5, kernel 6.19.10-danctnix1
 Link rate:                   65 Mb/s ≈ 8 MB/s theoretical
 Observed throughput:         725 KB/s  (Phase 0 anchor at N=3)
                             rep 3 cascaded into beacon-loss disconnect at ~9 min in
 Per-second event rates (3-min capture under 4 MB/s pv-cap):
  workqueue_execute_start:  5,643/sec    ← architectural floor
  bes2600_rx_cb:              611/sec
  bes2600_bh_wakeup:          267/sec
  wsm_cmd_send:                13/sec    (host-to-chip command rate, surprisingly low)
  lock contention_begin:       50/sec    (modest)
  mmc_request_start:        ~5,800/sec   (matches workqueue rate — every SDIO transaction is its own work item)
 perf record top symbol:      _raw_spin_unlock_irqrestore  (~20 % CPU samples)
 Dominant callstack:          process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0
 ```
 The implication: ~9 workqueue dispatches fire per frame delivered to mac80211. Items below address that ratio in descending order of predicted leverage.
 ---
 ## Item 1 — Two-hop workqueue relay: SDIO IRQ → `sdio_rx_work` → BH loop → mac80211
 **File:line:** `bes2600_sdio.c:416` (IRQ handler dispatches `rx_work`); `bes2600_sdio.c:829` (`sdio_rx_work` body); `bh.c:1330–1538` (BH main loop, `BES2600_RX_IN_BH` path); `bes2600_sdio.c:1267` (`bes_sdio` workqueue, `max_active=2`).
 **Current shape:** Every SDIO interrupt fires `queue_work(sdio_wq, &rx_work)`. `sdio_rx_work` reads up to `BES_SDIO_RX_MULTIPLE_NUM=16` frames (`hwio.h:294`) into per-frame SKBs, enqueues each onto `sbus_priv.rx_queue` under `rx_queue_lock`, then returns. Meanwhile the BH kthread (one work item queued at boot in `bh.c:93`, running an infinite loop inside `bes2600_bh()`) calls `pipe_read()` → `spin_lock(rx_queue_lock)` → `skb_dequeue()` → `wsm_handle_rx()` → `ieee80211_rx_irqsafe()` one frame at a time. When `pipe_read()` returns NULL and pending TX exists, `bes2600_sdio_pipe_read()` at `bes2600_sdio.c:941` re-dispatches `rx_work` — so a sustained RX stream fires **one `queue_work` per BH wakeup, not per burst**. That explains why `bh_wakeup` events are only 267/sec while `workqueue_execute_start` is 5,643/sec: the SDIO layer is firing a new `rx_work` item for every frame the BH loop drains.
 **Proposed shape:** Collapse `sdio_rx_work` and `pipe_read()` into the BH loop directly. The BH already runs in a dedicated `WQ_HIGHPRI | WQ_CPU_INTENSIVE` workqueue (`bh.c:66`) and (with `BES2600_RX_IN_BH` defined per `Makefile:159`) `bes2600_bh_rx_helper()` already dequeues from `rx_queue`. Merge `sdio_rx_work` into a function called synchronously from `bes2600_bh_rx_helper()` before the dequeue, guarded by a trylock so re-entry is safe. This eliminates O(N) `queue_work` calls per burst while keeping the BH as the single SDIO-access context.
 **Predicted delta vs Phase 1 metric:** Eliminates ~5 of the ~9 redundant workqueue dispatches per frame. 2–4× throughput improvement and a proportional drop in `_raw_spin_unlock_irqrestore` CPU cost.
 **Effort:** Medium. SDIO host-lock protocol (`sdio_claim_host`/`sdio_release_host`) is already managed inside `sdio_rx_work`; moving the body is mechanical but requires care around the `sdio_wq` `max_active=2` concurrency assumption.
 **Risks:** `sdio_rx_work` runs with `sdio_claim_host` held for the entire burst. Inside the BH it serialises all SDIO access fine. Watch `bes2600_sdio.c:1889` — flushes `rx_work` during teardown; that path must remain.
 ---
 ## Item 2 — `ieee80211_rx_irqsafe` instead of `ieee80211_rx` (pre-NAPI cw1200 ancestor pattern)
 **File:line:** `txrx.c:1947`, `txrx.c:1950`, `ap.c:99`, `sta.c:1487`, `wsm.c:2416`.
 **Current shape:** Every RX frame is delivered via `ieee80211_rx_irqsafe()`. This function enqueues the SKB onto a per-cpu `tasklet_rx` list and schedules a software IRQ. Under sustained load: one softirq wakeup per frame — 611 softirq wakeups/sec on top of the workqueue overhead.
 **Proposed shape:** Switch to `ieee80211_rx_ni()` (process context, which `wsm_handle_rx` is already in) or, better, batch-deliver frames using `ieee80211_rx_list()` (introduced in kernel 5.12, available in 6.19). Accumulate frames from a single `sdio_rx_work` burst into a `list_head`, then call `ieee80211_rx_list()` once per burst.
 **mac80211 contract:** `ieee80211_rx_list()` is safe from process context with the same `ieee80211_rx_status` rules as `ieee80211_rx_ni()`. Per `include/net/mac80211.h` — kerneldoc states it takes the RX path atomically only when called from softirq context; from process context it uses the same path as `ieee80211_rx_ni()`.
 **Predicted delta:** Reduces per-frame softirq overhead. Hard to isolate independently of item 1, but combined the two deliver the < 10 % CPU-in-lock target.
 **Effort:** Small (once item 1 is done — the batch list naturally exists at the burst boundary).
 **Risks:** Must hold `rcu_read_lock()` at call site; `skb->cb` (`IEEE80211_SKB_RXCB`) must be filled before the call, as today. The `early_data` path at `txrx.c:1942` uses `skb_queue_tail` into a per-link queue before calling `ieee80211_rx_irqsafe` — that path must be excluded from batch collection.
 ---
 ## Item 3 — Per-frame `queue_work(sdio_wq, &tx_work)` in the TX send path
 **File:line:** `bes2600_sdio.c:1236` (inside `bes2600_sdio_pipe_send()`).
 **Current shape:** Every call to `bes2600_sdio_pipe_send()` appends one descriptor to `tx_bufferlist` and immediately calls `queue_work(sdio_wq, &tx_work)`. `sdio_tx_work` then drains the list with scatterlist batching (up to `BES_SDIO_TX_MULTIPLE_NUM=16` frames per SDIO transfer). At low rates the workqueue's pending-but-not-started dedup means only one dispatch fires; at high TX rates — especially after `atomic_add(1, &hw_priv->bh_tx)` in `bh.c` reschedules TX — successive `pipe_send` calls each hit `queue_work` before the previous fires, multiplying dispatches.
 **Proposed shape:** Stage all frames into `tx_bufferlist` in the BH TX loop, then flush `sdio_tx_work` synchronously (call the work function body directly) before returning to the wait-event. The TX mirror of item 1.
 **Predicted delta:** Removes redundant TX-side `queue_work` calls. Lower priority than RX side given current TX rate (13 `wsm_cmd_send`/sec is host→chip control plane; data-plane TX is also limited by firmware buffer count `numInpChBufs`), but it does remove one source of the 5,643/sec workqueue count.
 **Effort:** Small.
 **Risks:** `sdio_tx_work` calls `sdio_claim_host`/`sdio_release_host` internally. Running directly from BH context requires confirming no deadlock with the SDIO bus claim that `sdio_rx_work` (now merged per item 1) holds. The TX flush must happen after the RX burst, matching the existing BH loop structure (`rx:` → `tx:` ordering in `bh.c:1444`).
 ---
 ## Item 4 — `ba_lock` per-frame acquisition in the RX path
 **File:line:** `txrx.c:998–1005` (`bes2600_rx_h_ba_stat()`); `txrx.c:1159–1164` and `txrx.c:1682–1698` (`tsm_lock`, `CONFIG_BES2600_TESTMODE`-gated).
 **Current shape:** `bes2600_rx_cb()` calls `bes2600_rx_h_ba_stat()` for every non-multicast data frame. That function acquires `ba_lock` under BH (`spin_lock_bh`) to increment `ba_acc_rx` and `ba_cnt_rx`, then sets a timer. At 611 frames/sec that's 611 lock acquisitions/sec on `ba_lock` alone.
 **Proposed shape:** Replace per-frame `ba_lock` with a per-cpu counter (or `atomic64_t`) for `ba_acc_rx` and `ba_cnt_rx`. The timer arm (`mod_timer`) is the actual reason for the lock — a `READ_ONCE`/`cmpxchg` on a flag to detect first-frame-in-interval is sufficient.
 **Predicted delta:** Removes 611 lock acquisitions/sec from the RX hot path. Not the dominant cost but the next bottleneck after items 1–2 land.
 **Effort:** Small.
 **Risks:** `ba_lock` also serialises TX-side block-ack accounting (`txrx.c:1632`). The per-cpu approach requires a fold step in the timer callback — cheap.
 ---
 ## Item 5 — Skip `ps_state_lock` acquisitions when PSM is known-disabled
 **File:line:** `bes2600.h:320` (decl); `txrx.c:1340–1365` (`vif_lock`); `txrx.c:1415–1426` (`ps_state_lock`); `txrx.c:1942–1948` (RX-side `ps_state_lock`).
 **Current shape:** `ps_state_lock` is taken on every TX frame if powersave is active. Per memory `reference_bes2600_firmware_no_psm.md`, **PSM is non-functional on this firmware** — c7 already self-detects this and latches `pm_unsupported = true`. The `ps_state_lock` guards in the RX callback and TX path are therefore taking dead overhead.
 **Proposed shape:** Add a `READ_ONCE()` check on `powersave_enabled` before taking `ps_state_lock`; if false, skip the lock and the PSM state update entirely. Since c7's `pm_unsupported` latches, this is safe.
 **Predicted delta:** Small absolute gain at current TX rate, but prevents fast path from regressing as throughput improves.
 **Effort:** Small.
 **Risks:** `powersave_enabled` is written from process context (`bh.c:403`). `READ_ONCE` without lock is safe — at worst one spurious PSM notification, not a state corruption.
 ---
 ## Item 6 — Firmware block-read size cap (`EFFECTIVE_BUF_SIZE = 8190 bytes`)
 **File:line:** `bh.c:33–36` (defines); `bes2600_sdio.c:721–783` (`bes2600_sdio_extract_packets()`); `hwio.h:294` (`BES_SDIO_RX_MULTIPLE_NUM=16`).
 **Current shape:** `BES_SDIO_RX_MULTIPLE_NUM=16` and `BES_SDIO_OPTIMIZED_LEN` both defined (`Makefile:90,92`). The RX burst reads `PACKET_TOTAL_LEN(ctrl_reg)` bytes in a single CMD53; each sub-packet bounded by `EFFECTIVE_BUF_SIZE = (0x1000-4)*2 - 2 = 8190` bytes. At 611 frames/sec ÷ 267 BH wakeups/sec ≈ **2.3 frames per wakeup** — well under the 16-frame limit. **Not the bottleneck today.**
 **Proposed shape:** No change needed now. Re-evaluate after items 1–3 land if throughput rises past ~3 MB/s. Verify ctrl_reg `PACKET_TOTAL_LEN` field values during high load — requires firmware-trace observation we don't currently have.
 **Effort:** N/A.
 **Risks:** Increasing beyond 16 requires a larger DMA allocation (currently `1632 × 16 = 26 KB`). Cortex-M4F firmware side is opaque.
 ---
 ## Item 7 — Duplicate workqueues (`hw_priv->workqueue` vs `hw_priv->bh_workqueue` vs `sbus_priv->sdio_wq`)
 **File:line:** `bes2600.h:323` (`workqueue`); `bes2600.h:385` (`bh_workqueue`); `bes2600_sdio.c:63` (`sdio_wq`). `txrx.c` has 10 `queue_work(hw_priv->workqueue, ...)` calls for control-plane work.
 **Current shape:** Three distinct workqueues. The 5,643 `workqueue_execute_start`/sec are dominated by `sdio_wq` items, not `workqueue`. `workqueue` items are control-plane events at rates well below the data-plane.
 **Proposed shape:** After item 1 (merging `sdio_rx_work` into BH), `sdio_wq` only carries `tx_work`. After item 3 (synchronous TX flush from BH), `sdio_wq` is idle during normal data-plane and could be replaced with `system_highpri_wq`.
 **Effort:** Small (follow-on to items 1 and 3).
 **Risks:** None if items 1 and 3 land first.
 ---
 ## Item 8 — `BH_RX_CONT_LIMIT=3` cap on RX burst per BH wakeup
 **File:line:** `bh.c:1380–1405` (timeout detection); `bh.c:1330` (`BH_RX_CONT_LIMIT=3`); `bh.c:1331` (`BH_TX_CONT_LIMIT=20`).
 **Current shape:** BH loop limits RX burst to 3 consecutive iterations before breaking back to wait-event. At 611 frames/sec ÷ 267 wakeups/sec ≈ 2.3 frames per wakeup → not the bottleneck today. **After items 1–3 land**, per-burst frame rate will rise and `BH_RX_CONT_LIMIT=3` becomes the ceiling.
 **Proposed shape:** Raise to 16 (matching `BES_SDIO_RX_MULTIPLE_NUM`) after items 1–3 are deployed and re-measured.
 **Effort:** Trivial (constant change), but must wait for Phase 7 measurements post 1–3.
 **Risks:** Too high a limit under firmware anomaly (corrupted ctrl_reg) can spin BH long enough to miss beacon ACK deadline. Bound to `BES_SDIO_RX_MULTIPLE_NUM` as safe ceiling.
 ---
 ## Ranking summary
 | Rank | Item | Predicted gain | Effort |
 |------|------|----------------|--------|
 | 1 | Collapse `sdio_rx_work` relay into BH loop | ~5x workqueue dispatch reduction | Medium |
 | 2 | Batch deliver via `ieee80211_rx_list()` | Removes per-frame softirq | Small |
 | 3 | Synchronous TX flush from BH | Removes TX-side dispatch noise | Small |
 | 4 | Replace `ba_lock` per-frame with atomic/per-cpu | Removes 611 lock/sec from RX hot path | Small |
 | 5 | Skip `ps_state_lock` when PSM-known-disabled | Removes dead overhead | Small |
 | 6 | Raise `BH_RX_CONT_LIMIT` after 1–3 land | Unlocks residual throughput | Trivial |
 | 7 | Consolidate workqueues post-items 1&3 | Cleanup | Small |
 | 8 | Firmware block-read size | Not bottleneck at current rates | N/A |
 **Items 1 + 2 together are the structural answer to the measurement**: ~9 workqueue events per delivered frame collapse to ~1, and the per-frame softirq cost disappears. Items 3–5 clean up the next layer. The beacon-loss cascade at 9 minutes is almost certainly starvation of the BH wait-event under the per-frame workqueue storm — item 1 removes the mechanism that makes the cascade possible.
 ---
 ## Next campaign step
 A Phase 4 plan locking item 1 (and possibly item 2) follows in a separate PR. The remaining items go on the campaign backlog as follow-on patches once the Phase 7 verification of item-1-or-1-plus-2 confirms the predicted delta.
@@ -82,6 +82,91 @@ without board power-cycle").
 **Status**: task c3 (indirectly, via bes_chardev removal which currently
 gates the signal/nosignal mode switch path).
 ## Architect review — now BUG-#5-blocking (was backlog)
 The Phase 0 perf trace for Bug #5 first exposed a "when in doubt, add a
 lock" pattern (~20 % CPU in `_raw_spin_unlock_irqrestore`). The
 follow-up ftrace measurement (2026-05-07 17:00) refined the root cause
 to an architectural problem: **the bes2600 driver dispatches every
 SDIO transaction through the kernel workqueue**. Numbers from a 3-min
 4 MB/s ohm capture (post-reboot, srcversion `1B3B3ED0`):
 ```
 wsm_cmd_send:               13/sec    (host-to-chip command rate, surprisingly low)
 bes2600_rx_cb:             611/sec
 bes2600_bh_wakeup:         267/sec
 lock contention_begin:      50/sec
 workqueue_execute_start: 5,643/sec    ← DOMINATES; matches the mmc
                                       transaction rate from earlier perf
 ```
 5.6 k workqueue dispatches per second is the throughput floor — not a
 specific lock, not WSM-command rate, not decrypt-state. A surgical fix
 to any single function won't move the floor; the architecture needs
 to be restructured to amortise SDIO transactions across fewer work-
 items (or move SDIO RX out of the workqueue entirely).
 This is where the **Claude Sonnet architect review** belongs: a
 top-to-bottom assessment of `~/src/besser/bes2600-dkms-mobian/bes2600/`
 focused on:
 - the workqueue dispatch shape (most actionable)
 - needless lock proliferation (the original signal)
 - BH / RX scheduling boundaries
 - error-handling coverage and dead-code from the cw1200 ancestor
 - API contract violations relative to mainline mac80211
 Output: ranked list of restructuring targets, with predicted-delta
 estimates against the Phase 1 metric (≥ 2 MB/s sustained @ 4 MB/s cap,
 < 10 % CPU in lock-cycling, no link cascade in 30 min).
 **Status**: now blocking on Bug #5 (was independent track). Surgical
 patches B5-1, B5-2, B5-3 from the original Phase 4 candidate list are
 all DEFERRED until the architect review's restructuring map is in.
 ## Bug #5 — RX path degrades under attempted-throughput pressure
 **Suspect file**: bes2600 RX path (`txrx.c bes2600_rx_cb`, `bh.c bes2600_bh_work`,
 SDIO RX scheduling) — pinpoint pending.
 **Symptom (observed 2026-05-07 13:43, srcversion `1B3B3ED0` = c-stack +
 Patch A + Patch B, ohm @ -57 dBm 2.4GHz ch11 5b:32, idle save for the
 netcat load):**
 ```
 sender cap 1 MB/s  →  ohm receives 1015 KB/s,  signal -57 dBm,  RX MCS 4
 sender cap 4 MB/s  →  ohm receives  563 KB/s,  signal -67 dBm,  RX MCS 3
                     (Send-Q on boltzmann backed up to 1.16 MB)
 ```
 Pushing the sender-side cap from 1 MB/s to 4 MB/s **decreased** observed
 throughput at the receiver and degraded the link metrics. Signal dropped
 ~10 dB and the chip downshifted MCS, suggesting the chip can't sustain
 the higher RX rate even with the link physically capable of more (link
 bitrate 65 Mb/s = ~8 MB/s theoretical).
 **Hypothesis (Markus, 2026-05-07): driver/firmware locks itself to death
 under busy reads** — possibly a busy-wait loop or lock contention on the
 RX SDIO path that prevents draining at line rate. Plausible reason it
 didn't surface for the c-stack tasks: those operated at typical
 browse-rate traffic, well below the saturation threshold this bug needs
 to fire.
 **May explain**: original Phase-0 observation that **YouTube DASH chunks
 drop ~10 frames per chunk fetch** on hardware-decoder playback. A chunk
 fetch is a brief burst at near-link-rate; if the driver throttles itself
 down during high-RX, the player buffer underruns for the duration of
 the fetch.
 **How to drill (when prioritized)**:
 - Capture trace_pipe with `mmc:*` and `sdio*` events enabled during a
  controlled rate-ramp (e.g., pv -L 500K, 1M, 2M, 4M each for 60 s).
 - Watch `/proc/sys/kernel/sched_*` and the `bes2600_bh_work` kworker for
  CPU saturation.
 - `perf top -p $(pgrep -f bes_sdio)` during 4 MB/s load.
 **Status**: backlog. No patch yet.
 ## Bug #4 — scan_complete_cb constant loop
 **File**: `scan.c:883-909` — `bes2600_scan_complete_cb()`.
@@ -0,0 +1,190 @@
 # BES2600 WiFi structural analysis and code critique
 **Author:** Claude (noether) — second-opinion as Opus 4.7 against Sonnet 4.6's review of 2026-05-07
 **Scope:** the WiFi half of the BES2600 driver as it lives in `bes2600-dkms-mobian/bes2600/` on top of the `cleanups` branch (srcversion `1B3B3ED0…`, c-stack + Patch A + Patch B deployed).
 **Reading frame:** Bug #5 prompted Sonnet's review; this writeup is independent — same source tree, different model, different priors. Where I concur I tighten; where I disagree I say so.
 ---
 ## 1. Top-line
 The BES2600 WiFi driver is **not a BES2600 driver**. It is a CW12xx driver wearing a BES2600 nameplate. That sentence is not rhetoric — it is the design fact that explains every other smell I will list below.
 - 30+ live references to `CW12XX_MAX_VIFS` across 9 files.
 - `cw12xx_hwpriv_to_vifpriv()` / `cw12xx_get_vif_from_ieee80211()` are the active vif accessors.
 - `is_hardware_cw1250(hw_priv) || is_hardware_cw1260(hw_priv)` is a runtime branch in `ap.c:1892` — the chip is BES2600, neither check ever matches, the branch is dead on this hardware but compiled in.
 - `CW1200_MAX_SW_RETRY_CNT` gates the active retry-decision logic in `bh.c:1269` (inside `KEY_FRAME_SW_RETRY`).
 - The header opens with "Based on the mac80211 Prism54 code, which is Copyright (c) 2006, Michael Wu" → **prism54 → islsm → ST-E CW1200 → CW1260 → CW12xx → BES2600**: at least five generations of vendor-SDK descent, with each generation preserving its predecessor as #if-0 blocks rather than removing it.
 This is the Phase 6 "transcription trap" from `CLAUDE.md`, frozen into the codebase: every generation copied behaviour rather than re-derive against the API contract. The result is a driver that *works*, but whose structural choices are decisions made for a 2010 ST-Ericsson chip, not a 2022 Bestechnic one.
 The downstream consequence — and the thing that actually pinches us in Bug #5 — is that the **hot path was designed for cw1200's IRQ-driven SPI bus, not for SDIO with multi-block coalescing**. Items 1 + 2 of Sonnet's review are the right surgical fix. The deep fix is bigger than the budget of any one campaign.
 ## 2. Concurrence with Sonnet — refined
 ### 2.1 RX relay (Sonnet item 1) — concur, refine
 The flow on this build (`-DBES2600_RX_IN_BH` in Makefile, so this is the *real* path):
 ```
 SDIO IRQ
  → bes2600_gpio_irq_handler                       (bes2600_sdio.c:413)
  → queue_work(self->sdio_wq, &self->rx_work)      (bes2600_sdio.c:416)
  → sdio_rx_work runs                              (bes2600_sdio.c:829)
      → bes2600_sdio_lock + memcpy_fromio
      → bes2600_sdio_extract_packets               (skb_queue_tail to self->rx_queue)
      → self->irq_handler(self->irq_priv)          (function call, not workqueue)
          → atomic_add_return(1, &hw_priv->bh_rx)  (bh.c:130)
          → wake_up(&hw_priv->bh_wq)
  → bh_work (already running, never re-queued):
      wait_event_interruptible_timeout returns
      → bes2600_bh_rx_helper                       (bh.c:961)
          → priv->sbus_ops->pipe_read              (skb_dequeue from self->rx_queue)
          → wsm_handle_rx                          (wsm.c)
              → bes2600_rx_cb                      (txrx.c:1642)
                  → ieee80211_rx_irqsafe(skb)      (txrx.c:1947 / 1950)
 ```
 **Where I refine Sonnet:** the "9 workqueue events per delivered RX frame" claim doesn't survive source reading. Per IRQ *batch* there is **one** workqueue dispatch (sdio_wq.rx_work). `bh_work` is registered once, runs as a long-lived work item using `wait_event_interruptible_timeout` to sleep — the wake-up path is a wait-queue, not a workqueue dispatch. `ieee80211_rx_irqsafe` schedules a mac80211 tasklet, not a workqueue. The 5,643 `workqueue_execute_start/sec` ftrace count from Bug #5 is **system-wide**, not bes2600-only — it should not be quoted as "per frame" without per-pid filtering.
 **What is real:** the indirection adds two synchronization points per frame (`skb_queue_tail` + `skb_dequeue`, each `&rx_queue->lock`) plus a cross-CPU wake-up plus a tasklet schedule. That's enough to dominate at 4 MB/s. The collapse is justified — just not by the 9× number.
 ### 2.2 ieee80211_rx_irqsafe from process context (Sonnet item 2) — concur, gated on contract verification
 Confirmed: `ieee80211_rx_irqsafe` is the right primitive only when called from hard-IRQ context — it defers to a tasklet. From process context (which is where `bh_work` and `sdio_rx_work` both live), it adds a tasklet hop for nothing.
 `ieee80211_rx_list(hw, sta, &skbs)` is the correct call shape if, and only if, two contract claims hold:
 1. callable from process context with `local_bh_disable()` wrap (or callable bare),
 2. SKB list invariants don't impose NAPI-poll semantics we can't honour.
 Sonnet asserted both; I have **not** verified them against `include/net/mac80211.h` kerneldoc on a 6.19-class kernel. **Task #19 blocks Patch C on that verification.** Until it lands, treat the API claim as unconfirmed — this is exactly the Phase 6 contract-citation rule, and skipping it would be the same trap the older driver fell into.
 ### 2.3 ba_lock per-frame (Sonnet item 4) — concur
 `txrx.c:998-1005` (TX path) and `txrx.c:1632-1640` (RX path): `spin_lock_bh(&hw_priv->ba_lock)` to bump 4 ints (`ba_acc`, `ba_cnt`, `ba_acc_rx`, `ba_cnt_rx`) and conditionally `mod_timer(&hw_priv->ba_timer, …)`. The TODO comment in `bes2600.h:359-365` literally says *"TODO: Same as above"* on every field — the original author flagged it as deferred work, then shipped.
 Replace with `atomic_t` for the four counters and `cmpxchg`-guarded `mod_timer` for the arm-once invariant. Patch D.
 ### 2.4 ps_state_lock when pm_unsupported (Sonnet item 5) — concur
 `txrx.c:1942-1948`: per-RX-frame `spin_lock_bh(&priv->ps_state_lock)` on the early-data path, protecting a check on `entry->status == BES2600_LINK_SOFT`. The lock exists to coordinate with the AP-side power-save state machine.
 c7's contribution (`pm_unsupported = true`) means we already know this firmware doesn't honour PSM; the LINK_SOFT branch is an AP-mode soft-link state that won't transition under us when PSM is dead. Gate the lock acquisition on `!hw_priv->pm_unsupported`. Patch E.
 (This patch is *narrower* than Sonnet framed it: it only applies when `pm_unsupported` latches on, which is at boot for our firmware. Production reality on this hardware = always; but the patch must remain conditional in case a future firmware fixes PSM and c7 self-clears the flag.)
 ## 3. Push-back against Sonnet
 ### 3.1 "BES_SDIO_OPTIMIZED_LEN config flag"
 Not a runtime/Kconfig knob on this build. `Makefile:18` hard-codes `ccflags-y += -DBES_SDIO_OPTIMIZED_LEN`. Whether to keep it is a separate question, but Sonnet's recommendation should not have framed it as toggleable.
 ### 3.2 "Multiple workqueues are unconditionally bad"
 There are three driver-side workqueues:
 | name | purpose | dispatch shape |
 |---|---|---|
 | `bh_workqueue` | hosts the single long-running `bh_work` | one-shot at register, wait-queue driven thereafter |
 | `sdio_wq` | sdio_rx_work + sdio_tx_work + sdio_scan_work | per-IRQ-batch dispatch |
 | `hw_priv->workqueue` | scan, AP, PM, multicast-start, link-id, set-tim, … | per-event dispatch (~20 producers) |
 **`bh_workqueue` is fine** — it runs a single work item forever, which is just a kthread-shaped-as-workqueue. The cost is one alloc_workqueue at register and zero ongoing dispatch overhead. Don't kill it.
 **`sdio_wq` is the actual surgical target** — collapsing item 1 means subsuming `sdio_rx_work` into the bh-loop, after which `sdio_wq` only hosts tx_work and scan_work and could be merged with `hw_priv->workqueue` for cleanup. But that merge is cosmetic; do it later or never.
 **`hw_priv->workqueue` shouldn't be touched.** It hosts ~20 unrelated producers; merging it into sdio_wq is the wrong direction (priority inversion risk under coex pressure).
 ### 3.3 "BH_RX_CONT_LIMIT=3 is the bottleneck"
 Half-true. The limit caps the burst-RX pass to 3 frames before yielding to TX work. Raising it past 3 only helps if RX has steady backlog, which under our 4 MB/s ramp it does. But there's also `BH_TX_CONT_LIMIT=20` paired with it — TX gets 20-frame bursts, RX gets 3. The asymmetry is from a previous campaign that found TX-starvation, and **flipping it without re-running that campaign is a regression risk**. Treat the constant as a phase-7-knob, not a one-liner.
 ## 4. New findings Sonnet did not surface
 ### 4.1 `bh.c` carries ~700 lines of `#if 0` dead code
 `bh.c:196-877` is the cw1200 ancestor `bes2600_bh()` preserved verbatim alongside the active impl at `bh.c:1332+`. Same function name, same `goto rx:` / `goto tx:` labels, same loop variables. The fossil block contains a typo (`if ((i = (CW12XX_MAX_VIFS - 1)) || !priv)` at lines 438 and 562 — single `=` is assignment-not-compare; live code at `ap.c:696` uses `==` correctly) which would be a real bug if compiled. **It is not compiled** — `#if 0` saves us — but this is the maintenance hazard you discover *first* when reading the file in a hurry.
 Action: kill the `#if 0` block. Standalone hygiene patch, not on the Bug-#5 critical path.
 ### 4.2 Allwinner-specific code in the SDIO bus path
 `bes2600_sdio.c:475` calls `sw_mci_check_r1_ready(self->func->card->host, 1000)` from inside the IRQ-setup error path. This is the Allwinner mmc driver's R1-ready helper — not portable to RK3566's `dw_mmc-rockchip` host driver.
 The call is reachable only on `set_func` cleanup (a comparatively rare error path), but it is a build-time portability hazard. Most likely a stub macro on non-Allwinner builds; verify on ohm or wrap behind `#ifdef CONFIG_MMC_SUNXI`.
 ### 4.3 `asm volatile ("nop")` placeholder in the live BH loop
 `bh.c:1518` is where IRQ re-enable used to be (`__bes2600_irq_enable(1)` is commented out two lines above). The author left a literal nop instruction "asm volatile" instead of removing the dead block. Either re-enable IRQs (if the code was deleted prematurely) or remove the nop (if IRQs are intentionally always-on). This is non-cosmetic — it indicates an unresolved IRQ-handling decision.
 ### 4.4 `BUG_ON` in the steady-state hot path
 `bh.c:1488`: `BUG_ON(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs)` runs *every* BH iteration. Tripping it locks up the kernel during normal operation — by definition the wrong response to a bookkeeping bug. Should be `WARN_ON_ONCE` + bail-out. (Same critique applies to several other `BUG_ON`s in `bh.c` — search the active `#else` block.)
 ### 4.5 Build-system is a vendor SDK, not a kernel-style driver
 `Makefile:1-50` defaults: `CONFIG_BES2600_TESTMODE ?= y`, `WIFI_BT_COEXIST_EPTA_ENABLE ?= y`, `BES2600_INTEGRATED_MODULE_V1/V2/V3` for *xiaomi R329 wifi module*, *sicun QM215 wifi module*, *bes evb*. 86 `#ifdef CONFIG_BES2600_TESTMODE` sites — testmode is essentially compiled-in dead code in non-test builds.
 The driver was built by Bestechnic to ship per-customer board variants from one source tree. Upstreaming will require ripping that whole apparatus out, replacing with `Kconfig` toggles and platform-data lookups. This is **not** a Bug-#5 dependency, but it is a debt that pollutes every other patch — diff hunks land in `#ifdef`-walled territory and conflict on rebases for unrelated reasons.
 ### 4.6 8 `EXPORT_SYMBOL` declarations from a single-binary module
 The driver exports `bes2600_irq_handler`, `bes2600_bh_wakeup`, `bes2600_bh_suspend`, `bes2600_bh_resume`, etc. — for whom? The only known consumer is `bes2600_btuart`, the BT sibling module. Either the BT module needs a coherent shared-driver API surface (refactor target), or these exports should become `static`. Random sibling-module coupling via global symbols is a known kernel anti-pattern.
 ### 4.7 No `__must_check` on functions that obviously return errors
 Almost every `bes2600_data_read` / `bes2600_data_write` / `bes2600_reg_read*` call site is wrapped in `WARN_ON()`. That's defensive but not enforced. A single missed return-check (compiler will not warn) is a silent SDIO-path bug. Annotation cost is one keyword per declaration; benefit is a class of bugs caught at compile time.
 ### 4.8 `rx_queue` is per-sbus_priv, not per-vif
 Multi-vif RX serializes through one `skb_queue` on the sbus side (`bes2600_sdio.c:867` queues to `self->rx_queue`, only dequeued by the single bh thread). For STA-only operation this doesn't matter; for STA+AP concurrent or P2P-multivif it's a structural ceiling on aggregate RX throughput. Out of scope for Bug #5 but worth recording — Markus's "P2P_MULTIVIF=y" Makefile default makes this potentially observable.
 ## 5. Ordering recommendation for the cleanup roadmap
 Given (a) the current Bug-#5 budget, (b) Phase-7 stress-ramp cost per patch, (c) the constraint that the cleanups branch must rebase cleanly on Mobian's `mobian` for re-MR:
 | order | patch | scope | phase-7 cost | risk |
 |---|---|---|---|---|
 | 1 | **Patch C (items 1+2 wrapped)** | hot path: collapse sdio_rx_work into bh, batch deliver via ieee80211_rx_list | full ramp 1→4→8 MB/s | high — touches RX hot path |
 | 2 | **Patch D (item 4)** | ba_lock → atomics + cmpxchg-guarded mod_timer | minimal — lock-stat delta + 5min @ 4MB/s smoke | low |
 | 3 | **Patch E (item 5)** | ps_state_lock skip when pm_unsupported | minimal — same as D | low (gated on c7's existing latch) |
 | ∞ | bh.c #if 0 graveyard removal | pure delete | none — recompile + smoke | zero |
 | ∞ | CW12XX → BES2600 rename | mass rename | none — but every open patch conflicts | high churn cost, zero behaviour change |
 | **NOT** | Allwinner abstraction layer | wrap sw_mci_check_r1_ready | n/a | scope-creep; do only if RK3566 fails on it |
 | **NOT** | Vendor-SDK Makefile rewrite | Kconfigify | n/a | upstream-prep work, not Bug-#5 |
 | **NOT** | bh_workqueue / sdio_wq merge | structural | n/a | speculation, no measured win |
 Patch C is high-risk; merging items 1 and 2 into one patch is the user's call (made: "wrap them together") but should **be reviewed Phase-5 before Phase-6 implementation lands** — exactly the receipts-checklist that this CLAUDE.md exists to enforce. Splitting Patch C into 1-then-2 is *also* defensible; if Phase 7 finds item 1 regressed something, item 2 in isolation is harder to bisect.
 ## 6. Things I would explicitly NOT do
 - **Don't paint the bikeshed on naming.** CW12XX → BES2600 rename is a 30+ file mass-substitute that conflict-spams every open topic branch. It is the right fix *for upstreaming*, not for the cleanups branch.
 - **Don't refactor the workqueue topology.** Three workqueues is fine. Two workqueues for cosmetic reasons risks priority inversion under coex pressure.
 - **Don't replace the BH thread architecture.** It works, the wait-queue model is well-suited to the IRQ → drain pattern, and replacing it with NAPI or threaded-IRQ would re-do six years of debugging in a single patch.
 - **Don't strip the `#ifdef CONFIG_BES2600_TESTMODE` blocks** until upstream-prep. They are vendor-SDK debt but harmless dead code.
 - **Don't wrap the Allwinner helper** unless RK3566 actually trips it. The path is rare-error.
 ## 7. What I would tell a fresh reviewer in one paragraph
 > *This driver is genealogically a CW1200 driver (ST-Ericsson, ~2010) with chip-name search-and-replace done halfway. The hot path was designed for SPI with one-frame-per-IRQ; SDIO multi-block coalescing was bolted on with a worker-queue handoff that adds two synchronization points per frame. Bug #5's RX-throughput regression at 4 MB/s is a direct consequence: at low rate the handoff overhead is invisible; at high rate it dominates. Three small patches (Patches C, D, E) reclaim most of the floor without touching the genealogy. The genealogy itself is technical debt for upstreaming, not a Bug-#5 dependency. Don't conflate the two.*
 ---
 ## 8. Disagreements summary
 | Sonnet claim | My finding |
 |---|---|
 | "9 workqueue events per delivered RX frame" | overstated; per IRQ batch is 1 workqueue dispatch on this build. The 5,643/sec ftrace count is system-wide, needs per-pid filtering before claiming as bes2600 dispatch rate. |
 | "BES_SDIO_OPTIMIZED_LEN config flag" | hard-baked in Makefile as `-D…` ccflags, not toggleable |
 | Item 4 / Item 5 sized as one patch each | concur — separate small patches as Markus directed |
 | Item 1 + 2 mergeable | concur — directionally; predicated on `ieee80211_rx_list()` contract (Task #19) |
 ## 9. Open questions for Markus
 1. **Patch C split-or-merge:** user directive is "wrap together". I'd note that a Phase-7 regression in the merged patch is harder to bisect than two sequential Phase-7 runs. Keeping the directive but recording the bisect-cost as known.
 2. **`__bes2600_irq_enable(1)` commented out:** is IRQ re-enable intentionally always-on now, or is the `nop` a deletion-in-progress bug? Reading the c-stack history doesn't tell me. Worth a "what was this for" pass before any RX-architecture patch lands.
 3. **`sw_mci_check_r1_ready` on RK3566:** should we test or just trust the path is rare-error? My read is: trust + `WARN_ON` if it's ever called, then react.
 ---
 *Written 2026-05-07. Reviewing as Opus 4.7 against Sonnet 4.6's review of the same source tree. Independent reads of: bh.c, bes2600_sdio.c (sdio_rx_work + pipe_read + IRQ handler), txrx.c (RX delivery sites + ba_lock + ps_state_lock sites), bes2600.h (struct lock topology), Makefile (build-system shape). No simulator runs; this is a static-analysis writeup, the dynamic verification of any claim above belongs in Phase 7 of the corresponding patch.*
@@ -0,0 +1,184 @@
 # Patch C — Phase 4 Plan: collapse sdio_rx_work into BH
 **Author:** Claude (noether)
 **Status:** Phase 4 — pending Phase 5 second-model review before any Phase 6 code.
 **Scope:** **item 1 only** (per merged PR #8 inline review: "do it sequentially; we're not on the clock").
 **Item 2** (batch deliver via `ieee80211_rx_list`) splits to **Patch C2**, gated on Task #19 kerneldoc verification.
 ---
 ## §0 Substrate — anchored
 Bug #5 anchor (recorded 2026-05-07, see `notes/phase1-bug5-2026-05-07.md`):
 - Sender: netcat-over-WiFi, 4 MB/s cap, 2.4 GHz, AVM AP, single-STA
 - Receiver: ohm (PineTab2, RK3566 + BES2600WM-SDIO)
 - N=3 baseline reps: 725 / 663 / 75 KB/s (rep 3 saw link-death at ~9 min)
 - `perf record -g` during 4MB/s window: `_raw_spin_unlock_irqrestore` ≈ 20% CPU
 - ftrace lock-instrumentation, system-wide: `workqueue_execute_start` ≈ 5,643/sec
 - Driver-side count: `wsm_cmd_send` 13/sec — wsm command path is *not* the dispatch source; the contributor is the per-SDIO-transaction relay.
 Root cause traced in PR #7 (Sonnet review) and concurred in PR #8 (Opus review): RX path adds two synchronization points per frame and one wait-queue wake-up per IRQ batch via `sdio_rx_work` → `rx_queue` → `bh_work` indirection.
 ## §1 Goal (locked)
 Reduce per-RX-frame overhead enough that observed receive ≥ 1.0 MB/s sustained @ 4 MB/s sender, with `_raw_spin_unlock_irqrestore` < 15 % CPU during the 4 MB/s window. No 30-min cascade to link-death.
 (This is a partial step toward Phase 1's full target of ≥ 2 MB/s sustained @ 4 MB/s with < 10 % lock CPU. The full target is jointly addressed by Patch C + Patch C2; Patch C alone should *cross half the gap*.)
 ## §2 Situation
 - `bes2600.ko` srcversion `1B3B3ED0…` deployed on ohm (c-stack + Patch A + Patch B).
 - `cleanups` branch on `marfrit/bes2600-dkms` is the current source-of-truth.
 - Build sandbox `/var/tmp/c6-sandbox/` on ohm, native `make -j4`.
 - `BES2600_RX_IN_BH` is **defined** in Makefile — `bes2600_bh_rx_helper` is the active RX consumer.
 - ohm reachable. Markus pushes the reboot button; never me.
 - Test rig under `/root/bes2600-samples/` — `rep-trace.sh` per-rep capture script.
 ## §3 Baseline measurements
 Reused from Bug #5 Phase 0 (above). No re-anchor needed for Patch C — same regime.
 **Specific Phase-3-units that this plan's predictions reference:**
 | metric | tool | current value (4MB/s window) |
 |---|---|---|
 | observed receive throughput | netcat receiver byte-count | 75–725 KB/s, rep-variance high |
 | `_raw_spin_unlock_irqrestore` CPU% | perf record / report | ~20% |
 | `workqueue_execute_start`/sec | ftrace `workqueue:workqueue_execute_start` | ~5,643/sec system-wide |
 | `bes_sdio` workqueue dispatches | `cat /sys/kernel/tracing/events/workqueue/.../filter` filtered by `bes_sdio` | not measured pre-patch — **TODO before Phase 6** |
 | RX SKB rate at mac80211 boundary | trace `mac80211:drv_rx_irqsafe` count | not measured pre-patch — **TODO before Phase 6** |
 Phase 6 must not start until the two TODOs above are filled in — otherwise Phase 7 has no reference point for the predicted-delta comparison.
 ## §4 Plan
 ### §4.1 What will be touched
 - `bes2600_sdio.c::sdio_rx_work` — the relay loop. After this patch, it still drains the SDIO bus into SKBs but **delivers SKBs directly into `wsm_handle_rx`** instead of `skb_queue_tail`-ing them onto `self->rx_queue`.
 - `bes2600_sdio.c::bes2600_sdio_extract_packets` — the inner per-SKB extractor. Changes the in-loop action from `skb_queue_tail(&self->rx_queue, skb)` to a direct call (or callback) into the wsm dispatcher.
 - `bes2600_sdio.c::bes2600_sdio_pipe_read` — becomes unused, removed.
 - `bh.c::bes2600_bh_rx_helper` — its `BES_SDIO_RX_MULTIPLE_ENABLE` branch is no longer reachable for RX (RX path no longer feeds bh). Either gate the helper, or remove the helper outright if `bh_rx` atomic is no longer raised on RX.
 ### §4.2 What will NOT be touched
 - `ieee80211_rx_irqsafe()` call sites — that's Patch C2 (item 2).
 - TX path — `sdio_tx_work`, `bes2600_bh_tx_helper`, etc. Untouched.
 - `sdio_wq` workqueue alloc — stays. After patch it hosts only `tx_work` + `scan_work` + (briefly during patch) `rx_work`. Renaming is cosmetic and out of scope.
 - The bh thread itself — still runs, still handles TX, still watches the timeouts.
 - `bh.c` `#if 0` graveyard — separate hygiene patch, not bundled.
 - `__bes2600_irq_enable(1)` commented-out / `asm volatile("nop")` placeholder — **deferred** per `feedback_dont_patch_downstream_artifacts`. These are symptom-shaped; Patch C may dissolve them. Re-evaluate at Task #24 (post-Patch-E observation).
 - `bh_rx` / `bh_tx` atomic split — out of scope.
 ### §4.3 Approach choice — Option A (sdio_rx_work direct delivery)
 Two structural options surveyed in PR #8 §2.1; recap:
 | | Option A: direct delivery from sdio_rx_work | Option B: subsume sdio_rx_work into bh thread |
 |---|---|---|
 | diff size | small | medium |
 | eliminates `rx_queue->lock` × 2 per frame | yes | yes |
 | eliminates `sdio_wq.rx_work` workqueue dispatch per IRQ | no | yes |
 | changes who calls `wsm_handle_rx` | sdio_wq context (already process context) | bh thread |
 | TX/RX SDIO bus contention | unchanged (sdio_rx_work and sdio_tx_work already share `bes2600_sdio_lock`) | adds bh ↔ sdio_tx_work contention on the SDIO mutex |
 | bisection isolation | clean: only the rx_queue handoff is removed | mixes "remove handoff" with "subsume thread" |
 **Choosing Option A.** Reasons:
 1. Smaller diff = clearer Phase-7 attribution. If RX KB/s rises, we know it was the rx_queue handoff, not the workqueue topology.
 2. Per Markus's PR #8 review: split was for bisection clarity. Option A is narrower than Option B.
 3. The remaining cost (per-IRQ `sdio_wq.rx_work` dispatch) is ≤ 1 dispatch per IRQ batch; multi-RX coalescing means several frames per dispatch. If Phase 7 of Patch C shows that dispatch IS the residual cost, that becomes a concrete data point and motivates a *measured* Option-B follow-up, not a speculative one.
 ### §4.4 Implementation sketch (preview — actual code in Phase 6)
 **Today** (`bes2600_sdio.c:783–831`):
 ```c
 static int bes2600_sdio_extract_packets(...) {
    for each packet:
        skb = dev_alloc_skb(...);
        memcpy(skb->data, &data[pos], packet_len);
        spin_lock(&self->rx_queue_lock);
        skb_queue_tail(&self->rx_queue, skb);    // ← handoff
        spin_unlock(&self->rx_queue_lock);
 }
 static void sdio_rx_work(...) {
    bes2600_sdio_extract_packets(...);
    self->irq_handler(self->irq_priv);           // ← wakes bh_wq
 }
 // bh thread later: pipe_read = skb_dequeue(rx_queue) → wsm_handle_rx(skb)
 ```
 **After patch** (sketch):
 ```c
 static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8 *data) {
    for each packet:
        skb = dev_alloc_skb(...);
        memcpy(skb->data, &data[pos], packet_len);
        ret = wsm_handle_rx(self->core, wsm_id_from(skb), wsm_hdr_of(skb), &skb);
        if (skb) dev_kfree_skb(skb);
        // no rx_queue, no spinlock, no wake-up
 }
 static void sdio_rx_work(...) {
    bes2600_sdio_extract_packets(...);
    // self->irq_handler(...) is no longer called for RX-only wakes
    // (it remains called for TX-confirm-completion paths, if any)
 }
 ```
 Caveats discovered during sketch:
 - `wsm_handle_rx`'s signature wants `(hw_priv, id, wsm_hdr*, **skb)`. `extract_packets` doesn't currently parse the wsm header — we either parse it inline (cheap; the cost is one `__le16_to_cpu`) or defer parsing into a new `bes2600_sdio_deliver_rx(skb)` helper that wraps it.
 - `hw_priv` is reachable as `self->core`.
 - Need to verify `wsm_handle_rx` is callable from sdio_wq context. **Hypothesis:** yes, because today's bh thread is also process-context-via-workqueue and that's where wsm_handle_rx already runs. Phase 6 contract-cite from `wsm.h` / call-graph confirms.
 - The `irq_handler(self->irq_priv)` wakeup at sdio_rx_work:902 — keep it, but confirm whether bh actually has remaining work after RX is gone. Possibilities: TX-confirm completions (`wsm_release_tx_buffer`) still need a bh wake. Verify in Phase 6.
 ### §4.5 Predicted delta (Phase 3 units)
 Conservative because Patch C is item 1 only, not items 1+2.
 | metric | predicted change | confidence |
 |---|---|---|
 | `rx_queue->lock` acquire/release rate | → 0 (lock is removed entirely; struct field deleted) | high |
 | RX-path wait-queue wakes (`bh_wq` from sdio_rx_work for RX) | → 0 (TX-confirm wakes remain) | high |
 | `_raw_spin_unlock_irqrestore` CPU% | 20 % → 12–15 % | **medium** — the rx_queue lock is one of several contributors; I don't have per-lock breakdown pre-patch |
 | `workqueue_execute_start`/sec | marginal change (≤ 5 %) | high — sdio_wq dispatch still happens per IRQ |
 | observed receive @ 4 MB/s | floor lifts from 75 KB/s → ≥ 1.0 MB/s; rep-variance shrinks | **medium** — rep 3's link death has multiple causes (decrypt-storm path is Patch A's territory; AP-side `aid 30` rejection is also possible) |
 | Phase 7 N=3 outcome | all reps ≥ 1 MB/s sustained for 30 min @ 4 MB/s | **medium** |
 **Honest acknowledgement:** the medium-confidence predictions are the ones where Phase 7 either confirms the model or surfaces a new bug. If `_raw_spin_unlock_irqrestore` only drops to 18 %, the next-largest contributor was something else — `pool->lock` (workqueue infrastructure) or `ba_lock` — and Patch D/E/C2 become the answer.
 ### §4.6 Risks
 1. **`wsm_handle_rx` not callable from sdio_wq**: low probability (process context, same shape as today's bh), but a cite-failure here means revert to Option B. **Phase 6 must produce a `wsm.h` contract citation** before code lands.
 2. **TX-confirm wake-ups stop firing**: if `wsm_handle_rx` was the only thing that ultimately bumped `bh_tx`, removing it from bh's input causes TX-confirm starvation. Mitigation: keep `irq_handler(irq_priv)` call in sdio_rx_work for now; let the bh's wait_event re-evaluate `bh_tx` on every wake. **Verify in Phase 6 that `wsm_release_tx_buffer` still wakes bh.**
 3. **SKB allocation under memory pressure**: `dev_alloc_skb` in extract_packets currently `msleep(100)` retries up to 10×. Calling `wsm_handle_rx` directly from extract_packets keeps us in sdio_wq context during sleep; that's the same as today, so no new risk.
 4. **rcu / locking invariants in `wsm_handle_rx`**: it traverses `priv->vif_list`, may grab `priv->vif_lock`. Currently called from bh thread. After patch: called from sdio_wq context. Both are process context, both can sleep. No new risk *unless* there's a held lock at sdio_wq level that wsm_handle_rx tries to re-acquire. **Phase 6 lock-graph audit required.**
 5. **`bes2600_chrdev_is_bus_error()` early-return**: currently checked in `pipe_read`. After patch, must move into `extract_packets` or `sdio_rx_work` so RX during a bus-error window still gets dropped, not passed to mac80211.
 6. **Multi-vif RX serialization**: the `rx_queue` is per-sbus_priv, not per-vif. After patch, multi-vif demux happens inside `wsm_handle_rx` (same as today). No new risk; same ceiling.
 ### §4.7 Phase 5 review handover
 Goal/Situation/Measurements/Plan paste verbatim into DokuWiki when Markus initiates handover. **Do not curate** the plan for the reviewer — including the "medium-confidence" predictions and the §4.6 risk list verbatim. Reviewer should see the same uncertainty I have.
 ### §4.8 Phase 7 protocol (after Phase 6 lands)
 Per `feedback_phase7_stress_ramp.md` — **stress ramp, not steady cap**:
 1. Pre-patch baseline (re-anchor): 5 min @ 1 MB/s, 10 min @ 2 MB/s, 30 min @ 4 MB/s. Capture ftrace `workqueue/`, `lock/`, `mac80211/`, `mmc/`. perf record during the 4 MB/s window.
 2. Apply Patch C, install, reboot (Markus pushes).
 3. Post-patch: identical ramp, identical instrumentation.
 4. Compute deltas in **the same units** as §3 baseline. Compare to §4.5 predictions. Any unexplained delta is a finding, not a footnote — log it and loop back to Phase 4 if the model is wrong.
 5. **N=3 reps** post-patch. The user's stress-ramp memory and the receipts checklist both require this.
 6. Capture `sdio_work_debug` output and `dmesg` if any storm fires (Patch A's counter should hold steady).
 7. If Phase 7 numbers match prediction → Phase 8 memory update + proceed to Patch C2.
 8. If they don't match → loop back to Phase 4. Don't paper-fix.
 ## §5 Out-of-scope items recorded for follow-on patches
 - **Patch C2**: items 2 — `ieee80211_rx_list` batch delivery. Gated on Task #19 kerneldoc verification.
 - **Patch D**: ba_lock atomicization at `txrx.c:998-1005, 1632`. Independent.
 - **Patch E**: ps_state_lock skip when `pm_unsupported = true` at `txrx.c:1942-1948`. Independent, gated on c7 latch.
 - **Task #24**: post-Patch-E observation of bh.c `asm volatile("nop")`, commented-out `__bes2600_irq_enable(1)`, BUG_ON in steady-state hot path. Symptom-shaped; observe before patching.
 - **Task #25**: measure `sw_mci_check_r1_ready` on RK3566 during testing.
 ---
 *Plan written 2026-05-07 by Claude (noether). Awaiting Phase 5 second-model review on DokuWiki, initiated by Markus.*
@@ -0,0 +1,136 @@
 # Patch C v2 — Phase 4 Plan: atomic_t prep + direct-deliver
 **Author:** Claude (noether)
 **Status:** Phase 4 v2 — Phase 7 of Patch C (notes/patch-c-phase4-plan-2026-05-07.md, PR #9 merged) failed with a thread-safety race; this is the redesign.
 **Decision:** Option B from PR #3 close-out comment — `atomic_t` prep refactor first, direct-deliver on top.
 ---
 ## §0 What just happened (Phase 7 of Patch C)
 Reproduced verbatim from boot -1 of ohm 2026-05-07 20:18:10 CEST, ~13 s into a 4 MB/s nc stress:
 ```
 WARNING: at wsm_release_tx_buffer+0x84/0xa0 [bes2600], CPU#0: kworker/0:3H/3912
 Workqueue: bes_sdio sdio_rx_work [bes2600]
 pc : wsm_release_tx_buffer+0x84/0xa0 [bes2600]
 lr : bes2600_bh_handle_rx_skb+0x134/0x370 [bes2600]
 sdio_rx_work+0x2a8/0x540 [bes2600]
 bes2600_wlan: wsm_release_tx_buffer failed: -1
 ```
 Storm continued; chip wedged; ohm fell off the WiFi (wlan0).  Patch C module preserved at `/var/tmp/bes2600.patchC-broken.ko` for forensics.  Patch B rolled back, currently on disk on ohm.  Lesson saved as `feedback_phase6_contract_threadsafety` memory.
 ## §1 Why it failed
 `wsm_release_tx_buffer()` (bh.c:222–243) does **unlocked** read–modify–write on `hw_priv->hw_bufs_used`.  Pre-Patch-C invariant was single-writer = BH thread; the lock that mattered was structural, not annotated.  Patch C's direct-deliver moved one writer (RX-confirm decrement) into `sdio_rx_work` workqueue context.  BH thread + sdio_rx_work race on the int counter; underflow below zero, WARN, return -1, bookkeeping corrupt, TX wedges.
 Phase 6 contract block correctly cited `wsm_handle_rx`'s sleepability and held-lock invariants — but stopped at the called function's signature.  It did not enumerate `hw_bufs_used` as shared state mutated by the callee.  That's the gap.
 ## §2 Shared-state delta table (the thing missing from Patch C)
 Every field that `bes2600_bh_handle_rx_skb` mutates either directly or transitively, with current protection and required action:
 | field | declared at | written by (today) | written by (after Patch C v2) | current protection | action needed |
 |---|---|---|---|---|---|
 | `hw_priv->hw_bufs_used` | bes2600.h | `wsm_alloc_tx_buffer` (bh thread, TX submit), `wsm_release_tx_buffer` (bh thread, RX confirm), `main.c:543` (init) | + `wsm_release_tx_buffer` from sdio_rx_work | single-writer = BH thread (structural) | **convert to `atomic_t`** |
 | `hw_priv->hw_bufs_used_vif[i]` | bes2600.h | `wsm_release_vif_tx_buffer` (bh thread), `bh.c:1271` (vif TX submit), init | + `wsm_release_vif_tx_buffer` from sdio_rx_work | single-writer = BH thread | **convert to `atomic_t [N]`** |
 | `hw_priv->wsm_rx_seq[i]` | bes2600.h | bh thread RX | sdio_rx_work only | single-writer = BH/sdio_rx context (was BH, now is sdio_rx_work, but still **one writer**) | OK — single writer |
 | `hw_priv->wsm_tx_pending[i]` | bes2600.h | `bes2600_bh_inc_pending_count` (TX submit, BH thread), `bes2600_bh_dec_pending_count` (RX confirm) | dec moves to sdio_rx_work; inc stays BH | single-writer = BH | **also needs `atomic_t`** |
 | `hw_priv->lmac_mon_timer` / `mcu_mon_timer` | bes2600.h | mod_timer / del_timer_sync from BH | ditto from sdio_rx_work | timer API is internally locked | OK — `mod_timer` is concurrency-safe |
 | `hw_priv->wsm_cmd.lock` (taken inside wsm_handle_rx) | wsm_buf | bh thread (today) | sdio_rx_work | spinlock | OK — already protected |
 | `hw_priv->vif_lock` (taken inside wsm_handle_rx for some paths) | per vif | bh thread today | sdio_rx_work | spinlock | OK |
 | `priv->bh_evt_wq` wake-up | bes2600.h | wsm_release_tx_buffer when count hits 0 | ditto from sdio_rx_work | wake_up is concurrency-safe | OK |
 | `bes2600_pwr_clear_busy_event` (called inside release) | bes_pwr | bh thread | sdio_rx_work | internal locking via `bes_power.lock` | OK |
 | `hw_priv->buf_released` | bes2600.h | only `wsm_release_buffer_to_fw` (MCAST_FWDING ifdef, AP-only) | unchanged — BH only | single-writer = BH | OK — not on Patch C v2 hot path |
 **Three fields require atomic_t conversion:** `hw_bufs_used`, `hw_bufs_used_vif[]`, `wsm_tx_pending[]`.  Everything else is already concurrency-safe or moves cleanly to single-writer-in-sdio_rx_work.
 ## §3 Read-site survey (the rest of the work — atomic_read swaps)
 `grep -hE "hw_bufs_used\b|hw_bufs_used_vif\b" *.c *.h | wc -l` = **57 references** across the source tree:
 - 5 writers (above)
 - 52 readers — converted mechanically to `atomic_read()`.  Distribution:
  - `bh.c`: 22 read sites (most in the bh main loop, BUG_ON gates, idle / suspend predicates)
  - `sta.c`: 3 sites (PM idle check at sta.c:1231–1253)
  - `bes2600_sdio.c`: 1 site (PM idle check at line 958)
  - `main.c`: 2 sites (init zero, teardown wait)
  - `debug.c`: 1 site (debugfs stats)
  - `itp.c`: 1 site (test mode)
 `wsm_tx_pending[i]` site count is smaller — ~6 references, all in bh.c and the timer monitors.  Same mechanical conversion.
 ## §4 Plan v2 — two-step
 **Patch C-prep** (NFC, lands first):
 - Convert `hw_bufs_used` from `int` → `atomic_t`.
 - Convert `hw_bufs_used_vif[CW12XX_MAX_VIFS]` from `int[]` → `atomic_t[]`.
 - Convert `wsm_tx_pending[2]` from `int[]` → `atomic_t[]`.
 - Update writers:
  - `wsm_alloc_tx_buffer`: `atomic_inc(&hw_priv->hw_bufs_used)`.
  - `wsm_release_tx_buffer`: rewrite with `atomic_fetch_sub_release(count, &hw_priv->hw_bufs_used)` — returns prior value.  Re-derive the "tx restart" predicate (`prior >= numInpChBufs - 1`) and the "wake bh_evt_wq + clear busy" predicate (`prior - count == 0`) from that.  WARN if `prior - count < 0`.
  - `wsm_release_vif_tx_buffer`: same pattern on the array element.
  - `bes2600_bh_inc/dec_pending_count`: use `atomic_inc` and `atomic_dec_return` (need post-decrement value to decide whether to del_timer).
 - Update all 52+6 read sites: mechanical `atomic_read()` swap.
 - `main.c:543` init: `atomic_set(&hw_priv->hw_bufs_used_vif[i], 0)`.
 **Patch C-prep does NOT change behaviour.**  Same atomic ordering (`_release` / `_acquire` chosen to match the implicit memory ordering the BH-only path had).  Phase 7 of C-prep alone should show **identical** numbers to pre-patch baseline (`run-20260507-patchC-preflight`): 1.36 MB/s, 86.4 sdio_rx_work/sec, 90.3 dispatches per 1000 RX pkts, 0 bh_work redispatches.  If Phase 7 of C-prep shows a delta, the atomic ordering is wrong and we loop back here, not to C v2.
 **Patch C v2** (the actual structural change, lands on top of C-prep):
 - Identical to Patch C as merged in PR #3 (since closed): direct-deliver from `bes2600_sdio_extract_packets` into `bes2600_bh_handle_rx_skb`, no `rx_queue` indirection, no bh wake-up for RX.
 - The contract block in `bh.c::bes2600_bh_handle_rx_skb` is **expanded** to include the shared-state delta table from §2 of this plan, with explicit citations.
 - Same minimum-diff scope as Patch C: keep `rx_queue`, `pipe_read`, `bh_rx_helper` for clean bisection; remove in a follow-up hygiene patch.
 ## §5 What will NOT be touched (deferred or out of scope)
 - mac80211-side `ieee80211_rx_irqsafe` → `ieee80211_rx_list` migration: that's Patch C2, gated on Task #19 kerneldoc verification.
 - The `#if 0` graveyard in bh.c, the `asm volatile("nop")` placeholder, the BUG_ON in steady-state hot path: still symptom-shaped per `feedback_dont_patch_downstream_artifacts`.  Re-evaluate at Task #24 after C v2 / D / E land.
 - `ba_lock` (Patch D) and `ps_state_lock` (Patch E): independent.
 ## §6 Risk list (per Phase 6 contract-thread-safety memory)
 1. **C-prep memory ordering**: I've chosen `atomic_fetch_sub_release` for `wsm_release_tx_buffer` to mirror the implicit BH-thread ordering (release before subsequent atomic ops on `bh_evt_wq` / `bes_power`).  If the BH thread or other readers expect `_acquire` semantics on the value, we get reordering bugs that are hard to reproduce.  **Mitigation:** pair with `_acquire` reads where the read-then-decision pattern is critical (e.g., the bh main loop's `if (!hw_priv->hw_bufs_used)` idle predicate).  Cite the kerneldoc reference for `atomic_fetch_sub_release` in the commit message.
 2. **`wsm_tx_pending[]` decrement-side timer interaction**: `bes2600_bh_dec_pending_count` does `if (--hw_priv->wsm_tx_pending[idx] == 0) del_timer_sync(timer); else mod_timer(timer, ...)`.  After atomic_t conversion: `if (atomic_dec_return(&hw_priv->wsm_tx_pending[idx]) == 0) ...`.  But *another* thread could `atomic_inc` between our dec and the timer call, racing the del_timer.  `del_timer_sync` is internally safe (it can be called concurrently with `mod_timer`), but the **decision** "whether to delete vs mod" is racy.  **Mitigation:** even after atomic conversion, this function still needs to be called from a single context.  Verify `inc/dec_pending_count` callers — if both sides only fire from BH and sdio_rx_work and never overlap on the same idx, we're fine; if not, this needs a lock.
 3. **`hw_bufs_used_vif[]` array vs `wsm_alloc_tx_buffer`**: vif counter increment lives at bh.c:1271, called from bh thread TX-submit path.  Decrement (`wsm_release_vif_tx_buffer`) called from RX-confirm.  After Patch C v2 the decrement is in sdio_rx_work — same race shape as the global counter.  Already covered by the atomic_t array conversion.
 4. **PM idle predicate at sta.c:1239**: reads `hw_priv->hw_bufs_used_vif[priv->if_id]` to decide can-sleep.  Currently racy (was already reading BH-mutated state from a non-BH PM context).  Atomic conversion makes the read coherent.  PM context's read-then-decide is still fundamentally a snapshot — no change in semantics, just no torn-read.
 5. **Reboot / module-unload teardown** (`main.c:840`): `wait_event_timeout(... !hw_priv->hw_bufs_used ...)`.  Becomes `... !atomic_read(...)`.  No semantic change — the wait_event macro re-evaluates the predicate on each wake.
 6. **Phase 7 rig: Patch C v2 still wedges chip if I missed anything**: now mitigated by ohm's new wired interface (enu1, 192.168.88.80) — survives bes2600 wedges, lets us collect dmesg / ftrace / journalctl from a wedged ohm without reboot.  See `reference_ohm_wired_iface` memory.
 ## §7 Phase 5 review handover
 PR on git.reauktion.de/marfrit/besser, this file as the artifact (per `feedback_phase5_surface_is_pr`).  Specifically request reviewer focus on §2 shared-state delta table — that's the part that should have caught Patch C's bug.  Don't curate.
 ## §8 Phase 6 implementation order
 1. Branch off `cleanups` on bes2600-dkms-mobian: `bes2600/atomic-tx-buf-counters` (= Patch C-prep).
 2. Mechanical refactor: `int hw_bufs_used` → `atomic_t hw_bufs_used`, all reads → `atomic_read`, all writes → atomic ops.  Same for vif array and tx_pending array.  No other changes.
 3. Build, install, smoke-test.  Phase 7 of C-prep.  Should be a no-op delta.
 4. PR + Phase 5 review + merge.
 5. Branch off C-prep: `bes2600/sdio-rx-direct-deliver-v2` (= Patch C v2).
 6. Re-apply the Patch C delta (3 files: bh.h, bh.c, bes2600_sdio.c — same edits as PR #3).
 7. Build, install, Phase 7 N=3 stress ramp.
 8. PR + Phase 5 review + merge.
 ## §9 Phase 7 v2 protocol (per `feedback_phase7_stress_ramp` + wired-rig)
 1. Pre-C-prep baseline rep N=3 (re-anchor, since current N=1 baseline is from `run-20260507-patchC-preflight`).
 2. Apply C-prep, N=3.  Compare to pre.  Expect: zero meaningful delta.  If non-zero → memory-ordering bug, loop back to §4 atomic-ordering choice.
 3. Apply C v2, N=3.  Compare to C-prep baseline.  Expect: §4.5 of original Patch C plan's predicted delta (rx_queue lock acquires → 0, observed RX KB/s lifts toward ≥1 MB/s sustained @ 4MB/s).
 4. **All Phase 7 stress runs use the wired path (`ssh mfritsche@192.168.88.80`) for telemetry collection.**  When the chip wedges (it shouldn't this time, but planning for it), wlan0 stops responding but enu1 stays alive.  Collect dmesg / ftrace / journalctl over enu1 BEFORE rebooting.  This is the data we lost in Patch C boot -1 because wlan0 was the only path.
 5. N=3 reps per phase per `feedback_phase7_stress_ramp`.  Don't accept N=1 as verification.
 ## §10 Closeout
 If C-prep + C v2 both pass Phase 7: proceed to D (ba_lock atomicization), E (ps_state_lock skip).  Markus's "we're not on the clock" applies — sequencing per bisection clarity, not delivery deadline.
 ---
 *Plan written 2026-05-07 by Claude (noether), in response to Patch C Phase 7 failure.  Phase 5 review = PR comments on this artifact at git.reauktion.de/marfrit/besser.  Don't curate the shared-state delta table for the reviewer — that's the part the previous round's reviewer should have caught me on.*
@@ -0,0 +1,135 @@
 # Bug #5 campaign — Phase 1 metric + Phase 2 situation
 Date assembled: 2026-05-07
 Module under test (baseline): bes2600.ko srcversion `1B3B3ED096AAD7217FEDE11`
                              (cleanups + Patch A + Patch B)
 Phase 0 anchor at N=3 reps (10 min each, 4 MB/s pv-cap on boltzmann → ohm
 2.4GHz 5b:32) reproduces the throughput regression and traces it to lock-
 cycling cost in the bes2600 BH path. See `notes/observed-bugs.md` Bug #5
 for the original report. This document locks Phase 1 and prepares Phase 4
 candidates.
 ---
 ## Phase 1 — measurable target (locked)
 > *Reduce the bes2600 BH path's spin-unlock-irqrestore cost so that under sustained 4 MB/s sender pressure on a healthy 2.4GHz link (signal -55 to -65 dBm), ohm sustains ≥ 2 MB/s observed RX throughput (vs 663–725 KB/s baseline at N=3) AND the link survives ≥ 30 min continuous load without cascading into beacon-loss disconnect (vs rep3's failure at ~9 min).*
 Three measurable outcomes, single sentence:
 - **(a) Throughput floor**: ≥ 2 MB/s sustained RX at ohm
 - **(b) Lock-cycle ceiling**: % CPU in `_raw_spin_unlock_irqrestore` from `bes2600_bh.isra.0` callstack drops to < 10 % (currently ~10 % rep1, ~16 % rep3)
 - **(c) Cascade prevention**: no link death under continuous 30 min @ 4 MB/s
 ---
 ## Phase 0 anchor — receipts
 ### Reproduction protocol (same units as Phase 7 will use)
 1. boltzmann: `pv -L 4M -q < /dev/zero | nc ohm.fritz.box 12345`
 2. ohm: `sudo $RUN/rep-trace.sh 600` (10 min capture window)
 3. Rep dirs: `bug5/rep-<ts>/{mmc.log, perf.data, rx_bytes.tsv, start.txt, end.txt}`
 4. N=3 reps with 60 s cooldowns
 ### Observed (2026-05-07 15:36–16:08)
 | rep | duration | avg KB/s | near-zero ticks | end state |
 |---|---|---|---|---|
 | 1 | 600 s | 725 | 1/119 | associated, MCS 6 |
 | 2 | 600 s | 663 | 5/119 | associated, MCS 4 |
 | 3 | 600 s |  75 | 53/119 | **passive (link died at sample ~82, ~9 min in)** |
 mmc transaction rate: rep1 = 5793/s sustained, rep3 = 6000/s for first ~10s then collapse to <100/s.
 ### Hot-symbol receipts (perf record on `bes_sdio | bes2600_bh` kworkers)
 | symbol | rep 1 (healthy) | rep 3 (cascade) |
 |---|---|---|
 | `_raw_spin_unlock_irqrestore` (sum across kworker variants) | **~19 %** | **~21 %** |
 | `handle_softirqs` | 5.4 % | 4.3 % |
 | `__tasklet_schedule` | 2.4 % | 2.0 % |
 | `dw_mci_start_command` (SDIO host) | 1.5 % | < 0.6 % |
 | `bes2600_sw_retry_requeue` | 0.79 % | 0.70 % |
 Top callchain leading to `_raw_spin_unlock_irqrestore`:
 ```
 process_one_work → worker_thread → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0
 ```
 ---
 ## Phase 2 — situation analysis
 ### Relevant source pins
 - `bes2600/wsm.c:98` — `wsm_cmd_send()`, the function in the hot callstack. Body:
  - holds `&hw_priv->wsm_cmd.lock` (spin) only briefly to fill the cmd struct (lines ~145–152)
  - calls `bes2600_bh_wakeup()` then `wait_event_interruptible_timeout` for the response
  - outer lock: `down(&hw_priv->wsm_cmd_sema)` from callers (`wsm_cmd_lock()` at wsm.c:105)
 - `bes2600/bh.c:435,559,847` — `bes2600_bh_work()` takes `&hw_priv->vif_list_lock` 3× per pass through the dispatcher
 - `bes2600/bh.c:172,195,487,581,861,1361,1427` — multiple `wait_event*` calls; the BH thread cycles through wait/wake/dispatch
 ### What the lock-cycling cost is buying
 Each WSM command from the host (mac80211, NM, kernel scan etc.) goes through the same path:
 1. caller acquires `wsm_cmd_sema` (outer)
 2. `wsm_cmd_send()` acquires `wsm_cmd.lock`, fills the struct, releases the lock
 3. `bes2600_bh_wakeup()` schedules the BH
 4. BH dispatches the command, takes `vif_list_lock` to walk vifs
 5. BH talks to chip via SDIO
 6. response arrives, BH wakes the waiter
 7. caller releases `wsm_cmd_sema`
 Under sustained TCP RX, mac80211 issues lots of small WSM commands (TX-scheduling hints, rate updates, etc.) — every one cycles through this path. The spin-unlock-irqrestore cost is the floor on this cycle rate.
 ### What's been ruled out
 - AP-side bug (AVM Fritz!Box, reliable per Markus's testimony — see `feedback_phase7_stress_ramp.md` reasoning in the campaign-so-far prior).
 - Patch A / Patch B (target different triggers; would not address lock cost).
 - Decrypt-failure storm (Patch A handles; this regression occurs in rep1 with zero decrypt-fails).
 - mac80211 scan-fail / scan-comeback (cosmetic; doesn't account for the throughput floor).
 ---
 ## Phase 4 — candidate plans (preliminary, not locked)
 Three candidates surfaced from the perf data. Listed cheapest to most-invasive.
 ### B5-1 — Reduce `wsm_cmd_send` lock scope
 The spin_lock around the cmd-struct fill (wsm.c:145–152) can probably be replaced with `WRITE_ONCE` of a single sentinel field, since the BH thread reads these fields cooperatively (BH only reads after `bh_wakeup` schedules it, and only writes back via the response path). Eliminates the per-command spin-lock cycle for the host side.
 **Risk**: race with BH if the protocol isn't strictly happens-before. Need to read bh.c:486-500 (where BH reads wsm_cmd.ptr) carefully.
 **Predicted delta**: small but measurable. Maybe 2–3 % CPU off the lock floor.
 ### B5-2 — Coalesce `vif_list_lock` in BH dispatcher
 bh.c takes `vif_list_lock` 3× per dispatcher pass. If these 3 critical sections are within a single iteration, they should be merged into one acquire/release.
 **Risk**: vif teardown might depend on releasing the lock between iterations to allow concurrent vif removal. Needs careful audit.
 **Predicted delta**: significant under multi-vif (we're single-vif STA today, so probably small immediate gain).
 ### B5-3 — Move WSM send out of process context, use ringbuffer
 Replace the wsm_cmd_sema + wsm_cmd struct mechanism with an SPSC ringbuffer between caller and BH. Caller writes to ring, no lock needed (one producer); BH reads from ring, no lock needed (one consumer).
 **Risk**: significant rework. cw1200 ancestor doesn't have this; we'd be inventing it.
 **Predicted delta**: large — could halve the lock cost. But cost-to-implement is also large.
 ---
 ## Open question for the reviewer
 Which Phase 4 candidate to lock? My ranking by ROI:
 1. B5-1 (smallest, fastest, cleanest source pin) — try first
 2. B5-2 (medium, conditional on multi-vif applicability)
 3. B5-3 (largest rework, biggest potential)
 Or: instrument deeper before committing to a fix (e.g., add `tracepoints around wsm_cmd_send` enter/exit to measure lock holdtime distribution, not just CPU%).
@@ -0,0 +1,153 @@
 # BES2600 WiFi-stability campaign — Phase 4 plan (Patch B / Trigger A)
 Date assembled: 2026-05-07
 Run dir: /root/bes2600-samples/run-20260506-2113-patchA/ on ohm
 Module: bes2600.ko srcversion 21BD07B3782B144D478CE43 (c-stack + Patch A merged)
 This is the Phase 4 plan for **Patch B (Trigger A: beacon-loss / mac80211 `api_connection_loss` chain)**, drafted after the Phase 7 verification of Patch A on 2026-05-07. Per project CLAUDE.md: paste verbatim, do not curate.
 ---
 ## What changed since the merged Patch-A plan (`notes/phase4-2026-05-06.md`)
 Patch A is **landed (PR #1)** and **active on ohm** (srcversion `21BD07B3`). Phase 7 verification:
 ```
 duration:                     10h30m sustained 1 MB/s load on 2.4GHz (5b:32)
 DecryptStormRecoveries:       0
 Decrypt-fails total:          183  (~1 every 3.5 min — never bursted ≥5/5s)
 api_connection_loss events:   9    ← Trigger A
 unprotected deauth (AP):      7    ← AP-deauth-6 cluster at 02:42:11
 mac80211 reason 4 deauth:     yes  ← inactivity (Trigger A flavor)
 mac80211 reason 2 deauth:     no   ← what Patch A handles
 ```
 **Patch A's predicted delta is unobserved** (no decrypt-storm fired during Phase 7). Patch A is dormant but caused no harm. This Phase 4 pivots to **Trigger A** — the dominant failure path in the overnight rep.
 ---
 ## Phase 1 — revised metric (Trigger-A scope)
 > Per hour of operation: count `mac80211 api_connection_loss` events and the conditional probability that each escalates into a > 5 s reauth blackhole (assoc-comeback timeouts followed by AP unprotected-deauth-6 cluster).
 Observed rate from the Phase 7 rep: 9 events over 10h30m = **0.86/h** under sustained load. Not all of them escalated to a visible blackhole — some apparently recovered cleanly. But the 02:42 cluster (1/9 = 11 % escalation rate in this rep) shows the catastrophic shape.
 ---
 ## Phase 0 / 3 receipt — the 02:42 chain (verbatim from iw-event)
 ```
 02:40:32  scan started, full-band
 02:40:34  scan aborted
 02:40:45  del station 5b:32
 02:40:45  kernel: deauth 8a:2e:77:1f:ec:05 → 5b:32 reason 4
          (Disassociated due to inactivity)        ← TRIGGER A
 02:40:45  cfg80211: disconnected (local request) reason 4
 02:40:45  scan started → finished: 2462 2412, "newton"
 02:40:45  new station 61:b0
 02:40:45  AP→ohm: auth status 0: Successful
 02:40:45  AP→ohm: assoc comeback bssid 61:b0 timeout 1000  ← BSS load mgmt
 02:40:47  del station 61:b0
 02:40:47  assoc: timed out
 02:40:47  scan → new station cc:ce:1e:2b:74:17
 02:40:48  auth: timed out
 02:40:49  scan → new station 5b:32 (back to where we started)
 02:40:49  AP→ohm: auth status 0
 02:40:49  AP→ohm: assoc comeback bssid 5b:32 timeout 881
 02:40:51  assoc: timed out
 02:42:11  ── AP-deauth-6 cluster (×7 within 1 ms) from 61:b0 ──
          reason 6: Class 2 frame received from non-authenticated station
 02:42:11  reason 9 = STA_REQ_ASSOC_WITHOUT_AUTH
 ```
 **Net**: ~86 s in reauth-blackhole, recovery via cross-channel fallback. Same shape as Phase 3's 11:03 blackhole (~109 s), but trigger here is **inactivity-deauth → assoc-comeback rejection**, not decrypt-storm.
 ---
 ## Hypothesis on the mechanism
 Three plausible chains for why post-`api_connection_loss` reauth blackholes:
 1. **AP's assoc-comeback timer disrespected.** The AP says "wait 1000 TU before retrying", but mac80211 / wpa_supplicant retries fast. AP keeps deferring; eventually a stale frame triggers the AP's "Class 2 from unauth" reaction.
 2. **Driver state stale across deauth.** After mac80211 fires `ieee80211_connection_loss`, the bes2600 driver's per-vif state (link_id, key state, queues) may not be fully scrubbed. Subsequent reassoc starts with mixed state; AP rejects.
 3. **Chip-level RX wedge.** The chip's RX state machine got stuck during the inactivity period; reauth sends out frames, but RX of AP's responses is lossy. mac80211 perceives timeout when actually frames arrived but were dropped. Hard to verify without monitor mode (which the chip doesn't support concurrent with managed).
 Each hypothesis suggests a different fix surface.
 ---
 ## Phase 4 — Plan candidates
 ### Candidate B-1 — Chip-level reset on api_connection_loss flood
 **What touches:**
 - New work-item on `bes2600_common`: `api_connection_loss_recover_work`.
 - mac80211 → driver `disconnect()` op → bump a sliding-window counter.
 - On threshold (e.g., 3 events within 60 s): schedule the work that calls `bes2600_chrdev_do_bus_reset()` (the existing c5.2 LMAC-wedge recovery path).
 - After bus reset, NM auto-reconnects from a fresh chip state.
 **Why this candidate:** reuses the c5.2 infrastructure already deployed; small surface; if hypothesis 3 (chip wedge) is right, this fixes the root cause. If hypothesis 1 or 2 are right, this is overkill but harmless (a brief reset).
 **API contracts to confirm:**
 - `bes2600_chrdev_do_bus_reset()` re-entrancy and worker-context safety
 - mac80211 ops or callbacks around `ieee80211_connection_loss`/`disconnect`
 - cw1200/cw1260 ancestor for any similar pattern
 **Predicted delta (Phase 7 units):**
 - `api_connection_loss` rate: unchanged (we don't address the trigger)
 - conditional escalation to >5 s blackhole: target ≤ 30 % (need realistic)
 - worst-case recovery: 86 s → < 10 s
 ### Candidate B-2 — Respect assoc-comeback timer
 **What touches:**
 - Possibly NOT in bes2600 — this looks like a mac80211 / wpa_supplicant concern.
 - If the driver does anything assoc-related itself, audit for racing the comeback timer.
 **Status:** out of scope for a bes2600 patch unless the driver is observed sending frames during the comeback window.
 ### Candidate B-3 — Audit and scrub vif state on disconnect
 **What touches:**
 - `bes2600_unjoin_work` — verify link_id, keys, queues all reset
 - Add explicit reset on `ieee80211_disconnect`/`disconnect` ops
 **Status:** speculative without further instrumentation.
 ---
 ## Lock proposal
 **Lock Candidate B-1 first.** It has:
 - the cleanest re-use (c5.2's bus_reset)
 - the smallest patch surface
 - a measurable predicted delta against Phase 7's `api_connection_loss` rate
 If Phase 7-of-B-1 shows the rate unchanged but escalation still high → loop back to B-2/B-3 hypothesis space.
 ---
 ## What will NOT be touched
 - mac80211 / cfg80211 core — host-side STA driver only
 - The c-stack patches (c5.x, c6.x, c7) — independent recovery paths
 - Patch A — stays in place, untouched
 - AP / firmware
 ---
 ## Receipt checklist for Phase 4
 - [x] What will and will not be touched: stated above
 - [x] Predicted delta in Phase 3 units: stated for Candidate B-1
 - [x] Out-of-scope items explicitly listed
 - [x] Risk items: bus_reset has a known multi-function-SDIO consideration (handled by c5.2.1)
 ## Asks of the reviewer
 1. Candidate B-1 (bus_reset on api_connection_loss flood) the right scope, or should we instrument deeper before committing?
 2. Threshold (3 events / 60 s): too aggressive (false-positive bus_resets on transient RF issues) or about right?
 3. Should bus_reset be conditional on ALSO seeing post-deauth assoc-comeback timeouts, to avoid resetting on benign connection_loss events?
 4. Hypothesis 1 (assoc-comeback disrespected) — is this a mac80211/wpa_supplicant bug rather than a bes2600 bug? If yes, we file it elsewhere.
@@ -0,0 +1,96 @@
 # BES2600 WiFi-stability campaign — Phase 7 verdict (Patches A + B)
 Date assembled: 2026-05-07
 Module under test: bes2600.ko srcversion `1B3B3ED096AAD7217FEDE11`
                   (cleanups + Patch A + Patch B)
 Run dir: `/root/bes2600-samples/run-20260507-1248-patchB/` on ohm
 Phase 7 verification window: 2026-05-07 12:48 → ~15:13 CEST (≈ 2 h 25 m)
 of which: ~50 min @ 1 MB/s pv-cap, ~1 h 30 m @ 4 MB/s pv-cap on 2.4 GHz
 newton (5b:32, signal -57 to -67 dBm).
 ---
 ## Result table (vs the Phase 4 predicted delta)
 ### Patch A — decrypt-storm fast-recover (Trigger B)
 | metric | Phase 3 baseline | Phase 4 prediction | Phase 7-of-B observed |
 |---|---|---|---|
 | decrypt-burst rate | 8.18/h | unchanged | 2 bursts in ~22 min once 4MB/s pressure was on |
 | AP-deauth-6 rate following burst | 100 % escalation | ≤ 0.2 × baseline | **0/2 = 0 % escalation** |
 | recovery time given burst | up to 109 s | < 5 s | **~1 s** (×2) |
 **Verdict: predicted delta CONFIRMED at N=2.** CLAUDE.md ideal is N=3; we're directionally locked at 2 reproductions, both behaving as predicted (threshold trip → `[bes2600] decrypt-storm fast-recover: forcing reassoc` log line → mac80211 disassoc → userspace reauth in ≈1 s).
 #### Receipts (verbatim)
 ```
 13:47:56  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
 13:47:57  wlan0: associated to cc:ce:1e:2b:74:17     (cross-BSSID, 1 s)
 13:49:26  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
 13:49:27  wlan0: associated to c0:25:06:e6:5b:32     (back home, 1 s)
 ```
 `DecryptStormRecoveries: 2` exposed via debugfs at `/sys/kernel/debug/ieee80211/phy0/bes2600/vif_0/status`.
 ### Patch B — connection-loss-storm bus_reset (Trigger A)
 | metric | Phase 7-of-A observed | Phase 4 prediction | Phase 7-of-B observed |
 |---|---|---|---|
 | api_connection_loss rate | 0.86/h | unchanged | 2 events in ~2 h (≈ 1/h) |
 | ConnectionLossStormRecoveries | n/a | trips on 3-in-60s bursts | **0** |
 | Threshold trip events | n/a | (when burst occurs) | **0** (events spaced 91 s apart) |
 **Verdict: installed but UNTRIGGERED.** The 3-in-60s threshold was never reached (max-cluster observed: 2-in-91s). No false positives, no spurious bus_resets. Predicted delta unobserved — same shape as Patch A's first Phase 7 run.
 The threshold may be too conservative for typical event rates (we'd need a true api_connection_loss flood to trip it). Tuning is a future Phase-1 question if more reproductions accumulate.
 ### Trigger C — AP unprotected-deauth-6 cluster without preceding storm
 ```
 12:53:10.475 → 12:53:11.756  AP fires 17 unprotected-deauth-6 from 5b:32 over 1.3 s
                              (2 mgmt-TX no-ack from our chip in the middle)
 12:53:12.309  kernel: deauthenticating ... reason 2 = PREV_AUTH_NOT_VALID
 12:53:14–15  reauth via 61:b0 → 5b:32, recovery in ~4 s
 ```
 Neither Patch A (zero decrypt-fails preceded) nor Patch B (zero api_connection_loss) fired. Background: AVM Fritz!Boxes (newton) are reliable; the AP correctly classified ohm's frames as Class 2 from non-auth, meaning **bes2600 sent something the AP couldn't authenticate**. New backlog entry: `notes/observed-bugs.md` Bug #5 (RX path under throughput pressure) is the leading hypothesis surface.
 Recovery was fast (4 s) so this isn't a P0 — but a Patch C investigation is warranted when prioritized.
 ---
 ## Bug #5 — RX path degradation under attempted-throughput pressure (NEW)
 ```
 sender 1 MB/s  →  ohm receives 1015 KB/s,  -57 dBm,  RX MCS 4
 sender 4 MB/s  →  ohm receives  563 KB/s,  -67 dBm,  RX MCS 3
 ```
 Higher attempted-throughput on the sender side → LOWER observed throughput at ohm. Signal degraded ~10 dB, MCS dropped a notch. Link-physical max is ~8 MB/s; we're getting ~7 % of that under load.
 **Hypothesis (Markus): driver/firmware locks itself to death under busy reads.** Plausibly the same root-cause as the Phase 0 YouTube DASH chunk-fetch drops (~10 frames per chunk fetch on hardware-decoder playback). Documented as Bug #5 in `notes/observed-bugs.md`.
 ---
 ## Lessons captured for memory (Phase 8 anchor)
 1. **Stress-rate matters for verification.** Patch A's predicted delta only became observable when the netcat cap went 1 → 4 MB/s. The previous Phase 7 (10h30m @ 1 MB/s) saw zero decrypt-storms. Future Phase 7 protocols should plan a stress ramp from steady to near-saturation, not just the steady setting.
 2. **"Untriggered, no harm" is a valid Phase 7 verdict** for installed patches. Patch B fits this exactly. The patch is ready; the trigger pattern just doesn't fire often enough in this RF / load regime to verify the recovery delta. Don't let unobserved verifications block the loop.
 3. **Build infrastructure on `cleanups` not `mobian`.** The Phase 6 attempt to base Patch B on mobian forced a refactor mid-flight; the c-stack lives on cleanups, and re-using c5.2's `bes2600_chrdev_do_bus_reset` requires that. The cleanups branch is the campaign's working trunk.
 4. **AP-side bug is unlikely on AVM hardware.** AVM Fritz!Boxes don't fire spurious deauth-6 storms. When ohm sees AP-deauth-6 unprovoked, the suspect chain is bes2600 sending something the AP can't authenticate. The bias toward "bes2600 is the broken thing" is empirically validated.
 5. **AP-deauth-6 can fire without our local triggers.** Trigger C is a real failure mode neither Patch A nor B addresses. Adding a Phase-1-style metric for "AP-deauth-6 rate without preceding decrypt-storm or api_connection_loss" would surface Trigger C cleanly.
 6. **`pv -L` cap interacts with TCP retransmit recovery.** When the link can't sustain the cap, TCP backs off and pv blocks. Observed throughput is then a **floor on chip RX capacity at that signal level**, not the sender's intent. Useful for chip-load-characterization, but the cap should be set based on observed pull-rate, not on the link's nominal MCS rate.
 ---
 ## Loop status
 - Phase 7: closed.
 - Patch A: confirmed (N=2). Stays in.
 - Patch B: installed, dormant in this regime, no harm. Stays in.
 - Bug #5: backlog, no patch yet. Documented.
 - Trigger C: backlog candidate, no patch yet. Documented.
 Next campaign cycle would be re-anchoring Phase 0 around Bug #5 or Trigger C.
Author	SHA1	Message	Date
claude-noether	0b63ca3c24	notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver Phase 7 of Patch C (PR #9 → bes2600-dkms PR #3 → boot -1 of ohm 20:18:10) failed with a thread-safety race: wsm_release_tx_buffer's unlocked R-M-W on hw_bufs_used races against wsm_alloc_tx_buffer in the bh thread when Patch C moved the RX-confirm decrement into sdio_rx_work. WARN storm at +13s under stress, chip wedges, host off-network. Phase 6 contract analysis cited wsm_handle_rx's sleepability and held-lock invariants but stopped at the function signature. Did not enumerate hw_bufs_used as shared state mutated by the callee. Lesson saved as feedback_phase6_contract_threadsafety memory. Phase 4 v2 designs around that gap. Two-step: 1. Patch C-prep: NFC refactor — convert hw_bufs_used, hw_bufs_used_vif[], wsm_tx_pending[] from int / int[] to atomic_t / atomic_t[]. Use atomic_fetch_sub_release in wsm_release_tx_buffer (returns prior value for the >= numInpChBufs - 1 predicate). Mechanical atomic_read swap at ~58 read sites. Lands first; Phase 7 should show zero delta from baseline. 2. Patch C v2: re-apply the sdio_rx_work direct-deliver on top of C-prep. Identical structural change to the closed PR #3, but now the racing counter is safe. Contract block in bes2600_bh_handle_rx_skb expanded to include the shared-state delta table. Plan §2 is the shared-state delta table — every field bes2600_bh_handle_rx_skb mutates directly or transitively, with current protection and required action. 3 fields need atomic_t, the rest are already concurrency-safe or stay single-writer. Plan §6 lists 6 risks including memory-ordering choices, the inc/dec_pending_count timer-decision race, and the new wired-rig fallback (enu1 192.168.88.80) that survives bes2600 wedges so Phase 7 can capture dmesg / ftrace from a wedged ohm without reboot. PR superseded #3 closed with full verdict comment. Phase B rolled back on ohm at /lib/modules/.../extra/bes2600.ko. Markus's reboot button to land Patch B again before C-prep work begins.	2026-05-07 20:50:39 +02:00
marfrit	4666e03254	Merge pull request 'notes: Patch C Phase 4 plan (item 1 only — collapse sdio_rx_work into BH)' (#9 ) from claude-noether-7 into main Reviewed-on: #9	2026-05-07 17:21:37 +00:00
claude-noether	f232476240	notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only) Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out and gated on Task #19 kerneldoc contract verification. Approach choice: Option A (sdio_rx_work delivers directly into wsm_handle_rx, removing rx_queue and its two synchronization points per frame) over Option B (subsume into bh thread). Option A has a smaller diff and clearer bisection story; the residual per-IRQ workqueue dispatch is preserved as a measurable Phase 7 data point that motivates or doesn't motivate a follow-on Option-B patch. Predicted delta in Phase 3 units, with confidence levels stated explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract citations (wsm_handle_rx callability from sdio_wq context; wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp. Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON in hot path) explicitly deferred to Task #24 per feedback_dont_patch_downstream_artifacts. Awaiting Phase 5 second-model review on DokuWiki.	2026-05-07 19:04:53 +02:00
marfrit	08c7aafb48	Merge pull request 'notes: Opus second-opinion BES2600 WiFi structural critique' (#8 ) from claude-noether-6 into main Reviewed-on: #8 Reviewed-by: Markus Fritsche <mfritsche@reauktion.de>	2026-05-07 16:58:55 +00:00
claude-noether	809e3cce84	notes: opus second-opinion BES2600 WiFi structural critique Independent code-review writeup (Opus 4.7) against Sonnet's review of the same tree. Concurs with Sonnet on items 1+2 (RX relay, batch delivery) and items 4+5 (ba_lock atomics, ps_state_lock skip-when-pm_unsupported); pushes back on the "9 workqueue events per frame" quantification and records BES_SDIO_OPTIMIZED_LEN as hard-baked rather than togglable. New findings: cw12xx-not-bes2600 genealogy still active in source, ~700 lines of #if 0 fossil in bh.c, Allwinner-specific sw_mci_check_r1_ready in the SDIO bus path, asm volatile("nop") placeholder where IRQ re-enable used to live, BUG_ON in steady-state hot path, vendor-SDK Makefile shape that pollutes every diff, 8 EXPORT_SYMBOLs from a nominally-single-binary module. Recommends ordering: Patch C (1+2 wrapped) high-risk-first, Patches D+E as small individually-verifiable cleanups, explicit don't-touch list. Notes ieee80211_rx_list contract verification (task #19) blocks Patch C.	2026-05-07 18:12:54 +02:00
marfrit	4344873f2d	Merge pull request 'Sonnet architect review for Bug #5 — ranked restructuring map' (#7 ) from claude-noether-5 into main Reviewed-on: #7	2026-05-07 16:01:55 +00:00
claude-noether	679083d1aa	notes: Sonnet architect review for Bug #5 — ranked restructuring map Sonnet (general-purpose subagent, model=sonnet) reviewed ~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement context. Output: 8-item ranked restructuring map, file:line cited. Headline: - Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue dispatch reduction, medium effort) - Item 2: batch deliver via ieee80211_rx_list (small effort, removes per-frame softirq) - Items 1 + 2 together collapse "9 workqueue events per delivered frame" to ~1. Items 3-5 clean up next-layer overhead (TX-side queue_work, per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8 are follow-ons to be re-measured after 1-3 land. Phase 4 plan locking the lead candidate(s) follows in a separate PR.	2026-05-07 17:38:16 +02:00
claude-noether	594f73c6b4	notes: Bug #5 root cause refined — workqueue-per-SDIO-transaction is the floor Follow-up ftrace measurement (post-reboot, 3-min 4MB/s capture): - workqueue_execute_start: 5,643/sec ← dominates - wsm_cmd_send: only 13/sec (host-to-chip command path NOT the hotspot) - lock contention: 50/sec (modest) The throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. Surgical patches B5-1/B5-2/B5-3 from the prior Phase 4 plan all targeted the wrong layer; deferring those until an architectural restructuring map is produced. Promoting the Sonnet architect review from "backlog" to "blocking on Bug #5" — the next step is a restructuring assessment, not another patch.	2026-05-07 17:31:31 +02:00
claude-noether	928268f477	notes: backlog Sonnet architect review of bes2600 driver Per PR #6 review feedback. Independent track from Bug #5; scheduled once the Bug #5 measurement pass finishes.	2026-05-07 16:38:58 +02:00
marfrit	425eb92456	Merge pull request 'Bug #5 Phase 1 metric + Phase 0 anchor receipts' (#6 ) from claude-noether-4 into main Reviewed-on: #6	2026-05-07 14:37:29 +00:00
claude-noether	1830c17891	notes: Bug #5 Phase 1 metric + Phase 0 anchor receipts Phase 0 anchored at N=3 reps (10min @ 4MB/s pv-cap on 2.4GHz): - rep1+2: ~700 KB/s sustained (10% of link capacity) - rep3: link death at ~9 min in (passive mode, beacon-loss cascade) Hot symbol identified: _raw_spin_unlock_irqrestore at ~20% CPU in both healthy and failed reps, callstack process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0 → spin-unlock. Phase 1 metric locked: ≥2 MB/s sustained throughput, <10% CPU in lock- cycling, no link death under 30 min continuous load. Three Phase 4 candidates drafted (B5-1: shrink wsm_cmd_send lock scope; B5-2: coalesce vif_list_lock in BH dispatcher; B5-3: SPSC ringbuffer for WSM commands). Locking pending review.	2026-05-07 16:32:45 +02:00
claude-noether	69a1d0f8b1	notes: phase 7 verdict — Patch A confirmed, Patch B dormant Phase 7 verification of cleanups + Patch A + Patch B (srcversion 1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load ramped 1 MB/s → 4 MB/s on 2.4GHz newton. Patch A: predicted delta CONFIRMED at N=2 reproductions. - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation Patch B: installed, untriggered. 2 api_connection_loss events spaced 91 s apart, never tripping the 3-in-60s threshold. No false positives, no spurious bus_resets. Recovery delta unobserved (no harm done). Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth in ~4 s. New backlog item. Bug #5 documented separately (RX path degrades under throughput pressure; possible root of the original Phase-0 YouTube frame drops).	2026-05-07 15:18:36 +02:00
claude-noether	458ad36f8b	notes: backlog Bug #5 — RX path degrades under throughput pressure Observed 2026-05-07: bumping the netcat sender from 1 MB/s to 4 MB/s DECREASED ohm's observed RX rate (1015 KB/s → 563 KB/s) and degraded the link (signal -57 → -67 dBm, MCS 4 → 3). Chip can't sustain near- link-rate RX even though theoretical capacity is ~8 MB/s. Hypothesis: driver/firmware lock contention or busy-wait on the RX SDIO path. Plausibly explains the original Phase-0 observation that YouTube DASH chunks drop ~10 frames per chunk fetch — chunk fetch is a brief near-line-rate burst that this bug would be triggered by.	2026-05-07 13:56:36 +02:00
marfrit	ea509e810f	Merge pull request 'Phase 4 plan: Patch B (Trigger A / api_connection_loss)' (#5 ) from claude-noether-3 into main Reviewed-on: #5	2026-05-07 10:45:28 +00:00
claude-noether	e53aad5013	notes: phase 4 plan for Patch B (Trigger A / api_connection_loss) Drafted after Phase 7 verification of Patch A (PR #1, srcversion 21BD07B3). 10h30m sustained load on 2.4GHz produced: - 0 DecryptStormRecoveries (Patch A dormant; no decrypt-storm fired) - 9 mac80211 api_connection_loss events - 1 catastrophic blackhole at 02:42 (reason 4 inactivity → reauth with assoc-comeback timeouts → AP unprotected-deauth-6 cluster) Phase 4 pivots to Trigger A (Patch B). Candidate B-1 lock proposal: extend c5.2 bus_reset infrastructure to fire on N consecutive api_connection_loss events; reuses existing recovery path. Pending Phase 5 review before Phase 6 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:22:34 +02:00
marfrit	4acba3e707	Merge pull request #4 : Phase 4 plan: decrypt-storm fast-recover (Trigger B), with revised Phase 1	2026-05-06 17:30:48 +00:00