notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only)

Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out and gated on Task #19 kerneldoc contract verification. Approach choice: Option A (sdio_rx_work delivers directly into wsm_handle_rx, removing rx_queue and its two synchronization points per frame) over Option B (subsume into bh thread). Option A has a smaller diff and clearer bisection story; the residual per-IRQ workqueue dispatch is preserved as a measurable Phase 7 data point that motivates or doesn't motivate a follow-on Option-B patch. Predicted delta in Phase 3 units, with confidence levels stated explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract citations (wsm_handle_rx callability from sdio_wq context; wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp. Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON in hot path) explicitly deferred to Task #24 per feedback_dont_patch_downstream_artifacts. Awaiting Phase 5 second-model review on DokuWiki.
Merge pull request 'notes: Opus second-opinion BES2600 WiFi structural critique' (#8 ) from claude-noether-6 into main
2026-05-07 19:04:53 +02:00 · 2026-05-07 16:58:55 +00:00 · 2026-05-07 18:12:54 +02:00 · 2026-05-07 16:01:55 +00:00 · 2026-05-07 17:38:16 +02:00 · 2026-05-07 17:31:31 +02:00
6 changed files with 870 additions and 0 deletions
@@ -0,0 +1,180 @@
+# BES2600 architecture review — Bug #5
+
+Date assembled: 2026-05-07
+Reviewer: Claude Sonnet (general-purpose subagent, model=sonnet)
+Driver source: `~/src/besser/bes2600-dkms-mobian/bes2600/` on boltzmann
+
+This is the architect-review pass requested in `notes/observed-bugs.md` after the Phase 0 measurement showed the throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. The reviewer was given the measurement summary, source location, and a focused brief; output is a ranked restructuring map with file:line citations for every concrete claim.
+
+---
+
+## Measurement context (input to the reviewer)
+
+```
+Reproduction:                pv -L 4M < /dev/zero | nc ohm 12345
+Module under test:           bes2600.ko srcversion 1B3B3ED0...  (cleanups + Patch A + Patch B)
+Hardware:                    PineTab2, RK3566 Cortex-A55 ARMv8.5, kernel 6.19.10-danctnix1
+Link rate:                   65 Mb/s ≈ 8 MB/s theoretical
+Observed throughput:         725 KB/s  (Phase 0 anchor at N=3)
+                             rep 3 cascaded into beacon-loss disconnect at ~9 min in
+
+Per-second event rates (3-min capture under 4 MB/s pv-cap):
+  workqueue_execute_start:  5,643/sec    ← architectural floor
+  bes2600_rx_cb:              611/sec
+  bes2600_bh_wakeup:          267/sec
+  wsm_cmd_send:                13/sec    (host-to-chip command rate, surprisingly low)
+  lock contention_begin:       50/sec    (modest)
+  mmc_request_start:        ~5,800/sec   (matches workqueue rate — every SDIO transaction is its own work item)
+
+perf record top symbol:      _raw_spin_unlock_irqrestore  (~20 % CPU samples)
+Dominant callstack:          process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0
+```
+
+The implication: ~9 workqueue dispatches fire per frame delivered to mac80211. Items below address that ratio in descending order of predicted leverage.
+
+---
+
+## Item 1 — Two-hop workqueue relay: SDIO IRQ → `sdio_rx_work` → BH loop → mac80211
+
+**File:line:** `bes2600_sdio.c:416` (IRQ handler dispatches `rx_work`); `bes2600_sdio.c:829` (`sdio_rx_work` body); `bh.c:1330–1538` (BH main loop, `BES2600_RX_IN_BH` path); `bes2600_sdio.c:1267` (`bes_sdio` workqueue, `max_active=2`).
+
+**Current shape:** Every SDIO interrupt fires `queue_work(sdio_wq, &rx_work)`. `sdio_rx_work` reads up to `BES_SDIO_RX_MULTIPLE_NUM=16` frames (`hwio.h:294`) into per-frame SKBs, enqueues each onto `sbus_priv.rx_queue` under `rx_queue_lock`, then returns. Meanwhile the BH kthread (one work item queued at boot in `bh.c:93`, running an infinite loop inside `bes2600_bh()`) calls `pipe_read()` → `spin_lock(rx_queue_lock)` → `skb_dequeue()` → `wsm_handle_rx()` → `ieee80211_rx_irqsafe()` one frame at a time. When `pipe_read()` returns NULL and pending TX exists, `bes2600_sdio_pipe_read()` at `bes2600_sdio.c:941` re-dispatches `rx_work` — so a sustained RX stream fires **one `queue_work` per BH wakeup, not per burst**. That explains why `bh_wakeup` events are only 267/sec while `workqueue_execute_start` is 5,643/sec: the SDIO layer is firing a new `rx_work` item for every frame the BH loop drains.
+
+**Proposed shape:** Collapse `sdio_rx_work` and `pipe_read()` into the BH loop directly. The BH already runs in a dedicated `WQ_HIGHPRI | WQ_CPU_INTENSIVE` workqueue (`bh.c:66`) and (with `BES2600_RX_IN_BH` defined per `Makefile:159`) `bes2600_bh_rx_helper()` already dequeues from `rx_queue`. Merge `sdio_rx_work` into a function called synchronously from `bes2600_bh_rx_helper()` before the dequeue, guarded by a trylock so re-entry is safe. This eliminates O(N) `queue_work` calls per burst while keeping the BH as the single SDIO-access context.
+
+**Predicted delta vs Phase 1 metric:** Eliminates ~5 of the ~9 redundant workqueue dispatches per frame. 2–4× throughput improvement and a proportional drop in `_raw_spin_unlock_irqrestore` CPU cost.
+
+**Effort:** Medium. SDIO host-lock protocol (`sdio_claim_host`/`sdio_release_host`) is already managed inside `sdio_rx_work`; moving the body is mechanical but requires care around the `sdio_wq` `max_active=2` concurrency assumption.
+
+**Risks:** `sdio_rx_work` runs with `sdio_claim_host` held for the entire burst. Inside the BH it serialises all SDIO access fine. Watch `bes2600_sdio.c:1889` — flushes `rx_work` during teardown; that path must remain.
+
+---
+
+## Item 2 — `ieee80211_rx_irqsafe` instead of `ieee80211_rx` (pre-NAPI cw1200 ancestor pattern)
+
+**File:line:** `txrx.c:1947`, `txrx.c:1950`, `ap.c:99`, `sta.c:1487`, `wsm.c:2416`.
+
+**Current shape:** Every RX frame is delivered via `ieee80211_rx_irqsafe()`. This function enqueues the SKB onto a per-cpu `tasklet_rx` list and schedules a software IRQ. Under sustained load: one softirq wakeup per frame — 611 softirq wakeups/sec on top of the workqueue overhead.
+
+**Proposed shape:** Switch to `ieee80211_rx_ni()` (process context, which `wsm_handle_rx` is already in) or, better, batch-deliver frames using `ieee80211_rx_list()` (introduced in kernel 5.12, available in 6.19). Accumulate frames from a single `sdio_rx_work` burst into a `list_head`, then call `ieee80211_rx_list()` once per burst.
+
+**mac80211 contract:** `ieee80211_rx_list()` is safe from process context with the same `ieee80211_rx_status` rules as `ieee80211_rx_ni()`. Per `include/net/mac80211.h` — kerneldoc states it takes the RX path atomically only when called from softirq context; from process context it uses the same path as `ieee80211_rx_ni()`.
+
+**Predicted delta:** Reduces per-frame softirq overhead. Hard to isolate independently of item 1, but combined the two deliver the < 10 % CPU-in-lock target.
+
+**Effort:** Small (once item 1 is done — the batch list naturally exists at the burst boundary).
+
+**Risks:** Must hold `rcu_read_lock()` at call site; `skb->cb` (`IEEE80211_SKB_RXCB`) must be filled before the call, as today. The `early_data` path at `txrx.c:1942` uses `skb_queue_tail` into a per-link queue before calling `ieee80211_rx_irqsafe` — that path must be excluded from batch collection.
+
+---
+
+## Item 3 — Per-frame `queue_work(sdio_wq, &tx_work)` in the TX send path
+
+**File:line:** `bes2600_sdio.c:1236` (inside `bes2600_sdio_pipe_send()`).
+
+**Current shape:** Every call to `bes2600_sdio_pipe_send()` appends one descriptor to `tx_bufferlist` and immediately calls `queue_work(sdio_wq, &tx_work)`. `sdio_tx_work` then drains the list with scatterlist batching (up to `BES_SDIO_TX_MULTIPLE_NUM=16` frames per SDIO transfer). At low rates the workqueue's pending-but-not-started dedup means only one dispatch fires; at high TX rates — especially after `atomic_add(1, &hw_priv->bh_tx)` in `bh.c` reschedules TX — successive `pipe_send` calls each hit `queue_work` before the previous fires, multiplying dispatches.
+
+**Proposed shape:** Stage all frames into `tx_bufferlist` in the BH TX loop, then flush `sdio_tx_work` synchronously (call the work function body directly) before returning to the wait-event. The TX mirror of item 1.
+
+**Predicted delta:** Removes redundant TX-side `queue_work` calls. Lower priority than RX side given current TX rate (13 `wsm_cmd_send`/sec is host→chip control plane; data-plane TX is also limited by firmware buffer count `numInpChBufs`), but it does remove one source of the 5,643/sec workqueue count.
+
+**Effort:** Small.
+
+**Risks:** `sdio_tx_work` calls `sdio_claim_host`/`sdio_release_host` internally. Running directly from BH context requires confirming no deadlock with the SDIO bus claim that `sdio_rx_work` (now merged per item 1) holds. The TX flush must happen after the RX burst, matching the existing BH loop structure (`rx:` → `tx:` ordering in `bh.c:1444`).
+
+---
+
+## Item 4 — `ba_lock` per-frame acquisition in the RX path
+
+**File:line:** `txrx.c:998–1005` (`bes2600_rx_h_ba_stat()`); `txrx.c:1159–1164` and `txrx.c:1682–1698` (`tsm_lock`, `CONFIG_BES2600_TESTMODE`-gated).
+
+**Current shape:** `bes2600_rx_cb()` calls `bes2600_rx_h_ba_stat()` for every non-multicast data frame. That function acquires `ba_lock` under BH (`spin_lock_bh`) to increment `ba_acc_rx` and `ba_cnt_rx`, then sets a timer. At 611 frames/sec that's 611 lock acquisitions/sec on `ba_lock` alone.
+
+**Proposed shape:** Replace per-frame `ba_lock` with a per-cpu counter (or `atomic64_t`) for `ba_acc_rx` and `ba_cnt_rx`. The timer arm (`mod_timer`) is the actual reason for the lock — a `READ_ONCE`/`cmpxchg` on a flag to detect first-frame-in-interval is sufficient.
+
+**Predicted delta:** Removes 611 lock acquisitions/sec from the RX hot path. Not the dominant cost but the next bottleneck after items 1–2 land.
+
+**Effort:** Small.
+
+**Risks:** `ba_lock` also serialises TX-side block-ack accounting (`txrx.c:1632`). The per-cpu approach requires a fold step in the timer callback — cheap.
+
+---
+
+## Item 5 — Skip `ps_state_lock` acquisitions when PSM is known-disabled
+
+**File:line:** `bes2600.h:320` (decl); `txrx.c:1340–1365` (`vif_lock`); `txrx.c:1415–1426` (`ps_state_lock`); `txrx.c:1942–1948` (RX-side `ps_state_lock`).
+
+**Current shape:** `ps_state_lock` is taken on every TX frame if powersave is active. Per memory `reference_bes2600_firmware_no_psm.md`, **PSM is non-functional on this firmware** — c7 already self-detects this and latches `pm_unsupported = true`. The `ps_state_lock` guards in the RX callback and TX path are therefore taking dead overhead.
+
+**Proposed shape:** Add a `READ_ONCE()` check on `powersave_enabled` before taking `ps_state_lock`; if false, skip the lock and the PSM state update entirely. Since c7's `pm_unsupported` latches, this is safe.
+
+**Predicted delta:** Small absolute gain at current TX rate, but prevents fast path from regressing as throughput improves.
+
+**Effort:** Small.
+
+**Risks:** `powersave_enabled` is written from process context (`bh.c:403`). `READ_ONCE` without lock is safe — at worst one spurious PSM notification, not a state corruption.
+
+---
+
+## Item 6 — Firmware block-read size cap (`EFFECTIVE_BUF_SIZE = 8190 bytes`)
+
+**File:line:** `bh.c:33–36` (defines); `bes2600_sdio.c:721–783` (`bes2600_sdio_extract_packets()`); `hwio.h:294` (`BES_SDIO_RX_MULTIPLE_NUM=16`).
+
+**Current shape:** `BES_SDIO_RX_MULTIPLE_NUM=16` and `BES_SDIO_OPTIMIZED_LEN` both defined (`Makefile:90,92`). The RX burst reads `PACKET_TOTAL_LEN(ctrl_reg)` bytes in a single CMD53; each sub-packet bounded by `EFFECTIVE_BUF_SIZE = (0x1000-4)*2 - 2 = 8190` bytes. At 611 frames/sec ÷ 267 BH wakeups/sec ≈ **2.3 frames per wakeup** — well under the 16-frame limit. **Not the bottleneck today.**
+
+**Proposed shape:** No change needed now. Re-evaluate after items 1–3 land if throughput rises past ~3 MB/s. Verify ctrl_reg `PACKET_TOTAL_LEN` field values during high load — requires firmware-trace observation we don't currently have.
+
+**Effort:** N/A.
+
+**Risks:** Increasing beyond 16 requires a larger DMA allocation (currently `1632 × 16 = 26 KB`). Cortex-M4F firmware side is opaque.
+
+---
+
+## Item 7 — Duplicate workqueues (`hw_priv->workqueue` vs `hw_priv->bh_workqueue` vs `sbus_priv->sdio_wq`)
+
+**File:line:** `bes2600.h:323` (`workqueue`); `bes2600.h:385` (`bh_workqueue`); `bes2600_sdio.c:63` (`sdio_wq`). `txrx.c` has 10 `queue_work(hw_priv->workqueue, ...)` calls for control-plane work.
+
+**Current shape:** Three distinct workqueues. The 5,643 `workqueue_execute_start`/sec are dominated by `sdio_wq` items, not `workqueue`. `workqueue` items are control-plane events at rates well below the data-plane.
+
+**Proposed shape:** After item 1 (merging `sdio_rx_work` into BH), `sdio_wq` only carries `tx_work`. After item 3 (synchronous TX flush from BH), `sdio_wq` is idle during normal data-plane and could be replaced with `system_highpri_wq`.
+
+**Effort:** Small (follow-on to items 1 and 3).
+
+**Risks:** None if items 1 and 3 land first.
+
+---
+
+## Item 8 — `BH_RX_CONT_LIMIT=3` cap on RX burst per BH wakeup
+
+**File:line:** `bh.c:1380–1405` (timeout detection); `bh.c:1330` (`BH_RX_CONT_LIMIT=3`); `bh.c:1331` (`BH_TX_CONT_LIMIT=20`).
+
+**Current shape:** BH loop limits RX burst to 3 consecutive iterations before breaking back to wait-event. At 611 frames/sec ÷ 267 wakeups/sec ≈ 2.3 frames per wakeup → not the bottleneck today. **After items 1–3 land**, per-burst frame rate will rise and `BH_RX_CONT_LIMIT=3` becomes the ceiling.
+
+**Proposed shape:** Raise to 16 (matching `BES_SDIO_RX_MULTIPLE_NUM`) after items 1–3 are deployed and re-measured.
+
+**Effort:** Trivial (constant change), but must wait for Phase 7 measurements post 1–3.
+
+**Risks:** Too high a limit under firmware anomaly (corrupted ctrl_reg) can spin BH long enough to miss beacon ACK deadline. Bound to `BES_SDIO_RX_MULTIPLE_NUM` as safe ceiling.
+
+---
+
+## Ranking summary
+
+| Rank | Item | Predicted gain | Effort |
+|------|------|----------------|--------|
+| 1 | Collapse `sdio_rx_work` relay into BH loop | ~5x workqueue dispatch reduction | Medium |
+| 2 | Batch deliver via `ieee80211_rx_list()` | Removes per-frame softirq | Small |
+| 3 | Synchronous TX flush from BH | Removes TX-side dispatch noise | Small |
+| 4 | Replace `ba_lock` per-frame with atomic/per-cpu | Removes 611 lock/sec from RX hot path | Small |
+| 5 | Skip `ps_state_lock` when PSM-known-disabled | Removes dead overhead | Small |
+| 6 | Raise `BH_RX_CONT_LIMIT` after 1–3 land | Unlocks residual throughput | Trivial |
+| 7 | Consolidate workqueues post-items 1&3 | Cleanup | Small |
+| 8 | Firmware block-read size | Not bottleneck at current rates | N/A |
+
+**Items 1 + 2 together are the structural answer to the measurement**: ~9 workqueue events per delivered frame collapse to ~1, and the per-frame softirq cost disappears. Items 3–5 clean up the next layer. The beacon-loss cascade at 9 minutes is almost certainly starvation of the BH wait-event under the per-frame workqueue storm — item 1 removes the mechanism that makes the cascade possible.
+
+---
+
+## Next campaign step
+
+A Phase 4 plan locking item 1 (and possibly item 2) follows in a separate PR. The remaining items go on the campaign backlog as follow-on patches once the Phase 7 verification of item-1-or-1-plus-2 confirms the predicted delta.
@@ -82,6 +82,91 @@ without board power-cycle").
 **Status**: task c3 (indirectly, via bes_chardev removal which currently
 gates the signal/nosignal mode switch path).

+## Architect review — now BUG-#5-blocking (was backlog)
+
+The Phase 0 perf trace for Bug #5 first exposed a "when in doubt, add a
+lock" pattern (~20 % CPU in `_raw_spin_unlock_irqrestore`). The
+follow-up ftrace measurement (2026-05-07 17:00) refined the root cause
+to an architectural problem: **the bes2600 driver dispatches every
+SDIO transaction through the kernel workqueue**. Numbers from a 3-min
+4 MB/s ohm capture (post-reboot, srcversion `1B3B3ED0`):
+
+```
+wsm_cmd_send:               13/sec    (host-to-chip command rate, surprisingly low)
+bes2600_rx_cb:             611/sec
+bes2600_bh_wakeup:         267/sec
+lock contention_begin:      50/sec
+workqueue_execute_start: 5,643/sec    ← DOMINATES; matches the mmc
+                                       transaction rate from earlier perf
+```
+
+5.6 k workqueue dispatches per second is the throughput floor — not a
+specific lock, not WSM-command rate, not decrypt-state. A surgical fix
+to any single function won't move the floor; the architecture needs
+to be restructured to amortise SDIO transactions across fewer work-
+items (or move SDIO RX out of the workqueue entirely).
+
+This is where the **Claude Sonnet architect review** belongs: a
+top-to-bottom assessment of `~/src/besser/bes2600-dkms-mobian/bes2600/`
+focused on:
+
+- the workqueue dispatch shape (most actionable)
+- needless lock proliferation (the original signal)
+- BH / RX scheduling boundaries
+- error-handling coverage and dead-code from the cw1200 ancestor
+- API contract violations relative to mainline mac80211
+
+Output: ranked list of restructuring targets, with predicted-delta
+estimates against the Phase 1 metric (≥ 2 MB/s sustained @ 4 MB/s cap,
+< 10 % CPU in lock-cycling, no link cascade in 30 min).
+
+**Status**: now blocking on Bug #5 (was independent track). Surgical
+patches B5-1, B5-2, B5-3 from the original Phase 4 candidate list are
+all DEFERRED until the architect review's restructuring map is in.
+
+## Bug #5 — RX path degrades under attempted-throughput pressure
+
+**Suspect file**: bes2600 RX path (`txrx.c bes2600_rx_cb`, `bh.c bes2600_bh_work`,
+SDIO RX scheduling) — pinpoint pending.
+
+**Symptom (observed 2026-05-07 13:43, srcversion `1B3B3ED0` = c-stack +
+Patch A + Patch B, ohm @ -57 dBm 2.4GHz ch11 5b:32, idle save for the
+netcat load):**
+
+```
+sender cap 1 MB/s  →  ohm receives 1015 KB/s,  signal -57 dBm,  RX MCS 4
+sender cap 4 MB/s  →  ohm receives  563 KB/s,  signal -67 dBm,  RX MCS 3
+                     (Send-Q on boltzmann backed up to 1.16 MB)
+```
+
+Pushing the sender-side cap from 1 MB/s to 4 MB/s **decreased** observed
+throughput at the receiver and degraded the link metrics. Signal dropped
+~10 dB and the chip downshifted MCS, suggesting the chip can't sustain
+the higher RX rate even with the link physically capable of more (link
+bitrate 65 Mb/s = ~8 MB/s theoretical).
+
+**Hypothesis (Markus, 2026-05-07): driver/firmware locks itself to death
+under busy reads** — possibly a busy-wait loop or lock contention on the
+RX SDIO path that prevents draining at line rate. Plausible reason it
+didn't surface for the c-stack tasks: those operated at typical
+browse-rate traffic, well below the saturation threshold this bug needs
+to fire.
+
+**May explain**: original Phase-0 observation that **YouTube DASH chunks
+drop ~10 frames per chunk fetch** on hardware-decoder playback. A chunk
+fetch is a brief burst at near-link-rate; if the driver throttles itself
+down during high-RX, the player buffer underruns for the duration of
+the fetch.
+
+**How to drill (when prioritized)**:
+- Capture trace_pipe with `mmc:*` and `sdio*` events enabled during a
+  controlled rate-ramp (e.g., pv -L 500K, 1M, 2M, 4M each for 60 s).
+- Watch `/proc/sys/kernel/sched_*` and the `bes2600_bh_work` kworker for
+  CPU saturation.
+- `perf top -p $(pgrep -f bes_sdio)` during 4 MB/s load.
+
+**Status**: backlog. No patch yet.
+
 ## Bug #4 — scan_complete_cb constant loop

 **File**: `scan.c:883-909` — `bes2600_scan_complete_cb()`.
@@ -0,0 +1,190 @@
+# BES2600 WiFi structural analysis and code critique
+
+**Author:** Claude (noether) — second-opinion as Opus 4.7 against Sonnet 4.6's review of 2026-05-07
+**Scope:** the WiFi half of the BES2600 driver as it lives in `bes2600-dkms-mobian/bes2600/` on top of the `cleanups` branch (srcversion `1B3B3ED0…`, c-stack + Patch A + Patch B deployed).
+**Reading frame:** Bug #5 prompted Sonnet's review; this writeup is independent — same source tree, different model, different priors. Where I concur I tighten; where I disagree I say so.
+
+---
+
+## 1. Top-line
+
+The BES2600 WiFi driver is **not a BES2600 driver**. It is a CW12xx driver wearing a BES2600 nameplate. That sentence is not rhetoric — it is the design fact that explains every other smell I will list below.
+
+- 30+ live references to `CW12XX_MAX_VIFS` across 9 files.
+- `cw12xx_hwpriv_to_vifpriv()` / `cw12xx_get_vif_from_ieee80211()` are the active vif accessors.
+- `is_hardware_cw1250(hw_priv) || is_hardware_cw1260(hw_priv)` is a runtime branch in `ap.c:1892` — the chip is BES2600, neither check ever matches, the branch is dead on this hardware but compiled in.
+- `CW1200_MAX_SW_RETRY_CNT` gates the active retry-decision logic in `bh.c:1269` (inside `KEY_FRAME_SW_RETRY`).
+- The header opens with "Based on the mac80211 Prism54 code, which is Copyright (c) 2006, Michael Wu" → **prism54 → islsm → ST-E CW1200 → CW1260 → CW12xx → BES2600**: at least five generations of vendor-SDK descent, with each generation preserving its predecessor as #if-0 blocks rather than removing it.
+
+This is the Phase 6 "transcription trap" from `CLAUDE.md`, frozen into the codebase: every generation copied behaviour rather than re-derive against the API contract. The result is a driver that *works*, but whose structural choices are decisions made for a 2010 ST-Ericsson chip, not a 2022 Bestechnic one.
+
+The downstream consequence — and the thing that actually pinches us in Bug #5 — is that the **hot path was designed for cw1200's IRQ-driven SPI bus, not for SDIO with multi-block coalescing**. Items 1 + 2 of Sonnet's review are the right surgical fix. The deep fix is bigger than the budget of any one campaign.
+
+## 2. Concurrence with Sonnet — refined
+
+### 2.1 RX relay (Sonnet item 1) — concur, refine
+
+The flow on this build (`-DBES2600_RX_IN_BH` in Makefile, so this is the *real* path):
+
+```
+SDIO IRQ
+  → bes2600_gpio_irq_handler                       (bes2600_sdio.c:413)
+  → queue_work(self->sdio_wq, &self->rx_work)      (bes2600_sdio.c:416)
+  → sdio_rx_work runs                              (bes2600_sdio.c:829)
+      → bes2600_sdio_lock + memcpy_fromio
+      → bes2600_sdio_extract_packets               (skb_queue_tail to self->rx_queue)
+      → self->irq_handler(self->irq_priv)          (function call, not workqueue)
+          → atomic_add_return(1, &hw_priv->bh_rx)  (bh.c:130)
+          → wake_up(&hw_priv->bh_wq)
+  → bh_work (already running, never re-queued):
+      wait_event_interruptible_timeout returns
+      → bes2600_bh_rx_helper                       (bh.c:961)
+          → priv->sbus_ops->pipe_read              (skb_dequeue from self->rx_queue)
+          → wsm_handle_rx                          (wsm.c)
+              → bes2600_rx_cb                      (txrx.c:1642)
+                  → ieee80211_rx_irqsafe(skb)      (txrx.c:1947 / 1950)
+```
+
+**Where I refine Sonnet:** the "9 workqueue events per delivered RX frame" claim doesn't survive source reading. Per IRQ *batch* there is **one** workqueue dispatch (sdio_wq.rx_work). `bh_work` is registered once, runs as a long-lived work item using `wait_event_interruptible_timeout` to sleep — the wake-up path is a wait-queue, not a workqueue dispatch. `ieee80211_rx_irqsafe` schedules a mac80211 tasklet, not a workqueue. The 5,643 `workqueue_execute_start/sec` ftrace count from Bug #5 is **system-wide**, not bes2600-only — it should not be quoted as "per frame" without per-pid filtering.
+
+**What is real:** the indirection adds two synchronization points per frame (`skb_queue_tail` + `skb_dequeue`, each `&rx_queue->lock`) plus a cross-CPU wake-up plus a tasklet schedule. That's enough to dominate at 4 MB/s. The collapse is justified — just not by the 9× number.
+
+### 2.2 ieee80211_rx_irqsafe from process context (Sonnet item 2) — concur, gated on contract verification
+
+Confirmed: `ieee80211_rx_irqsafe` is the right primitive only when called from hard-IRQ context — it defers to a tasklet. From process context (which is where `bh_work` and `sdio_rx_work` both live), it adds a tasklet hop for nothing.
+
+`ieee80211_rx_list(hw, sta, &skbs)` is the correct call shape if, and only if, two contract claims hold:
+1. callable from process context with `local_bh_disable()` wrap (or callable bare),
+2. SKB list invariants don't impose NAPI-poll semantics we can't honour.
+
+Sonnet asserted both; I have **not** verified them against `include/net/mac80211.h` kerneldoc on a 6.19-class kernel. **Task #19 blocks Patch C on that verification.** Until it lands, treat the API claim as unconfirmed — this is exactly the Phase 6 contract-citation rule, and skipping it would be the same trap the older driver fell into.
+
+### 2.3 ba_lock per-frame (Sonnet item 4) — concur
+
+`txrx.c:998-1005` (TX path) and `txrx.c:1632-1640` (RX path): `spin_lock_bh(&hw_priv->ba_lock)` to bump 4 ints (`ba_acc`, `ba_cnt`, `ba_acc_rx`, `ba_cnt_rx`) and conditionally `mod_timer(&hw_priv->ba_timer, …)`. The TODO comment in `bes2600.h:359-365` literally says *"TODO: Same as above"* on every field — the original author flagged it as deferred work, then shipped.
+
+Replace with `atomic_t` for the four counters and `cmpxchg`-guarded `mod_timer` for the arm-once invariant. Patch D.
+
+### 2.4 ps_state_lock when pm_unsupported (Sonnet item 5) — concur
+
+`txrx.c:1942-1948`: per-RX-frame `spin_lock_bh(&priv->ps_state_lock)` on the early-data path, protecting a check on `entry->status == BES2600_LINK_SOFT`. The lock exists to coordinate with the AP-side power-save state machine.
+
+c7's contribution (`pm_unsupported = true`) means we already know this firmware doesn't honour PSM; the LINK_SOFT branch is an AP-mode soft-link state that won't transition under us when PSM is dead. Gate the lock acquisition on `!hw_priv->pm_unsupported`. Patch E.
+
+(This patch is *narrower* than Sonnet framed it: it only applies when `pm_unsupported` latches on, which is at boot for our firmware. Production reality on this hardware = always; but the patch must remain conditional in case a future firmware fixes PSM and c7 self-clears the flag.)
+
+## 3. Push-back against Sonnet
+
+### 3.1 "BES_SDIO_OPTIMIZED_LEN config flag"
+
+Not a runtime/Kconfig knob on this build. `Makefile:18` hard-codes `ccflags-y += -DBES_SDIO_OPTIMIZED_LEN`. Whether to keep it is a separate question, but Sonnet's recommendation should not have framed it as toggleable.
+
+### 3.2 "Multiple workqueues are unconditionally bad"
+
+There are three driver-side workqueues:
+
+| name | purpose | dispatch shape |
+|---|---|---|
+| `bh_workqueue` | hosts the single long-running `bh_work` | one-shot at register, wait-queue driven thereafter |
+| `sdio_wq` | sdio_rx_work + sdio_tx_work + sdio_scan_work | per-IRQ-batch dispatch |
+| `hw_priv->workqueue` | scan, AP, PM, multicast-start, link-id, set-tim, … | per-event dispatch (~20 producers) |
+
+**`bh_workqueue` is fine** — it runs a single work item forever, which is just a kthread-shaped-as-workqueue. The cost is one alloc_workqueue at register and zero ongoing dispatch overhead. Don't kill it.
+
+**`sdio_wq` is the actual surgical target** — collapsing item 1 means subsuming `sdio_rx_work` into the bh-loop, after which `sdio_wq` only hosts tx_work and scan_work and could be merged with `hw_priv->workqueue` for cleanup. But that merge is cosmetic; do it later or never.
+
+**`hw_priv->workqueue` shouldn't be touched.** It hosts ~20 unrelated producers; merging it into sdio_wq is the wrong direction (priority inversion risk under coex pressure).
+
+### 3.3 "BH_RX_CONT_LIMIT=3 is the bottleneck"
+
+Half-true. The limit caps the burst-RX pass to 3 frames before yielding to TX work. Raising it past 3 only helps if RX has steady backlog, which under our 4 MB/s ramp it does. But there's also `BH_TX_CONT_LIMIT=20` paired with it — TX gets 20-frame bursts, RX gets 3. The asymmetry is from a previous campaign that found TX-starvation, and **flipping it without re-running that campaign is a regression risk**. Treat the constant as a phase-7-knob, not a one-liner.
+
+## 4. New findings Sonnet did not surface
+
+### 4.1 `bh.c` carries ~700 lines of `#if 0` dead code
+
+`bh.c:196-877` is the cw1200 ancestor `bes2600_bh()` preserved verbatim alongside the active impl at `bh.c:1332+`. Same function name, same `goto rx:` / `goto tx:` labels, same loop variables. The fossil block contains a typo (`if ((i = (CW12XX_MAX_VIFS - 1)) || !priv)` at lines 438 and 562 — single `=` is assignment-not-compare; live code at `ap.c:696` uses `==` correctly) which would be a real bug if compiled. **It is not compiled** — `#if 0` saves us — but this is the maintenance hazard you discover *first* when reading the file in a hurry.
+
+Action: kill the `#if 0` block. Standalone hygiene patch, not on the Bug-#5 critical path.
+
+### 4.2 Allwinner-specific code in the SDIO bus path
+
+`bes2600_sdio.c:475` calls `sw_mci_check_r1_ready(self->func->card->host, 1000)` from inside the IRQ-setup error path. This is the Allwinner mmc driver's R1-ready helper — not portable to RK3566's `dw_mmc-rockchip` host driver.
+
+The call is reachable only on `set_func` cleanup (a comparatively rare error path), but it is a build-time portability hazard. Most likely a stub macro on non-Allwinner builds; verify on ohm or wrap behind `#ifdef CONFIG_MMC_SUNXI`.
+
+### 4.3 `asm volatile ("nop")` placeholder in the live BH loop
+
+`bh.c:1518` is where IRQ re-enable used to be (`__bes2600_irq_enable(1)` is commented out two lines above). The author left a literal nop instruction "asm volatile" instead of removing the dead block. Either re-enable IRQs (if the code was deleted prematurely) or remove the nop (if IRQs are intentionally always-on). This is non-cosmetic — it indicates an unresolved IRQ-handling decision.
+
+### 4.4 `BUG_ON` in the steady-state hot path
+
+`bh.c:1488`: `BUG_ON(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs)` runs *every* BH iteration. Tripping it locks up the kernel during normal operation — by definition the wrong response to a bookkeeping bug. Should be `WARN_ON_ONCE` + bail-out. (Same critique applies to several other `BUG_ON`s in `bh.c` — search the active `#else` block.)
+
+### 4.5 Build-system is a vendor SDK, not a kernel-style driver
+
+`Makefile:1-50` defaults: `CONFIG_BES2600_TESTMODE ?= y`, `WIFI_BT_COEXIST_EPTA_ENABLE ?= y`, `BES2600_INTEGRATED_MODULE_V1/V2/V3` for *xiaomi R329 wifi module*, *sicun QM215 wifi module*, *bes evb*. 86 `#ifdef CONFIG_BES2600_TESTMODE` sites — testmode is essentially compiled-in dead code in non-test builds.
+
+The driver was built by Bestechnic to ship per-customer board variants from one source tree. Upstreaming will require ripping that whole apparatus out, replacing with `Kconfig` toggles and platform-data lookups. This is **not** a Bug-#5 dependency, but it is a debt that pollutes every other patch — diff hunks land in `#ifdef`-walled territory and conflict on rebases for unrelated reasons.
+
+### 4.6 8 `EXPORT_SYMBOL` declarations from a single-binary module
+
+The driver exports `bes2600_irq_handler`, `bes2600_bh_wakeup`, `bes2600_bh_suspend`, `bes2600_bh_resume`, etc. — for whom? The only known consumer is `bes2600_btuart`, the BT sibling module. Either the BT module needs a coherent shared-driver API surface (refactor target), or these exports should become `static`. Random sibling-module coupling via global symbols is a known kernel anti-pattern.
+
+### 4.7 No `__must_check` on functions that obviously return errors
+
+Almost every `bes2600_data_read` / `bes2600_data_write` / `bes2600_reg_read*` call site is wrapped in `WARN_ON()`. That's defensive but not enforced. A single missed return-check (compiler will not warn) is a silent SDIO-path bug. Annotation cost is one keyword per declaration; benefit is a class of bugs caught at compile time.
+
+### 4.8 `rx_queue` is per-sbus_priv, not per-vif
+
+Multi-vif RX serializes through one `skb_queue` on the sbus side (`bes2600_sdio.c:867` queues to `self->rx_queue`, only dequeued by the single bh thread). For STA-only operation this doesn't matter; for STA+AP concurrent or P2P-multivif it's a structural ceiling on aggregate RX throughput. Out of scope for Bug #5 but worth recording — Markus's "P2P_MULTIVIF=y" Makefile default makes this potentially observable.
+
+## 5. Ordering recommendation for the cleanup roadmap
+
+Given (a) the current Bug-#5 budget, (b) Phase-7 stress-ramp cost per patch, (c) the constraint that the cleanups branch must rebase cleanly on Mobian's `mobian` for re-MR:
+
+| order | patch | scope | phase-7 cost | risk |
+|---|---|---|---|---|
+| 1 | **Patch C (items 1+2 wrapped)** | hot path: collapse sdio_rx_work into bh, batch deliver via ieee80211_rx_list | full ramp 1→4→8 MB/s | high — touches RX hot path |
+| 2 | **Patch D (item 4)** | ba_lock → atomics + cmpxchg-guarded mod_timer | minimal — lock-stat delta + 5min @ 4MB/s smoke | low |
+| 3 | **Patch E (item 5)** | ps_state_lock skip when pm_unsupported | minimal — same as D | low (gated on c7's existing latch) |
+| ∞ | bh.c #if 0 graveyard removal | pure delete | none — recompile + smoke | zero |
+| ∞ | CW12XX → BES2600 rename | mass rename | none — but every open patch conflicts | high churn cost, zero behaviour change |
+| **NOT** | Allwinner abstraction layer | wrap sw_mci_check_r1_ready | n/a | scope-creep; do only if RK3566 fails on it |
+| **NOT** | Vendor-SDK Makefile rewrite | Kconfigify | n/a | upstream-prep work, not Bug-#5 |
+| **NOT** | bh_workqueue / sdio_wq merge | structural | n/a | speculation, no measured win |
+
+Patch C is high-risk; merging items 1 and 2 into one patch is the user's call (made: "wrap them together") but should **be reviewed Phase-5 before Phase-6 implementation lands** — exactly the receipts-checklist that this CLAUDE.md exists to enforce. Splitting Patch C into 1-then-2 is *also* defensible; if Phase 7 finds item 1 regressed something, item 2 in isolation is harder to bisect.
+
+## 6. Things I would explicitly NOT do
+
+- **Don't paint the bikeshed on naming.** CW12XX → BES2600 rename is a 30+ file mass-substitute that conflict-spams every open topic branch. It is the right fix *for upstreaming*, not for the cleanups branch.
+- **Don't refactor the workqueue topology.** Three workqueues is fine. Two workqueues for cosmetic reasons risks priority inversion under coex pressure.
+- **Don't replace the BH thread architecture.** It works, the wait-queue model is well-suited to the IRQ → drain pattern, and replacing it with NAPI or threaded-IRQ would re-do six years of debugging in a single patch.
+- **Don't strip the `#ifdef CONFIG_BES2600_TESTMODE` blocks** until upstream-prep. They are vendor-SDK debt but harmless dead code.
+- **Don't wrap the Allwinner helper** unless RK3566 actually trips it. The path is rare-error.
+
+## 7. What I would tell a fresh reviewer in one paragraph
+
+> *This driver is genealogically a CW1200 driver (ST-Ericsson, ~2010) with chip-name search-and-replace done halfway. The hot path was designed for SPI with one-frame-per-IRQ; SDIO multi-block coalescing was bolted on with a worker-queue handoff that adds two synchronization points per frame. Bug #5's RX-throughput regression at 4 MB/s is a direct consequence: at low rate the handoff overhead is invisible; at high rate it dominates. Three small patches (Patches C, D, E) reclaim most of the floor without touching the genealogy. The genealogy itself is technical debt for upstreaming, not a Bug-#5 dependency. Don't conflate the two.*
+
+---
+
+## 8. Disagreements summary
+
+| Sonnet claim | My finding |
+|---|---|
+| "9 workqueue events per delivered RX frame" | overstated; per IRQ batch is 1 workqueue dispatch on this build. The 5,643/sec ftrace count is system-wide, needs per-pid filtering before claiming as bes2600 dispatch rate. |
+| "BES_SDIO_OPTIMIZED_LEN config flag" | hard-baked in Makefile as `-D…` ccflags, not toggleable |
+| Item 4 / Item 5 sized as one patch each | concur — separate small patches as Markus directed |
+| Item 1 + 2 mergeable | concur — directionally; predicated on `ieee80211_rx_list()` contract (Task #19) |
+
+## 9. Open questions for Markus
+
+1. **Patch C split-or-merge:** user directive is "wrap together". I'd note that a Phase-7 regression in the merged patch is harder to bisect than two sequential Phase-7 runs. Keeping the directive but recording the bisect-cost as known.
+2. **`__bes2600_irq_enable(1)` commented out:** is IRQ re-enable intentionally always-on now, or is the `nop` a deletion-in-progress bug? Reading the c-stack history doesn't tell me. Worth a "what was this for" pass before any RX-architecture patch lands.
+3. **`sw_mci_check_r1_ready` on RK3566:** should we test or just trust the path is rare-error? My read is: trust + `WARN_ON` if it's ever called, then react.
+
+---
+
+*Written 2026-05-07. Reviewing as Opus 4.7 against Sonnet 4.6's review of the same source tree. Independent reads of: bh.c, bes2600_sdio.c (sdio_rx_work + pipe_read + IRQ handler), txrx.c (RX delivery sites + ba_lock + ps_state_lock sites), bes2600.h (struct lock topology), Makefile (build-system shape). No simulator runs; this is a static-analysis writeup, the dynamic verification of any claim above belongs in Phase 7 of the corresponding patch.*
@@ -0,0 +1,184 @@
+# Patch C — Phase 4 Plan: collapse sdio_rx_work into BH
+
+**Author:** Claude (noether)
+**Status:** Phase 4 — pending Phase 5 second-model review before any Phase 6 code.
+**Scope:** **item 1 only** (per merged PR #8 inline review: "do it sequentially; we're not on the clock").
+**Item 2** (batch deliver via `ieee80211_rx_list`) splits to **Patch C2**, gated on Task #19 kerneldoc verification.
+
+---
+
+## §0 Substrate — anchored
+
+Bug #5 anchor (recorded 2026-05-07, see `notes/phase1-bug5-2026-05-07.md`):
+
+- Sender: netcat-over-WiFi, 4 MB/s cap, 2.4 GHz, AVM AP, single-STA
+- Receiver: ohm (PineTab2, RK3566 + BES2600WM-SDIO)
+- N=3 baseline reps: 725 / 663 / 75 KB/s (rep 3 saw link-death at ~9 min)
+- `perf record -g` during 4MB/s window: `_raw_spin_unlock_irqrestore` ≈ 20% CPU
+- ftrace lock-instrumentation, system-wide: `workqueue_execute_start` ≈ 5,643/sec
+- Driver-side count: `wsm_cmd_send` 13/sec — wsm command path is *not* the dispatch source; the contributor is the per-SDIO-transaction relay.
+
+Root cause traced in PR #7 (Sonnet review) and concurred in PR #8 (Opus review): RX path adds two synchronization points per frame and one wait-queue wake-up per IRQ batch via `sdio_rx_work` → `rx_queue` → `bh_work` indirection.
+
+## §1 Goal (locked)
+
+Reduce per-RX-frame overhead enough that observed receive ≥ 1.0 MB/s sustained @ 4 MB/s sender, with `_raw_spin_unlock_irqrestore` < 15 % CPU during the 4 MB/s window. No 30-min cascade to link-death.
+
+(This is a partial step toward Phase 1's full target of ≥ 2 MB/s sustained @ 4 MB/s with < 10 % lock CPU. The full target is jointly addressed by Patch C + Patch C2; Patch C alone should *cross half the gap*.)
+
+## §2 Situation
+
+- `bes2600.ko` srcversion `1B3B3ED0…` deployed on ohm (c-stack + Patch A + Patch B).
+- `cleanups` branch on `marfrit/bes2600-dkms` is the current source-of-truth.
+- Build sandbox `/var/tmp/c6-sandbox/` on ohm, native `make -j4`.
+- `BES2600_RX_IN_BH` is **defined** in Makefile — `bes2600_bh_rx_helper` is the active RX consumer.
+- ohm reachable. Markus pushes the reboot button; never me.
+- Test rig under `/root/bes2600-samples/` — `rep-trace.sh` per-rep capture script.
+
+## §3 Baseline measurements
+
+Reused from Bug #5 Phase 0 (above). No re-anchor needed for Patch C — same regime.
+
+**Specific Phase-3-units that this plan's predictions reference:**
+
+| metric | tool | current value (4MB/s window) |
+|---|---|---|
+| observed receive throughput | netcat receiver byte-count | 75–725 KB/s, rep-variance high |
+| `_raw_spin_unlock_irqrestore` CPU% | perf record / report | ~20% |
+| `workqueue_execute_start`/sec | ftrace `workqueue:workqueue_execute_start` | ~5,643/sec system-wide |
+| `bes_sdio` workqueue dispatches | `cat /sys/kernel/tracing/events/workqueue/.../filter` filtered by `bes_sdio` | not measured pre-patch — **TODO before Phase 6** |
+| RX SKB rate at mac80211 boundary | trace `mac80211:drv_rx_irqsafe` count | not measured pre-patch — **TODO before Phase 6** |
+
+Phase 6 must not start until the two TODOs above are filled in — otherwise Phase 7 has no reference point for the predicted-delta comparison.
+
+## §4 Plan
+
+### §4.1 What will be touched
+
+- `bes2600_sdio.c::sdio_rx_work` — the relay loop. After this patch, it still drains the SDIO bus into SKBs but **delivers SKBs directly into `wsm_handle_rx`** instead of `skb_queue_tail`-ing them onto `self->rx_queue`.
+- `bes2600_sdio.c::bes2600_sdio_extract_packets` — the inner per-SKB extractor. Changes the in-loop action from `skb_queue_tail(&self->rx_queue, skb)` to a direct call (or callback) into the wsm dispatcher.
+- `bes2600_sdio.c::bes2600_sdio_pipe_read` — becomes unused, removed.
+- `bh.c::bes2600_bh_rx_helper` — its `BES_SDIO_RX_MULTIPLE_ENABLE` branch is no longer reachable for RX (RX path no longer feeds bh). Either gate the helper, or remove the helper outright if `bh_rx` atomic is no longer raised on RX.
+
+### §4.2 What will NOT be touched
+
+- `ieee80211_rx_irqsafe()` call sites — that's Patch C2 (item 2).
+- TX path — `sdio_tx_work`, `bes2600_bh_tx_helper`, etc. Untouched.
+- `sdio_wq` workqueue alloc — stays. After patch it hosts only `tx_work` + `scan_work` + (briefly during patch) `rx_work`. Renaming is cosmetic and out of scope.
+- The bh thread itself — still runs, still handles TX, still watches the timeouts.
+- `bh.c` `#if 0` graveyard — separate hygiene patch, not bundled.
+- `__bes2600_irq_enable(1)` commented-out / `asm volatile("nop")` placeholder — **deferred** per `feedback_dont_patch_downstream_artifacts`. These are symptom-shaped; Patch C may dissolve them. Re-evaluate at Task #24 (post-Patch-E observation).
+- `bh_rx` / `bh_tx` atomic split — out of scope.
+
+### §4.3 Approach choice — Option A (sdio_rx_work direct delivery)
+
+Two structural options surveyed in PR #8 §2.1; recap:
+
+| | Option A: direct delivery from sdio_rx_work | Option B: subsume sdio_rx_work into bh thread |
+|---|---|---|
+| diff size | small | medium |
+| eliminates `rx_queue->lock` × 2 per frame | yes | yes |
+| eliminates `sdio_wq.rx_work` workqueue dispatch per IRQ | no | yes |
+| changes who calls `wsm_handle_rx` | sdio_wq context (already process context) | bh thread |
+| TX/RX SDIO bus contention | unchanged (sdio_rx_work and sdio_tx_work already share `bes2600_sdio_lock`) | adds bh ↔ sdio_tx_work contention on the SDIO mutex |
+| bisection isolation | clean: only the rx_queue handoff is removed | mixes "remove handoff" with "subsume thread" |
+
+**Choosing Option A.** Reasons:
+1. Smaller diff = clearer Phase-7 attribution. If RX KB/s rises, we know it was the rx_queue handoff, not the workqueue topology.
+2. Per Markus's PR #8 review: split was for bisection clarity. Option A is narrower than Option B.
+3. The remaining cost (per-IRQ `sdio_wq.rx_work` dispatch) is ≤ 1 dispatch per IRQ batch; multi-RX coalescing means several frames per dispatch. If Phase 7 of Patch C shows that dispatch IS the residual cost, that becomes a concrete data point and motivates a *measured* Option-B follow-up, not a speculative one.
+
+### §4.4 Implementation sketch (preview — actual code in Phase 6)
+
+**Today** (`bes2600_sdio.c:783–831`):
+```c
+static int bes2600_sdio_extract_packets(...) {
+    for each packet:
+        skb = dev_alloc_skb(...);
+        memcpy(skb->data, &data[pos], packet_len);
+        spin_lock(&self->rx_queue_lock);
+        skb_queue_tail(&self->rx_queue, skb);    // ← handoff
+        spin_unlock(&self->rx_queue_lock);
+}
+static void sdio_rx_work(...) {
+    bes2600_sdio_extract_packets(...);
+    self->irq_handler(self->irq_priv);           // ← wakes bh_wq
+}
+// bh thread later: pipe_read = skb_dequeue(rx_queue) → wsm_handle_rx(skb)
+```
+
+**After patch** (sketch):
+```c
+static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8 *data) {
+    for each packet:
+        skb = dev_alloc_skb(...);
+        memcpy(skb->data, &data[pos], packet_len);
+        ret = wsm_handle_rx(self->core, wsm_id_from(skb), wsm_hdr_of(skb), &skb);
+        if (skb) dev_kfree_skb(skb);
+        // no rx_queue, no spinlock, no wake-up
+}
+static void sdio_rx_work(...) {
+    bes2600_sdio_extract_packets(...);
+    // self->irq_handler(...) is no longer called for RX-only wakes
+    // (it remains called for TX-confirm-completion paths, if any)
+}
+```
+
+Caveats discovered during sketch:
+- `wsm_handle_rx`'s signature wants `(hw_priv, id, wsm_hdr*, **skb)`. `extract_packets` doesn't currently parse the wsm header — we either parse it inline (cheap; the cost is one `__le16_to_cpu`) or defer parsing into a new `bes2600_sdio_deliver_rx(skb)` helper that wraps it.
+- `hw_priv` is reachable as `self->core`.
+- Need to verify `wsm_handle_rx` is callable from sdio_wq context. **Hypothesis:** yes, because today's bh thread is also process-context-via-workqueue and that's where wsm_handle_rx already runs. Phase 6 contract-cite from `wsm.h` / call-graph confirms.
+- The `irq_handler(self->irq_priv)` wakeup at sdio_rx_work:902 — keep it, but confirm whether bh actually has remaining work after RX is gone. Possibilities: TX-confirm completions (`wsm_release_tx_buffer`) still need a bh wake. Verify in Phase 6.
+
+### §4.5 Predicted delta (Phase 3 units)
+
+Conservative because Patch C is item 1 only, not items 1+2.
+
+| metric | predicted change | confidence |
+|---|---|---|
+| `rx_queue->lock` acquire/release rate | → 0 (lock is removed entirely; struct field deleted) | high |
+| RX-path wait-queue wakes (`bh_wq` from sdio_rx_work for RX) | → 0 (TX-confirm wakes remain) | high |
+| `_raw_spin_unlock_irqrestore` CPU% | 20 % → 12–15 % | **medium** — the rx_queue lock is one of several contributors; I don't have per-lock breakdown pre-patch |
+| `workqueue_execute_start`/sec | marginal change (≤ 5 %) | high — sdio_wq dispatch still happens per IRQ |
+| observed receive @ 4 MB/s | floor lifts from 75 KB/s → ≥ 1.0 MB/s; rep-variance shrinks | **medium** — rep 3's link death has multiple causes (decrypt-storm path is Patch A's territory; AP-side `aid 30` rejection is also possible) |
+| Phase 7 N=3 outcome | all reps ≥ 1 MB/s sustained for 30 min @ 4 MB/s | **medium** |
+
+**Honest acknowledgement:** the medium-confidence predictions are the ones where Phase 7 either confirms the model or surfaces a new bug. If `_raw_spin_unlock_irqrestore` only drops to 18 %, the next-largest contributor was something else — `pool->lock` (workqueue infrastructure) or `ba_lock` — and Patch D/E/C2 become the answer.
+
+### §4.6 Risks
+
+1. **`wsm_handle_rx` not callable from sdio_wq**: low probability (process context, same shape as today's bh), but a cite-failure here means revert to Option B. **Phase 6 must produce a `wsm.h` contract citation** before code lands.
+2. **TX-confirm wake-ups stop firing**: if `wsm_handle_rx` was the only thing that ultimately bumped `bh_tx`, removing it from bh's input causes TX-confirm starvation. Mitigation: keep `irq_handler(irq_priv)` call in sdio_rx_work for now; let the bh's wait_event re-evaluate `bh_tx` on every wake. **Verify in Phase 6 that `wsm_release_tx_buffer` still wakes bh.**
+3. **SKB allocation under memory pressure**: `dev_alloc_skb` in extract_packets currently `msleep(100)` retries up to 10×. Calling `wsm_handle_rx` directly from extract_packets keeps us in sdio_wq context during sleep; that's the same as today, so no new risk.
+4. **rcu / locking invariants in `wsm_handle_rx`**: it traverses `priv->vif_list`, may grab `priv->vif_lock`. Currently called from bh thread. After patch: called from sdio_wq context. Both are process context, both can sleep. No new risk *unless* there's a held lock at sdio_wq level that wsm_handle_rx tries to re-acquire. **Phase 6 lock-graph audit required.**
+5. **`bes2600_chrdev_is_bus_error()` early-return**: currently checked in `pipe_read`. After patch, must move into `extract_packets` or `sdio_rx_work` so RX during a bus-error window still gets dropped, not passed to mac80211.
+6. **Multi-vif RX serialization**: the `rx_queue` is per-sbus_priv, not per-vif. After patch, multi-vif demux happens inside `wsm_handle_rx` (same as today). No new risk; same ceiling.
+
+### §4.7 Phase 5 review handover
+
+Goal/Situation/Measurements/Plan paste verbatim into DokuWiki when Markus initiates handover. **Do not curate** the plan for the reviewer — including the "medium-confidence" predictions and the §4.6 risk list verbatim. Reviewer should see the same uncertainty I have.
+
+### §4.8 Phase 7 protocol (after Phase 6 lands)
+
+Per `feedback_phase7_stress_ramp.md` — **stress ramp, not steady cap**:
+
+1. Pre-patch baseline (re-anchor): 5 min @ 1 MB/s, 10 min @ 2 MB/s, 30 min @ 4 MB/s. Capture ftrace `workqueue/`, `lock/`, `mac80211/`, `mmc/`. perf record during the 4 MB/s window.
+2. Apply Patch C, install, reboot (Markus pushes).
+3. Post-patch: identical ramp, identical instrumentation.
+4. Compute deltas in **the same units** as §3 baseline. Compare to §4.5 predictions. Any unexplained delta is a finding, not a footnote — log it and loop back to Phase 4 if the model is wrong.
+5. **N=3 reps** post-patch. The user's stress-ramp memory and the receipts checklist both require this.
+6. Capture `sdio_work_debug` output and `dmesg` if any storm fires (Patch A's counter should hold steady).
+7. If Phase 7 numbers match prediction → Phase 8 memory update + proceed to Patch C2.
+8. If they don't match → loop back to Phase 4. Don't paper-fix.
+
+## §5 Out-of-scope items recorded for follow-on patches
+
+- **Patch C2**: items 2 — `ieee80211_rx_list` batch delivery. Gated on Task #19 kerneldoc verification.
+- **Patch D**: ba_lock atomicization at `txrx.c:998-1005, 1632`. Independent.
+- **Patch E**: ps_state_lock skip when `pm_unsupported = true` at `txrx.c:1942-1948`. Independent, gated on c7 latch.
+- **Task #24**: post-Patch-E observation of bh.c `asm volatile("nop")`, commented-out `__bes2600_irq_enable(1)`, BUG_ON in steady-state hot path. Symptom-shaped; observe before patching.
+- **Task #25**: measure `sw_mci_check_r1_ready` on RK3566 during testing.
+
+---
+
+*Plan written 2026-05-07 by Claude (noether). Awaiting Phase 5 second-model review on DokuWiki, initiated by Markus.*
@@ -0,0 +1,135 @@
+# Bug #5 campaign — Phase 1 metric + Phase 2 situation
+
+Date assembled: 2026-05-07
+Module under test (baseline): bes2600.ko srcversion `1B3B3ED096AAD7217FEDE11`
+                              (cleanups + Patch A + Patch B)
+
+Phase 0 anchor at N=3 reps (10 min each, 4 MB/s pv-cap on boltzmann → ohm
+2.4GHz 5b:32) reproduces the throughput regression and traces it to lock-
+cycling cost in the bes2600 BH path. See `notes/observed-bugs.md` Bug #5
+for the original report. This document locks Phase 1 and prepares Phase 4
+candidates.
+
+---
+
+## Phase 1 — measurable target (locked)
+
+> *Reduce the bes2600 BH path's spin-unlock-irqrestore cost so that under sustained 4 MB/s sender pressure on a healthy 2.4GHz link (signal -55 to -65 dBm), ohm sustains ≥ 2 MB/s observed RX throughput (vs 663–725 KB/s baseline at N=3) AND the link survives ≥ 30 min continuous load without cascading into beacon-loss disconnect (vs rep3's failure at ~9 min).*
+
+Three measurable outcomes, single sentence:
+
+- **(a) Throughput floor**: ≥ 2 MB/s sustained RX at ohm
+- **(b) Lock-cycle ceiling**: % CPU in `_raw_spin_unlock_irqrestore` from `bes2600_bh.isra.0` callstack drops to < 10 % (currently ~10 % rep1, ~16 % rep3)
+- **(c) Cascade prevention**: no link death under continuous 30 min @ 4 MB/s
+
+---
+
+## Phase 0 anchor — receipts
+
+### Reproduction protocol (same units as Phase 7 will use)
+
+1. boltzmann: `pv -L 4M -q < /dev/zero | nc ohm.fritz.box 12345`
+2. ohm: `sudo $RUN/rep-trace.sh 600` (10 min capture window)
+3. Rep dirs: `bug5/rep-<ts>/{mmc.log, perf.data, rx_bytes.tsv, start.txt, end.txt}`
+4. N=3 reps with 60 s cooldowns
+
+### Observed (2026-05-07 15:36–16:08)
+
+| rep | duration | avg KB/s | near-zero ticks | end state |
+|---|---|---|---|---|
+| 1 | 600 s | 725 | 1/119 | associated, MCS 6 |
+| 2 | 600 s | 663 | 5/119 | associated, MCS 4 |
+| 3 | 600 s |  75 | 53/119 | **passive (link died at sample ~82, ~9 min in)** |
+
+mmc transaction rate: rep1 = 5793/s sustained, rep3 = 6000/s for first ~10s then collapse to <100/s.
+
+### Hot-symbol receipts (perf record on `bes_sdio | bes2600_bh` kworkers)
+
+| symbol | rep 1 (healthy) | rep 3 (cascade) |
+|---|---|---|
+| `_raw_spin_unlock_irqrestore` (sum across kworker variants) | **~19 %** | **~21 %** |
+| `handle_softirqs` | 5.4 % | 4.3 % |
+| `__tasklet_schedule` | 2.4 % | 2.0 % |
+| `dw_mci_start_command` (SDIO host) | 1.5 % | < 0.6 % |
+| `bes2600_sw_retry_requeue` | 0.79 % | 0.70 % |
+
+Top callchain leading to `_raw_spin_unlock_irqrestore`:
+
+```
+process_one_work → worker_thread → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0
+```
+
+---
+
+## Phase 2 — situation analysis
+
+### Relevant source pins
+
+- `bes2600/wsm.c:98` — `wsm_cmd_send()`, the function in the hot callstack. Body:
+  - holds `&hw_priv->wsm_cmd.lock` (spin) only briefly to fill the cmd struct (lines ~145–152)
+  - calls `bes2600_bh_wakeup()` then `wait_event_interruptible_timeout` for the response
+  - outer lock: `down(&hw_priv->wsm_cmd_sema)` from callers (`wsm_cmd_lock()` at wsm.c:105)
+- `bes2600/bh.c:435,559,847` — `bes2600_bh_work()` takes `&hw_priv->vif_list_lock` 3× per pass through the dispatcher
+- `bes2600/bh.c:172,195,487,581,861,1361,1427` — multiple `wait_event*` calls; the BH thread cycles through wait/wake/dispatch
+
+### What the lock-cycling cost is buying
+
+Each WSM command from the host (mac80211, NM, kernel scan etc.) goes through the same path:
+1. caller acquires `wsm_cmd_sema` (outer)
+2. `wsm_cmd_send()` acquires `wsm_cmd.lock`, fills the struct, releases the lock
+3. `bes2600_bh_wakeup()` schedules the BH
+4. BH dispatches the command, takes `vif_list_lock` to walk vifs
+5. BH talks to chip via SDIO
+6. response arrives, BH wakes the waiter
+7. caller releases `wsm_cmd_sema`
+
+Under sustained TCP RX, mac80211 issues lots of small WSM commands (TX-scheduling hints, rate updates, etc.) — every one cycles through this path. The spin-unlock-irqrestore cost is the floor on this cycle rate.
+
+### What's been ruled out
+
+- AP-side bug (AVM Fritz!Box, reliable per Markus's testimony — see `feedback_phase7_stress_ramp.md` reasoning in the campaign-so-far prior).
+- Patch A / Patch B (target different triggers; would not address lock cost).
+- Decrypt-failure storm (Patch A handles; this regression occurs in rep1 with zero decrypt-fails).
+- mac80211 scan-fail / scan-comeback (cosmetic; doesn't account for the throughput floor).
+
+---
+
+## Phase 4 — candidate plans (preliminary, not locked)
+
+Three candidates surfaced from the perf data. Listed cheapest to most-invasive.
+
+### B5-1 — Reduce `wsm_cmd_send` lock scope
+
+The spin_lock around the cmd-struct fill (wsm.c:145–152) can probably be replaced with `WRITE_ONCE` of a single sentinel field, since the BH thread reads these fields cooperatively (BH only reads after `bh_wakeup` schedules it, and only writes back via the response path). Eliminates the per-command spin-lock cycle for the host side.
+
+**Risk**: race with BH if the protocol isn't strictly happens-before. Need to read bh.c:486-500 (where BH reads wsm_cmd.ptr) carefully.
+
+**Predicted delta**: small but measurable. Maybe 2–3 % CPU off the lock floor.
+
+### B5-2 — Coalesce `vif_list_lock` in BH dispatcher
+
+bh.c takes `vif_list_lock` 3× per dispatcher pass. If these 3 critical sections are within a single iteration, they should be merged into one acquire/release.
+
+**Risk**: vif teardown might depend on releasing the lock between iterations to allow concurrent vif removal. Needs careful audit.
+
+**Predicted delta**: significant under multi-vif (we're single-vif STA today, so probably small immediate gain).
+
+### B5-3 — Move WSM send out of process context, use ringbuffer
+
+Replace the wsm_cmd_sema + wsm_cmd struct mechanism with an SPSC ringbuffer between caller and BH. Caller writes to ring, no lock needed (one producer); BH reads from ring, no lock needed (one consumer).
+
+**Risk**: significant rework. cw1200 ancestor doesn't have this; we'd be inventing it.
+
+**Predicted delta**: large — could halve the lock cost. But cost-to-implement is also large.
+
+---
+
+## Open question for the reviewer
+
+Which Phase 4 candidate to lock? My ranking by ROI:
+
+1. B5-1 (smallest, fastest, cleanest source pin) — try first
+2. B5-2 (medium, conditional on multi-vif applicability)
+3. B5-3 (largest rework, biggest potential)
+
+Or: instrument deeper before committing to a fix (e.g., add `tracepoints around wsm_cmd_send` enter/exit to measure lock holdtime distribution, not just CPU%).
@@ -0,0 +1,96 @@
+# BES2600 WiFi-stability campaign — Phase 7 verdict (Patches A + B)
+
+Date assembled: 2026-05-07
+Module under test: bes2600.ko srcversion `1B3B3ED096AAD7217FEDE11`
+                   (cleanups + Patch A + Patch B)
+Run dir: `/root/bes2600-samples/run-20260507-1248-patchB/` on ohm
+
+Phase 7 verification window: 2026-05-07 12:48 → ~15:13 CEST (≈ 2 h 25 m)
+of which: ~50 min @ 1 MB/s pv-cap, ~1 h 30 m @ 4 MB/s pv-cap on 2.4 GHz
+newton (5b:32, signal -57 to -67 dBm).
+
+---
+
+## Result table (vs the Phase 4 predicted delta)
+
+### Patch A — decrypt-storm fast-recover (Trigger B)
+
+| metric | Phase 3 baseline | Phase 4 prediction | Phase 7-of-B observed |
+|---|---|---|---|
+| decrypt-burst rate | 8.18/h | unchanged | 2 bursts in ~22 min once 4MB/s pressure was on |
+| AP-deauth-6 rate following burst | 100 % escalation | ≤ 0.2 × baseline | **0/2 = 0 % escalation** |
+| recovery time given burst | up to 109 s | < 5 s | **~1 s** (×2) |
+
+**Verdict: predicted delta CONFIRMED at N=2.** CLAUDE.md ideal is N=3; we're directionally locked at 2 reproductions, both behaving as predicted (threshold trip → `[bes2600] decrypt-storm fast-recover: forcing reassoc` log line → mac80211 disassoc → userspace reauth in ≈1 s).
+
+#### Receipts (verbatim)
+
+```
+13:47:56  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
+13:47:57  wlan0: associated to cc:ce:1e:2b:74:17     (cross-BSSID, 1 s)
+13:49:26  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
+13:49:27  wlan0: associated to c0:25:06:e6:5b:32     (back home, 1 s)
+```
+
+`DecryptStormRecoveries: 2` exposed via debugfs at `/sys/kernel/debug/ieee80211/phy0/bes2600/vif_0/status`.
+
+### Patch B — connection-loss-storm bus_reset (Trigger A)
+
+| metric | Phase 7-of-A observed | Phase 4 prediction | Phase 7-of-B observed |
+|---|---|---|---|
+| api_connection_loss rate | 0.86/h | unchanged | 2 events in ~2 h (≈ 1/h) |
+| ConnectionLossStormRecoveries | n/a | trips on 3-in-60s bursts | **0** |
+| Threshold trip events | n/a | (when burst occurs) | **0** (events spaced 91 s apart) |
+
+**Verdict: installed but UNTRIGGERED.** The 3-in-60s threshold was never reached (max-cluster observed: 2-in-91s). No false positives, no spurious bus_resets. Predicted delta unobserved — same shape as Patch A's first Phase 7 run.
+
+The threshold may be too conservative for typical event rates (we'd need a true api_connection_loss flood to trip it). Tuning is a future Phase-1 question if more reproductions accumulate.
+
+### Trigger C — AP unprotected-deauth-6 cluster without preceding storm
+
+```
+12:53:10.475 → 12:53:11.756  AP fires 17 unprotected-deauth-6 from 5b:32 over 1.3 s
+                              (2 mgmt-TX no-ack from our chip in the middle)
+12:53:12.309  kernel: deauthenticating ... reason 2 = PREV_AUTH_NOT_VALID
+12:53:14–15  reauth via 61:b0 → 5b:32, recovery in ~4 s
+```
+
+Neither Patch A (zero decrypt-fails preceded) nor Patch B (zero api_connection_loss) fired. Background: AVM Fritz!Boxes (newton) are reliable; the AP correctly classified ohm's frames as Class 2 from non-auth, meaning **bes2600 sent something the AP couldn't authenticate**. New backlog entry: `notes/observed-bugs.md` Bug #5 (RX path under throughput pressure) is the leading hypothesis surface.
+
+Recovery was fast (4 s) so this isn't a P0 — but a Patch C investigation is warranted when prioritized.
+
+---
+
+## Bug #5 — RX path degradation under attempted-throughput pressure (NEW)
+
+```
+sender 1 MB/s  →  ohm receives 1015 KB/s,  -57 dBm,  RX MCS 4
+sender 4 MB/s  →  ohm receives  563 KB/s,  -67 dBm,  RX MCS 3
+```
+
+Higher attempted-throughput on the sender side → LOWER observed throughput at ohm. Signal degraded ~10 dB, MCS dropped a notch. Link-physical max is ~8 MB/s; we're getting ~7 % of that under load.
+
+**Hypothesis (Markus): driver/firmware locks itself to death under busy reads.** Plausibly the same root-cause as the Phase 0 YouTube DASH chunk-fetch drops (~10 frames per chunk fetch on hardware-decoder playback). Documented as Bug #5 in `notes/observed-bugs.md`.
+
+---
+
+## Lessons captured for memory (Phase 8 anchor)
+
+1. **Stress-rate matters for verification.** Patch A's predicted delta only became observable when the netcat cap went 1 → 4 MB/s. The previous Phase 7 (10h30m @ 1 MB/s) saw zero decrypt-storms. Future Phase 7 protocols should plan a stress ramp from steady to near-saturation, not just the steady setting.
+2. **"Untriggered, no harm" is a valid Phase 7 verdict** for installed patches. Patch B fits this exactly. The patch is ready; the trigger pattern just doesn't fire often enough in this RF / load regime to verify the recovery delta. Don't let unobserved verifications block the loop.
+3. **Build infrastructure on `cleanups` not `mobian`.** The Phase 6 attempt to base Patch B on mobian forced a refactor mid-flight; the c-stack lives on cleanups, and re-using c5.2's `bes2600_chrdev_do_bus_reset` requires that. The cleanups branch is the campaign's working trunk.
+4. **AP-side bug is unlikely on AVM hardware.** AVM Fritz!Boxes don't fire spurious deauth-6 storms. When ohm sees AP-deauth-6 unprovoked, the suspect chain is bes2600 sending something the AP can't authenticate. The bias toward "bes2600 is the broken thing" is empirically validated.
+5. **AP-deauth-6 can fire without our local triggers.** Trigger C is a real failure mode neither Patch A nor B addresses. Adding a Phase-1-style metric for "AP-deauth-6 rate without preceding decrypt-storm or api_connection_loss" would surface Trigger C cleanly.
+6. **`pv -L` cap interacts with TCP retransmit recovery.** When the link can't sustain the cap, TCP backs off and pv blocks. Observed throughput is then a **floor on chip RX capacity at that signal level**, not the sender's intent. Useful for chip-load-characterization, but the cap should be set based on observed pull-rate, not on the link's nominal MCS rate.
+
+---
+
+## Loop status
+
+- Phase 7: closed.
+- Patch A: confirmed (N=2). Stays in.
+- Patch B: installed, dormant in this regime, no harm. Stays in.
+- Bug #5: backlog, no patch yet. Documented.
+- Trigger C: backlog candidate, no patch yet. Documented.
+
+Next campaign cycle would be re-anchoring Phase 0 around Bug #5 or Trigger C.
Author	SHA1	Message	Date
claude-noether	f232476240	notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only) Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out and gated on Task #19 kerneldoc contract verification. Approach choice: Option A (sdio_rx_work delivers directly into wsm_handle_rx, removing rx_queue and its two synchronization points per frame) over Option B (subsume into bh thread). Option A has a smaller diff and clearer bisection story; the residual per-IRQ workqueue dispatch is preserved as a measurable Phase 7 data point that motivates or doesn't motivate a follow-on Option-B patch. Predicted delta in Phase 3 units, with confidence levels stated explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract citations (wsm_handle_rx callability from sdio_wq context; wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp. Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON in hot path) explicitly deferred to Task #24 per feedback_dont_patch_downstream_artifacts. Awaiting Phase 5 second-model review on DokuWiki.	2026-05-07 19:04:53 +02:00
marfrit	08c7aafb48	Merge pull request 'notes: Opus second-opinion BES2600 WiFi structural critique' (#8 ) from claude-noether-6 into main Reviewed-on: #8 Reviewed-by: Markus Fritsche <mfritsche@reauktion.de>	2026-05-07 16:58:55 +00:00
claude-noether	809e3cce84	notes: opus second-opinion BES2600 WiFi structural critique Independent code-review writeup (Opus 4.7) against Sonnet's review of the same tree. Concurs with Sonnet on items 1+2 (RX relay, batch delivery) and items 4+5 (ba_lock atomics, ps_state_lock skip-when-pm_unsupported); pushes back on the "9 workqueue events per frame" quantification and records BES_SDIO_OPTIMIZED_LEN as hard-baked rather than togglable. New findings: cw12xx-not-bes2600 genealogy still active in source, ~700 lines of #if 0 fossil in bh.c, Allwinner-specific sw_mci_check_r1_ready in the SDIO bus path, asm volatile("nop") placeholder where IRQ re-enable used to live, BUG_ON in steady-state hot path, vendor-SDK Makefile shape that pollutes every diff, 8 EXPORT_SYMBOLs from a nominally-single-binary module. Recommends ordering: Patch C (1+2 wrapped) high-risk-first, Patches D+E as small individually-verifiable cleanups, explicit don't-touch list. Notes ieee80211_rx_list contract verification (task #19) blocks Patch C.	2026-05-07 18:12:54 +02:00
marfrit	4344873f2d	Merge pull request 'Sonnet architect review for Bug #5 — ranked restructuring map' (#7 ) from claude-noether-5 into main Reviewed-on: #7	2026-05-07 16:01:55 +00:00
claude-noether	679083d1aa	notes: Sonnet architect review for Bug #5 — ranked restructuring map Sonnet (general-purpose subagent, model=sonnet) reviewed ~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement context. Output: 8-item ranked restructuring map, file:line cited. Headline: - Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue dispatch reduction, medium effort) - Item 2: batch deliver via ieee80211_rx_list (small effort, removes per-frame softirq) - Items 1 + 2 together collapse "9 workqueue events per delivered frame" to ~1. Items 3-5 clean up next-layer overhead (TX-side queue_work, per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8 are follow-ons to be re-measured after 1-3 land. Phase 4 plan locking the lead candidate(s) follows in a separate PR.	2026-05-07 17:38:16 +02:00
claude-noether	594f73c6b4	notes: Bug #5 root cause refined — workqueue-per-SDIO-transaction is the floor Follow-up ftrace measurement (post-reboot, 3-min 4MB/s capture): - workqueue_execute_start: 5,643/sec ← dominates - wsm_cmd_send: only 13/sec (host-to-chip command path NOT the hotspot) - lock contention: 50/sec (modest) The throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. Surgical patches B5-1/B5-2/B5-3 from the prior Phase 4 plan all targeted the wrong layer; deferring those until an architectural restructuring map is produced. Promoting the Sonnet architect review from "backlog" to "blocking on Bug #5" — the next step is a restructuring assessment, not another patch.	2026-05-07 17:31:31 +02:00
claude-noether	928268f477	notes: backlog Sonnet architect review of bes2600 driver Per PR #6 review feedback. Independent track from Bug #5; scheduled once the Bug #5 measurement pass finishes.	2026-05-07 16:38:58 +02:00
marfrit	425eb92456	Merge pull request 'Bug #5 Phase 1 metric + Phase 0 anchor receipts' (#6 ) from claude-noether-4 into main Reviewed-on: #6	2026-05-07 14:37:29 +00:00
claude-noether	1830c17891	notes: Bug #5 Phase 1 metric + Phase 0 anchor receipts Phase 0 anchored at N=3 reps (10min @ 4MB/s pv-cap on 2.4GHz): - rep1+2: ~700 KB/s sustained (10% of link capacity) - rep3: link death at ~9 min in (passive mode, beacon-loss cascade) Hot symbol identified: _raw_spin_unlock_irqrestore at ~20% CPU in both healthy and failed reps, callstack process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0 → spin-unlock. Phase 1 metric locked: ≥2 MB/s sustained throughput, <10% CPU in lock- cycling, no link death under 30 min continuous load. Three Phase 4 candidates drafted (B5-1: shrink wsm_cmd_send lock scope; B5-2: coalesce vif_list_lock in BH dispatcher; B5-3: SPSC ringbuffer for WSM commands). Locking pending review.	2026-05-07 16:32:45 +02:00
claude-noether	69a1d0f8b1	notes: phase 7 verdict — Patch A confirmed, Patch B dormant Phase 7 verification of cleanups + Patch A + Patch B (srcversion 1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load ramped 1 MB/s → 4 MB/s on 2.4GHz newton. Patch A: predicted delta CONFIRMED at N=2 reproductions. - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation Patch B: installed, untriggered. 2 api_connection_loss events spaced 91 s apart, never tripping the 3-in-60s threshold. No false positives, no spurious bus_resets. Recovery delta unobserved (no harm done). Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth in ~4 s. New backlog item. Bug #5 documented separately (RX path degrades under throughput pressure; possible root of the original Phase-0 YouTube frame drops).	2026-05-07 15:18:36 +02:00
claude-noether	458ad36f8b	notes: backlog Bug #5 — RX path degrades under throughput pressure Observed 2026-05-07: bumping the netcat sender from 1 MB/s to 4 MB/s DECREASED ohm's observed RX rate (1015 KB/s → 563 KB/s) and degraded the link (signal -57 → -67 dBm, MCS 4 → 3). Chip can't sustain near- link-rate RX even though theoretical capacity is ~8 MB/s. Hypothesis: driver/firmware lock contention or busy-wait on the RX SDIO path. Plausibly explains the original Phase-0 observation that YouTube DASH chunks drop ~10 frames per chunk fetch — chunk fetch is a brief near-line-rate burst that this bug would be triggered by.	2026-05-07 13:56:36 +02:00
marfrit	ea509e810f	Merge pull request 'Phase 4 plan: Patch B (Trigger A / api_connection_loss)' (#5 ) from claude-noether-3 into main Reviewed-on: #5	2026-05-07 10:45:28 +00:00