notes: Patch C2 Phase 4 plan — ieee80211_rx_irqsafe to ieee80211_rx_list

After Patch C v3 / D / E / F / G all merged, the remaining cleanup target is the per-RX-frame tasklet defer that ieee80211_rx_irqsafe introduces. Patch C2 migrates all 6 call sites in bes2600 to ieee80211_rx_list, the process-context API verified per the kerneldoc audit (Task #19, mainline include/net/mac80211.h:5324-5345). Key constraints from kerneldoc: - cannot mix _list and _irqsafe for the same hardware (=> all 6 sites convert atomically) - requires local_bh_disable + rcu_read_lock wrap - calls must be synchronized for a single hardware (=> bh-thread-as-sole-RX-context post-v3 satisfies trivially) Plan §4.2 design decision: per-batch wrap (Option B), wrapping bes2600_sdio_read_rx_batch outer loop, rather than per-call wrap. Captures the actual batch benefit. Open questions for the Phase 5 reviewer: 1. rx_list draining semantics — does mainline expect explicit netif_receive_skb_list at end-of-batch, or does mac80211 internal-deliver? Need to verify by reading mt76 / iwl_pcie usage before Phase 6 lands. 2. beacon path (wsm.c:2415) SKB ownership — hw_priv->beacon is long-lived; after _rx_list consumes it, the field would be dangling. Audit before Phase 6. Predicted throughput delta: +5-15% over v3 N=3 baseline (2.352 MB/s), medium confidence. Smaller-than-expected delta = "marginal but no regression, ship for upstream-cleanliness". Phase 7 N=3 ramp uses wired enu1 path + per-rep fresh nc listener per the rig-failure-is-finding lesson.
notes: Bug #5 RX-degradation campaign — Phase 0 plan (#13 )
2026-05-08 00:42:50 +02:00 · 2026-05-07 21:52:14 +00:00 · 2026-05-07 21:51:28 +00:00 · 2026-05-07 23:23:31 +02:00 · 2026-05-07 23:08:51 +02:00 · 2026-05-07 19:41:44 +00:00
4 changed files with 500 additions and 0 deletions
@@ -0,0 +1,108 @@
 # Bug #5 RX-degradation campaign — Phase 0
 **Date:** 2026-05-07
 **Module under test:** v3 + F (`bes2600.ko` srcversion `371C6606B73AF19299228CA`)
 **Hardware:** ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 fallback path live.
 ---
 ## Research question (locked)
 > **Why does the bes2600 RX path collapse from ~2 MB/s sustained @ fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic `wsm_generic_confirm failed for request 0x0007` + `ieee80211 phy0: [SCAN] Scan failed (-22)` every 300 s in the intervening window?**
 Reproduces on Patch B, Patch F, and Patch C v3 alike — independent of the relay/race issues v3 addressed. Side-effect that was masked by the throughput floor while v2's race was the dominant variable.
 ## Predecessor data (reference, not anchor)
 | source | observation |
 |---|---|
 | Patch C v3 N=3 (uptime 200/391/582 s) | mean 2.352 MB/s @ 4 MB/s sender |
 | v3 single rep at uptime ~28 min (rep 2 of 2026-05-07 22:23) | 180 KB / 5 min = 600 B/s, sender saw "Connection reset by peer" |
 | v3 single rep at uptime ~47 min (N=3 first attempt 22:42) | 55 KB / 5 min = 180 B/s, sender timed out (exit 124) |
 | dmesg pattern observed at 47-min uptime | scan failures every 301-302 s starting at uptime 778 s (~13 min) |
 The shape: **fresh chip → linear data flow at ~2 MB/s sustained → sometime around 13 min uptime, NetworkManager-triggered scans start failing → sometime around 28 min uptime, data throughput collapses to <1 KB/s while link still shows associated.**
 Predecessor data is reference. Phase 0 will re-anchor at N=1 long-trace + 5 in-window stress probes; if the pattern doesn't reproduce, that's the campaign result.
 ## Mechanism candidates (Phase 4 will discriminate)
 1. **Firmware-side resource exhaustion.** Per-scan or per-WSM-event accumulation in chip-side state. Scan-failed -22 (EINVAL) suggests firmware refusing the request — possibly out of scan handles, scan-buffer slots, or some other limit.
 2. **NetworkManager scan-fail recovery loop.** Each failed scan triggers NM retry. If retry overhead dominates the bh thread, data path starves. Verifiable by suppressing NM scans.
 3. **AP-side rate limiting.** Newton (AVM) AP could be applying QoS / fairness / probation after sustained 4 MB/s burst. Verifiable by Fritz!Box log access (Markus has it) or by switching to a different AP.
 4. **PSM state machine deadlock.** c7's `pm_unsupported` self-detect was supposed to handle this, but the latch state could become stale if a real PM_IND arrives mid-operation. Verifiable by `chip_pm_state` debugfs read at degradation onset.
 5. **SDIO bus clock degradation / mmc retune.** SDIO retune with `retune_protected` flag interacts with bes2600's data path. Verifiable by ftrace `mmc/mmc_request_*` event correlation with throughput drop.
 6. **Power-management busy-event accumulation.** `bes2600_pwr_set_busy_event` counters might leak — busy events not cleared lock the chip awake (no PSM) but also exhaust event capacity. Verifiable by `bes2600_pwr_busy_event_record` dump.
 ## Phase 0 measurement protocol (rig armed 2026-05-07 23:18:58 CEST, T0=1778188738)
 Capturing for 35 minutes from fresh boot. All capture lives in `/root/bes2600-samples/run-20260507-bug5-degradation-rig/` on ohm.
 ### Always-on streams
 | stream | tool | output |
 |---|---|---|
 | ftrace events | per-event `enable=1` | `trace.log` (via `trace_pipe`) |
 | cfg80211 events | `iw event -t -f` | `iw-event.log` |
 | kernel printks | `dmesg -wT` | `dmesg.log` |
 | netdev counters | per-30s shell loop | `snap.log` |
 ### ftrace event set
 - `workqueue/workqueue_execute_start` — work dispatches
 - `workqueue/workqueue_queue_work` — work submissions
 - `mac80211/api_beacon_loss` — driver beacon-loss events
 - `mac80211/api_connection_loss` — driver-side conn-loss
 - `mac80211/api_disconnect` — driver-side disconnect
 - `mac80211/drv_hw_scan` — mac80211 → driver scan dispatch
 - `mac80211/drv_set_key` — key state changes
 - `cfg80211/rdev_assoc` — assoc requests
 - `cfg80211/rdev_deauth` — deauth requests
 - `cfg80211/rdev_disassoc` — disassoc requests
 - `cfg80211/cfg80211_assoc_comeback` — AP-side assoc-busy throttling
 - `cfg80211/cfg80211_send_auth_timeout` — auth timeouts
 - `cfg80211/cfg80211_scan_done` — scan completions
 - `power/suspend_resume` — PM transitions
 - `mmc/mmc_request_start` / `mmc_request_done` — bus-level transactions
 ### Scheduled stress probes
 Sender on boltzmann (`/tmp/bug5-probe-loop.sh`) fires `pv -L 4m | nc ohm 12345` for 30 s at T+5/10/15/20/25 min. Each probe brackets uptime, RX-bytes pre, RX-bytes post, elapsed. Throughput-vs-uptime curve falls out of the snap.log + probe boundaries.
 Probe markers logged via `logger -t bes2600-bug5 PROBE_N_START/END` so they appear in dmesg.log timeline.
 ## Anti-theatre receipts (must tick before claiming Phase 0 done)
 - [ ] In-session baseline: long-capture across degradation window, N=1 for now; re-run if anomalous
 - [ ] ftrace events actually firing (verify by tail of trace.log mid-capture)
 - [ ] dmesg captures the scan-failure pattern timestamp (expected ~uptime 778 s)
 - [ ] Probes actually transferred data at fresh chip (T+5 should be > 1 MB/s)
 - [ ] At least one probe in-window after scan-failure onset (expected: T+15 or T+20)
 - [ ] Snap.log shows monotonic counter behaviour (no rx_bytes going backwards)
 ## Phase 1 hypothesis (provisional, refine after Phase 3 data)
 Metric candidate: **probe throughput as function of uptime, with state-transition markers (first `wsm_generic_confirm 0x0007 failed`, first `[SCAN] Scan failed (-22)`, first NetworkManager-deauth-and-reassociate)**.
 Discriminator question: does throughput collapse abruptly at the first scan failure, or gradually over a window? Abrupt = single-event causation; gradual = accumulator.
 ## Phase 4 candidates (post-Phase-3)
 Depending on which mechanism (1-6) Phase 3 surfaces:
 - (1) firmware resource exhaustion: report to upstream; possibly disable NetworkManager scans pending firmware fix.
 - (2) NM scan-fail loop: configure `wpa_supplicant` to skip scans; or add scan-failure handling in driver to dampen retry cascade.
 - (3) AP-side: switch APs for testing; report to AVM if reproducible.
 - (4) PSM deadlock: extend c7 latch with timeout-or-progress recovery.
 - (5) SDIO retune: ftrace correlation guides the lock-ordering fix.
 - (6) PWR busy-event leak: audit set/clear pairs; add a warning-when-stale.
 ## Out-of-scope
 - Patch C v3 closure (PR #5 merged, Phase 7 done).
 - Patch C2 (`ieee80211_rx_list` batch) — gated on Task #19 kerneldoc.
 - Patch D / E independent.
 - Reproduction at higher rates (8 MB/s ramp) — defer to Phase 4 once mechanism identified.
 ---
 *Phase 0 plan written 2026-05-07 23:21 CEST by Claude (noether), at the close of Patch C v3 Phase 7.  Rig armed; long capture in flight; probes scheduled at T+5/10/15/20/25 min.  Post-capture analysis will populate Phase 3 results before Phase 4 plan branches off.*
@@ -0,0 +1,127 @@
 # Patch C v3 — Phase 4 Plan: drop sdio_rx_work, match cw1200 architecture
 **Author:** Claude (noether)
 **Status:** Phase 4 v3 — supersedes v2 (PR #10) after cw1200 mainline survey showed the race-free path is structural, not lock-based.
 **Decision:** drop the `sdio_rx_work` workqueue entirely; SDIO IRQ wakes `bh_wq`; bh thread does the SDIO read inline.  Restores single-writer-from-bh invariant on `hw_bufs_used` *by construction*.  No `atomic_t` prep needed.
 ---
 ## §0 Why v3 supersedes v2
 PR #10's plan was: convert `hw_bufs_used` etc. to `atomic_t` (prep), then direct-deliver from `sdio_rx_work` (structural).  That was a workaround for the race that *only existed because of the relay*.
 The cw1200 mining (`~/src/linux-rockchip`, 228 cw1200 commits) showed the upstream answer: there is no relay.  cw1200's IRQ handler bumps `bh_rx` and wakes the bh thread; the bh thread does the SDIO read itself inside `cw1200_bh_rx_helper` (`drivers/net/wireless/st/cw1200/bh.c:233`).  Single thread = single writer for `hw_bufs_used` = no race.  Same `int hw_bufs_used` as bes2600, never atomic_t'd in 16 years upstream because it never needed to be.
 Patch C v3 brings bes2600 into that shape.  The structural simplification is bigger than v2's diff but lands the right architecture in one move.
 ## §1 Goal
 Same as Patch C v2 §1: ≥ 1 MB/s sustained receive @ 4 MB/s sender, < 15 % `_raw_spin_unlock_irqrestore` CPU%, no 30-min cascade to link-death.  Stretch toward Phase 1's full 2 MB/s once Patch C2 (rx_list batch) lands separately.
 ## §2 Situation
 - Cleanups branch is at Patch F merged (commit `b717251`).  All Phase 5 reviews of the F series merged via PR #4.
 - ohm rebooted with F module live (srcversion `A9438692D6A8698F92AEEA1`) — F is the new baseline for Patch C v3 Phase 7 comparison.
 - Wired path `enu1` at `192.168.88.80` survives bes2600 wedges; lmcp `ohm` still goes through wlan0.  Phase 7 telemetry collection over enu1.
 - Reboot-permission override active (ohm dev-allocated; I can `sudo reboot` directly — `feedback_user_pushes_reboot_button` override clause).
 ## §3 Baseline measurements
 Carry forward from `run-20260507-patchC-preflight/baseline.tsv` (N=1, F-less Patch B module):
 | metric | value |
 |---|---|
 | observed receive @ 4 MB/s | 1.362 MB/s |
 | sdio_rx_work dispatches | 86.4/s = 90.3 per 1000 RX packets |
 | sdio_tx_work dispatches | 276.1/s |
 | bes2600_bh_work redispatches | 0 (single long-lived) |
 **Phase 6 prereq:** capture an N=3 baseline ON THE F MODULE before Patch C v3 code lands.  Same instrumentation, same stress ramp.  This is the post-F / pre-v3 reference.  Without it, Phase 7's delta is C+F vs B+nothing — confounded.
 ## §4 Plan v3
 ### §4.1 What gets eliminated
 - **`sdio_rx_work` (bes2600_sdio.c:829)** — function deleted.  No longer queued, no longer runs.
 - **`self->rx_work` work_struct** — field deleted from `struct sbus_priv`.  `INIT_WORK` removed.
 - **`self->rx_queue` + `self->rx_queue_lock`** — fields deleted.  `skb_queue_head_init` removed.  No SKB ever queued there.
 - **`bes2600_sdio_pipe_read`** — function deleted.  No callers after this patch.
 - **`sbus_ops->pipe_read`** — sbus op slot deleted (or kept and stubbed; tx_loop.c also implements it for the test-loop bus, has to stay if test-loop is preserved).
 - **`queue_work(self->sdio_wq, &self->rx_work)`** at the 3 call sites in `bes2600_sdio.c` (lines 416, 941, 1199) — removed.
 ### §4.2 What gets added
 - **A new `bes2600_bh_handle_rx_skb()`** in bh.c (same shape as Patch C added, same contract block; no longer needs to also wake the bh thread because we ARE the bh thread).
 - **A new helper `bes2600_sdio_read_rx_batch()`** in bes2600_sdio.c, exported, that does what `sdio_rx_work` used to do MINUS the queuing: lock → read ctrl_reg → memcpy_fromio → packets_check → for-each-frame extract+deliver.  Called from bh.
 ### §4.3 What gets rewired
 - **`bes2600_gpio_irq_handler`** in bes2600_sdio.c:413 (the GPIO-IRQ path used when CONFIG_BES2600_USE_GPIO_IRQ is set):  drop `queue_work(self->sdio_wq, &self->rx_work)`; instead call `self->irq_handler(self->irq_priv)` directly (which is `bes2600_irq_handler` in bh.c, bumps `bh_rx` + wakes `bh_wq`).  Matches cw1200_sdio_irq_handler shape.
 - **`bes2600_bh_rx_helper`** (bh.c:961, BES_SDIO_RX_MULTIPLE_ENABLE branch): instead of `pipe_read`-ing one SKB from the (now-gone) rx_queue, call the new `bes2600_sdio_read_rx_batch()` which does the SDIO read AND delivers each frame inline via `bes2600_bh_handle_rx_skb()`.  Returns count delivered, or negative on error.
 - **`bes2600_bh()` outer loop**:  after a successful rx_batch read, the helper signals whether to continue draining (more frames pending) — same shape as today's `BH_RX_CONT_LIMIT=3` outer loop.
 - **`bes2600_gpio_wakeup_mcu(SDIO_RX)`** + **`bes2600_gpio_allow_mcu_sleep(SDIO_RX)`** brackets:  currently called inside sdio_rx_work.  Move into bh thread around the `bes2600_sdio_read_rx_batch()` call.  Same wake-flag bracketing, just from a different thread.
 - **`sdio_wq` workqueue**:  keeps `tx_work` and (briefly) `scan_work`.  Renamed or kept — cosmetic.  Don't touch in this patch.
 ### §4.4 What stays untouched
 - TX path (`sdio_tx_work`, `bes2600_bh_tx_helper`, `wsm_alloc_tx_buffer`).  Independent.
 - WSM protocol layer (`wsm.c`, `wsm_handle_rx`).  Same callees, just from bh thread now.
 - mac80211 RX delivery (`ieee80211_rx_irqsafe`).  That's Patch C2.
 - `BES2600_RX_IN_BH` ifdef gate.  Stays defined; the gated branch is now the only RX path.
 - Symptom-shaped artifacts (asm nop, BUG_ON in hot path) — still deferred, see task #24 post-cleanup.
 ## §5 Shared-state delta table (the v2 lesson, applied)
 Every field `bes2600_bh_handle_rx_skb` mutates directly or transitively, with the v3 protection:
 | field | written by (today) | written by (after v3) | concurrency | required action |
 |---|---|---|---|---|
 | `hw_priv->hw_bufs_used` | bh thread (TX submit + RX confirm), main.c init | **bh thread only** (RX moves into bh) | single-writer | none — `int` is fine, race-free by construction |
 | `hw_priv->hw_bufs_used_vif[i]` | bh thread (TX vif submit + RX vif confirm), main.c init | **bh thread only** | single-writer | none |
 | `hw_priv->wsm_rx_seq[i]` | sdio_rx_work today | bh thread | single-writer | none — moves cleanly between contexts |
 | `hw_priv->wsm_tx_pending[i]` | bh thread (inc on TX submit), bh+sdio_rx_work (dec on RX confirm) | **bh thread only** | single-writer | none |
 | `hw_priv->lmac_mon_timer` / `mcu_mon_timer` | mod_timer / del_timer_sync from bh + sdio_rx_work | bh thread only | timer API safe anyway | none |
 | `hw_priv->wsm_cmd.lock` | spinlock taken inside wsm_handle_rx | same | already protected | none |
 | `priv->bh_evt_wq` wake-up | wsm_release_tx_buffer when count→0 | same | wake_up is concurrency-safe | none |
 | `bes_pwr.lock` (inside bes2600_pwr_clear_busy_event) | bh thread (today) | bh thread | already protected | none |
 | `self->rx_data_cnt` etc. (sbus_priv stats) | sdio_rx_work | bh thread | single-writer | none |
 **Zero fields require new locking.** The architectural pivot eliminates the race v2's atomic_t was working around.
 ## §6 Risks
 1. **bh thread now holds the SDIO bus mutex during read** (currently held by sdio_rx_work).  TX work in the same bh thread is unaffected (sdio_tx_work runs on a separate workqueue and shares the same mutex anyway).  The sdio_lock contention pattern doesn't change.
 2. **Loss of "parallelism" between sdio_rx_work and bh TX**:  sdio_rx_work and bh thread *appeared* to run in parallel today, but both serialize through `bes2600_sdio_lock(self)` for the actual bus operations.  The parallelism was illusory.  Net throughput should not regress.
 3. **bh thread CPU-busy-time per RX batch increases**:  inline SDIO read is the same cost, just charged to bh instead of sdio_wq's worker.  Mitigation: the per-IRQ workqueue dispatch cost (~86/s) is what we trade for it.  Net: -86 dispatches/s, +0 µs per frame.
 4. **Multi-RX coalescing (BES_SDIO_RX_MULTIPLE_NUM=16)** stays.  bes2600_sdio_extract_packets parses the multi-frame buffer same as before, just inline now.  No functional change to chip-side behaviour.
 5. **GPIO wake-flag bracketing**:  `bes2600_gpio_wakeup_mcu(SDIO_RX)` and `bes2600_gpio_allow_mcu_sleep(SDIO_RX)` currently bracket sdio_rx_work.  Move them to bracket the new bh-side read.  If the wake-flag accounting is sub-system-scoped (it is — flag bits per subsystem), this is a clean move.
 6. **IRQ re-enable in bh thread**:  cw1200's bh re-enables IRQ via `__cw1200_irq_enable(priv, 1)` after each round.  bes2600 has the analogous `__bes2600_irq_enable(0/1)` (commented out as the `asm volatile("nop")` symptom in `bh.c:1518-1520`).  This patch does NOT re-engage the commented-out re-enable — that's still task #24's call.  But if the IRQ stays disabled across rounds, we'd never receive the next IRQ.  **Investigate before Phase 6 lands**: where does IRQ re-enable happen in the current bes2600 hot path?  The sdio_func IRQ may be auto-managed by sdio core differently.  Block Phase 6 on this audit.
 7. **Phase 7 wedge resilience**:  if v3 has a different bug shape than v2's race (which it shouldn't, since the race is gone by construction), the wired path lets us collect telemetry from a wedged ohm.
 ## §7 Phase 5 / 6 / 7
 - **Phase 5**: PR on `git.reauktion.de/marfrit/besser` with this artifact.  Specifically request reviewer focus on §6 risk #6 (IRQ re-enable mechanism).
 - **Phase 6**: branch off cleanups (post-F): `bes2600/sdio-rx-no-relay`.  Implement the file changes per §4.  Build, install, smoke-test.
 - **Phase 7**: 
  - First: N=3 stress-ramp **on F module** (post-F pre-v3 baseline).  10 min @ 1, 30 min @ 2, 30 min @ 4 MB/s.  Use wired path for telemetry.
  - Then: install v3 module, identical N=3 ramp.  Compare deltas.
  - Predicted: sdio_rx_work dispatch rate → 0/s (was 86/s).  observed receive lifts toward ≥ 1.0 MB/s sustained.  `_raw_spin_unlock_irqrestore` drops by the rx_queue lock contribution (was 1914/s acquires).
 ## §8 What gets dropped from v2 plan
 - atomic_t prep refactor (`hw_bufs_used` → `atomic_t`): not needed.  Single-writer invariant preserved structurally.  Still a defensible standalone hardening patch *if mainlining bes2600 ever requires defense-in-depth*, but not on the Bug-#5 critical path.
 - `wsm_tx_pending[]` decrement-decision race (v2 risk #2): also moots.  Both sides single-thread under v3.
 - v2 Phase 7's "C-prep should show zero delta" gate: replaced by "v3 should match cw1200's structural shape" gate.
 ## §9 Open question for reviewer
 The big one is §6 risk #6 — IRQ re-enable. cw1200 explicitly does `__cw1200_irq_enable(priv, 1)` from bh after each round; bes2600 has the call **commented out** with an `asm volatile("nop")` placeholder.  Either:
 (a) bes2600's SDIO IRQ is level-triggered + auto-acked by SDIO core, so re-enable isn't needed (that would explain the nop).
 (b) The current code happens to work because sdio_rx_work is queued by the IRQ regardless of whether IRQ is "enabled" by the driver-side flag.  After v3 we have to manually re-enable like cw1200 does.
 Need to confirm (a) vs (b) before Phase 6 lands.  Plan to grep for `__bes2600_irq_enable` callsites and trace back to whether it's load-bearing.
 ---
 *Plan written 2026-05-07 by Claude (noether), after Patch F merged and Patch C v2 (PR #10) was superseded by the cw1200 architectural mining finding.  Phase 5 review on PR.  Don't curate.*
@@ -0,0 +1,171 @@
 # Patch C2 — Phase 4 Plan: migrate ieee80211_rx_irqsafe → ieee80211_rx_list
 **Author:** Claude (noether)
 **Status:** Phase 4 — pending Phase 5 PR review before any Phase 6 code.
 **Predecessor:** Patch C v3 (PR #5 merged, +73% throughput, no-relay architecture); Patch D + E + F + G also landed. Cleanups branch tip = 42fd0ce.
 **Task #19 contract**: `ieee80211_rx_list` callable from process context, **requires `local_bh_disable()` + `rcu_read_lock()` wrap**, **cannot mix with `ieee80211_rx_irqsafe()` for the same hardware** → all 6 sites convert in one shot.
 ---
 ## §0 Substrate
 After Patch C v3:
 - bh thread is the sole RX-delivery context (no relay, no sdio_rx_work)
 - Per-frame work runs in process context (sleepable)
 - Single-writer-from-bh invariant covers `hw_bufs_used` and friends
 `ieee80211_rx_irqsafe` is currently called from process context. Per kerneldoc (`include/net/mac80211.h:5399-5411`):
 > **Like ieee80211_rx() but can be called in IRQ context** (internally defers to a tasklet.)
 The tasklet hop is the cost we pay today for delivering each RX frame from process context. `ieee80211_rx_list` is the process-context replacement.
 ## §1 Goal
 Per-frame: skip the tasklet hop. Batch: process multiple SKBs from one SDIO read inside a single `local_bh_disable()`/`rcu_read_lock()` window.
 Phase 1 metric: **RX throughput @ 4 MB/s sender**, with v3 N=3 baseline = 2.352 MB/s. Hypothesis: small to moderate uplift (<10%) from removing the tasklet deferral. Larger improvement would be surprising — if observed, that's a finding to investigate.
 ## §2 Situation
 - 6 call sites in bes2600 currently use `ieee80211_rx_irqsafe`:
  - `ap.c:96` (AP-mode link-id RX queue drain)
  - `sta.c:1487` (link-id rx_queue drain in ?)
  - `txrx.c:1960` (early-data + pm_unsupported branch — Patch E added)
  - `txrx.c:1967` (early-data + LINK_SOFT-not-set branch)
  - `txrx.c:1971` (normal RX path)
  - `wsm.c:2415` (beacon SKB delivery from `bes2600_beacon_handler`?)
 - All 6 must convert together (kerneldoc: cannot mix per hardware)
 - bh thread is single-writer post-v3 → `_rx_list`'s "calls must be synchronized" satisfied trivially
 - bh thread is process context → `_rx_list` callable
 ## §3 Baseline (carry forward)
 From `notes/phase7-v3-2026-05-07.md` (v3 N=3 ramp, Phase 7 closed):
 | metric | v3 fresh-chip N=3 |
 |---|---|
 | RX throughput @ 4 MB/s | mean 2.352 MB/s, min 2.102, max 2.590 |
 | sdio_rx_work dispatches | 0/s |
 | bh_work redispatches | 0 |
 Phase 7 of C2 will compare against this baseline.
 ## §4 Plan
 ### §4.1 Conversion shape
 Per call site:
 ```c
 ieee80211_rx_irqsafe(priv->hw, skb);
 ```
 becomes:
 ```c
 ieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list);
 ```
 Where `priv->rx_list` is a `struct list_head` initialized once.
 **Wrap requirement:** `local_bh_disable()` + `rcu_read_lock()` must be held across the call. Per the kerneldoc, that's also needed for batch correctness.
 ### §4.2 Wrap placement (the design decision)
 **Option A — per-call wrap.** Wrap each individual `ieee80211_rx_list()` call. Simple but loses the batch benefit (each call's wrap+unwrap costs as much as the avoided tasklet defer).
 **Option B — per-batch wrap.** Wrap the OUTER frame-iteration loop (e.g., the `for` in `bes2600_sdio_extract_packets`). All 16 SKBs from one SDIO read get delivered inside one wrap. This is the upstream-idiomatic pattern (mt76, iwl_pcie do this).
 Choosing **Option B**. Concrete shape:
 - `bes2600_sdio_read_rx_batch` (the per-SDIO-batch entry point added in Patch C v3) wraps the read+extract+deliver phase:
  ```c
  rcu_read_lock();
  local_bh_disable();
  // existing read + extract_packets that calls bh_handle_rx_skb per frame
  local_bh_enable();
  rcu_read_unlock();
  ```
 - Inside `bes2600_bh_handle_rx_skb`, the single `ieee80211_rx_irqsafe` swap becomes `ieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list)`.
 - The OTHER 5 call sites (in `ap.c`, `sta.c`, `txrx.c`'s branches, `wsm.c`) need the same treatment, but they're called from the bh thread (post-v3) so they're already in the right context. Each gets its own narrow wrap (Option A applied selectively because those paths process one frame at a time, not a batch).
 ### §4.3 The `rx_list` field
 Add `struct list_head rx_list` to either `struct bes2600_common` (driver-wide) or `struct bes2600_vif` (per-vif). Per-vif is cleaner because the existing `priv->hw` parameter implies vif scope.
 `INIT_LIST_HEAD(&priv->rx_list)` at vif setup; no teardown needed (mac80211 owns the SKBs once handed off).
 **Open question for reviewer:** does the `rx_list` need to be drained explicitly after the batch (e.g., via a `list_for_each_entry_safe` + `netif_receive_skb_list_internal`)? Looking at mainline mt76 / iwl_pcie usage will clarify. Phase 6 must answer this before code lands.
 ### §4.4 What will NOT be touched
 - The 6 call sites change atomically (all-or-nothing per kerneldoc) — no per-site progressive migration
 - `wsm.c:2415` beacon path: same conversion shape, but beacon delivery is once-per-beacon-interval (not hot path); could stay `_irqsafe` if upstream allows mixing per-SKB-type. Re-read kerneldoc carefully — it says "per hardware", not per-call-site, so we can't keep _irqsafe even on the slow paths.
 - bh thread structure (Patch C v3 stands)
 - atomic_t counters from Patch D
 - `pm_unsupported` lock-skip from Patch E
 - mac80211 batch-delivery semantics (mainline owns this; we just call the API)
 ### §4.5 Predicted delta in Phase 3 units
 | metric | predicted |
 |---|---|
 | `rx_irqsafe` tasklet schedule rate | → 0 (function no longer called) |
 | RX throughput @ 4 MB/s sustained | 2.352 → +5-15% (medium confidence) |
 | `_raw_spin_unlock_irqrestore` CPU% | small drop (no tasklet schedule lock contribution) |
 **Honest acknowledgment:** I don't have data on how much the tasklet hop actually costs. The improvement might be smaller than predicted if tasklet defer was already cheap on this kernel. If <2%, Phase 7 says "marginal but no regression" and we ship anyway for upstream-cleanliness.
 ### §4.6 Risks
 1. **`ieee80211_rx_list` semantics surprise.** mainline drivers I have access to (mt76, iwl_pcie) use this via NAPI infrastructure. bes2600 doesn't have NAPI; we're doing process-context-direct. The kerneldoc says callable that way but we should verify a few mainline drivers actually do it. **Phase 6 contract-cite from at least one upstream caller** before code lands.
 2. **`rx_list` lifetime in cross-batch / cross-vif scenarios.** Multiple vifs (P2P_MULTIVIF=y in Makefile) might race on the same hw's `rx_list`. The kerneldoc says "for a single hardware" — the list is per-call destination, which means each call appends to its argument list. Per-vif `rx_list` per-call is the natural shape. No per-hw aggregator needed.
 3. **`local_bh_disable` cost in batch wrap.** Not free. If the batch is small (1-2 SKBs), the wrap might dominate. Estimated breakeven: 2-3 SKBs per wrap. Phase 7 should look at SKB-per-batch distribution to confirm.
 4. **`rcu_read_lock` across SDIO read.** SDIO read can take multi-ms (multi-block transfers). RCU reader-cs across that is fine (no preemption blocked) but it's a longer reader-cs than typical. Verifiable but not a blocker — kerneldoc requires it.
 5. **wsm.c:2415 (beacon) is a different SKB lifecycle** — `hw_priv->beacon` is owned by hw_priv, not allocated per-call. After `_rx_list` consumes it (by passing ownership to mac80211), `hw_priv->beacon` is dangling. **Phase 6 must verify the beacon path either reallocates after delivery or wasn't actually transferring ownership.** Risk #5 is the biggest open question.
 ### §4.7 Phase 5 review handover
 PR on `git.reauktion.de/marfrit/besser` with this artifact. Specifically request reviewer focus on:
 - §4.2 wrap-placement choice (Option B vs A)
 - §4.3 rx_list scoping (per-vif)
 - §4.6 risks #1 (mainline-caller verification) and #5 (beacon path SKB ownership)
 Don't curate.
 ### §4.8 Phase 6 implementation order
 1. Branch off cleanups: `bes2600/rx-list-batch-delivery`
 2. Add `struct list_head rx_list` to `struct bes2600_vif`, `INIT_LIST_HEAD` in vif setup
 3. Convert all 6 call sites: `ieee80211_rx_irqsafe(...)` → `ieee80211_rx_list(...)`
 4. Wrap `bes2600_sdio_read_rx_batch` outer loop with `rcu_read_lock + local_bh_disable / local_bh_enable + rcu_read_unlock`
 5. For the non-bh-thread call sites (ap.c, sta.c, wsm.c beacon): per-call narrow wrap
 6. Verify beacon path in wsm.c:2415 (Risk #5)
 7. Build, install, smoke-test
 8. Phase 7 N=3 stress ramp — compare to v3 baseline
 ### §4.9 Phase 7 protocol (per `feedback_phase7_stress_ramp`)
 - N=3 reps, 30s each at 4 MB/s, fresh-chip (uptime <15 min)
 - Use wired path (`ssh mfritsche@192.168.88.80`) for telemetry
 - Fresh nc listener per rep (per `feedback_rig_failure_is_finding`)
 - Compare: throughput delta + tasklet schedule rate (ftrace `irq:tasklet_*` events)
 - If predicted delta met → close C2 + memory entry
 - If NO delta → marginal patch but no regression; ship for upstream-cleanliness
 ## §5 Out of scope
 - Patch D / E already shipped (PR #7, #8 merged)
 - Patch G already shipped (PR #6 merged)
 - bh.c `#if 0` graveyard removal (Task #24 hygiene)
 - Allwinner `sw_mci_check_r1_ready` (Task #25)
 ## §6 Summary
 C2 is a 6-site mechanical migration with ONE design decision (per-batch wrap), TWO open questions for the reviewer (rx_list draining + beacon path SKB ownership), and SMALL expected throughput delta (<15%). Risk-low, upstream-prep-high. Worth shipping for the kernel.org submission story even if the throughput delta is marginal.
 ---
 *Plan written 2026-05-08 by Claude (noether). Phase 5 review on PR. Phase 6 contingent on review passing.*
@@ -0,0 +1,94 @@
 # Patch C v3 Phase 7 — N=3 verification results
 **Date:** 2026-05-07
 **Module:** `bes2600.ko` srcversion `371C6606B73AF19299228CA` (cleanups+F+v3)
 **Rig:** ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 path for telemetry
 **Stress:** netcat sender from boltzmann, `pv -L 4m` rate cap (4 MB/s), 3-min window per rep
 **Boot:** fresh — uptime 200 s / 391 s / 582 s at rep 1/2/3 starts (all within fresh-chip window before the ~13-min Bug #5 RX-degradation point)
 ---
 ## Results table
 | rep | elapsed (s) | RX bytes | RX MB | MB/s | sdio_rx_work | sdio_tx_work | bes2600_bh_work redispatches |
 |---:|---:|---:|---:|---:|---:|---:|---:|
 | 1 | 180.72 | 447,758,333 | 427.0 | **2.363** | 0 | 368 | 0 |
 | 2 | 180.67 | 490,669,836 | 467.9 | **2.590** | 0 | 20  | 0 |
 | 3 | 180.69 | 398,224,992 | 379.8 | **2.102** | 0 | 39  | 0 |
 **N=3 stats:** mean 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s · max 2.590 MB/s
 ## Comparison to baselines
 ### vs Patch B baseline (`run-20260507-patchC-preflight`, N=1, 5 min @ 4 MB/s, fresh chip)
 | | Patch B | v3 mean | Δ |
 |---|---:|---:|---:|
 | throughput | 1.362 MB/s | 2.352 MB/s | **+73%** |
 ### vs original Bug #5 baseline (`run-20260506-0659-fresh`, N=3, decay over time)
 Bug #5 anchor was 725 / 663 / **75** KB/s — rep 3 saw link-death at ~9 min.
 | | Bug #5 floor (rep 3) | v3 floor (rep 3) | Δ |
 |---|---:|---:|---:|
 | throughput | 0.075 MB/s | 2.102 MB/s | **28× improvement** |
 ### vs Phase 4 v3 plan §4.5 predictions
 | metric | predicted | observed | verdict |
 |---|---|---|---|
 | sdio_rx_work dispatch rate | → 0/s (high confidence) | 0/s all 3 reps | ✅ |
 | `bes2600_bh_work` redispatches | → 0 (high confidence) | 0 all 3 reps | ✅ |
 | observed RX @ 4 MB/s | floor lifts toward ≥ 1 MB/s sustained (medium) | 2.10 MB/s floor | ✅ exceeds prediction |
 | `_raw_spin_unlock_irqrestore` CPU% | 20% → 12-15% (medium) | not measured | deferred — perf-record run can confirm |
 ## Workqueue dispatch rate collapse
 Patch B baseline (per `run-20260507-patchC-preflight`):
 - sdio_rx_work: 86.4/s
 - sdio_tx_work: 276.1/s
 - bes2600_bh_work redispatches: 0
 v3 N=3 mean:
 - **sdio_rx_work: 0.0/s** (function deleted)
 - **sdio_tx_work: 0.8/s** (post-tx queue_work → self->irq_handler call; the chip-side TX driver no longer needs to wake a separate workqueue)
 - bes2600_bh_work redispatches: 0 (preserved invariant; bh thread still single long-lived work item)
 The 99.7% reduction in `sdio_tx_work` dispatch rate is a side-effect of v3's IRQ→bh-direct rewiring: the post-TX `queue_work(self->sdio_wq, &self->rx_work)` call I replaced with `self->irq_handler()` was actually firing more often than I'd assumed (276/s on Patch B). Folding it into the bh wake-up cuts 275/s of workqueue dispatches that weren't doing anything useful.
 ## Risks observed
 - **Bug #5 RX-degradation after ~13-min uptime is independent of v3.** Same scan-failure pattern observed (`wsm_generic_confirm failed for request 0x0007` + `[SCAN] Scan failed (-22)` every 300s) on v3 as on Patch B. v3 did NOT fix Bug #5; it fixed the v2-race that was ALSO present. RX-degradation is firmware-side, likely needs a separate campaign.
 - **N=3 reps were 3 minutes each instead of 5** to fit within the fresh-chip window. Direct comparison with Patch B's 5-min baseline is approximate; chip-side throughput in 3-min vs 5-min should be similar given the bug fires on uptime, not on transferred-bytes.
 - **No regression observed in 3×3 min = 9 min of stress.** The v2 race that wedged Patch C v1 within 13 s did NOT reproduce. v3's structural fix held.
 ## Phase 8 — lesson distilled
 **The cw1200 mining was decisive.** Patch C v2 (atomic_t prep + direct-deliver on top of relay, PR #10 closed) would have worked correctly but kept the structural relay that was the source of the race. v3 removed the relay entirely — restoring single-writer-from-bh invariant by construction, no atomic_t needed, and delivering a 73% throughput improvement as side benefit.
 Without the cw1200 history mine (`~/src/linux-rockchip`, 228 cw1200 commits over 16 years), v2's atomic_t prep would have shipped. The structural fix is upstream-grade because it matches the reference driver. v2's atomic_t wrapper would have been bes2600-specific bookkeeping with no upstream parallel — defensible as a fix, but worse to maintain.
 **Memory entry:** *When you have an upstream-ancestral driver still in the kernel tree, mine its bug-fix history before patching the inherited fork. The architectural answer may already be there; you just have to look.*
 ## Receipts checklist (Phase 7 done)
 - [x] N=3 reps captured at fresh-chip uptime (200/391/582 s)
 - [x] Same instrumentation pre/post (workqueue ftrace + rx_packets/rx_bytes counters)
 - [x] Predicted delta matched (sdio_rx_work → 0; bh redispatches → 0; throughput ≥ 1 MB/s sustained)
 - [x] No WARN/BUG/oops during stress on any rep
 - [x] Wired-rig telemetry collection (would have caught a wedge if v3 had one)
 - [x] Receiver `nc` listener restarted fresh per rep (avoiding rep-2-style TCP race)
 - [x] Stress-ramp memory honored: not steady-state low-rate; saw 4 MB/s saturate
 ## Out-of-scope follow-ups
 - Patch C2 — `ieee80211_rx_list` batch delivery — gated on Task #19 kerneldoc verification.
 - Patch D — ba_lock atomicization — independent.
 - Patch E — ps_state_lock skip when pm_unsupported — independent.
 - Bug #5 RX-degradation after 13-min uptime — separate campaign, scan-failure pattern is the entry point.
 - Task #24 — observe whether `bh.c` `asm volatile("nop")` / commented-out `__bes2600_irq_enable(1)` / BUG_ON in hot path are still load-bearing post-v3. Already partially answered: `__bes2600_irq_enable` is a stub (PR #11 comment). The other artifacts can be re-read fresh.
 ---
 *Phase 7 results captured 2026-05-07 by Claude (noether). v3 (PR #5) closes Patch C campaign with structural improvement + race fix + measurable throughput win.*
Author	SHA1	Message	Date
claude-noether	722434414a	notes: Patch C2 Phase 4 plan — ieee80211_rx_irqsafe to ieee80211_rx_list After Patch C v3 / D / E / F / G all merged, the remaining cleanup target is the per-RX-frame tasklet defer that ieee80211_rx_irqsafe introduces. Patch C2 migrates all 6 call sites in bes2600 to ieee80211_rx_list, the process-context API verified per the kerneldoc audit (Task #19, mainline include/net/mac80211.h:5324-5345). Key constraints from kerneldoc: - cannot mix _list and _irqsafe for the same hardware (=> all 6 sites convert atomically) - requires local_bh_disable + rcu_read_lock wrap - calls must be synchronized for a single hardware (=> bh-thread-as-sole-RX-context post-v3 satisfies trivially) Plan §4.2 design decision: per-batch wrap (Option B), wrapping bes2600_sdio_read_rx_batch outer loop, rather than per-call wrap. Captures the actual batch benefit. Open questions for the Phase 5 reviewer: 1. rx_list draining semantics — does mainline expect explicit netif_receive_skb_list at end-of-batch, or does mac80211 internal-deliver? Need to verify by reading mt76 / iwl_pcie usage before Phase 6 lands. 2. beacon path (wsm.c:2415) SKB ownership — hw_priv->beacon is long-lived; after _rx_list consumes it, the field would be dangling. Audit before Phase 6. Predicted throughput delta: +5-15% over v3 N=3 baseline (2.352 MB/s), medium confidence. Smaller-than-expected delta = "marginal but no regression, ship for upstream-cleanliness". Phase 7 N=3 ramp uses wired enu1 path + per-rep fresh nc listener per the rig-failure-is-finding lesson.	2026-05-08 00:42:50 +02:00
marfrit	fc88ff41c3	notes: Bug #5 RX-degradation campaign — Phase 0 plan (#13 )	2026-05-07 21:52:14 +00:00
marfrit	fde41fcdd4	notes: Patch C v3 Phase 7 N=3 — +73% throughput, race fix verified (#12 )	2026-05-07 21:51:28 +00:00
claude-noether	6bae531917	notes: Bug #5 RX-degradation campaign — Phase 0 plan + research question After Patch C v3 closed (PR #5 merged, Phase 7 N=3 verified at +73% throughput vs Patch B baseline), the post-13-min RX-degradation pattern remains. Reproduces on Patch B, F, and v3 alike — independent of the relay/race issues v3 addressed. Side-effect that was masked by the throughput floor while v2's race was the dominant variable. Research question (locked): Why does the bes2600 RX path collapse from ~2 MB/s sustained @ fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic wsm_generic_confirm failed for request 0x0007 + ieee80211 phy0: [SCAN] Scan failed (-22) every 300 s in the intervening window? Phase 0 protocol: - long-capture rig armed on ohm at uptime 0 (fresh boot 23:13 CEST) - ftrace events: workqueue, mac80211, cfg80211, mmc, sdhci, power - iw event (cfg80211 reason codes), dmesg follow, per-30s netdev counter snap, 5 stress probes at T+5/10/15/20/25 min Phase 0 will: - re-anchor the predecessor data via the long capture (in-session N=1; re-run if anomalous) - characterize state transitions (first scan-fail, first throughput drop) via cfg80211/mac80211 ftrace + iw event correlation - feed Phase 1 metric formulation Mechanism candidates (Phase 4 will discriminate): 1. Firmware-side resource exhaustion (per-scan accumulator) 2. NetworkManager scan-fail recovery loop competing with data 3. AP-side rate limiting / fairness probation 4. PSM state machine deadlock (c7 latch stale) 5. SDIO bus retune interaction 6. Power-management busy-event accumulator leak Out of scope: Patch C2/D/E, higher-rate ramp, reproducing on different APs. Independent campaign from Patch C closure.	2026-05-07 23:23:31 +02:00
claude-noether	3a38286e6f	notes: Patch C v3 Phase 7 N=3 results — +73% throughput, race fix verified N=3 stress reps on ohm with v3 module (srcversion 371C6606B73AF19299228CA), 3 min @ 4 MB/s each, all within fresh-chip uptime window (200/391/582 s). \| rep \| MB/s \| sdio_rx_work \| bh_work redispatches \| \|----:\|----:\|-:\|-:\| \| 1 \| 2.363 \| 0 \| 0 \| \| 2 \| 2.590 \| 0 \| 0 \| \| 3 \| 2.102 \| 0 \| 0 \| N=3 mean: 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s. vs Patch B baseline (1.362 MB/s, run-20260507-patchC-preflight): +73%. vs original Bug #5 floor (75 KB/s rep 3 death): 28× improvement. Plan §4.5 prediction verified: - sdio_rx_work dispatch rate: 86.4/s -> 0/s (function deleted) - bes2600_bh_work redispatches: 0 (preserved invariant) - observed receive @ 4 MB/s: floor lifts toward >= 1 MB/s (exceeded — floor is 2.10 MB/s) Bonus finding: sdio_tx_work dispatch rate dropped from 276.1/s to 0.8/s. The post-tx queue_work(rx_work) call I rewired to self->irq_handler() was actually firing more often than predicted; folding it into bh-wake-up cuts ~99.7% of the workqueue dispatches. No WARN/BUG/oops on any rep — the v2 race that wedged Patch C v1 within 13 s under stress did NOT reproduce on v3. Phase 8 lesson distilled as feedback_mine_upstream_ancestor memory: when patching a fork-from-upstream driver, mine the ancestor's fix history BEFORE writing fixes from scratch. cw1200 mining drove the structural pivot from v2's atomic_t wrapper to v3's no-relay architecture. Without the mine, we'd have shipped v2. Phase 7 receipts checklist met (N=3, fresh-chip, identical instrumentation, predicted delta verified, no-WARN under stress).	2026-05-07 23:08:51 +02:00
marfrit	1e408c9d33	Merge pull request 'notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200' (#11 ) from claude-noether-9 into main Reviewed-on: #11	2026-05-07 19:41:44 +00:00
claude-noether	d01400140b	notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200 Supersedes v2 (PR #10). cw1200 mining (~/src/linux-rockchip, 228 cw1200 commits) confirmed: upstream cw1200 has no sdio_rx_work workqueue at all. IRQ handler bumps bh_rx + wakes bh_wq; bh thread does the SDIO read inline via cw1200_bh_rx_helper. Single thread = single writer for hw_bufs_used = no race by construction. Same int hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream. v3 brings bes2600 into that shape: - delete sdio_rx_work, self->rx_work, self->rx_queue, self->rx_queue_lock, bes2600_sdio_pipe_read - GPIO IRQ handler calls self->irq_handler directly (matches cw1200_sdio_irq_handler shape) - bes2600_bh_rx_helper's BES_SDIO_RX_MULTIPLE_ENABLE branch replaced with inline SDIO read + extract_packets + per-skb delivery via new bes2600_bh_handle_rx_skb() - GPIO wake-flag bracketing moves into bh thread §5 shared-state delta table (the v2 lesson, applied): zero fields require new locking. hw_bufs_used / hw_bufs_used_vif / wsm_tx_pending all stay single-writer-from-bh. v2's atomic_t prep is mooted. §6 risk #6 is the open question for reviewer: bes2600's __bes2600_irq_enable(1) call is commented out in the BH-loop done: label with an asm volatile("nop") in its place. Either SDIO IRQ is auto-managed (so commenting out is fine) or the current code relies on sdio_rx_work being queued regardless of driver-side IRQ flag. Block Phase 6 on this audit. Patch F (PR #4 merged) is the new baseline. v3 will branch off F-merged cleanups. Phase 7 N=3 stress ramp uses wired enu1 path (192.168.88.80) for wedge-resilient telemetry collection.	2026-05-07 21:36:15 +02:00
marfrit	993117a108	Merge pull request 'notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver (re-after-failure)' (#10 ) from claude-noether-8 into main Question - you said earlier, the driver is a search-and-replace CW12xx driver. Did the CW12xx evolve since this "fork"? If so, are there lessons that can be learned from the CW12xx driver in it's nowadays state? Reviewed-on: #10	2026-05-07 18:56:12 +00:00