Supersedes v2 (PR #10). cw1200 mining (~/src/linux-rockchip, 228 cw1200 commits) confirmed: upstream cw1200 has no sdio_rx_work workqueue at all. IRQ handler bumps bh_rx + wakes bh_wq; bh thread does the SDIO read inline via cw1200_bh_rx_helper. Single thread = single writer for hw_bufs_used = no race by construction. Same int hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream. v3 brings bes2600 into that shape: - delete sdio_rx_work, self->rx_work, self->rx_queue, self->rx_queue_lock, bes2600_sdio_pipe_read - GPIO IRQ handler calls self->irq_handler directly (matches cw1200_sdio_irq_handler shape) - bes2600_bh_rx_helper's BES_SDIO_RX_MULTIPLE_ENABLE branch replaced with inline SDIO read + extract_packets + per-skb delivery via new bes2600_bh_handle_rx_skb() - GPIO wake-flag bracketing moves into bh thread §5 shared-state delta table (the v2 lesson, applied): zero fields require new locking. hw_bufs_used / hw_bufs_used_vif / wsm_tx_pending all stay single-writer-from-bh. v2's atomic_t prep is mooted. §6 risk #6 is the open question for reviewer: bes2600's __bes2600_irq_enable(1) call is commented out in the BH-loop done: label with an asm volatile("nop") in its place. Either SDIO IRQ is auto-managed (so commenting out is fine) or the current code relies on sdio_rx_work being queued regardless of driver-side IRQ flag. Block Phase 6 on this audit. Patch F (PR #4 merged) is the new baseline. v3 will branch off F-merged cleanups. Phase 7 N=3 stress ramp uses wired enu1 path (192.168.88.80) for wedge-resilient telemetry collection.
11 KiB
Patch C v3 — Phase 4 Plan: drop sdio_rx_work, match cw1200 architecture
Author: Claude (noether)
Status: Phase 4 v3 — supersedes v2 (PR #10) after cw1200 mainline survey showed the race-free path is structural, not lock-based.
Decision: drop the sdio_rx_work workqueue entirely; SDIO IRQ wakes bh_wq; bh thread does the SDIO read inline. Restores single-writer-from-bh invariant on hw_bufs_used by construction. No atomic_t prep needed.
§0 Why v3 supersedes v2
PR #10's plan was: convert hw_bufs_used etc. to atomic_t (prep), then direct-deliver from sdio_rx_work (structural). That was a workaround for the race that only existed because of the relay.
The cw1200 mining (~/src/linux-rockchip, 228 cw1200 commits) showed the upstream answer: there is no relay. cw1200's IRQ handler bumps bh_rx and wakes the bh thread; the bh thread does the SDIO read itself inside cw1200_bh_rx_helper (drivers/net/wireless/st/cw1200/bh.c:233). Single thread = single writer for hw_bufs_used = no race. Same int hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream because it never needed to be.
Patch C v3 brings bes2600 into that shape. The structural simplification is bigger than v2's diff but lands the right architecture in one move.
§1 Goal
Same as Patch C v2 §1: ≥ 1 MB/s sustained receive @ 4 MB/s sender, < 15 % _raw_spin_unlock_irqrestore CPU%, no 30-min cascade to link-death. Stretch toward Phase 1's full 2 MB/s once Patch C2 (rx_list batch) lands separately.
§2 Situation
- Cleanups branch is at Patch F merged (commit
b717251). All Phase 5 reviews of the F series merged via PR #4. - ohm rebooted with F module live (srcversion
A9438692D6A8698F92AEEA1) — F is the new baseline for Patch C v3 Phase 7 comparison. - Wired path
enu1at192.168.88.80survives bes2600 wedges; lmcpohmstill goes through wlan0. Phase 7 telemetry collection over enu1. - Reboot-permission override active (ohm dev-allocated; I can
sudo rebootdirectly —feedback_user_pushes_reboot_buttonoverride clause).
§3 Baseline measurements
Carry forward from run-20260507-patchC-preflight/baseline.tsv (N=1, F-less Patch B module):
| metric | value |
|---|---|
| observed receive @ 4 MB/s | 1.362 MB/s |
| sdio_rx_work dispatches | 86.4/s = 90.3 per 1000 RX packets |
| sdio_tx_work dispatches | 276.1/s |
| bes2600_bh_work redispatches | 0 (single long-lived) |
Phase 6 prereq: capture an N=3 baseline ON THE F MODULE before Patch C v3 code lands. Same instrumentation, same stress ramp. This is the post-F / pre-v3 reference. Without it, Phase 7's delta is C+F vs B+nothing — confounded.
§4 Plan v3
§4.1 What gets eliminated
sdio_rx_work(bes2600_sdio.c:829) — function deleted. No longer queued, no longer runs.self->rx_workwork_struct — field deleted fromstruct sbus_priv.INIT_WORKremoved.self->rx_queue+self->rx_queue_lock— fields deleted.skb_queue_head_initremoved. No SKB ever queued there.bes2600_sdio_pipe_read— function deleted. No callers after this patch.sbus_ops->pipe_read— sbus op slot deleted (or kept and stubbed; tx_loop.c also implements it for the test-loop bus, has to stay if test-loop is preserved).queue_work(self->sdio_wq, &self->rx_work)at the 3 call sites inbes2600_sdio.c(lines 416, 941, 1199) — removed.
§4.2 What gets added
- A new
bes2600_bh_handle_rx_skb()in bh.c (same shape as Patch C added, same contract block; no longer needs to also wake the bh thread because we ARE the bh thread). - A new helper
bes2600_sdio_read_rx_batch()in bes2600_sdio.c, exported, that does whatsdio_rx_workused to do MINUS the queuing: lock → read ctrl_reg → memcpy_fromio → packets_check → for-each-frame extract+deliver. Called from bh.
§4.3 What gets rewired
bes2600_gpio_irq_handlerin bes2600_sdio.c:413 (the GPIO-IRQ path used when CONFIG_BES2600_USE_GPIO_IRQ is set): dropqueue_work(self->sdio_wq, &self->rx_work); instead callself->irq_handler(self->irq_priv)directly (which isbes2600_irq_handlerin bh.c, bumpsbh_rx+ wakesbh_wq). Matches cw1200_sdio_irq_handler shape.bes2600_bh_rx_helper(bh.c:961, BES_SDIO_RX_MULTIPLE_ENABLE branch): instead ofpipe_read-ing one SKB from the (now-gone) rx_queue, call the newbes2600_sdio_read_rx_batch()which does the SDIO read AND delivers each frame inline viabes2600_bh_handle_rx_skb(). Returns count delivered, or negative on error.bes2600_bh()outer loop: after a successful rx_batch read, the helper signals whether to continue draining (more frames pending) — same shape as today'sBH_RX_CONT_LIMIT=3outer loop.bes2600_gpio_wakeup_mcu(SDIO_RX)+bes2600_gpio_allow_mcu_sleep(SDIO_RX)brackets: currently called inside sdio_rx_work. Move into bh thread around thebes2600_sdio_read_rx_batch()call. Same wake-flag bracketing, just from a different thread.sdio_wqworkqueue: keepstx_workand (briefly)scan_work. Renamed or kept — cosmetic. Don't touch in this patch.
§4.4 What stays untouched
- TX path (
sdio_tx_work,bes2600_bh_tx_helper,wsm_alloc_tx_buffer). Independent. - WSM protocol layer (
wsm.c,wsm_handle_rx). Same callees, just from bh thread now. - mac80211 RX delivery (
ieee80211_rx_irqsafe). That's Patch C2. BES2600_RX_IN_BHifdef gate. Stays defined; the gated branch is now the only RX path.- Symptom-shaped artifacts (asm nop, BUG_ON in hot path) — still deferred, see task #24 post-cleanup.
§5 Shared-state delta table (the v2 lesson, applied)
Every field bes2600_bh_handle_rx_skb mutates directly or transitively, with the v3 protection:
| field | written by (today) | written by (after v3) | concurrency | required action |
|---|---|---|---|---|
hw_priv->hw_bufs_used |
bh thread (TX submit + RX confirm), main.c init | bh thread only (RX moves into bh) | single-writer | none — int is fine, race-free by construction |
hw_priv->hw_bufs_used_vif[i] |
bh thread (TX vif submit + RX vif confirm), main.c init | bh thread only | single-writer | none |
hw_priv->wsm_rx_seq[i] |
sdio_rx_work today | bh thread | single-writer | none — moves cleanly between contexts |
hw_priv->wsm_tx_pending[i] |
bh thread (inc on TX submit), bh+sdio_rx_work (dec on RX confirm) | bh thread only | single-writer | none |
hw_priv->lmac_mon_timer / mcu_mon_timer |
mod_timer / del_timer_sync from bh + sdio_rx_work | bh thread only | timer API safe anyway | none |
hw_priv->wsm_cmd.lock |
spinlock taken inside wsm_handle_rx | same | already protected | none |
priv->bh_evt_wq wake-up |
wsm_release_tx_buffer when count→0 | same | wake_up is concurrency-safe | none |
bes_pwr.lock (inside bes2600_pwr_clear_busy_event) |
bh thread (today) | bh thread | already protected | none |
self->rx_data_cnt etc. (sbus_priv stats) |
sdio_rx_work | bh thread | single-writer | none |
Zero fields require new locking. The architectural pivot eliminates the race v2's atomic_t was working around.
§6 Risks
- bh thread now holds the SDIO bus mutex during read (currently held by sdio_rx_work). TX work in the same bh thread is unaffected (sdio_tx_work runs on a separate workqueue and shares the same mutex anyway). The sdio_lock contention pattern doesn't change.
- Loss of "parallelism" between sdio_rx_work and bh TX: sdio_rx_work and bh thread appeared to run in parallel today, but both serialize through
bes2600_sdio_lock(self)for the actual bus operations. The parallelism was illusory. Net throughput should not regress. - bh thread CPU-busy-time per RX batch increases: inline SDIO read is the same cost, just charged to bh instead of sdio_wq's worker. Mitigation: the per-IRQ workqueue dispatch cost (~86/s) is what we trade for it. Net: -86 dispatches/s, +0 µs per frame.
- Multi-RX coalescing (BES_SDIO_RX_MULTIPLE_NUM=16) stays. bes2600_sdio_extract_packets parses the multi-frame buffer same as before, just inline now. No functional change to chip-side behaviour.
- GPIO wake-flag bracketing:
bes2600_gpio_wakeup_mcu(SDIO_RX)andbes2600_gpio_allow_mcu_sleep(SDIO_RX)currently bracket sdio_rx_work. Move them to bracket the new bh-side read. If the wake-flag accounting is sub-system-scoped (it is — flag bits per subsystem), this is a clean move. - IRQ re-enable in bh thread: cw1200's bh re-enables IRQ via
__cw1200_irq_enable(priv, 1)after each round. bes2600 has the analogous__bes2600_irq_enable(0/1)(commented out as theasm volatile("nop")symptom inbh.c:1518-1520). This patch does NOT re-engage the commented-out re-enable — that's still task #24's call. But if the IRQ stays disabled across rounds, we'd never receive the next IRQ. Investigate before Phase 6 lands: where does IRQ re-enable happen in the current bes2600 hot path? The sdio_func IRQ may be auto-managed by sdio core differently. Block Phase 6 on this audit. - Phase 7 wedge resilience: if v3 has a different bug shape than v2's race (which it shouldn't, since the race is gone by construction), the wired path lets us collect telemetry from a wedged ohm.
§7 Phase 5 / 6 / 7
- Phase 5: PR on
git.reauktion.de/marfrit/besserwith this artifact. Specifically request reviewer focus on §6 risk #6 (IRQ re-enable mechanism). - Phase 6: branch off cleanups (post-F):
bes2600/sdio-rx-no-relay. Implement the file changes per §4. Build, install, smoke-test. - Phase 7:
- First: N=3 stress-ramp on F module (post-F pre-v3 baseline). 10 min @ 1, 30 min @ 2, 30 min @ 4 MB/s. Use wired path for telemetry.
- Then: install v3 module, identical N=3 ramp. Compare deltas.
- Predicted: sdio_rx_work dispatch rate → 0/s (was 86/s). observed receive lifts toward ≥ 1.0 MB/s sustained.
_raw_spin_unlock_irqrestoredrops by the rx_queue lock contribution (was 1914/s acquires).
§8 What gets dropped from v2 plan
- atomic_t prep refactor (
hw_bufs_used→atomic_t): not needed. Single-writer invariant preserved structurally. Still a defensible standalone hardening patch if mainlining bes2600 ever requires defense-in-depth, but not on the Bug-#5 critical path. wsm_tx_pending[]decrement-decision race (v2 risk #2): also moots. Both sides single-thread under v3.- v2 Phase 7's "C-prep should show zero delta" gate: replaced by "v3 should match cw1200's structural shape" gate.
§9 Open question for reviewer
The big one is §6 risk #6 — IRQ re-enable. cw1200 explicitly does __cw1200_irq_enable(priv, 1) from bh after each round; bes2600 has the call commented out with an asm volatile("nop") placeholder. Either:
(a) bes2600's SDIO IRQ is level-triggered + auto-acked by SDIO core, so re-enable isn't needed (that would explain the nop). (b) The current code happens to work because sdio_rx_work is queued by the IRQ regardless of whether IRQ is "enabled" by the driver-side flag. After v3 we have to manually re-enable like cw1200 does.
Need to confirm (a) vs (b) before Phase 6 lands. Plan to grep for __bes2600_irq_enable callsites and trace back to whether it's load-bearing.
Plan written 2026-05-07 by Claude (noether), after Patch F merged and Patch C v2 (PR #10) was superseded by the cw1200 architectural mining finding. Phase 5 review on PR. Don't curate.