Files

T

claude-noether d01400140b notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200

Supersedes v2 (PR #10).  cw1200 mining (~/src/linux-rockchip, 228
cw1200 commits) confirmed: upstream cw1200 has no sdio_rx_work
workqueue at all.  IRQ handler bumps bh_rx + wakes bh_wq; bh thread
does the SDIO read inline via cw1200_bh_rx_helper.  Single thread =
single writer for hw_bufs_used = no race by construction.  Same int
hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream.

v3 brings bes2600 into that shape:

  - delete sdio_rx_work, self->rx_work, self->rx_queue,
    self->rx_queue_lock, bes2600_sdio_pipe_read
  - GPIO IRQ handler calls self->irq_handler directly (matches
    cw1200_sdio_irq_handler shape)
  - bes2600_bh_rx_helper's BES_SDIO_RX_MULTIPLE_ENABLE branch
    replaced with inline SDIO read + extract_packets + per-skb
    delivery via new bes2600_bh_handle_rx_skb()
  - GPIO wake-flag bracketing moves into bh thread

§5 shared-state delta table (the v2 lesson, applied):  zero fields
require new locking.  hw_bufs_used / hw_bufs_used_vif / wsm_tx_pending
all stay single-writer-from-bh.  v2's atomic_t prep is mooted.

§6 risk #6 is the open question for reviewer:  bes2600's
__bes2600_irq_enable(1) call is commented out in the BH-loop done:
label with an asm volatile("nop") in its place.  Either SDIO IRQ
is auto-managed (so commenting out is fine) or the current code
relies on sdio_rx_work being queued regardless of driver-side IRQ
flag.  Block Phase 6 on this audit.

Patch F (PR #4 merged) is the new baseline.  v3 will branch off
F-merged cleanups.  Phase 7 N=3 stress ramp uses wired enu1 path
(192.168.88.80) for wedge-resilient telemetry collection.

2026-05-07 21:36:15 +02:00

11 KiB

Raw Blame History

Patch C v3 — Phase 4 Plan: drop sdio_rx_work, match cw1200 architecture

Author: Claude (noether) Status: Phase 4 v3 — supersedes v2 (PR #10) after cw1200 mainline survey showed the race-free path is structural, not lock-based. Decision: drop the sdio_rx_work workqueue entirely; SDIO IRQ wakes bh_wq; bh thread does the SDIO read inline. Restores single-writer-from-bh invariant on hw_bufs_used by construction. No atomic_t prep needed.

§0 Why v3 supersedes v2

PR #10's plan was: convert hw_bufs_used etc. to atomic_t (prep), then direct-deliver from sdio_rx_work (structural). That was a workaround for the race that only existed because of the relay.

The cw1200 mining (~/src/linux-rockchip, 228 cw1200 commits) showed the upstream answer: there is no relay. cw1200's IRQ handler bumps bh_rx and wakes the bh thread; the bh thread does the SDIO read itself inside cw1200_bh_rx_helper (drivers/net/wireless/st/cw1200/bh.c:233). Single thread = single writer for hw_bufs_used = no race. Same int hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream because it never needed to be.

Patch C v3 brings bes2600 into that shape. The structural simplification is bigger than v2's diff but lands the right architecture in one move.

§1 Goal

Same as Patch C v2 §1: ≥ 1 MB/s sustained receive @ 4 MB/s sender, < 15 % _raw_spin_unlock_irqrestore CPU%, no 30-min cascade to link-death. Stretch toward Phase 1's full 2 MB/s once Patch C2 (rx_list batch) lands separately.

§2 Situation

Cleanups branch is at Patch F merged (commit b717251). All Phase 5 reviews of the F series merged via PR #4.
ohm rebooted with F module live (srcversion A9438692D6A8698F92AEEA1) — F is the new baseline for Patch C v3 Phase 7 comparison.
Wired path enu1 at 192.168.88.80 survives bes2600 wedges; lmcp ohm still goes through wlan0. Phase 7 telemetry collection over enu1.
Reboot-permission override active (ohm dev-allocated; I can sudo reboot directly — feedback_user_pushes_reboot_button override clause).

§3 Baseline measurements

Carry forward from run-20260507-patchC-preflight/baseline.tsv (N=1, F-less Patch B module):

metric	value
observed receive @ 4 MB/s	1.362 MB/s
sdio_rx_work dispatches	86.4/s = 90.3 per 1000 RX packets
sdio_tx_work dispatches	276.1/s
bes2600_bh_work redispatches	0 (single long-lived)

Phase 6 prereq: capture an N=3 baseline ON THE F MODULE before Patch C v3 code lands. Same instrumentation, same stress ramp. This is the post-F / pre-v3 reference. Without it, Phase 7's delta is C+F vs B+nothing — confounded.

§4 Plan v3

§4.1 What gets eliminated

sdio_rx_work (bes2600_sdio.c:829) — function deleted. No longer queued, no longer runs.
self->rx_work work_struct — field deleted from struct sbus_priv. INIT_WORK removed.
self->rx_queue + self->rx_queue_lock — fields deleted. skb_queue_head_init removed. No SKB ever queued there.
bes2600_sdio_pipe_read — function deleted. No callers after this patch.
sbus_ops->pipe_read — sbus op slot deleted (or kept and stubbed; tx_loop.c also implements it for the test-loop bus, has to stay if test-loop is preserved).
queue_work(self->sdio_wq, &self->rx_work) at the 3 call sites in bes2600_sdio.c (lines 416, 941, 1199) — removed.

§4.2 What gets added

A new bes2600_bh_handle_rx_skb() in bh.c (same shape as Patch C added, same contract block; no longer needs to also wake the bh thread because we ARE the bh thread).
A new helper bes2600_sdio_read_rx_batch() in bes2600_sdio.c, exported, that does what sdio_rx_work used to do MINUS the queuing: lock → read ctrl_reg → memcpy_fromio → packets_check → for-each-frame extract+deliver. Called from bh.

§4.3 What gets rewired

bes2600_gpio_irq_handler in bes2600_sdio.c:413 (the GPIO-IRQ path used when CONFIG_BES2600_USE_GPIO_IRQ is set): drop queue_work(self->sdio_wq, &self->rx_work); instead call self->irq_handler(self->irq_priv) directly (which is bes2600_irq_handler in bh.c, bumps bh_rx + wakes bh_wq). Matches cw1200_sdio_irq_handler shape.
bes2600_bh_rx_helper (bh.c:961, BES_SDIO_RX_MULTIPLE_ENABLE branch): instead of pipe_read-ing one SKB from the (now-gone) rx_queue, call the new bes2600_sdio_read_rx_batch() which does the SDIO read AND delivers each frame inline via bes2600_bh_handle_rx_skb(). Returns count delivered, or negative on error.
bes2600_bh() outer loop: after a successful rx_batch read, the helper signals whether to continue draining (more frames pending) — same shape as today's BH_RX_CONT_LIMIT=3 outer loop.
bes2600_gpio_wakeup_mcu(SDIO_RX) + bes2600_gpio_allow_mcu_sleep(SDIO_RX) brackets: currently called inside sdio_rx_work. Move into bh thread around the bes2600_sdio_read_rx_batch() call. Same wake-flag bracketing, just from a different thread.
sdio_wq workqueue: keeps tx_work and (briefly) scan_work. Renamed or kept — cosmetic. Don't touch in this patch.

§4.4 What stays untouched

TX path (sdio_tx_work, bes2600_bh_tx_helper, wsm_alloc_tx_buffer). Independent.
WSM protocol layer (wsm.c, wsm_handle_rx). Same callees, just from bh thread now.
mac80211 RX delivery (ieee80211_rx_irqsafe). That's Patch C2.
BES2600_RX_IN_BH ifdef gate. Stays defined; the gated branch is now the only RX path.
Symptom-shaped artifacts (asm nop, BUG_ON in hot path) — still deferred, see task #24 post-cleanup.

§5 Shared-state delta table (the v2 lesson, applied)

Every field bes2600_bh_handle_rx_skb mutates directly or transitively, with the v3 protection:

field	written by (today)	written by (after v3)	concurrency	required action
`hw_priv->hw_bufs_used`	bh thread (TX submit + RX confirm), main.c init	bh thread only (RX moves into bh)	single-writer	none — `int` is fine, race-free by construction
`hw_priv->hw_bufs_used_vif[i]`	bh thread (TX vif submit + RX vif confirm), main.c init	bh thread only	single-writer	none
`hw_priv->wsm_rx_seq[i]`	sdio_rx_work today	bh thread	single-writer	none — moves cleanly between contexts
`hw_priv->wsm_tx_pending[i]`	bh thread (inc on TX submit), bh+sdio_rx_work (dec on RX confirm)	bh thread only	single-writer	none
`hw_priv->lmac_mon_timer` / `mcu_mon_timer`	mod_timer / del_timer_sync from bh + sdio_rx_work	bh thread only	timer API safe anyway	none
`hw_priv->wsm_cmd.lock`	spinlock taken inside wsm_handle_rx	same	already protected	none
`priv->bh_evt_wq` wake-up	wsm_release_tx_buffer when count→0	same	wake_up is concurrency-safe	none
`bes_pwr.lock` (inside bes2600_pwr_clear_busy_event)	bh thread (today)	bh thread	already protected	none
`self->rx_data_cnt` etc. (sbus_priv stats)	sdio_rx_work	bh thread	single-writer	none

Zero fields require new locking. The architectural pivot eliminates the race v2's atomic_t was working around.

§6 Risks

bh thread now holds the SDIO bus mutex during read (currently held by sdio_rx_work). TX work in the same bh thread is unaffected (sdio_tx_work runs on a separate workqueue and shares the same mutex anyway). The sdio_lock contention pattern doesn't change.
Loss of "parallelism" between sdio_rx_work and bh TX: sdio_rx_work and bh thread appeared to run in parallel today, but both serialize through bes2600_sdio_lock(self) for the actual bus operations. The parallelism was illusory. Net throughput should not regress.
bh thread CPU-busy-time per RX batch increases: inline SDIO read is the same cost, just charged to bh instead of sdio_wq's worker. Mitigation: the per-IRQ workqueue dispatch cost (~86/s) is what we trade for it. Net: -86 dispatches/s, +0 µs per frame.
Multi-RX coalescing (BES_SDIO_RX_MULTIPLE_NUM=16) stays. bes2600_sdio_extract_packets parses the multi-frame buffer same as before, just inline now. No functional change to chip-side behaviour.
GPIO wake-flag bracketing: bes2600_gpio_wakeup_mcu(SDIO_RX) and bes2600_gpio_allow_mcu_sleep(SDIO_RX) currently bracket sdio_rx_work. Move them to bracket the new bh-side read. If the wake-flag accounting is sub-system-scoped (it is — flag bits per subsystem), this is a clean move.
IRQ re-enable in bh thread: cw1200's bh re-enables IRQ via __cw1200_irq_enable(priv, 1) after each round. bes2600 has the analogous __bes2600_irq_enable(0/1) (commented out as the asm volatile("nop") symptom in bh.c:1518-1520). This patch does NOT re-engage the commented-out re-enable — that's still task #24's call. But if the IRQ stays disabled across rounds, we'd never receive the next IRQ. Investigate before Phase 6 lands: where does IRQ re-enable happen in the current bes2600 hot path? The sdio_func IRQ may be auto-managed by sdio core differently. Block Phase 6 on this audit.
Phase 7 wedge resilience: if v3 has a different bug shape than v2's race (which it shouldn't, since the race is gone by construction), the wired path lets us collect telemetry from a wedged ohm.

§7 Phase 5 / 6 / 7

Phase 5: PR on git.reauktion.de/marfrit/besser with this artifact. Specifically request reviewer focus on §6 risk #6 (IRQ re-enable mechanism).
Phase 6: branch off cleanups (post-F): bes2600/sdio-rx-no-relay. Implement the file changes per §4. Build, install, smoke-test.
Phase 7:
- First: N=3 stress-ramp on F module (post-F pre-v3 baseline). 10 min @ 1, 30 min @ 2, 30 min @ 4 MB/s. Use wired path for telemetry.
- Then: install v3 module, identical N=3 ramp. Compare deltas.
- Predicted: sdio_rx_work dispatch rate → 0/s (was 86/s). observed receive lifts toward ≥ 1.0 MB/s sustained. _raw_spin_unlock_irqrestore drops by the rx_queue lock contribution (was 1914/s acquires).

§8 What gets dropped from v2 plan

atomic_t prep refactor (hw_bufs_used → atomic_t): not needed. Single-writer invariant preserved structurally. Still a defensible standalone hardening patch if mainlining bes2600 ever requires defense-in-depth, but not on the Bug-#5 critical path.
wsm_tx_pending[] decrement-decision race (v2 risk #2): also moots. Both sides single-thread under v3.
v2 Phase 7's "C-prep should show zero delta" gate: replaced by "v3 should match cw1200's structural shape" gate.

§9 Open question for reviewer

The big one is §6 risk #6 — IRQ re-enable. cw1200 explicitly does __cw1200_irq_enable(priv, 1) from bh after each round; bes2600 has the call commented out with an asm volatile("nop") placeholder. Either:

(a) bes2600's SDIO IRQ is level-triggered + auto-acked by SDIO core, so re-enable isn't needed (that would explain the nop). (b) The current code happens to work because sdio_rx_work is queued by the IRQ regardless of whether IRQ is "enabled" by the driver-side flag. After v3 we have to manually re-enable like cw1200 does.

Need to confirm (a) vs (b) before Phase 6 lands. Plan to grep for __bes2600_irq_enable callsites and trace back to whether it's load-bearing.

Plan written 2026-05-07 by Claude (noether), after Patch F merged and Patch C v2 (PR #10) was superseded by the cw1200 architectural mining finding. Phase 5 review on PR. Don't curate.

11 KiB Raw Blame History