bes2600: Patch C v3 — drop sdio_rx_work relay, IRQ→bh-direct #5

Merged
marfrit merged 1 commits from bes2600/sdio-rx-no-relay into cleanups 2026-05-07 20:43:15 +00:00
Collaborator

Architectural pivot from PR #3 (closed, race-broken) and PR #10's atomic_t workaround. Matches cw1200 mainline (drivers/net/wireless/st/cw1200/).

What changed vs cleanups+F baseline

  • IRQ handler now wakes bh thread directly (no queue_work)
  • bh thread does inline SDIO read via sbus_ops->bus_rx_batch
  • bes2600_sdio_extract_packets delivers each SKB inline via bes2600_bh_handle_rx_skb
  • Single-writer-from-bh invariant on hw_priv->hw_bufs_used restored by construction — no atomic_t needed
  • Deletes: sdio_rx_work, bes2600_sdio_pipe_read, rx_queue, rx_queue_lock, rx_work field, cancel_work_sync(rx_work), flush_work(rx_work), work_pending(rx_work)
  • Adds: bes2600_bh_handle_rx_skb (bh.c), bes2600_sdio_read_rx_batch (bes2600_sdio.c), sbus_ops->bus_rx_batch (sbus.h)

Diff size

4 files, +209/-61 net.

Phase 6 → Phase 7 status

  • ✓ Builds clean on ohm sandbox (srcversion 371C6606B73AF19299228CA)
  • ✓ Loads, associates, no WARN/BUG/oops on stress
  • ✓ Verified: sdio_rx_work dispatch rate = 0/s (function deleted)
  • ✓ Verified: bes2600_bh_work redispatches = 0 (single long-lived item preserved)
  • ✓ Verified: the v2 race that wedged Patch C v1 within 13s under stress is GONE — chip stayed stable for ~25 min under stress in rep 1
  • ⚠️ Throughput delta vs F-baseline NOT yet quantified at N=3 ramp. Rep 2 hit a TCP nc-loop race (sender saw "Connection reset by peer" early; not a bes2600 issue). Full N=3 stress ramp pending.

Why merge before full Phase 7

The architectural fix is verified working — chip doesn't wedge, dispatch rate is 0, bh invariant preserved. The remaining Phase 7 work is quantifying how much throughput improved, not whether the change is safe. Reverting if N=3 ramp shows a regression is cheap (file rollback at /var/tmp/bes2600.patchF.rollback.ko on ohm).

Test plan

  • Re-run baseline N=3 on F module (post-F pre-v3 reference)
  • Apply v3 module, N=3 identical ramp
  • Compare deltas in: observed receive KB/s, _raw_spin_unlock_irqrestore CPU%, sdio_wq dispatch rate (now 0)
  • If deltas match plan §4.5 prediction → close out C v3 Phase 7 + memory entry
  • If they don't → loop back per CLAUDE.md, supersede with v4 if needed

Rollback

sudo install -D -m644 /var/tmp/bes2600.patchF.rollback.ko \
  /lib/modules/$(uname -r)/extra/bes2600.ko
sudo depmod -a && sudo reboot
Architectural pivot from PR #3 (closed, race-broken) and PR #10's atomic_t workaround. Matches cw1200 mainline (`drivers/net/wireless/st/cw1200/`). ## What changed vs cleanups+F baseline - IRQ handler now wakes bh thread directly (no queue_work) - bh thread does inline SDIO read via `sbus_ops->bus_rx_batch` - `bes2600_sdio_extract_packets` delivers each SKB inline via `bes2600_bh_handle_rx_skb` - Single-writer-from-bh invariant on `hw_priv->hw_bufs_used` restored by construction — no atomic_t needed - Deletes: `sdio_rx_work`, `bes2600_sdio_pipe_read`, `rx_queue`, `rx_queue_lock`, `rx_work` field, `cancel_work_sync(rx_work)`, `flush_work(rx_work)`, `work_pending(rx_work)` - Adds: `bes2600_bh_handle_rx_skb` (bh.c), `bes2600_sdio_read_rx_batch` (bes2600_sdio.c), `sbus_ops->bus_rx_batch` (sbus.h) ## Diff size 4 files, +209/-61 net. ## Phase 6 → Phase 7 status - ✓ Builds clean on ohm sandbox (srcversion `371C6606B73AF19299228CA`) - ✓ Loads, associates, no WARN/BUG/oops on stress - ✓ Verified: `sdio_rx_work` dispatch rate = 0/s (function deleted) - ✓ Verified: `bes2600_bh_work` redispatches = 0 (single long-lived item preserved) - ✓ Verified: the v2 race that wedged Patch C v1 within 13s under stress is GONE — chip stayed stable for ~25 min under stress in rep 1 - ⚠️ Throughput delta vs F-baseline NOT yet quantified at N=3 ramp. Rep 2 hit a TCP nc-loop race (sender saw "Connection reset by peer" early; not a bes2600 issue). Full N=3 stress ramp pending. ## Why merge before full Phase 7 The architectural fix is verified working — chip doesn't wedge, dispatch rate is 0, bh invariant preserved. The remaining Phase 7 work is quantifying *how much* throughput improved, not *whether* the change is safe. Reverting if N=3 ramp shows a regression is cheap (file rollback at /var/tmp/bes2600.patchF.rollback.ko on ohm). ## Test plan - [ ] Re-run baseline N=3 on F module (post-F pre-v3 reference) - [ ] Apply v3 module, N=3 identical ramp - [ ] Compare deltas in: observed receive KB/s, _raw_spin_unlock_irqrestore CPU%, sdio_wq dispatch rate (now 0) - [ ] If deltas match plan §4.5 prediction → close out C v3 Phase 7 + memory entry - [ ] If they don't → loop back per CLAUDE.md, supersede with v4 if needed ## Rollback ``` sudo install -D -m644 /var/tmp/bes2600.patchF.rollback.ko \ /lib/modules/$(uname -r)/extra/bes2600.ko sudo depmod -a && sudo reboot ```
claude-noether added 1 commit 2026-05-07 20:34:13 +00:00
Patch C v3 — match cw1200 mainline architecture
(drivers/net/wireless/st/cw1200/).  Eliminates the
sdio_rx_work workqueue relay that introduced a thread-safety
race on hw_priv->hw_bufs_used in v1 (PR #3 closed) and that
v2's atomic_t prep was a workaround for (PR #10 superseded by
v3 plan PR #11).

Architectural changes:

  - bes2600_gpio_irq_handler: now calls self->irq_handler()
    directly instead of queue_work(self->sdio_wq, &self->rx_work).
    Bumps bh_rx atomic + wakes bh_wq.
  - bes2600_bh_rx_helper (BES_SDIO_RX_MULTIPLE_ENABLE branch):
    now calls priv->sbus_ops->bus_rx_batch() to do the SDIO read
    inline.  No pipe_read, no skb_dequeue.
  - bes2600_sdio_read_rx_batch (new): the SDIO read sequence
    extracted from sdio_rx_work, registered as
    sbus_ops->bus_rx_batch.  Runs in bh thread context.
  - bes2600_sdio_extract_packets: calls
    bes2600_bh_handle_rx_skb() directly per parsed SKB.  No
    skb_queue_tail, no rx_queue.
  - bes2600_bh_handle_rx_skb (new in bh.c): the per-SKB
    bookkeeping that bh_rx_helper used to do post-pipe_read
    (seq# check, exception, confirm-condition, wsm_handle_rx).
    Wakes bh thread for tx-burst via atomic_inc(&priv->bh_tx)
    instead of bes2600_bh_wakeup() — we ARE the bh thread.
  - Post-tx queue_work(rx_work) site: replaced with
    self->irq_handler() to wake bh for piggyback RX check.

Deleted infrastructure:

  - struct sbus_priv: rx_queue, rx_queue_lock, rx_work fields
  - bes2600_sdio_pipe_read: function deleted (unused)
  - sdio_rx_work: function deleted (unused)
  - sbus_ops->pipe_read assignment: removed for SDIO bus
  - skb_queue_head_init(&self->rx_queue), spin_lock_init(...),
    INIT_WORK(rx_work): probe-time setup removed
  - cancel_work_sync(rx_work) + drain loop in empty_work: removed
  - flush_work(rx_work) in drain helper: replaced with msleep(2)
  - work_pending(rx_work) check in suspend predicate: removed

Concurrency invariant restored:

  - hw_priv->hw_bufs_used: single-writer (bh thread only)
    by construction.  No atomic_t needed.
  - hw_priv->hw_bufs_used_vif[]: ditto.
  - hw_priv->wsm_tx_pending[]: ditto.
  - All other shared state: unchanged or already protected.

Phase 7 partial verification (rep 1, 2026-05-07):

  - Module loads clean, srcversion 371C6606B73AF19299228CA
  - Link associates, no WARN/BUG/oops
  - sdio_rx_work dispatches: 0 (function deleted)
  - bes2600_bh_work redispatches: 0 (single long-lived
    invariant preserved)
  - Chip handled stress traffic without wedge

Phase 7 full N=3 stress ramp deferred to follow-up rep series
(rep 2 had a TCP-level nc race; not a bes2600 issue but
invalidated rep 2's throughput number).
marfrit merged commit 979d5436ee into cleanups 2026-05-07 20:43:15 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/bes2600-dkms#5