Compare commits

..

5 Commits

Author SHA1 Message Date
marfrit 64fc309e26 Merge pull request 'bes2600: reset firmware state on wsm_join_confirm failure' (#12) from bes2600/wsm-join-confirm-reset into cleanups
Reviewed-on: #12
2026-05-21 08:45:22 +00:00
test0r cdb6bd07d3 bes2600: reset firmware state on wsm_join_confirm failure
When wsm_join_confirm() returns status != WSM_STATUS_SUCCESS (ret 1),
the driver cleared its bookkeeping but did not reset the firmware
interface, leaving it in an intermediate post-rejection state.  A rapid
second JOIN attempt (e.g. wpa_supplicant retrying after the
PREV_AUTH_NOT_VALID deauth that mac80211 emits to clean up) hits an
inconsistent firmware context, causing bes2600_sdio_read_rx_batch to
return SDIO error which cascades into wifi_force_close:

  wsm_join_confirm ret 1
  deauthenticating from <bssid> by local choice (Reason: 2=PREV_AUTH_NOT_VALID)
  [~10 min later]
  bes2600_sdio_read_rx_batch sdio read error
  WARNING: at bes2600_tx_loop_set_enable / bes2600_chrdev_wifi_force_close

Two additions to the failure path in bes2600_join_work():

1. wsm_reset (WSM_REQ_ID_RESET, 0x000A) with reset_statistics=false.
   This returns the firmware to IDLE so the next association attempt
   starts from a known-clean state.  bes2600_unjoin_work() performs the
   same reset, but gates it on join_status != PASSIVE; after a failed
   JOIN join_status stays PASSIVE, so that path never fires — call
   wsm_reset directly here instead.

   Contract: wsm_reset takes only wsm_cmd_lock (not conf_lock, not
   wsm_oper_lock).  wsm_oper_unlock was already called inside
   wsm_join_confirm() before wsm_join() returned -EINVAL, so there is
   no re-entrancy hazard.  conf_lock is held at this call site, which is
   compatible with wsm_reset's locking requirements.

2. queue_work(workqueue, &priv->unjoin_work) instead of direct
   wsm_unlock_tx().  Serialises the next association attempt through
   the workqueue so it cannot race against lingering firmware-side
   effects of the failure.  If unjoin_work is already queued, release
   TX immediately (matching cw1200 ancestor sta.c:1344 comment "Tx lock
   still held, unjoin will clear it.").

Ancestor reference: drivers/net/wireless/st/cw1200/sta.c, function
cw1200_join_work(), lines 1339-1344.  cw1200 queues unjoin_work on join
failure for the same reason.  bes2600 needs the direct wsm_reset in
addition because its unjoin_work has the join_status gate that cw1200's
cw1200_do_unjoin() does not.

Signed-off-by: Claude (noether) <claude@reauktion.de>
2026-05-21 10:43:42 +02:00
marfrit fc327b2ff6 Merge pull request 'bes2600: take pending_record_lock with _bh() — fix SOFTIRQ-safe → -unsafe inversion (closes besser#18)' (#11) from bes2600/queue-pending-record-lock-bh-fix into cleanups
Reviewed-on: #11
2026-05-18 19:18:08 +00:00
test0r d95453c98e bes2600: take pending_record_lock with _bh() to fix SOFTIRQ-safe → -unsafe inversion (besser#18)
PROVE_LOCKING reports:

  WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
  kworker/u16:1 is trying to acquire:
    &hw_priv->tx_loop.pending_record_lock at bes2600_queue_clear+0x80
  and this task is already holding:
    &queue->lock at bes2600_queue_clear+0x60

  which would create a new lock dependency:
    (&queue->lock){+.-.}   -> (&hw_priv->tx_loop.pending_record_lock){+.+.}

  but this new dependency connects a SOFTIRQ-irq-safe lock:
    (&queue->lock){+.-.}
  ... which became SOFTIRQ-irq-safe at:
    bes2600_tx -> ieee80211_handle_wake_tx_queue -> tasklet_action
  to a SOFTIRQ-irq-unsafe lock:
    (&hw_priv->tx_loop.pending_record_lock){+.+.}
  ... which became SOFTIRQ-irq-unsafe at:
    bes2600_queue_get_skb -> bes2600_join_work -> process_one_work

queue->lock is taken consistently with spin_lock_bh() at 22 sites;
the nested acquisition of pending_record_lock at queue.c:289 (inside
the outer queue->lock_bh held at line 285) had it implicitly BH-safe
via the outer scope. But pending_record_lock is ALSO taken from
non-BH-disabled contexts:

  bes2600_queue_get_skb  (queue.c:832)  — process context via
    bes2600_join_work (workqueue), no outer queue->lock held
  bes2600_tx_loop_item_pending_check (tx_loop.c:112)
                                     — TX-loop context, no outer
                                     queue->lock held

When CPU0 holds pending_record_lock from one of those non-BH paths
and a softirq fires that wants queue->lock, and CPU1 in softirq has
queue->lock and is about to acquire pending_record_lock — classic AB-BA
SOFTIRQ deadlock.

The fix is the conservative one: take pending_record_lock with _bh()
at every site that's not already inside a queue->lock_bh-held scope.
That makes the lock consistently SOFTIRQ-safe, eliminating the
inversion. queue.c:289/295 stays as plain spin_lock because BH is
already disabled by the outer queue->lock_bh acquired at queue.c:285.

Five sites converted:
  bes2600/queue.c:832 -- spin_lock      -> spin_lock_bh
  bes2600/queue.c:839 -- spin_unlock    -> spin_unlock_bh
  bes2600/queue.c:844 -- spin_unlock    -> spin_unlock_bh
  bes2600/tx_loop.c:112 -- spin_lock    -> spin_lock_bh
  bes2600/tx_loop.c:114 -- spin_unlock  -> spin_unlock_bh

Contract:
  - Documentation/locking/locktypes.rst spelling: spin_lock_bh() is
    the canonical way to make a non-IRQ spinlock safe against
    softirq preemption that might re-enter the same lock.
  - Same shape as queue->lock in this driver and as is_drv->lock
    in the cw1200 ancestor.

Closes: besser#18
Fixes: <bes2600 base import>
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-05-18 16:58:49 +02:00
marfrit 87a3d65960 bes2600: Patch H — bh.c hygiene cleanup (drop fossil blocks, dead stubs) (#10) 2026-05-08 06:30:40 +00:00
3 changed files with 48 additions and 9 deletions
+3 -3
View File
@@ -829,19 +829,19 @@ int bes2600_queue_get_skb(struct bes2600_queue *queue, u32 packetID,
bes2600_queue_parse_id(packetID, &queue_generation, &queue_id,
&item_generation, &item_id, &if_id, &link_id);
spin_lock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
spin_lock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
if (!list_empty(&queue->stats->hw_priv->tx_loop.pending_record_list)) {
list_for_each_entry_safe(record_item, temp_record_item, &queue->stats->hw_priv->tx_loop.pending_record_list, head) {
if (record_item->packetID == packetID) {
list_del(&record_item->head);
dev_kfree_skb(record_item->skb);
kfree(record_item);
spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
return -EINVAL;
}
}
}
spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
item = &queue->pool[item_id];
+43 -4
View File
@@ -2209,9 +2209,10 @@ void bes2600_join_work(struct work_struct *work)
struct wsm_template_frame probe_tmp = {
.frame_type = WSM_FRAME_TYPE_PROBE_REQUEST,
};
/*struct wsm_reset reset = {
.reset_statistics = true,
};*/
struct wsm_reset join_fail_reset = {
.reset_statistics = false,
};
bool join_failed = false;
BUG_ON(queueId >= 4);
@@ -2390,6 +2391,33 @@ void bes2600_join_work(struct work_struct *work)
#endif /*CONFIG_BES2600_TESTMODE*/
cancel_delayed_work_sync(&priv->join_timeout);
bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_JOIN);
/*
* Firmware rejected WSM_JOIN (wsm_join_confirm ret 1).
* Issue wsm_reset so the firmware returns to a clean
* IDLE state before the next association attempt.
*
* Without this reset the firmware sits in an
* intermediate post-reject state. A rapid second
* JOIN (e.g. wpa_supplicant retrying after the
* PREV_AUTH_NOT_VALID deauth that follows) hits an
* inconsistent firmware context, causing
* bes2600_sdio_read_rx_batch to return SDIO error
* which cascades into wifi_force_close.
*
* cw1200 ancestor (drivers/net/wireless/st/cw1200/
* sta.c:1339) queues unjoin_work on join failure for
* the same reason; bes2600_unjoin_work gates its
* wsm_reset on join_status != PASSIVE, so after a
* failed JOIN (join_status stays PASSIVE) that path
* never fires — call wsm_reset directly here instead.
*
* Contract: wsm_reset takes only wsm_cmd_lock; safe
* to call while conf_lock is held. wsm_oper_unlock
* was already called in wsm_join_confirm() before
* wsm_join() returned the error.
*/
WARN_ON(wsm_reset(hw_priv, &join_fail_reset, priv->if_id));
join_failed = true;
} else {
/* Upload keys */
#ifdef CONFIG_BES2600_TESTMODE
@@ -2414,7 +2442,18 @@ void bes2600_join_work(struct work_struct *work)
up(&hw_priv->conf_lock);
if (bss)
cfg80211_put_bss(hw_priv->hw->wiphy, bss);
wsm_unlock_tx(hw_priv);
/*
* On join failure: queue unjoin_work so the next association
* attempt is serialised after any lingering cleanup, matching
* cw1200 sta.c:1344 "Tx lock still held, unjoin will clear it."
* If unjoin_work is already queued, release TX immediately.
*/
if (join_failed) {
if (queue_work(hw_priv->workqueue, &priv->unjoin_work) <= 0)
wsm_unlock_tx(hw_priv);
} else {
wsm_unlock_tx(hw_priv);
}
}
void bes2600_join_timeout(struct work_struct *work)
+2 -2
View File
@@ -109,9 +109,9 @@ void bes2600_tx_loop_set_enable(struct bes2600_common *hw_priv, bool need_warn)
bes2600_queue_iterate_pending_packet(&hw_priv->tx_queue[i],
bes2600_tx_loop_item_pending_item);
}
spin_lock(&hw_priv->tx_loop.pending_record_lock);
spin_lock_bh(&hw_priv->tx_loop.pending_record_lock);
bes2600_queue_iterate_record_pending_packet(hw_priv, bes2600_tx_loop_item_pending_item);
spin_unlock(&hw_priv->tx_loop.pending_record_lock);
spin_unlock_bh(&hw_priv->tx_loop.pending_record_lock);
if (atomic_read(&hw_priv->bh_rx) > 0)
wake_up(&hw_priv->bh_wq);