bes2600: reset firmware state on wsm_join_confirm failure

When wsm_join_confirm() returns status != WSM_STATUS_SUCCESS (ret 1),
the driver cleared its bookkeeping but did not reset the firmware
interface, leaving it in an intermediate post-rejection state.  A rapid
second JOIN attempt (e.g. wpa_supplicant retrying after the
PREV_AUTH_NOT_VALID deauth that mac80211 emits to clean up) hits an
inconsistent firmware context, causing bes2600_sdio_read_rx_batch to
return SDIO error which cascades into wifi_force_close:

  wsm_join_confirm ret 1
  deauthenticating from <bssid> by local choice (Reason: 2=PREV_AUTH_NOT_VALID)
  [~10 min later]
  bes2600_sdio_read_rx_batch sdio read error
  WARNING: at bes2600_tx_loop_set_enable / bes2600_chrdev_wifi_force_close

Two additions to the failure path in bes2600_join_work():

1. wsm_reset (WSM_REQ_ID_RESET, 0x000A) with reset_statistics=false.
   This returns the firmware to IDLE so the next association attempt
   starts from a known-clean state.  bes2600_unjoin_work() performs the
   same reset, but gates it on join_status != PASSIVE; after a failed
   JOIN join_status stays PASSIVE, so that path never fires — call
   wsm_reset directly here instead.

   Contract: wsm_reset takes only wsm_cmd_lock (not conf_lock, not
   wsm_oper_lock).  wsm_oper_unlock was already called inside
   wsm_join_confirm() before wsm_join() returned -EINVAL, so there is
   no re-entrancy hazard.  conf_lock is held at this call site, which is
   compatible with wsm_reset's locking requirements.

2. queue_work(workqueue, &priv->unjoin_work) instead of direct
   wsm_unlock_tx().  Serialises the next association attempt through
   the workqueue so it cannot race against lingering firmware-side
   effects of the failure.  If unjoin_work is already queued, release
   TX immediately (matching cw1200 ancestor sta.c:1344 comment "Tx lock
   still held, unjoin will clear it.").

Ancestor reference: drivers/net/wireless/st/cw1200/sta.c, function
cw1200_join_work(), lines 1339-1344.  cw1200 queues unjoin_work on join
failure for the same reason.  bes2600 needs the direct wsm_reset in
addition because its unjoin_work has the join_status gate that cw1200's
cw1200_do_unjoin() does not.

Signed-off-by: Claude (noether) <claude@reauktion.de>
This commit is contained in:
2026-05-21 09:25:12 +02:00
parent 0750df2611
commit cdb6bd07d3
+43 -4
View File
@@ -2209,9 +2209,10 @@ void bes2600_join_work(struct work_struct *work)
struct wsm_template_frame probe_tmp = {
.frame_type = WSM_FRAME_TYPE_PROBE_REQUEST,
};
/*struct wsm_reset reset = {
.reset_statistics = true,
};*/
struct wsm_reset join_fail_reset = {
.reset_statistics = false,
};
bool join_failed = false;
BUG_ON(queueId >= 4);
@@ -2390,6 +2391,33 @@ void bes2600_join_work(struct work_struct *work)
#endif /*CONFIG_BES2600_TESTMODE*/
cancel_delayed_work_sync(&priv->join_timeout);
bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_JOIN);
/*
* Firmware rejected WSM_JOIN (wsm_join_confirm ret 1).
* Issue wsm_reset so the firmware returns to a clean
* IDLE state before the next association attempt.
*
* Without this reset the firmware sits in an
* intermediate post-reject state. A rapid second
* JOIN (e.g. wpa_supplicant retrying after the
* PREV_AUTH_NOT_VALID deauth that follows) hits an
* inconsistent firmware context, causing
* bes2600_sdio_read_rx_batch to return SDIO error
* which cascades into wifi_force_close.
*
* cw1200 ancestor (drivers/net/wireless/st/cw1200/
* sta.c:1339) queues unjoin_work on join failure for
* the same reason; bes2600_unjoin_work gates its
* wsm_reset on join_status != PASSIVE, so after a
* failed JOIN (join_status stays PASSIVE) that path
* never fires — call wsm_reset directly here instead.
*
* Contract: wsm_reset takes only wsm_cmd_lock; safe
* to call while conf_lock is held. wsm_oper_unlock
* was already called in wsm_join_confirm() before
* wsm_join() returned the error.
*/
WARN_ON(wsm_reset(hw_priv, &join_fail_reset, priv->if_id));
join_failed = true;
} else {
/* Upload keys */
#ifdef CONFIG_BES2600_TESTMODE
@@ -2414,7 +2442,18 @@ void bes2600_join_work(struct work_struct *work)
up(&hw_priv->conf_lock);
if (bss)
cfg80211_put_bss(hw_priv->hw->wiphy, bss);
wsm_unlock_tx(hw_priv);
/*
* On join failure: queue unjoin_work so the next association
* attempt is serialised after any lingering cleanup, matching
* cw1200 sta.c:1344 "Tx lock still held, unjoin will clear it."
* If unjoin_work is already queued, release TX immediately.
*/
if (join_failed) {
if (queue_work(hw_priv->workqueue, &priv->unjoin_work) <= 0)
wsm_unlock_tx(hw_priv);
} else {
wsm_unlock_tx(hw_priv);
}
}
void bes2600_join_timeout(struct work_struct *work)