Merge pull request 'bes2600: reset firmware state on wsm_join_confirm failure' (#12 ) from bes2600/wsm-join-confirm-reset into cleanups

Reviewed-on: #12
bes2600: reset firmware state on wsm_join_confirm failure
2026-05-21 08:45:22 +00:00 · 2026-05-21 10:43:42 +02:00 · 2026-05-18 19:18:08 +00:00 · 2026-05-18 16:58:49 +02:00 · 2026-05-08 06:30:40 +00:00
3 changed files with 48 additions and 9 deletions
@@ -829,19 +829,19 @@ int bes2600_queue_get_skb(struct bes2600_queue *queue, u32 packetID,
 	bes2600_queue_parse_id(packetID, &queue_generation, &queue_id,
 				&item_generation, &item_id, &if_id, &link_id);

-	spin_lock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
+	spin_lock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
 	if (!list_empty(&queue->stats->hw_priv->tx_loop.pending_record_list)) {
 		list_for_each_entry_safe(record_item, temp_record_item, &queue->stats->hw_priv->tx_loop.pending_record_list, head) {
 			if (record_item->packetID == packetID) {
 				list_del(&record_item->head);
 				dev_kfree_skb(record_item->skb);
 				kfree(record_item);
-				spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
+				spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
 				return -EINVAL;
 			}
 		}
 	}
-	spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
+	spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);

 	item = &queue->pool[item_id];

@@ -2209,9 +2209,10 @@ void bes2600_join_work(struct work_struct *work)
 	struct wsm_template_frame probe_tmp = {
 		.frame_type = WSM_FRAME_TYPE_PROBE_REQUEST,
 	};
-	/*struct wsm_reset reset = {
-		.reset_statistics = true,
-	};*/
+	struct wsm_reset join_fail_reset = {
+		.reset_statistics = false,
+	};
+	bool join_failed = false;


 	BUG_ON(queueId >= 4);
@@ -2390,6 +2391,33 @@ void bes2600_join_work(struct work_struct *work)
 #endif /*CONFIG_BES2600_TESTMODE*/
 			cancel_delayed_work_sync(&priv->join_timeout);
 			bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_JOIN);
+			/*
+			 * Firmware rejected WSM_JOIN (wsm_join_confirm ret 1).
+			 * Issue wsm_reset so the firmware returns to a clean
+			 * IDLE state before the next association attempt.
+			 *
+			 * Without this reset the firmware sits in an
+			 * intermediate post-reject state.  A rapid second
+			 * JOIN (e.g. wpa_supplicant retrying after the
+			 * PREV_AUTH_NOT_VALID deauth that follows) hits an
+			 * inconsistent firmware context, causing
+			 * bes2600_sdio_read_rx_batch to return SDIO error
+			 * which cascades into wifi_force_close.
+			 *
+			 * cw1200 ancestor (drivers/net/wireless/st/cw1200/
+			 * sta.c:1339) queues unjoin_work on join failure for
+			 * the same reason; bes2600_unjoin_work gates its
+			 * wsm_reset on join_status != PASSIVE, so after a
+			 * failed JOIN (join_status stays PASSIVE) that path
+			 * never fires — call wsm_reset directly here instead.
+			 *
+			 * Contract: wsm_reset takes only wsm_cmd_lock; safe
+			 * to call while conf_lock is held.  wsm_oper_unlock
+			 * was already called in wsm_join_confirm() before
+			 * wsm_join() returned the error.
+			 */
+			WARN_ON(wsm_reset(hw_priv, &join_fail_reset, priv->if_id));
+			join_failed = true;
 		} else {
 			/* Upload keys */
 #ifdef CONFIG_BES2600_TESTMODE
@@ -2414,7 +2442,18 @@ void bes2600_join_work(struct work_struct *work)
 	up(&hw_priv->conf_lock);
 	if (bss)
 		cfg80211_put_bss(hw_priv->hw->wiphy, bss);
-	wsm_unlock_tx(hw_priv);
+	/*
+	 * On join failure: queue unjoin_work so the next association
+	 * attempt is serialised after any lingering cleanup, matching
+	 * cw1200 sta.c:1344 "Tx lock still held, unjoin will clear it."
+	 * If unjoin_work is already queued, release TX immediately.
+	 */
+	if (join_failed) {
+		if (queue_work(hw_priv->workqueue, &priv->unjoin_work) <= 0)
+			wsm_unlock_tx(hw_priv);
+	} else {
+		wsm_unlock_tx(hw_priv);
+	}
 }

 void bes2600_join_timeout(struct work_struct *work)
@@ -109,9 +109,9 @@ void bes2600_tx_loop_set_enable(struct bes2600_common *hw_priv, bool need_warn)
                bes2600_queue_iterate_pending_packet(&hw_priv->tx_queue[i],
 				                bes2600_tx_loop_item_pending_item);
        }
-	spin_lock(&hw_priv->tx_loop.pending_record_lock);
+	spin_lock_bh(&hw_priv->tx_loop.pending_record_lock);
        bes2600_queue_iterate_record_pending_packet(hw_priv, bes2600_tx_loop_item_pending_item);
-	spin_unlock(&hw_priv->tx_loop.pending_record_lock);
+	spin_unlock_bh(&hw_priv->tx_loop.pending_record_lock);

        if (atomic_read(&hw_priv->bh_rx) > 0)
 		wake_up(&hw_priv->bh_wq);
Author	SHA1	Message	Date
marfrit	64fc309e26	Merge pull request 'bes2600: reset firmware state on wsm_join_confirm failure' (#12 ) from bes2600/wsm-join-confirm-reset into cleanups Reviewed-on: #12	2026-05-21 08:45:22 +00:00
Markus Fritsche	cdb6bd07d3	bes2600: reset firmware state on wsm_join_confirm failure When wsm_join_confirm() returns status != WSM_STATUS_SUCCESS (ret 1), the driver cleared its bookkeeping but did not reset the firmware interface, leaving it in an intermediate post-rejection state. A rapid second JOIN attempt (e.g. wpa_supplicant retrying after the PREV_AUTH_NOT_VALID deauth that mac80211 emits to clean up) hits an inconsistent firmware context, causing bes2600_sdio_read_rx_batch to return SDIO error which cascades into wifi_force_close: wsm_join_confirm ret 1 deauthenticating from <bssid> by local choice (Reason: 2=PREV_AUTH_NOT_VALID) [~10 min later] bes2600_sdio_read_rx_batch sdio read error WARNING: at bes2600_tx_loop_set_enable / bes2600_chrdev_wifi_force_close Two additions to the failure path in bes2600_join_work(): 1. wsm_reset (WSM_REQ_ID_RESET, 0x000A) with reset_statistics=false. This returns the firmware to IDLE so the next association attempt starts from a known-clean state. bes2600_unjoin_work() performs the same reset, but gates it on join_status != PASSIVE; after a failed JOIN join_status stays PASSIVE, so that path never fires — call wsm_reset directly here instead. Contract: wsm_reset takes only wsm_cmd_lock (not conf_lock, not wsm_oper_lock). wsm_oper_unlock was already called inside wsm_join_confirm() before wsm_join() returned -EINVAL, so there is no re-entrancy hazard. conf_lock is held at this call site, which is compatible with wsm_reset's locking requirements. 2. queue_work(workqueue, &priv->unjoin_work) instead of direct wsm_unlock_tx(). Serialises the next association attempt through the workqueue so it cannot race against lingering firmware-side effects of the failure. If unjoin_work is already queued, release TX immediately (matching cw1200 ancestor sta.c:1344 comment "Tx lock still held, unjoin will clear it."). Ancestor reference: drivers/net/wireless/st/cw1200/sta.c, function cw1200_join_work(), lines 1339-1344. cw1200 queues unjoin_work on join failure for the same reason. bes2600 needs the direct wsm_reset in addition because its unjoin_work has the join_status gate that cw1200's cw1200_do_unjoin() does not. Signed-off-by: Claude (noether) <claude@reauktion.de>	2026-05-21 10:43:42 +02:00
marfrit	fc327b2ff6	Merge pull request 'bes2600: take pending_record_lock with _bh() — fix SOFTIRQ-safe → -unsafe inversion (closes besser#18)' (#11 ) from bes2600/queue-pending-record-lock-bh-fix into cleanups Reviewed-on: #11	2026-05-18 19:18:08 +00:00
Markus Fritsche	d95453c98e	bes2600: take pending_record_lock with _bh() to fix SOFTIRQ-safe → -unsafe inversion (besser#18) PROVE_LOCKING reports: WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected kworker/u16:1 is trying to acquire: &hw_priv->tx_loop.pending_record_lock at bes2600_queue_clear+0x80 and this task is already holding: &queue->lock at bes2600_queue_clear+0x60 which would create a new lock dependency: (&queue->lock){+.-.} -> (&hw_priv->tx_loop.pending_record_lock){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&queue->lock){+.-.} ... which became SOFTIRQ-irq-safe at: bes2600_tx -> ieee80211_handle_wake_tx_queue -> tasklet_action to a SOFTIRQ-irq-unsafe lock: (&hw_priv->tx_loop.pending_record_lock){+.+.} ... which became SOFTIRQ-irq-unsafe at: bes2600_queue_get_skb -> bes2600_join_work -> process_one_work queue->lock is taken consistently with spin_lock_bh() at 22 sites; the nested acquisition of pending_record_lock at queue.c:289 (inside the outer queue->lock_bh held at line 285) had it implicitly BH-safe via the outer scope. But pending_record_lock is ALSO taken from non-BH-disabled contexts: bes2600_queue_get_skb (queue.c:832) — process context via bes2600_join_work (workqueue), no outer queue->lock held bes2600_tx_loop_item_pending_check (tx_loop.c:112) — TX-loop context, no outer queue->lock held When CPU0 holds pending_record_lock from one of those non-BH paths and a softirq fires that wants queue->lock, and CPU1 in softirq has queue->lock and is about to acquire pending_record_lock — classic AB-BA SOFTIRQ deadlock. The fix is the conservative one: take pending_record_lock with _bh() at every site that's not already inside a queue->lock_bh-held scope. That makes the lock consistently SOFTIRQ-safe, eliminating the inversion. queue.c:289/295 stays as plain spin_lock because BH is already disabled by the outer queue->lock_bh acquired at queue.c:285. Five sites converted: bes2600/queue.c:832 -- spin_lock -> spin_lock_bh bes2600/queue.c:839 -- spin_unlock -> spin_unlock_bh bes2600/queue.c:844 -- spin_unlock -> spin_unlock_bh bes2600/tx_loop.c:112 -- spin_lock -> spin_lock_bh bes2600/tx_loop.c:114 -- spin_unlock -> spin_unlock_bh Contract: - Documentation/locking/locktypes.rst spelling: spin_lock_bh() is the canonical way to make a non-IRQ spinlock safe against softirq preemption that might re-enter the same lock. - Same shape as queue->lock in this driver and as is_drv->lock in the cw1200 ancestor. Closes: besser#18 Fixes: <bes2600 base import> Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-18 16:58:49 +02:00
marfrit	87a3d65960	bes2600: Patch H — bh.c hygiene cleanup (drop fossil blocks, dead stubs) (#10 )	2026-05-08 06:30:40 +00:00