bes2600: wsm_join_confirm failure leaves firmware dirty — wsm_reset missing in join failure path (backlog) #24

Closed
opened 2026-05-21 07:30:29 +00:00 by marfrit · 0 comments
Owner

Backlog issue. Full analysis in commit 8dcacc4 on branch bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms.

Root cause

After wsm_join_confirm() returns status 1 (failure), bes2600 clears bookkeeping but does not reset firmware. A rapid second JOIN hits inconsistent firmware state, causing bes2600_sdio_read_rx_batch SDIO error → wifi_force_close cascade.

Observed: boot -1 on pkgrel=5, two JOIN failures to wohnzimmer 5 GHz AP (c0:25:06:e6:5b:32) 10 min apart, second triggers full cascade.

cw1200 ancestor (sta.c:1339-1344)

cw1200 queues unjoin_work on join failure: "Tx lock still held, unjoin will clear it." cw1200_do_unjoin() calls wsm_reset() when join_status == STA.

bes2600 divergence

bes2600_unjoin_work() gates wsm_reset on join_status != PASSIVE. After a failed JOIN, join_status stays PASSIVE (set to STA only on success) → wsm_reset never fires → firmware left in post-reject limbo.

Fix (commit 8dcacc4)

  1. Direct wsm_reset(hw_priv, &join_fail_reset, priv->if_id) in the failure path (compensates for PASSIVE gate). Contract: wsm_reset takes only wsm_cmd_lock; conf_lock held here is compatible; wsm_oper_unlock already called in wsm_join_confirm before error return.

  2. queue_work(hw_priv->workqueue, &priv->unjoin_work) instead of direct wsm_unlock_tx() — serialises next association attempt through workqueue, preventing race between second JOIN and first failure aftermath.

Testing needed

  • Reproduce wsm_join_confirm ret 1 scenario (near wohnzimmer 5 GHz), confirm no cascade follows
  • 8h soak, no wifi_force_close
  • Mobian DKMS: same patch applies
Backlog issue. Full analysis in commit 8dcacc4 on branch bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms. ## Root cause After `wsm_join_confirm()` returns status 1 (failure), bes2600 clears bookkeeping but does not reset firmware. A rapid second JOIN hits inconsistent firmware state, causing `bes2600_sdio_read_rx_batch` SDIO error → `wifi_force_close` cascade. Observed: boot -1 on pkgrel=5, two JOIN failures to wohnzimmer 5 GHz AP (`c0:25:06:e6:5b:32`) 10 min apart, second triggers full cascade. ## cw1200 ancestor (sta.c:1339-1344) cw1200 queues `unjoin_work` on join failure: "Tx lock still held, unjoin will clear it." `cw1200_do_unjoin()` calls `wsm_reset()` when `join_status == STA`. ## bes2600 divergence `bes2600_unjoin_work()` gates `wsm_reset` on `join_status != PASSIVE`. After a failed JOIN, `join_status` stays PASSIVE (set to STA only on success) → wsm_reset never fires → firmware left in post-reject limbo. ## Fix (commit 8dcacc4) 1. Direct `wsm_reset(hw_priv, &join_fail_reset, priv->if_id)` in the failure path (compensates for PASSIVE gate). Contract: wsm_reset takes only wsm_cmd_lock; conf_lock held here is compatible; wsm_oper_unlock already called in wsm_join_confirm before error return. 2. `queue_work(hw_priv->workqueue, &priv->unjoin_work)` instead of direct `wsm_unlock_tx()` — serialises next association attempt through workqueue, preventing race between second JOIN and first failure aftermath. ## Testing needed - Reproduce wsm_join_confirm ret 1 scenario (near wohnzimmer 5 GHz), confirm no cascade follows - 8h soak, no wifi_force_close - Mobian DKMS: same patch applies
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/besser#24