bes2600: wsm_join_confirm failure leaves firmware dirty — wsm_reset missing in join failure path (backlog) #25
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Backlog issue. Full analysis in commit 8dcacc4 on branch bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms.
Root cause
After
wsm_join_confirm()returns status 1, bes2600 clears bookkeeping but does not reset firmware. A rapid second JOIN hits inconsistent firmware state, causingbes2600_sdio_read_rx_batchSDIO error thenwifi_force_closecascade.Observed: boot -1 on pkgrel=5, two JOIN failures to wohnzimmer 5 GHz AP (c0:25:06:e6:5b:32) 10 min apart, second triggers full cascade.
cw1200 ancestor (sta.c:1339-1344)
cw1200 queues unjoin_work on join failure:
Tx lock still held, unjoin will clear it.cw1200_do_unjoin()callswsm_reset()when join_status == STA.bes2600 divergence
bes2600_unjoin_work()gates wsm_reset onjoin_status != PASSIVE. After failed JOIN, join_status stays PASSIVE (set to STA only on success), so wsm_reset never fires and firmware sits in post-reject limbo.Fix (commit 8dcacc4)
Direct
wsm_reset(hw_priv, &join_fail_reset, priv->if_id)in failure path (compensates for PASSIVE gate). Contract: wsm_reset takes only wsm_cmd_lock; conf_lock held here is compatible.queue_work(hw_priv->workqueue, &priv->unjoin_work)instead of direct wsm_unlock_tx — serialises next association attempt, preventing race between second JOIN and first failure aftermath.Testing needed
Phase 7 verified — closing
Build: linux-pinetab2-danctnix-besser 7.0.danctnix1-6 (pkgrel=6), srcversion 0E16463FA8D85F4704DE93F. Patch 0022 shipped.
Soak: 1h+ clean on ohm (8h gate yanked forward per user — fix is unambiguously working).
Result against acceptance criteria:
PREV_AUTH_NOT_VALIDdeauth observed from a Fritz!AP within 1.5 min of boot — fires the same trigger condition as the original reprowsm_join_confirm ret 1text in subsequent logs (firmware accepted JOIN or recovered silently via the new reset path)bes2600_sdio_read_rx_batch sdio read errorwifi_force_close, zeroWARN_ON at bes2600_tx_loop_set_enableBonus finding
The periodic ~600ms latency jitter on ohm (~1 spike/min, hypothesised yesterday to be BT/WiFi ePTA coexistence) is the same root cause and also resolved:
The bgscan-driven roam-attempt to a 5 GHz BSSID followed by
wsm_joinreject was briefly stalling TX every minute even when the cascade did not fire. Thewsm_resetshort-circuits the dirty-firmware window and TX flows uninterrupted.Coordinates
bes2600/wsm-join-confirm-resetinmarfrit/bes2600-dkms, PR #12 againstcleanupsbes2600/join-confirm-failure-resetinmarfrit/bes2600-dkms, commit 8dcacc4noether/readme-pkgrel4-kernel-agent-flowinmarfrit/besser, commitfa4a165(patch 0022, pkgrel=6)Closing.