bes2600: wsm_join_confirm failure leaves firmware dirty — wsm_reset missing in join failure path (backlog) #25

Closed
opened 2026-05-21 08:38:35 +00:00 by marfrit · 1 comment
Owner

Backlog issue. Full analysis in commit 8dcacc4 on branch bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms.

Root cause

After wsm_join_confirm() returns status 1, bes2600 clears bookkeeping but does not reset firmware. A rapid second JOIN hits inconsistent firmware state, causing bes2600_sdio_read_rx_batch SDIO error then wifi_force_close cascade.

Observed: boot -1 on pkgrel=5, two JOIN failures to wohnzimmer 5 GHz AP (c0:25:06:e6:5b:32) 10 min apart, second triggers full cascade.

cw1200 ancestor (sta.c:1339-1344)

cw1200 queues unjoin_work on join failure: Tx lock still held, unjoin will clear it. cw1200_do_unjoin() calls wsm_reset() when join_status == STA.

bes2600 divergence

bes2600_unjoin_work() gates wsm_reset on join_status != PASSIVE. After failed JOIN, join_status stays PASSIVE (set to STA only on success), so wsm_reset never fires and firmware sits in post-reject limbo.

Fix (commit 8dcacc4)

  1. Direct wsm_reset(hw_priv, &join_fail_reset, priv->if_id) in failure path (compensates for PASSIVE gate). Contract: wsm_reset takes only wsm_cmd_lock; conf_lock held here is compatible.

  2. queue_work(hw_priv->workqueue, &priv->unjoin_work) instead of direct wsm_unlock_tx — serialises next association attempt, preventing race between second JOIN and first failure aftermath.

Testing needed

  • Reproduce wsm_join_confirm ret 1 near wohnzimmer 5 GHz, confirm no cascade follows
  • 8h soak, no wifi_force_close
  • Mobian DKMS: same patch applies
Backlog issue. Full analysis in commit 8dcacc4 on branch bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms. ## Root cause After `wsm_join_confirm()` returns status 1, bes2600 clears bookkeeping but does not reset firmware. A rapid second JOIN hits inconsistent firmware state, causing `bes2600_sdio_read_rx_batch` SDIO error then `wifi_force_close` cascade. Observed: boot -1 on pkgrel=5, two JOIN failures to wohnzimmer 5 GHz AP (c0:25:06:e6:5b:32) 10 min apart, second triggers full cascade. ## cw1200 ancestor (sta.c:1339-1344) cw1200 queues unjoin_work on join failure: `Tx lock still held, unjoin will clear it.` `cw1200_do_unjoin()` calls `wsm_reset()` when join_status == STA. ## bes2600 divergence `bes2600_unjoin_work()` gates wsm_reset on `join_status != PASSIVE`. After failed JOIN, join_status stays PASSIVE (set to STA only on success), so wsm_reset never fires and firmware sits in post-reject limbo. ## Fix (commit 8dcacc4) 1. Direct `wsm_reset(hw_priv, &join_fail_reset, priv->if_id)` in failure path (compensates for PASSIVE gate). Contract: wsm_reset takes only wsm_cmd_lock; conf_lock held here is compatible. 2. `queue_work(hw_priv->workqueue, &priv->unjoin_work)` instead of direct wsm_unlock_tx — serialises next association attempt, preventing race between second JOIN and first failure aftermath. ## Testing needed - Reproduce wsm_join_confirm ret 1 near wohnzimmer 5 GHz, confirm no cascade follows - 8h soak, no wifi_force_close - Mobian DKMS: same patch applies
Author
Owner

Phase 7 verified — closing

Build: linux-pinetab2-danctnix-besser 7.0.danctnix1-6 (pkgrel=6), srcversion 0E16463FA8D85F4704DE93F. Patch 0022 shipped.

Soak: 1h+ clean on ohm (8h gate yanked forward per user — fix is unambiguously working).

Result against acceptance criteria:

  • PREV_AUTH_NOT_VALID deauth observed from a Fritz!AP within 1.5 min of boot — fires the same trigger condition as the original repro
  • ✓ No wsm_join_confirm ret 1 text in subsequent logs (firmware accepted JOIN or recovered silently via the new reset path)
  • ✓ Zero bes2600_sdio_read_rx_batch sdio read error
  • ✓ Zero wifi_force_close, zero WARN_ON at bes2600_tx_loop_set_enable

Bonus finding

The periodic ~600ms latency jitter on ohm (~1 spike/min, hypothesised yesterday to be BT/WiFi ePTA coexistence) is the same root cause and also resolved:

pkgrel=5 pkgrel=6
max RTT 612 ms 13.9 ms
mdev 103.5 ms 1.55 ms
spikes >50ms ~1/min 0/30s

The bgscan-driven roam-attempt to a 5 GHz BSSID followed by wsm_join reject was briefly stalling TX every minute even when the cascade did not fire. The wsm_reset short-circuits the dirty-firmware window and TX flows uninterrupted.

Coordinates

  • Mobian flavor: bes2600/wsm-join-confirm-reset in marfrit/bes2600-dkms, PR #12 against cleanups
  • Danctnix flavor: bes2600/join-confirm-failure-reset in marfrit/bes2600-dkms, commit 8dcacc4
  • PKGBUILD: noether/readme-pkgrel4-kernel-agent-flow in marfrit/besser, commit fa4a165 (patch 0022, pkgrel=6)

Closing.

## Phase 7 verified — closing **Build**: linux-pinetab2-danctnix-besser 7.0.danctnix1-6 (pkgrel=6), srcversion 0E16463FA8D85F4704DE93F. Patch 0022 shipped. **Soak**: 1h+ clean on ohm (8h gate yanked forward per user — fix is unambiguously working). **Result against acceptance criteria**: - ✓ `PREV_AUTH_NOT_VALID` deauth observed from a Fritz!AP within 1.5 min of boot — fires the same trigger condition as the original repro - ✓ No `wsm_join_confirm ret 1` text in subsequent logs (firmware accepted JOIN or recovered silently via the new reset path) - ✓ Zero `bes2600_sdio_read_rx_batch sdio read error` - ✓ Zero `wifi_force_close`, zero `WARN_ON at bes2600_tx_loop_set_enable` ## Bonus finding The periodic ~600ms latency jitter on ohm (~1 spike/min, hypothesised yesterday to be BT/WiFi ePTA coexistence) is the **same root cause** and also resolved: | | pkgrel=5 | pkgrel=6 | |---|---|---| | max RTT | 612 ms | 13.9 ms | | mdev | 103.5 ms | 1.55 ms | | spikes >50ms | ~1/min | 0/30s | The bgscan-driven roam-attempt to a 5 GHz BSSID followed by `wsm_join` reject was briefly stalling TX every minute even when the cascade did not fire. The `wsm_reset` short-circuits the dirty-firmware window and TX flows uninterrupted. ## Coordinates - Mobian flavor: `bes2600/wsm-join-confirm-reset` in `marfrit/bes2600-dkms`, PR #12 against `cleanups` - Danctnix flavor: `bes2600/join-confirm-failure-reset` in `marfrit/bes2600-dkms`, commit 8dcacc4 - PKGBUILD: `noether/readme-pkgrel4-kernel-agent-flow` in `marfrit/besser`, commit fa4a165 (patch 0022, pkgrel=6) Closing.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/besser#25