bes2600: bus_reset on connection-loss storm to dodge assoc-comeback blackhole #2
Reference in New Issue
Block a user
Delete Branch "bes2600/connection-loss-fast-recover"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Patch B — Phase 6 implementation (Trigger A)
Follows the Phase 4 plan merged at
marfrit/besserPR #5 (commit,notes/phase4-2026-05-07.md).What it does
When 3 driver-side
bes2600_connection_loss_workdecisions fire within 60 s on the same vif, skip the regularieee80211_connection_loss(vif)path and trigger a chip-levelbes2600_chrdev_do_bus_reset()instead. SDIO removes and re-probes the chip; userspace reassociates from a fresh state, dodging the AP'sassoc comebackrejection cycle.Why
21BD07B3= c-stack + Patch A) saw 9api_connection_lossevents, with one catastrophic at 02:42:11 → ~86 s ofassoc comebacktimeouts and AP unprotected-deauth-6 cluster, recovered only via cross-channel fallback.notes/phase4-2026-05-07.mdand the run dir atohm:/root/bes2600-samples/run-20260506-2113-patchA/.Reviewer feedback (from
marfrit/besserPR #5) folded innotes/phase4-2026-05-07.mdonly.Predicted Phase 7 delta vs unpatched baseline
Files touched
bes2600/bes2600.h— 3 counter fields onstruct bes2600_vif, 1 work_struct onstruct bes2600_common, 3 prototypesbes2600/sta.c— 3 helpers + storm-account hook inbes2600_connection_loss_work+ storm-init inbes2600_vif_setup+cancel_work_syncin the hw_priv shutdown pathbes2600/main.c—INIT_WORKalongside other hw_priv work_structsbes2600/debug.c—ConnectionLossStormRecoveriesseq_printfVerification status
checkpatch.pl --no-tree --strict: clean (0/0/0).cleanups(which has the c-stack + Patch A cherry-picked just now). Targeted atcleanupsnotmobianbecausebes2600_chrdev_do_bus_reset()is c5.2-only.Asks
bes2600_commonrather thanbes2600_vif— chosen becausebes2600_chrdev_do_bus_reset()triggers SDIO remove which frees the per-vif state, and we don't want to schedule_work onto a freed work_struct. OK?cancel_work_syncplaced adjacent tocoex_workcancel in the hw_priv shutdown path. OK or should it move?connection_loss_storm_recoveriesBEFORE callingschedule_work, so the counter is visible in debugfs even though the chip is about to be reset (counter survives in vif memory until the bus_reset's remove() fires). OK?🤖 Generated with Claude Code
When mac80211 declares connection loss against this AP (typically driven by inactivity-deauth or beacon-loss), the userspace reauth that follows sometimes enters a long blackhole: the AP responds to auth with success but defers assoc with the 802.11v "assoc comeback" timer; ohm retries faster than the comeback grants permission; the AP eventually fires an unprotected deauth-reason-6 ("Class 2 frame received from non- authenticated station"), and recovery only completes via cross-SSID or cross-channel fallback. Receipts: ~86 s blackhole observed in the phase-7 rep on 2026-05-07 02:42, with three subsequent BSSIDs returning assoc comeback timeouts before reason-9 (STA_REQ_ASSOC_WITHOUT_AUTH) fired. Documented in marfrit/besser:notes/phase4-2026-05-07.md. When N=3 driver-side connection_loss decisions fire within a 60 s window on the same vif, skip the ieee80211_connection_loss() path and trigger the c5.2-introduced bes2600_chrdev_do_bus_reset() instead. The bus reset removes and re-probes the chip; userspace re-associates with a fresh chip state, dodging the AP's comeback-timer rejection cycle. Predicted Phase 7 delta vs current baseline: - api_connection_loss rate: unchanged (we don't address the trigger) - conditional probability of >5 s blackhole given event: <= 30 % - worst-case recovery: 86 s -> < 10 s Contract pin: bes2600_chrdev_do_bus_reset(sbus_ops, sbus_priv) at bes2600/bes_chardev.c:455, introduced by c5.2. The function is async- returning: sbus_ops->bus_reset() schedules an SDIO rescan; the helper waits up to 3 s for the remove() callback to clear sbus_priv, then returns. Per-vif state is gone after this point, so the recover work lives on bes2600_common (hw_priv) and uses the global bes2600_cdev for the bus_reset call rather than dereferencing per-vif state. Threshold (3 / 60 s) is well above the steady-state per-vif connection_loss rate observed in the patch-A phase-7 rep (0.86/h under sustained load), so a true storm is required to trip it. Files touched: - bes2600/bes2600.h: 3 counter fields on struct bes2600_vif, 1 work_struct on struct bes2600_common, 3 prototypes - bes2600/sta.c: 3 helpers + storm-account hook in bes2600_connection_loss_work + storm-init in bes2600_vif_setup + cancel_work_sync in the hw_priv shutdown path; #include bes_chardev.h was already pulled in by an earlier c-stack patch - bes2600/main.c: INIT_WORK alongside other hw_priv work_structs - bes2600/debug.c: ConnectionLossStormRecoveries seq_printf in the per-vif status seq_file output The cw1200/cw1260 ancestor has no equivalent; this is a clean addition. checkpatch.pl --no-tree --strict: clean (0/0/0). Signed-off-by: Claude (noether) <claude@reauktion.de> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>cdfdac987atoae556d49daae556d49datoe78beea2cfe78beea2cftof2cf586f89The recover work_struct lives on bes2600_common rather than bes2600_vif — chosen because bes2600_chrdev_do_bus_reset() triggers SDIO remove which frees the per-vif state, and we don't want to schedule_work onto a freed work_struct. OK!
cancel_work_sync placed adjacent to coex_work cancel in the hw_priv shutdown path. OK!
The patch increments connection_loss_storm_recoveries BEFORE calling schedule_work, so the counter is visible in debugfs even though the chip is about to be reset (counter survives in vif memory until the bus_reset's remove() fires). OK!