Compare commits

..

4 Commits

Author SHA1 Message Date
test0r f12e870025 bes2600: self-detect when firmware does not honor PSM and skip the cycle
The c6 series fixed several host-side bookkeeping bugs around PSM
transitions, but didn't address the underlying contract: this chip's
firmware (BES2600 with the Bestechnic Dec 2023 build that ships on
PineTab2 and most danctnix images) silently drops every WSM_set_pm
request without emitting the corresponding PM_INDICATION. The driver's
own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every
~10s; without firmware acknowledgment each call burns 5s on
wait_for_completion_timeout(pm_enter_cmpl, 5*HZ) and produces a
recurring three-line cascade in dmesg:

  bes2600_pwr_enter_lp_mode, wait pm ind timeout
  bes2600_sdio_active failed, subsys:0
  bes2600_pwr_device_exit_lp_mode, active mcu fail

Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2
6.19.10-danctnix1, ohm) running the c5+c6 stack: zero
wsm_set_pm_indication() invocations across an entire boot, while
bes2600_pwr_enter_lp_mode timed out repeatedly, and
bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return
0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the
firmware reports "I'm awake, there's nothing to wake from").

This patch makes the driver self-heal:

  * struct bes2600_pwr_t gains pm_unsupported (bool) and
    pm_consecutive_timeouts (unsigned int). Both initialised to
    0/false.

  * bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when
    pm_unsupported is set. Skips the per-VIF set_pm round-trip and
    the wait_for_completion entirely.

  * On the cmpxchg-success branch of the timeout path, we increment
    pm_consecutive_timeouts. When it crosses
    BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch
    pm_unsupported = true and force chip_pm_state = ACTIVE so that
    bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the
    wake side (no gpio_wake / sbus_active / WSM_set_operational_mode
    reissue past the first one).

  * bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0
    on any incoming PM indication, and clears pm_unsupported if it
    was previously latched. So a firmware update that fixes PM_IND
    delivery automatically re-enables PSM transitions without a
    driver rebuild.

mac80211's PSM requests via bes2600_set_pm() still flow to the
firmware unchanged; they just don't have host-side timeouts so they
remain silent regardless of firmware acknowledgment. Power
consumption goes up if the firmware actually CAN do PSM (we'd be
keeping the chip awake unnecessarily), but on a chip where the
counter trips this trade-off is forced anyway: the chip stayed awake
under the broken cascade as well, just with constant SDIO churn.

Net effect on dmesg: after ~15s of boot, the three-line cascade stops
firing entirely. The firmware-side wedge is observed once per boot
(captured by the pm_unsupported latch) instead of per-cycle.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 17:56:08 +02:00
test0r 822a5f1bab bes2600: short-circuit wake handshake when chip is confirmed ACTIVE
The previous patch ("bes2600: gate PM indication completion on pending
request and track chip state") added enum bes2600_chip_pm_state and the
chip_pm_state field tracking what the host has *seen the firmware
confirm*. This patch makes the wake side use it.

Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally
runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active),
even when the chip is already in confirmed-ACTIVE state and the wake
sequence has nothing to do. The visible failure mode on PineTab2:

  bes2600_pwr_enter_lp_mode, wait pm ind timeout
  repeat set gpio_wake_flag, sub_sys:0
  bes2600_sdio_active failed, subsys:0
  bes2600_pwr_device_exit_lp_mode, active mcu fail

cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces:

  1. enter_lp_mode timed out (firmware indication lost). With c6.1,
     chip_pm_state is now UNKNOWN.
  2. lock_device fires exit_lp_mode.
  3. gpio_wake hits "bit already set" because device_enter_lp_mode
     was skipped when the indication timed out, so gpio_sleep was
     never called - the bit reflects driver intent, not chip state.
     gpio_wake silently no-ops (no GPIO edge), bit stays set.
  4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that
     never comes (firmware was never told to wake), then fails.
  5. Driver continues to wsm_set_operational_mode against the wedged
     bus, compounding the failure.

This patch's three moves:

  * bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry.
    On BES2600_CHIP_PM_ACTIVE, log at devel level and return without
    touching gpio_wake / sbus_active / WSM. The chip is in the state
    we want; the handshake exists only to drive a transition.

  * On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake
    handshake as before, but on sbus_active() failure: set
    chip_pm_state = UNKNOWN, log once at err level, and bail out.
    Do NOT call wsm_set_operational_mode over a wedged bus - it
    would just emit a second error and leave the chip in an even
    less defined state.

  * bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep():
    demote "repeat set/clear gpio_wake_flag" from bes_err to
    bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want
    MCU awake) is the steady-state case, and the symmetric clear
    while bit-already-clear is racy bookkeeping rather than a
    hardware error. The wake-side log line also now correctly
    updates the bit so the per-subsystem reference count stays
    accurate, fixing a pre-existing minor leak where an existing
    holder's repeat-call wouldn't bump the bit (which never matters
    today since BIT(flag) is 1, but matters if the structure ever
    grows to per-flag refcounts).

Net effect on the cycle:

  * If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips
    cleanly. Storm goes silent.
  * If chip is genuinely LP, behaviour is unchanged.
  * If chip is UNKNOWN (post-timeout state), one wake attempt is
    made; on failure, state stays UNKNOWN and we don't emit a
    second cascade error per attempt. Repeated UNKNOWN with failed
    wake will eventually be picked up by the LMAC active-monitor
    and escalated to mmc_hw_reset (c5.2).

No new locks, no new state. Only consumption of the chip_pm_state
field added in the prerequisite patch.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r c57c77e446 bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:

  1) bes2600_pwr_notify_ps_changed() unconditionally fires
     complete(pm_enter_cmpl) for any non-active psmode. It does not
     check whether a host-initiated set_pm is actually pending. A
     spontaneous indication (firmware-internal coex move,
     idle-driven aging) primes the completion, and the next host-
     driven enter_lp_mode sees a false success on its first
     wait_for_completion_timeout.

  2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is

         status = wait_for_completion_timeout(...);
         atomic_set(pm_set_in_process, 0);
         reinit_completion(...);

     If an indication arrives between wait_for_completion_timeout
     returning with status==1 and reinit_completion, the next
     enter_lp_mode iteration's wait can also see false success. The
     reinit must happen *before* we start the new request, not
     after handling the previous one.

  3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
     away. It does not record that the firmware's actual PM state
     is no longer known to the host. Subsequent wake paths
     (gpio_wake / sbus_active) assume the chip is still active and
     hit deterministic SDIO failures when the firmware has
     transitioned anyway.

This patch is the safe-prerequisite half of a wider fix:

  * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
    and bes_power.chip_pm_state. Its job is to track what the host
    has *seen the firmware confirm*, not what the host has
    requested. Initialised to ACTIVE in bes2600_pwr_init().

  * bes2600_pwr_notify_ps_changed() unconditionally updates
    chip_pm_state on every indication, but only fires
    complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
    1, 0) succeeds. A spontaneous indication can no longer prime a
    waiter that will only set up its request afterwards.

  * bes2600_pwr_enter_lp_mode() now reinit_completion()s before
    setting pm_set_in_process and sending wsm_set_pm. After a
    timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
    indication cannot prime the next iteration) and on the win-
    cmpxchg branch records chip_pm_state=UNKNOWN.

A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.

No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r db4ea70fb5 bes2600: widen scan-defer backoff to 30s and decay count on quiet
The scan-defer logic added in the previous patch ("bes2600: defer
scan and soften WARN on firmware reject") used a 10-second backoff
window and never cleared reject_count outside of a successful scan.
Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows
two distinct mac80211 scan-retry cadences in practice:

  * Idle background scans every ~5 minutes when associated -- well
    outside any plausible backoff, the defer guard correctly falls
    through to a real WSM scan attempt.

  * Roam-evaluation bursts triggered when mac80211 wants to find a
    candidate AP for handover (signal degradation, beacon loss,
    locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and
    one boot reproduced 14 such rejected scans in 3 minutes during a
    single burst, none of which engaged the defer guard because every
    retry landed just outside the 10 s window.

Two-line behaviour change to fix that:

  1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10*HZ to 30*HZ, so a
     12 s-cadence burst stays inside the window across consecutive
     rejects and the third reject in the burst trips the threshold
     guard. The 5 min idle case is still naturally past the window
     and is unaffected.

  2. bes2600_scan_should_defer() resets reject_count to 0 when
     time_after(jiffies, backoff_until). Without this, reject_count
     accumulated indefinitely across the slow-cadence rejects, so an
     isolated reject after long quiet would have tripped the
     threshold the moment it arrived. After the change, count is
     latched only inside an active burst and decays cleanly when the
     burst ends.

Net effect on a roam burst:

  * t=0   reject #1 (count 1, backoff_until = t0 + 30s)
  * t=12  reject #2 (count 2, backoff_until = t1 + 30s)
  * t=24  reject #3 (count 3, threshold met, next scan deferred)
  * t=36  defer fires, no WSM round-trip, reject not sent
  * ...   defers continue until the firmware-policy state clears
  * scan succeeds -> reject_count = 0, normal cadence resumes

WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just
the scans needed to reach the threshold). wpa_supplicant's reason=3
locally-generated disconnects driven by exhausted roam candidates
during the same burst window also drop.

No new state, no new symbols, no change to mac80211-facing semantics:
the deferred scan still completes via the existing fail: path with
status=-EBUSY, the same response a real firmware-busy would produce.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 14:33:00 +02:00
9 changed files with 4 additions and 313 deletions
-21
View File
@@ -511,9 +511,6 @@ struct bes2600_common {
struct list_head coex_event_list;
spinlock_t coex_event_lock;
/* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
struct work_struct connection_loss_storm_recover_work;
/* member for low power */
struct bes2600_pwr_t bes_power;
@@ -599,11 +596,6 @@ struct bes2600_vif {
unsigned long rx_timestamp;
u32 cipherType;
/* Decrypt-storm fast-recover (Trigger B). See txrx.c. */
unsigned long decrypt_storm_window_start;
unsigned int decrypt_storm_count;
unsigned int decrypt_storm_recoveries;
struct work_struct decrypt_storm_recover_work;
/* AP powersave */
u32 link_id_map;
@@ -630,10 +622,6 @@ struct bes2600_vif {
/* CQM Implementation */
struct delayed_work bss_loss_work;
struct delayed_work connection_loss_work;
/* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
unsigned long connection_loss_storm_window_start;
unsigned int connection_loss_storm_count;
unsigned int connection_loss_storm_recoveries;
struct work_struct tx_failure_work;
int delayed_link_loss;
spinlock_t bss_loss_lock;
@@ -868,13 +856,4 @@ int bes2600_btusb_setup_pipes(struct sbus_priv *sbus_priv);
void bes2600_btusb_uninit(struct usb_interface *interface);
#endif
/* Decrypt-storm fast-recover helpers — see txrx.c. */
void bes2600_decrypt_storm_init(struct bes2600_vif *priv);
void bes2600_decrypt_storm_account(struct bes2600_vif *priv);
/* Connection-loss-storm fast-recover helpers — see sta.c. */
void bes2600_connection_loss_storm_init(struct bes2600_vif *priv);
bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv);
void bes2600_connection_loss_storm_recover(struct work_struct *work);
#endif /* BES2600_H */
-51
View File
@@ -16,7 +16,6 @@
#include <linux/mmc/host.h>
#include <linux/mmc/sdio_func.h>
#include <linux/mmc/card.h>
#include <linux/mmc/core.h>
#include <linux/mmc/sdio.h>
#include <linux/spinlock.h>
#include <net/mac80211.h>
@@ -1789,55 +1788,6 @@ static void bes2600_sdio_halt_device(struct sbus_priv *self)
sdio_work_debug(self);
}
/*
* Trigger an SDIO bus reset via mmc_hw_reset().
*
* With multiple SDIO functions probed (PineTab2 binds func 1 for WLAN and
* func 2 for the BT-companion path) mmc_sdio_hw_reset() takes the
* remove-and-rescan path: it marks the card removed and schedules
* mmc_rescan, which tears down the bound function drivers and re-detects
* the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
*
* With a single function probed it instead invokes mmc_power_cycle()
* directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq.
*
* In both cases the chip ends up in a freshly reset state, which is the
* goal of the recovery path.
*
* mmc_hw_reset() must be called without holding the SDIO host claim --
* the multi-func remove-and-rescan path acquires the host claim via the
* mmc workqueue.
*/
static int bes2600_sdio_bus_reset(struct sbus_priv *self)
{
struct mmc_host *host;
int ret;
if (!self || !self->func || !self->func->card)
return -EINVAL;
host = self->func->card->host;
ret = mmc_hw_reset(self->func->card);
/*
* On multi-function SDIO cards (BES2600 has WLAN func 1 + BT
* companion func 2), mmc_sdio_hw_reset() removes the card and
* returns 1 to signal "remove happened, caller must trigger
* rescan". The kernel does NOT auto-rescan in this case;
* single-function cards take the rescan path inline and return 0.
* Treat any non-negative return as success and force a rescan if
* mmc_hw_reset signalled the multi-function path - otherwise the
* card stays removed indefinitely after a wedge recovery,
* leaving wifi (and the BT companion) silent until reboot.
*/
if (ret > 0) {
bes_info("multi-func mmc_hw_reset removed card; scheduling rescan\n");
mmc_detect_change(host, 0);
ret = 0;
}
return ret;
}
static bool bes2600_sdio_wakeup_source(struct sbus_priv *self)
{
struct bes2600_platform_data_sdio *pdata = bes2600_get_platform_data();
@@ -1876,7 +1826,6 @@ static struct sbus_ops bes2600_sdio_sbus_ops = {
.gpio_sleep = bes2600_gpio_allow_mcu_sleep,
.halt_device = bes2600_sdio_halt_device,
.wakeup_source = bes2600_sdio_wakeup_source,
.bus_reset = bes2600_sdio_bus_reset,
};
static void bes2600_sdio_en_lp_cb(struct bes2600_common *hw_priv)
+2 -69
View File
@@ -442,60 +442,6 @@ int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_
return ret;
}
/*
* Hard-reset the bus and wait for the bus core to remove the chip.
*
* Used by the firmware-wedge recovery path on platforms where the normal
* power_switch(0) sequence has no effective chip-reset signal. The bus
* implementation triggers an asynchronous re-detect; this helper waits for
* the resulting remove() callback to clear bes2600_cdev.sbus_priv so that a
* subsequent bes2600_switch_wifi(true) sees a clean state and can wait on
* the fresh probe.
*/
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv)
{
int ret;
long status;
if (!sbus_ops || !priv)
return -EINVAL;
if (!sbus_ops->bus_reset)
return -EOPNOTSUPP;
bes_info("trigger bus reset to recover wedged firmware.\n");
ret = sbus_ops->bus_reset(priv);
if (ret) {
bes_err("bus_reset failed: %d\n", ret);
return ret;
}
/*
* The bus reset is asynchronous: the bus core schedules a rescan
* which removes the bound function drivers and then re-detects the
* chip. Wait for the remove callback to clear sbus_priv. Do not
* dereference 'priv' after this point -- it may already be freed.
*/
status = wait_event_timeout(bes2600_cdev.probe_done_wq,
!bes2600_cdev.sbus_priv, HZ * 3);
WARN_ON(status <= 0);
return 0;
}
/*
* Trigger bes2600_chrdev_do_bus_reset() against the file-global
* bes2600_cdev. Used by host-side recovery paths outside this
* compilation unit (e.g. sta.c connection-loss-storm fast-recover) so
* those callers do not need to reach the static bes2600_cdev directly.
*/
int bes2600_chrdev_trigger_bus_reset(void)
{
return bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
}
bool bes2600_chrdev_is_wifi_opened(void)
{
bool wifi_opened = false;
@@ -594,21 +540,8 @@ static void bes2600_chrdev_wifi_force_close_work(struct work_struct *work)
/* unregister wifi */
bes2600_switch_wifi(0);
/*
* Hard exception with a bus_reset implementation: tear the
* bus down via mmc_hw_reset() (or equivalent) so the next
* bringup probes a freshly reset chip. On PineTab2 this is
* the only effective recovery path -- the existing
* power_switch(0)/(1) sequence has no chip-reset signal of
* its own (sdio_pwrseq owns wifi_reset).
*
* Soft close, or hard close on a board without bus_reset:
* fall back to the legacy power_switch(0) sequence.
*/
if (bes2600_cdev.halt_dev && bes2600_cdev.sbus_ops->bus_reset) {
bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
} else if (bes2600_chrdev_check_system_close()) {
/* power down device if wifi is only opened */
if (bes2600_chrdev_check_system_close()) {
bes2600_chrdev_do_system_close(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
}
-2
View File
@@ -60,8 +60,6 @@ struct sbus_priv *bes2600_chrdev_get_sbus_priv_data(void);
/* used to control device power down */
int bes2600_chrdev_check_system_close(void);
int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
int bes2600_chrdev_trigger_bus_reset(void);
void bes2600_chrdev_wakeup_bt(void);
void bes2600_chrdev_wifi_force_close(struct bes2600_common *hw_priv, bool halt_dev);
void bes2600_chrdev_usb_remove(struct bes2600_common *hw_priv);
-4
View File
@@ -542,10 +542,6 @@ static int bes2600_status_show_priv(struct seq_file *seq, void *v)
priv->listening ? " (listening)" : "");
seq_printf(seq, "Assoc: %s\n",
bes2600_debug_join_status[priv->join_status]);
seq_printf(seq, "DecryptStormRecoveries: %u\n",
priv->decrypt_storm_recoveries);
seq_printf(seq, "ConnectionLossStormRecoveries: %u\n",
priv->connection_loss_storm_recoveries);
if (priv->rx_filter.promiscuous)
seq_puts(seq, "Filter: promisc\n");
else if (priv->rx_filter.fcs)
-2
View File
@@ -484,8 +484,6 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
spin_lock_init(&hw_priv->rtsvalue_lock);
INIT_WORK(&hw_priv->dynamic_opt_txrx_work, bes2600_dynamic_opt_txrx_work);
INIT_WORK(&hw_priv->tx_policy_upload_work, tx_policy_upload_work);
INIT_WORK(&hw_priv->connection_loss_storm_recover_work,
bes2600_connection_loss_storm_recover);
spin_lock_init(&hw_priv->event_queue_lock);
INIT_LIST_HEAD(&hw_priv->event_queue);
INIT_WORK(&hw_priv->event_handler, bes2600_event_handler);
-8
View File
@@ -75,14 +75,6 @@ struct sbus_ops {
void (*halt_device)(struct sbus_priv *self);
bool (*wakeup_source)(struct sbus_priv *self);
int (*reboot)(struct sbus_priv *self);
/*
* Force the host bus to re-detect and re-probe the chip. Called
* from the firmware-wedge recovery path when power_switch() has no
* effective chip-reset signal of its own (e.g. PineTab2, where the
* wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 node).
* Returns 0 on success or a negative errno.
*/
int (*bus_reset)(struct sbus_priv *self);
};
void bes2600_irq_handler(struct bes2600_common *priv);
+2 -82
View File
@@ -266,7 +266,6 @@ void bes2600_stop(struct ieee80211_hw *dev, bool suspend)
cancel_work_sync(&hw_priv->coex_work);
coex_stop(hw_priv);
#endif
cancel_work_sync(&hw_priv->connection_loss_storm_recover_work);
bes2600_wifi_stop(hw_priv);
@@ -449,7 +448,6 @@ void bes2600_remove_interface(struct ieee80211_hw *dev,
cancel_delayed_work_sync(&priv->join_timeout);
cancel_delayed_work_sync(&priv->set_cts_work);
cancel_delayed_work_sync(&priv->pending_offchanneltx_work);
cancel_work_sync(&priv->decrypt_storm_recover_work);
del_timer_sync(&priv->mcast_timeout);
/* TODO:COMBO: May be reset of these variables "delayed_link_loss and
@@ -1660,70 +1658,6 @@ report:
spin_unlock(&priv->bss_loss_lock);
}
/*
* Connection-loss-storm fast-recover (Trigger A).
*
* bes2600_connection_loss_work below is the driver's own decision-point
* to give up on a BSS (after bss-loss detection accumulates beyond
* tolerance) and tell mac80211 via ieee80211_connection_loss(). On the
* deployed pinetab2 stack a single ieee80211_connection_loss() event
* sometimes triggers a userspace reauth blackhole (assoc-comeback
* timeouts followed by AP unprotected-deauth-reason-6) that ends only
* via cross-channel/cross-SSID fallback and can take 80+ s. Receipts at
* https://git.reauktion.de/marfrit/besser, notes/phase4-2026-05-07.md.
*
* When N connection-loss decisions land within WINDOW on the same vif,
* skip the ieee80211_connection_loss() path and trigger a chip-level
* bus_reset (the c5.2-introduced bes2600_chrdev_do_bus_reset). The chip
* is removed and re-probed; userspace re-associates from a fresh state,
* dodging the assoc-comeback loop.
*
* Threshold (3 / 60 s) is chosen well above the steady-state per-vif
* connection-loss rate observed in the patch-A Phase-7 rep
* (0.86/h under sustained load), so a true storm is required.
*
* The recover work_struct lives on bes2600_common (hw_priv) so that
* scheduling it does not race with vif teardown after bus_reset frees
* the per-vif state.
*/
#define BES2600_CONNECTION_LOSS_STORM_THRESHOLD 3
#define BES2600_CONNECTION_LOSS_STORM_WINDOW_MS 60000
void bes2600_connection_loss_storm_recover(struct work_struct *work)
{
bes_warn("[bes2600] connection-loss-storm fast-recover: bus_reset\n");
bes2600_chrdev_trigger_bus_reset();
/*
* After bes2600_chrdev_do_bus_reset() returns, the SDIO core has
* scheduled a remove + rescan; per-vif state may already be gone.
* Do not dereference any per-vif pointer here.
*/
}
void bes2600_connection_loss_storm_init(struct bes2600_vif *priv)
{
priv->connection_loss_storm_window_start = 0;
priv->connection_loss_storm_count = 0;
priv->connection_loss_storm_recoveries = 0;
}
bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv)
{
unsigned long now = jiffies;
unsigned long window =
msecs_to_jiffies(BES2600_CONNECTION_LOSS_STORM_WINDOW_MS);
if (priv->connection_loss_storm_window_start == 0 ||
time_after(now, priv->connection_loss_storm_window_start + window)) {
priv->connection_loss_storm_window_start = now;
priv->connection_loss_storm_count = 1;
return false;
}
return ++priv->connection_loss_storm_count >=
BES2600_CONNECTION_LOSS_STORM_THRESHOLD;
}
void bes2600_connection_loss_work(struct work_struct *work)
{
struct bes2600_vif *priv =
@@ -1733,21 +1667,9 @@ void bes2600_connection_loss_work(struct work_struct *work)
bes_devel("[CQM] Reporting connection loss.\n");
bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_BSS_LOST);
if (bes2600_connection_loss_storm_account(priv)) {
bes_warn("[bes2600] connection-loss storm: %u in %u s, scheduling bus reset\n",
priv->connection_loss_storm_count,
BES2600_CONNECTION_LOSS_STORM_WINDOW_MS / 1000);
priv->connection_loss_storm_count = 0;
priv->connection_loss_storm_recoveries++;
schedule_work(&hw_priv->connection_loss_storm_recover_work);
/* bus_reset will tear the chip down; skip the mac80211 path. */
return;
}
if (bes2600_suspend_status_get(hw_priv))
if(bes2600_suspend_status_get(hw_priv)) {
bes2600_pending_unjoin_set(hw_priv, priv->if_id);
else
} else
ieee80211_connection_loss(priv->vif);
#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
// set disconnected in BSS_CHANGED_ASSOC
@@ -2697,8 +2619,6 @@ int bes2600_vif_setup(struct bes2600_vif *priv)
/* Setup per vif workitems and locks */
spin_lock_init(&priv->vif_lock);
bes2600_decrypt_storm_init(priv);
bes2600_connection_loss_storm_init(priv);
INIT_WORK(&priv->join_work, bes2600_join_work);
INIT_DELAYED_WORK(&priv->join_timeout, bes2600_join_timeout);
INIT_WORK(&priv->unjoin_work, bes2600_unjoin_work);
-74
View File
@@ -25,78 +25,6 @@
#define BES2600_INVALID_RATE_ID (0xFF)
/*
* Decrypt-storm fast-recover (Trigger B).
*
* When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a
* burst of received frames (typically because the host's PTK or GTK
* has fallen out of sync with the AP), the AP eventually concludes that
* the STA is not authenticated and emits an unprotected deauth-reason-6
* ("Class 2 frame received from non-authenticated station"). On the
* deployed pinetab2 + bes2600 stack this AP-initiated deauth has been
* observed to leave the link blackholed for up to 109 s before
* userspace finds a different SSID/channel to recover on. (Receipts at
* https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)
*
* Recovery here pre-empts the AP: when we see THRESHOLD decrypt
* failures within WINDOW, we ask mac80211 for a clean reassoc via
* ieee80211_connection_loss(), which causes immediate disassociation
* and lets userspace auto-reconnect with fresh keys.
*
* mac80211 contract: ieee80211_connection_loss() may be called
* regardless of IEEE80211_HW_CONNECTION_MONITOR; it causes immediate
* disassociation without driver-side recovery attempts. See
* include/net/mac80211.h for the canonical doc-comment.
*
* The threshold is set well above the steady-state per-vif
* decrypt-fail rate observed in measurement (~1/min even under
* sustained 1 MB/s load), so a true storm is required to trip it.
*/
#define BES2600_DECRYPT_STORM_THRESHOLD 5
#define BES2600_DECRYPT_STORM_WINDOW_MS 5000
static void bes2600_decrypt_storm_recover_work(struct work_struct *work)
{
struct bes2600_vif *priv = container_of(work, struct bes2600_vif,
decrypt_storm_recover_work);
if (!priv->vif)
return;
bes_warn("[bes2600] decrypt-storm fast-recover: forcing reassoc\n");
ieee80211_connection_loss(priv->vif);
priv->decrypt_storm_recoveries++;
}
void bes2600_decrypt_storm_init(struct bes2600_vif *priv)
{
INIT_WORK(&priv->decrypt_storm_recover_work,
bes2600_decrypt_storm_recover_work);
priv->decrypt_storm_window_start = 0;
priv->decrypt_storm_count = 0;
priv->decrypt_storm_recoveries = 0;
}
void bes2600_decrypt_storm_account(struct bes2600_vif *priv)
{
unsigned long now = jiffies;
unsigned long window = msecs_to_jiffies(BES2600_DECRYPT_STORM_WINDOW_MS);
if (priv->decrypt_storm_window_start == 0 ||
time_after(now, priv->decrypt_storm_window_start + window)) {
priv->decrypt_storm_window_start = now;
priv->decrypt_storm_count = 1;
return;
}
if (++priv->decrypt_storm_count >= BES2600_DECRYPT_STORM_THRESHOLD) {
priv->decrypt_storm_count = 0;
/* Skew the window so we don't re-fire on the same storm. */
priv->decrypt_storm_window_start = now + window;
schedule_work(&priv->decrypt_storm_recover_work);
}
}
#ifdef CONFIG_BES2600_TESTMODE
#include "bes_nl80211_testmode_msg.h"
#endif /* CONFIG_BES2600_TESTMODE */
@@ -1744,8 +1672,6 @@ void bes2600_rx_cb(struct bes2600_vif *priv,
goto drop;
} else {
bes_warn("[RX] Receive failure: %d.\n", arg->status);
if (arg->status == WSM_STATUS_DECRYPTFAILURE)
bes2600_decrypt_storm_account(priv);
goto drop;
}
}