Compare commits

..

2 Commits

Author SHA1 Message Date
test0r c57c77e446 bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:

  1) bes2600_pwr_notify_ps_changed() unconditionally fires
     complete(pm_enter_cmpl) for any non-active psmode. It does not
     check whether a host-initiated set_pm is actually pending. A
     spontaneous indication (firmware-internal coex move,
     idle-driven aging) primes the completion, and the next host-
     driven enter_lp_mode sees a false success on its first
     wait_for_completion_timeout.

  2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is

         status = wait_for_completion_timeout(...);
         atomic_set(pm_set_in_process, 0);
         reinit_completion(...);

     If an indication arrives between wait_for_completion_timeout
     returning with status==1 and reinit_completion, the next
     enter_lp_mode iteration's wait can also see false success. The
     reinit must happen *before* we start the new request, not
     after handling the previous one.

  3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
     away. It does not record that the firmware's actual PM state
     is no longer known to the host. Subsequent wake paths
     (gpio_wake / sbus_active) assume the chip is still active and
     hit deterministic SDIO failures when the firmware has
     transitioned anyway.

This patch is the safe-prerequisite half of a wider fix:

  * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
    and bes_power.chip_pm_state. Its job is to track what the host
    has *seen the firmware confirm*, not what the host has
    requested. Initialised to ACTIVE in bes2600_pwr_init().

  * bes2600_pwr_notify_ps_changed() unconditionally updates
    chip_pm_state on every indication, but only fires
    complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
    1, 0) succeeds. A spontaneous indication can no longer prime a
    waiter that will only set up its request afterwards.

  * bes2600_pwr_enter_lp_mode() now reinit_completion()s before
    setting pm_set_in_process and sending wsm_set_pm. After a
    timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
    indication cannot prime the next iteration) and on the win-
    cmpxchg branch records chip_pm_state=UNKNOWN.

A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.

No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r db4ea70fb5 bes2600: widen scan-defer backoff to 30s and decay count on quiet
The scan-defer logic added in the previous patch ("bes2600: defer
scan and soften WARN on firmware reject") used a 10-second backoff
window and never cleared reject_count outside of a successful scan.
Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows
two distinct mac80211 scan-retry cadences in practice:

  * Idle background scans every ~5 minutes when associated -- well
    outside any plausible backoff, the defer guard correctly falls
    through to a real WSM scan attempt.

  * Roam-evaluation bursts triggered when mac80211 wants to find a
    candidate AP for handover (signal degradation, beacon loss,
    locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and
    one boot reproduced 14 such rejected scans in 3 minutes during a
    single burst, none of which engaged the defer guard because every
    retry landed just outside the 10 s window.

Two-line behaviour change to fix that:

  1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10*HZ to 30*HZ, so a
     12 s-cadence burst stays inside the window across consecutive
     rejects and the third reject in the burst trips the threshold
     guard. The 5 min idle case is still naturally past the window
     and is unaffected.

  2. bes2600_scan_should_defer() resets reject_count to 0 when
     time_after(jiffies, backoff_until). Without this, reject_count
     accumulated indefinitely across the slow-cadence rejects, so an
     isolated reject after long quiet would have tripped the
     threshold the moment it arrived. After the change, count is
     latched only inside an active burst and decays cleanly when the
     burst ends.

Net effect on a roam burst:

  * t=0   reject #1 (count 1, backoff_until = t0 + 30s)
  * t=12  reject #2 (count 2, backoff_until = t1 + 30s)
  * t=24  reject #3 (count 3, threshold met, next scan deferred)
  * t=36  defer fires, no WSM round-trip, reject not sent
  * ...   defers continue until the firmware-policy state clears
  * scan succeeds -> reject_count = 0, normal cadence resumes

WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just
the scans needed to reach the threshold). wpa_supplicant's reason=3
locally-generated disconnects driven by exhausted roam candidates
during the same burst window also drop.

No new state, no new symbols, no change to mac80211-facing semantics:
the deferred scan still completes via the existing fail: path with
status=-EBUSY, the same response a real firmware-busy would produce.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 14:33:00 +02:00
11 changed files with 14 additions and 440 deletions
-21
View File
@@ -511,9 +511,6 @@ struct bes2600_common {
struct list_head coex_event_list;
spinlock_t coex_event_lock;
/* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
struct work_struct connection_loss_storm_recover_work;
/* member for low power */
struct bes2600_pwr_t bes_power;
@@ -599,11 +596,6 @@ struct bes2600_vif {
unsigned long rx_timestamp;
u32 cipherType;
/* Decrypt-storm fast-recover (Trigger B). See txrx.c. */
unsigned long decrypt_storm_window_start;
unsigned int decrypt_storm_count;
unsigned int decrypt_storm_recoveries;
struct work_struct decrypt_storm_recover_work;
/* AP powersave */
u32 link_id_map;
@@ -630,10 +622,6 @@ struct bes2600_vif {
/* CQM Implementation */
struct delayed_work bss_loss_work;
struct delayed_work connection_loss_work;
/* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
unsigned long connection_loss_storm_window_start;
unsigned int connection_loss_storm_count;
unsigned int connection_loss_storm_recoveries;
struct work_struct tx_failure_work;
int delayed_link_loss;
spinlock_t bss_loss_lock;
@@ -868,13 +856,4 @@ int bes2600_btusb_setup_pipes(struct sbus_priv *sbus_priv);
void bes2600_btusb_uninit(struct usb_interface *interface);
#endif
/* Decrypt-storm fast-recover helpers — see txrx.c. */
void bes2600_decrypt_storm_init(struct bes2600_vif *priv);
void bes2600_decrypt_storm_account(struct bes2600_vif *priv);
/* Connection-loss-storm fast-recover helpers — see sta.c. */
void bes2600_connection_loss_storm_init(struct bes2600_vif *priv);
bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv);
void bes2600_connection_loss_storm_recover(struct work_struct *work);
#endif /* BES2600_H */
+2 -64
View File
@@ -16,7 +16,6 @@
#include <linux/mmc/host.h>
#include <linux/mmc/sdio_func.h>
#include <linux/mmc/card.h>
#include <linux/mmc/core.h>
#include <linux/mmc/sdio.h>
#include <linux/spinlock.h>
#include <net/mac80211.h>
@@ -1389,14 +1388,7 @@ static void bes2600_gpio_wakeup_mcu(struct sbus_priv *self, int flag)
/* error check */
if((self->gpio_wakup_flags & BIT(flag)) != 0) {
/*
* Multiple subsystems holding wake is the steady-state case
* (e.g. WIFI + BT both want MCU awake). Demoted from bes_err
* to bes_devel since it isn't an error - the GPIO is already
* asserted high and the subsystem is now also tracked.
*/
bes_devel("repeat set gpio_wake_flag, sub_sys:%d\n", flag);
self->gpio_wakup_flags |= BIT(flag);
bes_err( "repeat set gpio_wake_flag, sub_sys:%d", flag);
mutex_unlock(&self->io_mutex);
return;
}
@@ -1428,11 +1420,7 @@ static void bes2600_gpio_allow_mcu_sleep(struct sbus_priv *self, int flag)
/* error check */
if((self->gpio_wakup_flags & BIT(flag)) == 0) {
/*
* Mirror of the wake path: a clear when the bit is already
* clear is racy bookkeeping, not a hardware error.
*/
bes_devel("repeat clear gpio_wake_flag, sub_sys:%d\n", flag);
bes_err( "repeat clear gpio_wake_flag, sub_sys:%d", flag);
mutex_unlock(&self->io_mutex);
return;
}
@@ -1789,55 +1777,6 @@ static void bes2600_sdio_halt_device(struct sbus_priv *self)
sdio_work_debug(self);
}
/*
* Trigger an SDIO bus reset via mmc_hw_reset().
*
* With multiple SDIO functions probed (PineTab2 binds func 1 for WLAN and
* func 2 for the BT-companion path) mmc_sdio_hw_reset() takes the
* remove-and-rescan path: it marks the card removed and schedules
* mmc_rescan, which tears down the bound function drivers and re-detects
* the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
*
* With a single function probed it instead invokes mmc_power_cycle()
* directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq.
*
* In both cases the chip ends up in a freshly reset state, which is the
* goal of the recovery path.
*
* mmc_hw_reset() must be called without holding the SDIO host claim --
* the multi-func remove-and-rescan path acquires the host claim via the
* mmc workqueue.
*/
static int bes2600_sdio_bus_reset(struct sbus_priv *self)
{
struct mmc_host *host;
int ret;
if (!self || !self->func || !self->func->card)
return -EINVAL;
host = self->func->card->host;
ret = mmc_hw_reset(self->func->card);
/*
* On multi-function SDIO cards (BES2600 has WLAN func 1 + BT
* companion func 2), mmc_sdio_hw_reset() removes the card and
* returns 1 to signal "remove happened, caller must trigger
* rescan". The kernel does NOT auto-rescan in this case;
* single-function cards take the rescan path inline and return 0.
* Treat any non-negative return as success and force a rescan if
* mmc_hw_reset signalled the multi-function path - otherwise the
* card stays removed indefinitely after a wedge recovery,
* leaving wifi (and the BT companion) silent until reboot.
*/
if (ret > 0) {
bes_info("multi-func mmc_hw_reset removed card; scheduling rescan\n");
mmc_detect_change(host, 0);
ret = 0;
}
return ret;
}
static bool bes2600_sdio_wakeup_source(struct sbus_priv *self)
{
struct bes2600_platform_data_sdio *pdata = bes2600_get_platform_data();
@@ -1876,7 +1815,6 @@ static struct sbus_ops bes2600_sdio_sbus_ops = {
.gpio_sleep = bes2600_gpio_allow_mcu_sleep,
.halt_device = bes2600_sdio_halt_device,
.wakeup_source = bes2600_sdio_wakeup_source,
.bus_reset = bes2600_sdio_bus_reset,
};
static void bes2600_sdio_en_lp_cb(struct bes2600_common *hw_priv)
+2 -57
View File
@@ -442,48 +442,6 @@ int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_
return ret;
}
/*
* Hard-reset the bus and wait for the bus core to remove the chip.
*
* Used by the firmware-wedge recovery path on platforms where the normal
* power_switch(0) sequence has no effective chip-reset signal. The bus
* implementation triggers an asynchronous re-detect; this helper waits for
* the resulting remove() callback to clear bes2600_cdev.sbus_priv so that a
* subsequent bes2600_switch_wifi(true) sees a clean state and can wait on
* the fresh probe.
*/
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv)
{
int ret;
long status;
if (!sbus_ops || !priv)
return -EINVAL;
if (!sbus_ops->bus_reset)
return -EOPNOTSUPP;
bes_info("trigger bus reset to recover wedged firmware.\n");
ret = sbus_ops->bus_reset(priv);
if (ret) {
bes_err("bus_reset failed: %d\n", ret);
return ret;
}
/*
* The bus reset is asynchronous: the bus core schedules a rescan
* which removes the bound function drivers and then re-detects the
* chip. Wait for the remove callback to clear sbus_priv. Do not
* dereference 'priv' after this point -- it may already be freed.
*/
status = wait_event_timeout(bes2600_cdev.probe_done_wq,
!bes2600_cdev.sbus_priv, HZ * 3);
WARN_ON(status <= 0);
return 0;
}
bool bes2600_chrdev_is_wifi_opened(void)
{
bool wifi_opened = false;
@@ -582,21 +540,8 @@ static void bes2600_chrdev_wifi_force_close_work(struct work_struct *work)
/* unregister wifi */
bes2600_switch_wifi(0);
/*
* Hard exception with a bus_reset implementation: tear the
* bus down via mmc_hw_reset() (or equivalent) so the next
* bringup probes a freshly reset chip. On PineTab2 this is
* the only effective recovery path -- the existing
* power_switch(0)/(1) sequence has no chip-reset signal of
* its own (sdio_pwrseq owns wifi_reset).
*
* Soft close, or hard close on a board without bus_reset:
* fall back to the legacy power_switch(0) sequence.
*/
if (bes2600_cdev.halt_dev && bes2600_cdev.sbus_ops->bus_reset) {
bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
} else if (bes2600_chrdev_check_system_close()) {
/* power down device if wifi is only opened */
if (bes2600_chrdev_check_system_close()) {
bes2600_chrdev_do_system_close(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
}
-1
View File
@@ -60,7 +60,6 @@ struct sbus_priv *bes2600_chrdev_get_sbus_priv_data(void);
/* used to control device power down */
int bes2600_chrdev_check_system_close(void);
int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
void bes2600_chrdev_wakeup_bt(void);
void bes2600_chrdev_wifi_force_close(struct bes2600_common *hw_priv, bool halt_dev);
void bes2600_chrdev_usb_remove(struct bes2600_common *hw_priv);
+8 -118
View File
@@ -467,45 +467,6 @@ static void bes2600_pwr_device_enter_lp_mode(struct bes2600_common *hw_priv)
bes_devel("device enter sleep\n");
}
/*
* Number of consecutive bes2600_pwr_enter_lp_mode timeouts (with zero
* PM_INDICATIONs received) before we conclude the firmware does not
* honor host-driven PSM and switch to a sticky skip path.
*/
#define BES2600_PM_UNSUPPORTED_THRESHOLD 3
/*
* Latch pm_unsupported = true and force chip_pm_state = ACTIVE so the
* c6.2 wake-side skip branch covers bes2600_pwr_device_exit_lp_mode.
* Called after BES2600_PM_UNSUPPORTED_THRESHOLD consecutive enter_lp_mode
* timeouts with zero PM_INDICATIONs.
*/
static void bes2600_pwr_latch_pm_unsupported(struct bes2600_common *hw_priv)
{
bes_warn("PSM not honored (%u timeouts), switching to skip mode\n",
hw_priv->bes_power.pm_consecutive_timeouts);
hw_priv->bes_power.pm_unsupported = true;
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_ACTIVE);
/*
* Hold the MCU wake-flag bit permanently. Without this, every
* sdio_rx_work invocation hits bes2600_gpio_wakeup_mcu(SDIO_RX)
* when gpio_wakup_flags == 0, drives the GPIO high and msleeps
* 10 ms per RX. With ~50 RX/s of beacons + multicast that's
* ~50%% of the bes_sdio workqueue thread blocked in msleep,
* which directly caps RX throughput. Holding the MCU bit makes
* those calls bit-only bookkeeping (gpio_wakeup = (flags == 0)
* stays false, no GPIO toggle, no msleep). The bit is never
* cleared once pm_unsupported is set because
* bes2600_pwr_device_enter_lp_mode is unreachable under the
* early-return.
*/
if (hw_priv->sbus_ops->gpio_wake)
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
}
static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
{
int i = 0;
@@ -515,17 +476,6 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
char ip_str[20];
unsigned long status = 0;
/*
* Sticky early-return when we've previously concluded the firmware
* doesn't honor PSM. Each attempt would otherwise burn 5s on a
* doomed wait_for_completion_timeout and produce a noisy three-line
* cascade in dmesg every time power_down_work retries (every
* ~10s). The chip stays in active mode, which on this firmware is
* the de-facto state anyway.
*/
if (hw_priv->bes_power.pm_unsupported)
return -EOPNOTSUPP;
/* set interface low power configuration */
bes2600_for_each_vif(hw_priv, priv, i) {
#ifdef P2P_MULTIVIF
@@ -621,9 +571,6 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_UNKNOWN);
timeouts++;
if (++hw_priv->bes_power.pm_consecutive_timeouts
>= BES2600_PM_UNSUPPORTED_THRESHOLD)
bes2600_pwr_latch_pm_unsupported(hw_priv);
}
}
} else {
@@ -662,8 +609,7 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
* GPIO stays high and the bit clear here is purely
* bookkeeping (so the next gpio_wake doesn't no-op).
*/
if (!hw_priv->bes_power.pm_unsupported &&
hw_priv->sbus_ops->gpio_sleep)
if (hw_priv->sbus_ops->gpio_sleep)
hw_priv->sbus_ops->gpio_sleep(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
ret = -ETIMEDOUT;
@@ -675,61 +621,19 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
static void bes2600_pwr_device_exit_lp_mode(struct bes2600_common *hw_priv)
{
int ret = 0;
enum bes2600_chip_pm_state state;
struct wsm_operational_mode mode = {
.power_mode = wsm_power_mode_active,
.disableMoreFlagUsage = true,
};
/*
* Consult chip_pm_state set by bes2600_pwr_notify_ps_changed().
* If we last saw the firmware confirm ACTIVE, skip ONLY the
* gpio_wake + sbus_active wake handshake - the GPIO is already
* asserted high and the SDIO MCU subsystem is already running,
* so another sbus_active() round-trip just hits its 200x2ms
* timeout because the firmware has nothing to do.
*
* wsm_set_operational_mode() below is NOT part of the wake
* handshake; it is the operational-mode setter the firmware
* tracks per call. Skipping it leaves the chip's SDIO state
* machine without a fresh operational-mode update, which on
* PineTab2 wedges the bus (-EBUSY on next sdio_rx_work read)
* within a few seconds of probe completion. So it must run
* unconditionally.
*/
state = atomic_read(&hw_priv->bes_power.chip_pm_state);
if (state == BES2600_CHIP_PM_ACTIVE) {
bes_devel("device_exit_lp_mode: chip already ACTIVE, skipping wake handshake\n");
} else {
bes_devel("host lock lmac\n");
if (hw_priv->sbus_ops->gpio_wake)
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
bes_devel("host lock lmac\n");
if(hw_priv->sbus_ops->gpio_wake)
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv, GPIO_WAKE_FLAG_MCU);
if (hw_priv->sbus_ops->sbus_active) {
ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv,
SUBSYSTEM_MCU);
if (ret) {
/*
* MCU_WAKEUP_READY did not arrive within
* the SDIO handshake window. Record state
* as UNKNOWN so the next exit_lp_mode call
* also runs the full wake sequence (no
* skip), but still send operational_mode
* below to match pre-c6 behaviour - the
* WSM may succeed even if the SDIO active
* confirm was lost, and if it fails too,
* we just emit a second devel-level error.
* Repeated UNKNOWN is the signal for the
* LMAC active-monitor to eventually
* escalate to bus_reset (c5.2's
* mmc_hw_reset path).
*/
bes_err("%s, active mcu fail\n", __func__);
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_UNKNOWN);
}
}
if(hw_priv->sbus_ops->sbus_active) {
ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv, SUBSYSTEM_MCU);
if (ret)
bes_err("%s, active mcu fail\n", __func__);
}
ret = wsm_set_operational_mode(hw_priv, &mode, 0);
@@ -986,8 +890,6 @@ void bes2600_pwr_init(struct bes2600_common *hw_priv)
mutex_init(&hw_priv->bes_power.pwr_mutex);
atomic_set(&hw_priv->bes_power.dev_state, 0);
atomic_set(&hw_priv->bes_power.chip_pm_state, BES2600_CHIP_PM_UNKNOWN);
hw_priv->bes_power.pm_unsupported = false;
hw_priv->bes_power.pm_consecutive_timeouts = 0;
init_completion(&hw_priv->bes_power.pm_enter_cmpl);
sema_init(&hw_priv->bes_power.sync_lock, 1);
device_set_wakeup_capable(hw_priv->pdev, true);
@@ -1377,18 +1279,6 @@ void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode)
* indication can prime a future wait against a freshly
* reinit_completion()'ed state.
*/
/*
* Any PM indication, whatever its psmode, proves the firmware is
* actually emitting them. Reset the consecutive-timeout counter
* so a transient stall doesn't permanently disable PSM, and clear
* pm_unsupported if a previous run had latched it.
*/
hw_priv->bes_power.pm_consecutive_timeouts = 0;
if (hw_priv->bes_power.pm_unsupported) {
bes_warn("PM indication arrived after pm_unsupported was set; re-enabling PSM transitions\n");
hw_priv->bes_power.pm_unsupported = false;
}
if ((psmode & 0x01) != WSM_PSM_ACTIVE) {
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_LP);
-9
View File
@@ -121,15 +121,6 @@ struct bes2600_pwr_t
struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM];
atomic_t pm_set_in_process;
atomic_t chip_pm_state;
/*
* Sticky flag set after BES2600_PM_UNSUPPORTED_THRESHOLD
* consecutive enter_lp_mode timeouts with zero PM_INDICATIONs
* received from firmware. Indicates this chip's firmware does
* not honor host-driven PSM transitions; further attempts are
* skipped to avoid the 5s timeout cascade.
*/
bool pm_unsupported;
unsigned int pm_consecutive_timeouts;
};
#ifdef CONFIG_BES2600_WOWLAN
-4
View File
@@ -542,10 +542,6 @@ static int bes2600_status_show_priv(struct seq_file *seq, void *v)
priv->listening ? " (listening)" : "");
seq_printf(seq, "Assoc: %s\n",
bes2600_debug_join_status[priv->join_status]);
seq_printf(seq, "DecryptStormRecoveries: %u\n",
priv->decrypt_storm_recoveries);
seq_printf(seq, "ConnectionLossStormRecoveries: %u\n",
priv->connection_loss_storm_recoveries);
if (priv->rx_filter.promiscuous)
seq_puts(seq, "Filter: promisc\n");
else if (priv->rx_filter.fcs)
-2
View File
@@ -484,8 +484,6 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
spin_lock_init(&hw_priv->rtsvalue_lock);
INIT_WORK(&hw_priv->dynamic_opt_txrx_work, bes2600_dynamic_opt_txrx_work);
INIT_WORK(&hw_priv->tx_policy_upload_work, tx_policy_upload_work);
INIT_WORK(&hw_priv->connection_loss_storm_recover_work,
bes2600_connection_loss_storm_recover);
spin_lock_init(&hw_priv->event_queue_lock);
INIT_LIST_HEAD(&hw_priv->event_queue);
INIT_WORK(&hw_priv->event_handler, bes2600_event_handler);
-8
View File
@@ -75,14 +75,6 @@ struct sbus_ops {
void (*halt_device)(struct sbus_priv *self);
bool (*wakeup_source)(struct sbus_priv *self);
int (*reboot)(struct sbus_priv *self);
/*
* Force the host bus to re-detect and re-probe the chip. Called
* from the firmware-wedge recovery path when power_switch() has no
* effective chip-reset signal of its own (e.g. PineTab2, where the
* wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 node).
* Returns 0 on success or a negative errno.
*/
int (*bus_reset)(struct sbus_priv *self);
};
void bes2600_irq_handler(struct bes2600_common *priv);
+2 -82
View File
@@ -266,7 +266,6 @@ void bes2600_stop(struct ieee80211_hw *dev, bool suspend)
cancel_work_sync(&hw_priv->coex_work);
coex_stop(hw_priv);
#endif
cancel_work_sync(&hw_priv->connection_loss_storm_recover_work);
bes2600_wifi_stop(hw_priv);
@@ -449,7 +448,6 @@ void bes2600_remove_interface(struct ieee80211_hw *dev,
cancel_delayed_work_sync(&priv->join_timeout);
cancel_delayed_work_sync(&priv->set_cts_work);
cancel_delayed_work_sync(&priv->pending_offchanneltx_work);
cancel_work_sync(&priv->decrypt_storm_recover_work);
del_timer_sync(&priv->mcast_timeout);
/* TODO:COMBO: May be reset of these variables "delayed_link_loss and
@@ -1660,70 +1658,6 @@ report:
spin_unlock(&priv->bss_loss_lock);
}
/*
* Connection-loss-storm fast-recover (Trigger A).
*
* bes2600_connection_loss_work below is the driver's own decision-point
* to give up on a BSS (after bss-loss detection accumulates beyond
* tolerance) and tell mac80211 via ieee80211_connection_loss(). On the
* deployed pinetab2 stack a single ieee80211_connection_loss() event
* sometimes triggers a userspace reauth blackhole (assoc-comeback
* timeouts followed by AP unprotected-deauth-reason-6) that ends only
* via cross-channel/cross-SSID fallback and can take 80+ s. Receipts at
* https://git.reauktion.de/marfrit/besser, notes/phase4-2026-05-07.md.
*
* When N connection-loss decisions land within WINDOW on the same vif,
* skip the ieee80211_connection_loss() path and trigger a chip-level
* bus_reset (the c5.2-introduced bes2600_chrdev_do_bus_reset). The chip
* is removed and re-probed; userspace re-associates from a fresh state,
* dodging the assoc-comeback loop.
*
* Threshold (3 / 60 s) is chosen well above the steady-state per-vif
* connection-loss rate observed in the patch-A Phase-7 rep
* (0.86/h under sustained load), so a true storm is required.
*
* The recover work_struct lives on bes2600_common (hw_priv) so that
* scheduling it does not race with vif teardown after bus_reset frees
* the per-vif state.
*/
#define BES2600_CONNECTION_LOSS_STORM_THRESHOLD 3
#define BES2600_CONNECTION_LOSS_STORM_WINDOW_MS 60000
void bes2600_connection_loss_storm_recover(struct work_struct *work)
{
bes_warn("[bes2600] connection-loss-storm fast-recover: bus_reset\n");
bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops, bes2600_cdev.sbus_priv);
/*
* After bes2600_chrdev_do_bus_reset() returns, the SDIO core has
* scheduled a remove + rescan; per-vif state may already be gone.
* Do not dereference any per-vif pointer here.
*/
}
void bes2600_connection_loss_storm_init(struct bes2600_vif *priv)
{
priv->connection_loss_storm_window_start = 0;
priv->connection_loss_storm_count = 0;
priv->connection_loss_storm_recoveries = 0;
}
bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv)
{
unsigned long now = jiffies;
unsigned long window =
msecs_to_jiffies(BES2600_CONNECTION_LOSS_STORM_WINDOW_MS);
if (priv->connection_loss_storm_window_start == 0 ||
time_after(now, priv->connection_loss_storm_window_start + window)) {
priv->connection_loss_storm_window_start = now;
priv->connection_loss_storm_count = 1;
return false;
}
return ++priv->connection_loss_storm_count >=
BES2600_CONNECTION_LOSS_STORM_THRESHOLD;
}
void bes2600_connection_loss_work(struct work_struct *work)
{
struct bes2600_vif *priv =
@@ -1733,21 +1667,9 @@ void bes2600_connection_loss_work(struct work_struct *work)
bes_devel("[CQM] Reporting connection loss.\n");
bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_BSS_LOST);
if (bes2600_connection_loss_storm_account(priv)) {
bes_warn("[bes2600] connection-loss storm: %u in %u s, scheduling bus reset\n",
priv->connection_loss_storm_count,
BES2600_CONNECTION_LOSS_STORM_WINDOW_MS / 1000);
priv->connection_loss_storm_count = 0;
priv->connection_loss_storm_recoveries++;
schedule_work(&hw_priv->connection_loss_storm_recover_work);
/* bus_reset will tear the chip down; skip the mac80211 path. */
return;
}
if (bes2600_suspend_status_get(hw_priv))
if(bes2600_suspend_status_get(hw_priv)) {
bes2600_pending_unjoin_set(hw_priv, priv->if_id);
else
} else
ieee80211_connection_loss(priv->vif);
#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
// set disconnected in BSS_CHANGED_ASSOC
@@ -2697,8 +2619,6 @@ int bes2600_vif_setup(struct bes2600_vif *priv)
/* Setup per vif workitems and locks */
spin_lock_init(&priv->vif_lock);
bes2600_decrypt_storm_init(priv);
bes2600_connection_loss_storm_init(priv);
INIT_WORK(&priv->join_work, bes2600_join_work);
INIT_DELAYED_WORK(&priv->join_timeout, bes2600_join_timeout);
INIT_WORK(&priv->unjoin_work, bes2600_unjoin_work);
-74
View File
@@ -25,78 +25,6 @@
#define BES2600_INVALID_RATE_ID (0xFF)
/*
* Decrypt-storm fast-recover (Trigger B).
*
* When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a
* burst of received frames (typically because the host's PTK or GTK
* has fallen out of sync with the AP), the AP eventually concludes that
* the STA is not authenticated and emits an unprotected deauth-reason-6
* ("Class 2 frame received from non-authenticated station"). On the
* deployed pinetab2 + bes2600 stack this AP-initiated deauth has been
* observed to leave the link blackholed for up to 109 s before
* userspace finds a different SSID/channel to recover on. (Receipts at
* https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)
*
* Recovery here pre-empts the AP: when we see THRESHOLD decrypt
* failures within WINDOW, we ask mac80211 for a clean reassoc via
* ieee80211_connection_loss(), which causes immediate disassociation
* and lets userspace auto-reconnect with fresh keys.
*
* mac80211 contract: ieee80211_connection_loss() may be called
* regardless of IEEE80211_HW_CONNECTION_MONITOR; it causes immediate
* disassociation without driver-side recovery attempts. See
* include/net/mac80211.h for the canonical doc-comment.
*
* The threshold is set well above the steady-state per-vif
* decrypt-fail rate observed in measurement (~1/min even under
* sustained 1 MB/s load), so a true storm is required to trip it.
*/
#define BES2600_DECRYPT_STORM_THRESHOLD 5
#define BES2600_DECRYPT_STORM_WINDOW_MS 5000
static void bes2600_decrypt_storm_recover_work(struct work_struct *work)
{
struct bes2600_vif *priv = container_of(work, struct bes2600_vif,
decrypt_storm_recover_work);
if (!priv->vif)
return;
bes_warn("[bes2600] decrypt-storm fast-recover: forcing reassoc\n");
ieee80211_connection_loss(priv->vif);
priv->decrypt_storm_recoveries++;
}
void bes2600_decrypt_storm_init(struct bes2600_vif *priv)
{
INIT_WORK(&priv->decrypt_storm_recover_work,
bes2600_decrypt_storm_recover_work);
priv->decrypt_storm_window_start = 0;
priv->decrypt_storm_count = 0;
priv->decrypt_storm_recoveries = 0;
}
void bes2600_decrypt_storm_account(struct bes2600_vif *priv)
{
unsigned long now = jiffies;
unsigned long window = msecs_to_jiffies(BES2600_DECRYPT_STORM_WINDOW_MS);
if (priv->decrypt_storm_window_start == 0 ||
time_after(now, priv->decrypt_storm_window_start + window)) {
priv->decrypt_storm_window_start = now;
priv->decrypt_storm_count = 1;
return;
}
if (++priv->decrypt_storm_count >= BES2600_DECRYPT_STORM_THRESHOLD) {
priv->decrypt_storm_count = 0;
/* Skew the window so we don't re-fire on the same storm. */
priv->decrypt_storm_window_start = now + window;
schedule_work(&priv->decrypt_storm_recover_work);
}
}
#ifdef CONFIG_BES2600_TESTMODE
#include "bes_nl80211_testmode_msg.h"
#endif /* CONFIG_BES2600_TESTMODE */
@@ -1744,8 +1672,6 @@ void bes2600_rx_cb(struct bes2600_vif *priv,
goto drop;
} else {
bes_warn("[RX] Receive failure: %d.\n", arg->status);
if (arg->status == WSM_STATUS_DECRYPTFAILURE)
bes2600_decrypt_storm_account(priv);
goto drop;
}
}