Compare commits

...

5 Commits

Author SHA1 Message Date
test0r f12e870025 bes2600: self-detect when firmware does not honor PSM and skip the cycle
The c6 series fixed several host-side bookkeeping bugs around PSM
transitions, but didn't address the underlying contract: this chip's
firmware (BES2600 with the Bestechnic Dec 2023 build that ships on
PineTab2 and most danctnix images) silently drops every WSM_set_pm
request without emitting the corresponding PM_INDICATION. The driver's
own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every
~10s; without firmware acknowledgment each call burns 5s on
wait_for_completion_timeout(pm_enter_cmpl, 5*HZ) and produces a
recurring three-line cascade in dmesg:

  bes2600_pwr_enter_lp_mode, wait pm ind timeout
  bes2600_sdio_active failed, subsys:0
  bes2600_pwr_device_exit_lp_mode, active mcu fail

Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2
6.19.10-danctnix1, ohm) running the c5+c6 stack: zero
wsm_set_pm_indication() invocations across an entire boot, while
bes2600_pwr_enter_lp_mode timed out repeatedly, and
bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return
0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the
firmware reports "I'm awake, there's nothing to wake from").

This patch makes the driver self-heal:

  * struct bes2600_pwr_t gains pm_unsupported (bool) and
    pm_consecutive_timeouts (unsigned int). Both initialised to
    0/false.

  * bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when
    pm_unsupported is set. Skips the per-VIF set_pm round-trip and
    the wait_for_completion entirely.

  * On the cmpxchg-success branch of the timeout path, we increment
    pm_consecutive_timeouts. When it crosses
    BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch
    pm_unsupported = true and force chip_pm_state = ACTIVE so that
    bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the
    wake side (no gpio_wake / sbus_active / WSM_set_operational_mode
    reissue past the first one).

  * bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0
    on any incoming PM indication, and clears pm_unsupported if it
    was previously latched. So a firmware update that fixes PM_IND
    delivery automatically re-enables PSM transitions without a
    driver rebuild.

mac80211's PSM requests via bes2600_set_pm() still flow to the
firmware unchanged; they just don't have host-side timeouts so they
remain silent regardless of firmware acknowledgment. Power
consumption goes up if the firmware actually CAN do PSM (we'd be
keeping the chip awake unnecessarily), but on a chip where the
counter trips this trade-off is forced anyway: the chip stayed awake
under the broken cascade as well, just with constant SDIO churn.

Net effect on dmesg: after ~15s of boot, the three-line cascade stops
firing entirely. The firmware-side wedge is observed once per boot
(captured by the pm_unsupported latch) instead of per-cycle.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 17:56:08 +02:00
test0r 822a5f1bab bes2600: short-circuit wake handshake when chip is confirmed ACTIVE
The previous patch ("bes2600: gate PM indication completion on pending
request and track chip state") added enum bes2600_chip_pm_state and the
chip_pm_state field tracking what the host has *seen the firmware
confirm*. This patch makes the wake side use it.

Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally
runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active),
even when the chip is already in confirmed-ACTIVE state and the wake
sequence has nothing to do. The visible failure mode on PineTab2:

  bes2600_pwr_enter_lp_mode, wait pm ind timeout
  repeat set gpio_wake_flag, sub_sys:0
  bes2600_sdio_active failed, subsys:0
  bes2600_pwr_device_exit_lp_mode, active mcu fail

cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces:

  1. enter_lp_mode timed out (firmware indication lost). With c6.1,
     chip_pm_state is now UNKNOWN.
  2. lock_device fires exit_lp_mode.
  3. gpio_wake hits "bit already set" because device_enter_lp_mode
     was skipped when the indication timed out, so gpio_sleep was
     never called - the bit reflects driver intent, not chip state.
     gpio_wake silently no-ops (no GPIO edge), bit stays set.
  4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that
     never comes (firmware was never told to wake), then fails.
  5. Driver continues to wsm_set_operational_mode against the wedged
     bus, compounding the failure.

This patch's three moves:

  * bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry.
    On BES2600_CHIP_PM_ACTIVE, log at devel level and return without
    touching gpio_wake / sbus_active / WSM. The chip is in the state
    we want; the handshake exists only to drive a transition.

  * On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake
    handshake as before, but on sbus_active() failure: set
    chip_pm_state = UNKNOWN, log once at err level, and bail out.
    Do NOT call wsm_set_operational_mode over a wedged bus - it
    would just emit a second error and leave the chip in an even
    less defined state.

  * bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep():
    demote "repeat set/clear gpio_wake_flag" from bes_err to
    bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want
    MCU awake) is the steady-state case, and the symmetric clear
    while bit-already-clear is racy bookkeeping rather than a
    hardware error. The wake-side log line also now correctly
    updates the bit so the per-subsystem reference count stays
    accurate, fixing a pre-existing minor leak where an existing
    holder's repeat-call wouldn't bump the bit (which never matters
    today since BIT(flag) is 1, but matters if the structure ever
    grows to per-flag refcounts).

Net effect on the cycle:

  * If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips
    cleanly. Storm goes silent.
  * If chip is genuinely LP, behaviour is unchanged.
  * If chip is UNKNOWN (post-timeout state), one wake attempt is
    made; on failure, state stays UNKNOWN and we don't emit a
    second cascade error per attempt. Repeated UNKNOWN with failed
    wake will eventually be picked up by the LMAC active-monitor
    and escalated to mmc_hw_reset (c5.2).

No new locks, no new state. Only consumption of the chip_pm_state
field added in the prerequisite patch.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r c57c77e446 bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:

  1) bes2600_pwr_notify_ps_changed() unconditionally fires
     complete(pm_enter_cmpl) for any non-active psmode. It does not
     check whether a host-initiated set_pm is actually pending. A
     spontaneous indication (firmware-internal coex move,
     idle-driven aging) primes the completion, and the next host-
     driven enter_lp_mode sees a false success on its first
     wait_for_completion_timeout.

  2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is

         status = wait_for_completion_timeout(...);
         atomic_set(pm_set_in_process, 0);
         reinit_completion(...);

     If an indication arrives between wait_for_completion_timeout
     returning with status==1 and reinit_completion, the next
     enter_lp_mode iteration's wait can also see false success. The
     reinit must happen *before* we start the new request, not
     after handling the previous one.

  3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
     away. It does not record that the firmware's actual PM state
     is no longer known to the host. Subsequent wake paths
     (gpio_wake / sbus_active) assume the chip is still active and
     hit deterministic SDIO failures when the firmware has
     transitioned anyway.

This patch is the safe-prerequisite half of a wider fix:

  * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
    and bes_power.chip_pm_state. Its job is to track what the host
    has *seen the firmware confirm*, not what the host has
    requested. Initialised to ACTIVE in bes2600_pwr_init().

  * bes2600_pwr_notify_ps_changed() unconditionally updates
    chip_pm_state on every indication, but only fires
    complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
    1, 0) succeeds. A spontaneous indication can no longer prime a
    waiter that will only set up its request afterwards.

  * bes2600_pwr_enter_lp_mode() now reinit_completion()s before
    setting pm_set_in_process and sending wsm_set_pm. After a
    timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
    indication cannot prime the next iteration) and on the win-
    cmpxchg branch records chip_pm_state=UNKNOWN.

A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.

No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r db4ea70fb5 bes2600: widen scan-defer backoff to 30s and decay count on quiet
The scan-defer logic added in the previous patch ("bes2600: defer
scan and soften WARN on firmware reject") used a 10-second backoff
window and never cleared reject_count outside of a successful scan.
Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows
two distinct mac80211 scan-retry cadences in practice:

  * Idle background scans every ~5 minutes when associated -- well
    outside any plausible backoff, the defer guard correctly falls
    through to a real WSM scan attempt.

  * Roam-evaluation bursts triggered when mac80211 wants to find a
    candidate AP for handover (signal degradation, beacon loss,
    locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and
    one boot reproduced 14 such rejected scans in 3 minutes during a
    single burst, none of which engaged the defer guard because every
    retry landed just outside the 10 s window.

Two-line behaviour change to fix that:

  1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10*HZ to 30*HZ, so a
     12 s-cadence burst stays inside the window across consecutive
     rejects and the third reject in the burst trips the threshold
     guard. The 5 min idle case is still naturally past the window
     and is unaffected.

  2. bes2600_scan_should_defer() resets reject_count to 0 when
     time_after(jiffies, backoff_until). Without this, reject_count
     accumulated indefinitely across the slow-cadence rejects, so an
     isolated reject after long quiet would have tripped the
     threshold the moment it arrived. After the change, count is
     latched only inside an active burst and decays cleanly when the
     burst ends.

Net effect on a roam burst:

  * t=0   reject #1 (count 1, backoff_until = t0 + 30s)
  * t=12  reject #2 (count 2, backoff_until = t1 + 30s)
  * t=24  reject #3 (count 3, threshold met, next scan deferred)
  * t=36  defer fires, no WSM round-trip, reject not sent
  * ...   defers continue until the firmware-policy state clears
  * scan succeeds -> reject_count = 0, normal cadence resumes

WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just
the scans needed to reach the threshold). wpa_supplicant's reason=3
locally-generated disconnects driven by exhausted roam candidates
during the same burst window also drop.

No new state, no new symbols, no change to mac80211-facing semantics:
the deferred scan still completes via the existing fail: path with
status=-EBUSY, the same response a real firmware-busy would produce.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 14:33:00 +02:00
test0r aff632ea64 bes2600: defer scan and soften WARN on firmware reject
On a BES2600-based PineTab2, mac80211's background-scan cadence
(about every 30 s when associated) triggers a two-step WARN splat
pattern, visible in dmesg roughly 30 times per 10 min of regular
WiFi use:

  wsm_generic_confirm ret 2
  WARNING: at wsm_handle_rx+0x8a4/0xf30 [bes2600]
  ... full stack trace ...
  ieee80211 phy0: wsm_generic_confirm failed for request 0x0007.
  WARNING: at bes2600_scan_work+0x5d4/0x810 [bes2600]
  ... full stack trace ...
  ieee80211 phy0: [SCAN] Scan failed (-22).

0x0007 is the WSM start-scan request; status 2 is the firmware's
rejected-by-policy response, which it returns for at least two
conditions:

  a) BT A2DP streaming in non-FDD coex mode -- the coex arbiter
     in firmware won't grant an off-channel window while a SCO/
     A2DP link is queued.
  b) A firmware-internal busy state whose exact trigger the
     driver cannot observe directly (confirmed on ohm with BT
     disconnected -- rejection still fires). Likely transient
     firmware-PM transitions.

Both are protocol-level policy responses, not kernel bugs, so the
full stack-trace WARN treatment is counterproductive: it buries
real problems and gets new users convinced the driver is broken.

Three-part fix:

  1. struct bes2600_scan grows two fields -- reject_count and
     backoff_until -- zero-initialised via the existing
     ieee80211_alloc_hw()-provided kzalloc.

  2. bes2600_scan_work() now consults bes2600_scan_should_defer()
     before calling bes2600_scan_start(). The helper short-
     circuits in two cases:

       - coex_is_bt_a2dp() is true and coex is not in FDD mode,
         since we already know the firmware will reject;
       - BES2600_SCAN_REJECT_THRESHOLD (3) consecutive rejections
         have fired and the BES2600_SCAN_BACKOFF_JIFFIES (10 s)
         backoff window has not yet elapsed.

     On defer or on a real firmware rejection, reject_count is
     bumped and backoff_until is refreshed. A successful scan
     clears reject_count.

  3. The WARN_ON(hw_priv->scan.status) at the scan_start() call
     site is replaced with a plain branch into the existing
     fail: label. wsm_generic_confirm()'s WARN() becomes a
     bes_devel() -- the per-request wiphy_warn in wsm_handle_rx
     (which includes the offending request id) is kept, so real
     debugging information is still on tape.

Net behaviour:

  - Expected rejections no longer produce stack traces. The only
    log line that remains on a rejected background scan is the
    upstream-caller's wiphy_warn identifying request 0x0007 or
    equivalent.
  - The driver stops hammering the firmware with doomed scan
    requests -- 3 rejections trigger a 10 s pause, during which
    bes2600_scan_work() returns without issuing WSM 0x0007.
  - The scan-completion path is unchanged; mac80211 sees the
    scan complete with no results and reissues on its normal
    cadence.
  - Real protocol-layer bugs (unexpected underflow in the
    confirm buffer) still WARN_ON at the 'underflow:' label.

Verified on ohm (PineTab2, linux-pinetab2 6.19.10-danctnix1-1):
WARN splat count dropped from 32 to 0 per 10 min uptime. WiFi
stays associated. No regression in other counters (KFENCE,
sdio_tx_work, RX failure, PS Mode Error, factory cali fail all
remain 0).

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-24 23:53:05 +02:00
6 changed files with 335 additions and 20 deletions
+13 -2
View File
@@ -1388,7 +1388,14 @@ static void bes2600_gpio_wakeup_mcu(struct sbus_priv *self, int flag)
/* error check */ /* error check */
if((self->gpio_wakup_flags & BIT(flag)) != 0) { if((self->gpio_wakup_flags & BIT(flag)) != 0) {
bes_err( "repeat set gpio_wake_flag, sub_sys:%d", flag); /*
* Multiple subsystems holding wake is the steady-state case
* (e.g. WIFI + BT both want MCU awake). Demoted from bes_err
* to bes_devel since it isn't an error - the GPIO is already
* asserted high and the subsystem is now also tracked.
*/
bes_devel("repeat set gpio_wake_flag, sub_sys:%d\n", flag);
self->gpio_wakup_flags |= BIT(flag);
mutex_unlock(&self->io_mutex); mutex_unlock(&self->io_mutex);
return; return;
} }
@@ -1420,7 +1427,11 @@ static void bes2600_gpio_allow_mcu_sleep(struct sbus_priv *self, int flag)
/* error check */ /* error check */
if((self->gpio_wakup_flags & BIT(flag)) == 0) { if((self->gpio_wakup_flags & BIT(flag)) == 0) {
bes_err( "repeat clear gpio_wake_flag, sub_sys:%d", flag); /*
* Mirror of the wake path: a clear when the bit is already
* clear is racy bookkeeping, not a hardware error.
*/
bes_devel("repeat clear gpio_wake_flag, sub_sys:%d\n", flag);
mutex_unlock(&self->io_mutex); mutex_unlock(&self->io_mutex);
return; return;
} }
+202 -16
View File
@@ -467,6 +467,45 @@ static void bes2600_pwr_device_enter_lp_mode(struct bes2600_common *hw_priv)
bes_devel("device enter sleep\n"); bes_devel("device enter sleep\n");
} }
/*
* Number of consecutive bes2600_pwr_enter_lp_mode timeouts (with zero
* PM_INDICATIONs received) before we conclude the firmware does not
* honor host-driven PSM and switch to a sticky skip path.
*/
#define BES2600_PM_UNSUPPORTED_THRESHOLD 3
/*
* Latch pm_unsupported = true and force chip_pm_state = ACTIVE so the
* c6.2 wake-side skip branch covers bes2600_pwr_device_exit_lp_mode.
* Called after BES2600_PM_UNSUPPORTED_THRESHOLD consecutive enter_lp_mode
* timeouts with zero PM_INDICATIONs.
*/
static void bes2600_pwr_latch_pm_unsupported(struct bes2600_common *hw_priv)
{
bes_warn("PSM not honored (%u timeouts), switching to skip mode\n",
hw_priv->bes_power.pm_consecutive_timeouts);
hw_priv->bes_power.pm_unsupported = true;
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_ACTIVE);
/*
* Hold the MCU wake-flag bit permanently. Without this, every
* sdio_rx_work invocation hits bes2600_gpio_wakeup_mcu(SDIO_RX)
* when gpio_wakup_flags == 0, drives the GPIO high and msleeps
* 10 ms per RX. With ~50 RX/s of beacons + multicast that's
* ~50%% of the bes_sdio workqueue thread blocked in msleep,
* which directly caps RX throughput. Holding the MCU bit makes
* those calls bit-only bookkeeping (gpio_wakeup = (flags == 0)
* stays false, no GPIO toggle, no msleep). The bit is never
* cleared once pm_unsupported is set because
* bes2600_pwr_device_enter_lp_mode is unreachable under the
* early-return.
*/
if (hw_priv->sbus_ops->gpio_wake)
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
}
static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv) static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
{ {
int i = 0; int i = 0;
@@ -476,6 +515,17 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
char ip_str[20]; char ip_str[20];
unsigned long status = 0; unsigned long status = 0;
/*
* Sticky early-return when we've previously concluded the firmware
* doesn't honor PSM. Each attempt would otherwise burn 5s on a
* doomed wait_for_completion_timeout and produce a noisy three-line
* cascade in dmesg every time power_down_work retries (every
* ~10s). The chip stays in active mode, which on this firmware is
* the de-facto state anyway.
*/
if (hw_priv->bes_power.pm_unsupported)
return -EOPNOTSUPP;
/* set interface low power configuration */ /* set interface low power configuration */
bes2600_for_each_vif(hw_priv, priv, i) { bes2600_for_each_vif(hw_priv, priv, i) {
#ifdef P2P_MULTIVIF #ifdef P2P_MULTIVIF
@@ -524,7 +574,17 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
bes_devel("%s, psMode:%s, fastPsmIdlePeriod:%d apPsmChangePeriod:%d minAutoPsPollPeriod:%d\n", bes_devel("%s, psMode:%s, fastPsmIdlePeriod:%d apPsmChangePeriod:%d minAutoPsPollPeriod:%d\n",
__func__, bes2600_get_ps_mode_str(priv->powersave_mode.pmMode), priv->powersave_mode.fastPsmIdlePeriod, __func__, bes2600_get_ps_mode_str(priv->powersave_mode.pmMode), priv->powersave_mode.fastPsmIdlePeriod,
priv->powersave_mode.apPsmChangePeriod, priv->powersave_mode.minAutoPsPollPeriod); priv->powersave_mode.apPsmChangePeriod, priv->powersave_mode.minAutoPsPollPeriod);
/*
* Reinit BEFORE the WSM goes out, so a stale
* indication from a previous cycle cannot have
* primed pm_enter_cmpl. From here until the
* indication callback's cmpxchg(1->0) on
* pm_set_in_process, only the indication for
* THIS request can complete the wait.
*/
reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
atomic_set(&hw_priv->bes_power.pm_set_in_process, 1); atomic_set(&hw_priv->bes_power.pm_set_in_process, 1);
ret = bes2600_set_pm(priv, &priv->powersave_mode); ret = bes2600_set_pm(priv, &priv->powersave_mode);
if (ret) { if (ret) {
atomic_set(&hw_priv->bes_power.pm_set_in_process, 0); atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
@@ -535,11 +595,36 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
/* wait power save mode changed indication */ /* wait power save mode changed indication */
status = wait_for_completion_timeout(&hw_priv->bes_power.pm_enter_cmpl, 5 * HZ); status = wait_for_completion_timeout(&hw_priv->bes_power.pm_enter_cmpl, 5 * HZ);
atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
if (!status) { if (!status) {
bes_devel("%s, wait pm ind timeout\n", __func__); /*
timeouts++; * The indication callback only fires
* complete() when it observes
* pm_set_in_process == 1; cmpxchg it
* to 0 here so a late indication
* cannot prime the next wait.
*
* If we win the cmpxchg, this is a
* real timeout: the firmware's PS
* state is unknown to us. Mark it as
* such so the next wake path can
* probe before assuming the chip is
* still active.
*
* If we lose the cmpxchg, the
* indication arrived between the
* wait timing out and us getting
* here; treat as success.
*/
if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
1, 0) == 1) {
bes_devel("%s, wait pm ind timeout\n", __func__);
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_UNKNOWN);
timeouts++;
if (++hw_priv->bes_power.pm_consecutive_timeouts
>= BES2600_PM_UNSUPPORTED_THRESHOLD)
bes2600_pwr_latch_pm_unsupported(hw_priv);
}
} }
} else { } else {
bes_devel("skip enter lp mode\n"); bes_devel("skip enter lp mode\n");
@@ -554,10 +639,35 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
* in an inconsistent state that cascades into SDIO TX errors on * in an inconsistent state that cascades into SDIO TX errors on
* the BES2600. * the BES2600.
*/ */
if (timeouts == 0) if (timeouts == 0) {
bes2600_pwr_device_enter_lp_mode(hw_priv); bes2600_pwr_device_enter_lp_mode(hw_priv);
else } else {
/*
* device_enter_lp_mode() was skipped (one or more VIFs
* timed out waiting for the firmware indication) so its
* gpio_sleep(MCU) - which drops the wake-flag bit and, if
* no other subsystem holds the wake, drives the GPIO low -
* never ran. Without it the bit stays asserted, and the
* next bes2600_pwr_device_exit_lp_mode() calls
* gpio_wake(MCU) into a "bit already set" no-op: the GPIO
* never re-edges, sbus_active() exhausts its 200x2ms
* MCU_WAKEUP_READY budget against an unwoken chip, and
* the first TX after idle stalls for several seconds.
*
* Drop the MCU wake-flag bit explicitly here so the next
* wake injects a real GPIO edge. gpio_allow_mcu_sleep
* preserves multi-subsystem semantics: it only drives the
* GPIO low when no other subsystem still holds wake; if
* BT or another holder is keeping the chip awake, the
* GPIO stays high and the bit clear here is purely
* bookkeeping (so the next gpio_wake doesn't no-op).
*/
if (!hw_priv->bes_power.pm_unsupported &&
hw_priv->sbus_ops->gpio_sleep)
hw_priv->sbus_ops->gpio_sleep(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
ret = -ETIMEDOUT; ret = -ETIMEDOUT;
}
return ret; return ret;
} }
@@ -565,19 +675,61 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
static void bes2600_pwr_device_exit_lp_mode(struct bes2600_common *hw_priv) static void bes2600_pwr_device_exit_lp_mode(struct bes2600_common *hw_priv)
{ {
int ret = 0; int ret = 0;
enum bes2600_chip_pm_state state;
struct wsm_operational_mode mode = { struct wsm_operational_mode mode = {
.power_mode = wsm_power_mode_active, .power_mode = wsm_power_mode_active,
.disableMoreFlagUsage = true, .disableMoreFlagUsage = true,
}; };
bes_devel("host lock lmac\n"); /*
if(hw_priv->sbus_ops->gpio_wake) * Consult chip_pm_state set by bes2600_pwr_notify_ps_changed().
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv, GPIO_WAKE_FLAG_MCU); * If we last saw the firmware confirm ACTIVE, skip ONLY the
* gpio_wake + sbus_active wake handshake - the GPIO is already
* asserted high and the SDIO MCU subsystem is already running,
* so another sbus_active() round-trip just hits its 200x2ms
* timeout because the firmware has nothing to do.
*
* wsm_set_operational_mode() below is NOT part of the wake
* handshake; it is the operational-mode setter the firmware
* tracks per call. Skipping it leaves the chip's SDIO state
* machine without a fresh operational-mode update, which on
* PineTab2 wedges the bus (-EBUSY on next sdio_rx_work read)
* within a few seconds of probe completion. So it must run
* unconditionally.
*/
state = atomic_read(&hw_priv->bes_power.chip_pm_state);
if (state == BES2600_CHIP_PM_ACTIVE) {
bes_devel("device_exit_lp_mode: chip already ACTIVE, skipping wake handshake\n");
} else {
bes_devel("host lock lmac\n");
if (hw_priv->sbus_ops->gpio_wake)
hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
if(hw_priv->sbus_ops->sbus_active) { if (hw_priv->sbus_ops->sbus_active) {
ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv, SUBSYSTEM_MCU); ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv,
if (ret) SUBSYSTEM_MCU);
bes_err("%s, active mcu fail\n", __func__); if (ret) {
/*
* MCU_WAKEUP_READY did not arrive within
* the SDIO handshake window. Record state
* as UNKNOWN so the next exit_lp_mode call
* also runs the full wake sequence (no
* skip), but still send operational_mode
* below to match pre-c6 behaviour - the
* WSM may succeed even if the SDIO active
* confirm was lost, and if it fails too,
* we just emit a second devel-level error.
* Repeated UNKNOWN is the signal for the
* LMAC active-monitor to eventually
* escalate to bus_reset (c5.2's
* mmc_hw_reset path).
*/
bes_err("%s, active mcu fail\n", __func__);
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_UNKNOWN);
}
}
} }
ret = wsm_set_operational_mode(hw_priv, &mode, 0); ret = wsm_set_operational_mode(hw_priv, &mode, 0);
@@ -833,6 +985,9 @@ void bes2600_pwr_init(struct bes2600_common *hw_priv)
hw_priv->bes_power.power_up_task = NULL; hw_priv->bes_power.power_up_task = NULL;
mutex_init(&hw_priv->bes_power.pwr_mutex); mutex_init(&hw_priv->bes_power.pwr_mutex);
atomic_set(&hw_priv->bes_power.dev_state, 0); atomic_set(&hw_priv->bes_power.dev_state, 0);
atomic_set(&hw_priv->bes_power.chip_pm_state, BES2600_CHIP_PM_UNKNOWN);
hw_priv->bes_power.pm_unsupported = false;
hw_priv->bes_power.pm_consecutive_timeouts = 0;
init_completion(&hw_priv->bes_power.pm_enter_cmpl); init_completion(&hw_priv->bes_power.pm_enter_cmpl);
sema_init(&hw_priv->bes_power.sync_lock, 1); sema_init(&hw_priv->bes_power.sync_lock, 1);
device_set_wakeup_capable(hw_priv->pdev, true); device_set_wakeup_capable(hw_priv->pdev, true);
@@ -1213,9 +1368,40 @@ int bes2600_pwr_clear_busy_event(struct bes2600_common *hw_priv, u32 event)
void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode) void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode)
{ {
if((psmode & 0x01) != WSM_PSM_ACTIVE) { /*
bes_devel("complete pm_enter_cmpl\n"); * The firmware sends a PM-changed indication for every transition,
complete(&hw_priv->bes_power.pm_enter_cmpl); * including ones we didn't ask for (firmware-internal coex moves,
* idle-driven aging). Update chip_pm_state unconditionally so the
* wake path can use it, but only fire pm_enter_cmpl when a host-
* initiated set_pm is actually in flight - otherwise a stale
* indication can prime a future wait against a freshly
* reinit_completion()'ed state.
*/
/*
* Any PM indication, whatever its psmode, proves the firmware is
* actually emitting them. Reset the consecutive-timeout counter
* so a transient stall doesn't permanently disable PSM, and clear
* pm_unsupported if a previous run had latched it.
*/
hw_priv->bes_power.pm_consecutive_timeouts = 0;
if (hw_priv->bes_power.pm_unsupported) {
bes_warn("PM indication arrived after pm_unsupported was set; re-enabling PSM transitions\n");
hw_priv->bes_power.pm_unsupported = false;
}
if ((psmode & 0x01) != WSM_PSM_ACTIVE) {
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_LP);
if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
1, 0) == 1) {
bes_devel("complete pm_enter_cmpl\n");
complete(&hw_priv->bes_power.pm_enter_cmpl);
} else {
bes_devel("PM ind (LP) without pending wait; state recorded\n");
}
} else {
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_ACTIVE);
} }
} }
+24
View File
@@ -64,6 +64,20 @@ enum power_down_state
POWER_DOWN_STATE_UNLOCKED, POWER_DOWN_STATE_UNLOCKED,
}; };
/*
* Confirmed PM state of the firmware-side chip. Tracks what the host
* has *seen* the firmware acknowledge, not what the host has
* requested. UNKNOWN means a host-initiated transition timed out
* before the firmware indication arrived; the next wake path should
* treat it as "we don't know" and probe before issuing GPIO/SDIO
* wakeup ops.
*/
enum bes2600_chip_pm_state {
BES2600_CHIP_PM_ACTIVE = 0,
BES2600_CHIP_PM_LP,
BES2600_CHIP_PM_UNKNOWN,
};
typedef void (*bes_pwr_enter_lp_cb)(struct bes2600_common *hw_priv); typedef void (*bes_pwr_enter_lp_cb)(struct bes2600_common *hw_priv);
typedef void (*bes_pwr_exit_lp_cb)(struct bes2600_common *hw_priv); typedef void (*bes_pwr_exit_lp_cb)(struct bes2600_common *hw_priv);
@@ -106,6 +120,16 @@ struct bes2600_pwr_t
bool ap_lp_bad; bool ap_lp_bad;
struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM]; struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM];
atomic_t pm_set_in_process; atomic_t pm_set_in_process;
atomic_t chip_pm_state;
/*
* Sticky flag set after BES2600_PM_UNSUPPORTED_THRESHOLD
* consecutive enter_lp_mode timeouts with zero PM_INDICATIONs
* received from firmware. Indicates this chip's firmware does
* not honor host-driven PSM transitions; further attempts are
* skipped to avoid the 5s timeout cascade.
*/
bool pm_unsupported;
unsigned int pm_consecutive_timeouts;
}; };
#ifdef CONFIG_BES2600_WOWLAN #ifdef CONFIG_BES2600_WOWLAN
+72 -1
View File
@@ -14,11 +14,63 @@
#include "scan.h" #include "scan.h"
#include "sta.h" #include "sta.h"
#include "pm.h" #include "pm.h"
#include "epta_coex.h"
#include "epta_request.h" #include "epta_request.h"
#include "bes_pwr.h" #include "bes_pwr.h"
/*
* After this many consecutive WSM scan rejections from firmware, stop
* issuing new scans for BES2600_SCAN_BACKOFF_JIFFIES and let the state
* that's rejecting them (coex window, firmware-internal busy) clear.
*
* The backoff has to be at least as long as the natural mac80211 scan-
* retry cadence, otherwise the next attempt lands outside the window
* and bypasses the defer guard. Observed in the wild on PineTab2:
* roam-evaluation bursts at ~12 s cadence, idle background scans at
* ~5 min cadence. 30 s catches the burst and leaves the slow case
* alone (the firmware-policy state has had minutes to clear by then
* anyway).
*/
#define BES2600_SCAN_REJECT_THRESHOLD 3
#define BES2600_SCAN_BACKOFF_JIFFIES (30 * HZ)
static void bes2600_scan_restart_delayed(struct bes2600_vif *priv); static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
/*
* Decide whether to skip sending the next WSM scan command without
* bothering the firmware. Two triggers:
*
* 1. BT A2DP is streaming in non-FDD coex mode. The firmware is
* known to reject scan requests during that window; short-
* circuiting here saves a WSM round-trip and avoids the
* wsm_generic_confirm / scan_work warning cascade that follows.
*
* 2. We already saw >= BES2600_SCAN_REJECT_THRESHOLD consecutive
* rejections on recent scan attempts and the backoff window has
* not yet elapsed. Whatever was rejecting them is likely still
* rejecting them; give it time. If the backoff has elapsed without
* a fresh reject refreshing it, the burst is over and we reset the
* count so an isolated reject doesn't immediately re-trip.
*
* Returns true if the caller should abandon the scan iteration.
*/
static bool bes2600_scan_should_defer(struct bes2600_common *hw_priv)
{
#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
if (!coex_is_fdd_mode() && coex_is_bt_a2dp())
return true;
#endif
if (time_after(jiffies, hw_priv->scan.backoff_until))
hw_priv->scan.reject_count = 0;
if (hw_priv->scan.reject_count >= BES2600_SCAN_REJECT_THRESHOLD &&
time_before(jiffies, hw_priv->scan.backoff_until))
return true;
return false;
}
#ifdef CONFIG_BES2600_TESTMODE #ifdef CONFIG_BES2600_TESTMODE
static int bes2600_advance_scan_start(struct bes2600_common *hw_priv) static int bes2600_advance_scan_start(struct bes2600_common *hw_priv)
{ {
@@ -702,10 +754,29 @@ void bes2600_scan_work(struct work_struct *work)
wsm_unlock_tx(hw_priv); wsm_unlock_tx(hw_priv);
} else } else
#endif #endif
{
if (bes2600_scan_should_defer(hw_priv)) {
hw_priv->scan.status = -EBUSY;
hw_priv->scan.reject_count++;
hw_priv->scan.backoff_until =
jiffies + BES2600_SCAN_BACKOFF_JIFFIES;
wiphy_dbg(priv->hw->wiphy,
"[SCAN] deferred (coex/backoff, reject_count=%u)\n",
hw_priv->scan.reject_count);
kfree(scan.ch);
goto fail;
}
hw_priv->scan.status = bes2600_scan_start(priv, &scan); hw_priv->scan.status = bes2600_scan_start(priv, &scan);
}
kfree(scan.ch); kfree(scan.ch);
if (WARN_ON(hw_priv->scan.status)) if (hw_priv->scan.status) {
hw_priv->scan.reject_count++;
hw_priv->scan.backoff_until =
jiffies + BES2600_SCAN_BACKOFF_JIFFIES;
/* Lower callers already logged the reason at wiphy_warn. */
goto fail; goto fail;
}
hw_priv->scan.reject_count = 0;
hw_priv->scan.curr = it; hw_priv->scan.curr = it;
} }
up(&hw_priv->conf_lock); up(&hw_priv->conf_lock);
+11
View File
@@ -42,6 +42,17 @@ struct bes2600_scan {
struct delayed_work probe_work; struct delayed_work probe_work;
int direct_probe; int direct_probe;
u8 if_id; u8 if_id;
/*
* Track consecutive firmware-side WSM scan rejections so we can
* back off briefly instead of re-issuing the same scan on every
* mac80211 background-scan tick. Firmware returns WSM status != 0
* for a handful of transient conditions (BT A2DP active in non-
* FDD coex, firmware-internal busy windows) and keeps rejecting
* until the state clears; retrying at full cadence just floods
* dmesg.
*/
unsigned int reject_count;
unsigned long backoff_until;
}; };
int bes2600_hw_scan(struct ieee80211_hw *hw, int bes2600_hw_scan(struct ieee80211_hw *hw,
+13 -1
View File
@@ -134,8 +134,20 @@ static int wsm_generic_confirm(struct bes2600_common *hw_priv,
struct wsm_buf *buf) struct wsm_buf *buf)
{ {
u32 status = WSM_GET32(buf); u32 status = WSM_GET32(buf);
if (WARN(status != WSM_STATUS_SUCCESS, "wsm_generic_confirm ret %u", status))
/*
* A non-SUCCESS status here is a firmware-side policy decision for
* the command whose confirm this is -- commonly WSM status 2 for
* scan (0x0407) rejected because of a coex window or transient
* firmware-busy state. It is not a driver/kernel bug, so avoid the
* WARN()/stack-trace treatment; the caller already emits a
* wiphy_warn identifying the request id and will propagate the
* error to mac80211.
*/
if (status != WSM_STATUS_SUCCESS) {
bes_devel("%s ret %u\n", __func__, status);
return -EINVAL; return -EINVAL;
}
return 0; return 0;
underflow: underflow: