Compare commits

..

1 Commits

Author SHA1 Message Date
test0r 4604958033 bes2600: recover wedged firmware via mmc_hw_reset on link break
When the LMAC active monitor detects 'link break between lmac and host'
(the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor),
bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the
device down and prepare for a fresh probe. On the wifi_force_close_work
side this calls bes2600_chrdev_do_system_close() which dispatches
sbus_ops->power_switch(0).

On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a
no-op:

  * bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message,
    clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is
    the literal one-line stub bes_warn("...this function does
    nothing\n").
  * bes2600_sdio_on() (the eventual power_switch(1) counterpart)
    toggles pdata->powerup, which is NULL on PineTab2 because the
    wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device
    tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi:
    'The reset pin is claimed by sdio_mmcseq, It is better to move it
    to U-Boot so the OS can use it.').

Net result: the chip is never reset. The function drivers are not
removed (the SDIO core has no signal that the card is gone), the
firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO
function in a half-torn-down state. modprobe bes2600 then fails with
'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM)
on both functions (:1 wifi, :2 BT-companion) until a full system
reboot.

Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150
minutes of background-scan rejects (wsm_generic_confirm 0x0007,
[SCAN] Scan failed (-22)) accumulating until the LMAC stopped
acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible
under sustained scan pressure.

Add a sbus operation bus_reset() that the recovery path can call when
power_switch() has no effective chip-reset signal of its own. Provide
an SDIO implementation that calls mmc_hw_reset(self->func->card),
which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN
and func 2 for the BT-companion path) takes the remove-and-rescan
path: mmc_sdio_hw_reset() marks the card removed and schedules
mmc_rescan, which tears down the bound function drivers and re-detects
the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
With a single function probed it instead invokes mmc_power_cycle()
directly, which on PineTab2 toggles the wifi-reset GPIO via
sdio_pwrseq.

Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It
invokes the bus op and then waits on probe_done_wq for the SDIO
remove() callback to clear sbus_priv, mirroring the wait pattern
already used by bes2600_chrdev_do_system_close() so that a subsequent
bes2600_switch_wifi(true) sees a clean state and can wait on the
fresh probe.

Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is
set (the hard-exception path used by both
bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and
the underlying bus implements bus_reset, take the new recovery path;
otherwise fall back to the legacy power_switch(0) sequence so this
patch is a no-op on USB or any other future bus that does not provide
bus_reset.

mmc_hw_reset() is exported by the MMC core and is the canonical
recovery primitive; calling it without holding the SDIO host claim is
correct because the multi-func remove-and-rescan path acquires the
host claim via the mmc workqueue, and the single-func mmc_power_cycle
path does not require the host claim.

No DT change is required: this works against the existing PineTab2
DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on
v2.0 boards) are both already configured as MMC pwrseq resets.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-26 22:32:29 +02:00
7 changed files with 106 additions and 117 deletions
+29
View File
@@ -16,6 +16,7 @@
#include <linux/mmc/host.h>
#include <linux/mmc/sdio_func.h>
#include <linux/mmc/card.h>
#include <linux/mmc/core.h>
#include <linux/mmc/sdio.h>
#include <linux/spinlock.h>
#include <net/mac80211.h>
@@ -1777,6 +1778,33 @@ static void bes2600_sdio_halt_device(struct sbus_priv *self)
sdio_work_debug(self);
}
/*
* Trigger an SDIO bus reset via mmc_hw_reset().
*
* With multiple SDIO functions probed (PineTab2 binds func 1 for WLAN and
* func 2 for the BT-companion path) mmc_sdio_hw_reset() takes the
* remove-and-rescan path: it marks the card removed and schedules
* mmc_rescan, which tears down the bound function drivers and re-detects
* the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
*
* With a single function probed it instead invokes mmc_power_cycle()
* directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq.
*
* In both cases the chip ends up in a freshly reset state, which is the
* goal of the recovery path.
*
* mmc_hw_reset() must be called without holding the SDIO host claim --
* the multi-func remove-and-rescan path acquires the host claim via the
* mmc workqueue.
*/
static int bes2600_sdio_bus_reset(struct sbus_priv *self)
{
if (!self || !self->func || !self->func->card)
return -EINVAL;
return mmc_hw_reset(self->func->card);
}
static bool bes2600_sdio_wakeup_source(struct sbus_priv *self)
{
struct bes2600_platform_data_sdio *pdata = bes2600_get_platform_data();
@@ -1815,6 +1843,7 @@ static struct sbus_ops bes2600_sdio_sbus_ops = {
.gpio_sleep = bes2600_gpio_allow_mcu_sleep,
.halt_device = bes2600_sdio_halt_device,
.wakeup_source = bes2600_sdio_wakeup_source,
.bus_reset = bes2600_sdio_bus_reset,
};
static void bes2600_sdio_en_lp_cb(struct bes2600_common *hw_priv)
+57 -2
View File
@@ -442,6 +442,48 @@ int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_
return ret;
}
/*
* Hard-reset the bus and wait for the bus core to remove the chip.
*
* Used by the firmware-wedge recovery path on platforms where the normal
* power_switch(0) sequence has no effective chip-reset signal. The bus
* implementation triggers an asynchronous re-detect; this helper waits for
* the resulting remove() callback to clear bes2600_cdev.sbus_priv so that a
* subsequent bes2600_switch_wifi(true) sees a clean state and can wait on
* the fresh probe.
*/
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv)
{
int ret;
long status;
if (!sbus_ops || !priv)
return -EINVAL;
if (!sbus_ops->bus_reset)
return -EOPNOTSUPP;
bes_info("trigger bus reset to recover wedged firmware.\n");
ret = sbus_ops->bus_reset(priv);
if (ret) {
bes_err("bus_reset failed: %d\n", ret);
return ret;
}
/*
* The bus reset is asynchronous: the bus core schedules a rescan
* which removes the bound function drivers and then re-detects the
* chip. Wait for the remove callback to clear sbus_priv. Do not
* dereference 'priv' after this point -- it may already be freed.
*/
status = wait_event_timeout(bes2600_cdev.probe_done_wq,
!bes2600_cdev.sbus_priv, HZ * 3);
WARN_ON(status <= 0);
return 0;
}
bool bes2600_chrdev_is_wifi_opened(void)
{
bool wifi_opened = false;
@@ -540,8 +582,21 @@ static void bes2600_chrdev_wifi_force_close_work(struct work_struct *work)
/* unregister wifi */
bes2600_switch_wifi(0);
/* power down device if wifi is only opened */
if (bes2600_chrdev_check_system_close()) {
/*
* Hard exception with a bus_reset implementation: tear the
* bus down via mmc_hw_reset() (or equivalent) so the next
* bringup probes a freshly reset chip. On PineTab2 this is
* the only effective recovery path -- the existing
* power_switch(0)/(1) sequence has no chip-reset signal of
* its own (sdio_pwrseq owns wifi_reset).
*
* Soft close, or hard close on a board without bus_reset:
* fall back to the legacy power_switch(0) sequence.
*/
if (bes2600_cdev.halt_dev && bes2600_cdev.sbus_ops->bus_reset) {
bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
} else if (bes2600_chrdev_check_system_close()) {
bes2600_chrdev_do_system_close(bes2600_cdev.sbus_ops,
bes2600_cdev.sbus_priv);
}
+1
View File
@@ -60,6 +60,7 @@ struct sbus_priv *bes2600_chrdev_get_sbus_priv_data(void);
/* used to control device power down */
int bes2600_chrdev_check_system_close(void);
int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
void bes2600_chrdev_wakeup_bt(void);
void bes2600_chrdev_wifi_force_close(struct bes2600_common *hw_priv, bool halt_dev);
void bes2600_chrdev_usb_remove(struct bes2600_common *hw_priv);
+9 -85
View File
@@ -524,17 +524,7 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
bes_devel("%s, psMode:%s, fastPsmIdlePeriod:%d apPsmChangePeriod:%d minAutoPsPollPeriod:%d\n",
__func__, bes2600_get_ps_mode_str(priv->powersave_mode.pmMode), priv->powersave_mode.fastPsmIdlePeriod,
priv->powersave_mode.apPsmChangePeriod, priv->powersave_mode.minAutoPsPollPeriod);
/*
* Reinit BEFORE the WSM goes out, so a stale
* indication from a previous cycle cannot have
* primed pm_enter_cmpl. From here until the
* indication callback's cmpxchg(1->0) on
* pm_set_in_process, only the indication for
* THIS request can complete the wait.
*/
reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
atomic_set(&hw_priv->bes_power.pm_set_in_process, 1);
ret = bes2600_set_pm(priv, &priv->powersave_mode);
if (ret) {
atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
@@ -545,33 +535,11 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
/* wait power save mode changed indication */
status = wait_for_completion_timeout(&hw_priv->bes_power.pm_enter_cmpl, 5 * HZ);
atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
if (!status) {
/*
* The indication callback only fires
* complete() when it observes
* pm_set_in_process == 1; cmpxchg it
* to 0 here so a late indication
* cannot prime the next wait.
*
* If we win the cmpxchg, this is a
* real timeout: the firmware's PS
* state is unknown to us. Mark it as
* such so the next wake path can
* probe before assuming the chip is
* still active.
*
* If we lose the cmpxchg, the
* indication arrived between the
* wait timing out and us getting
* here; treat as success.
*/
if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
1, 0) == 1) {
bes_devel("%s, wait pm ind timeout\n", __func__);
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_UNKNOWN);
timeouts++;
}
bes_devel("%s, wait pm ind timeout\n", __func__);
timeouts++;
}
} else {
bes_devel("skip enter lp mode\n");
@@ -586,34 +554,10 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
* in an inconsistent state that cascades into SDIO TX errors on
* the BES2600.
*/
if (timeouts == 0) {
if (timeouts == 0)
bes2600_pwr_device_enter_lp_mode(hw_priv);
} else {
/*
* device_enter_lp_mode() was skipped (one or more VIFs
* timed out waiting for the firmware indication) so its
* gpio_sleep(MCU) - which drops the wake-flag bit and, if
* no other subsystem holds the wake, drives the GPIO low -
* never ran. Without it the bit stays asserted, and the
* next bes2600_pwr_device_exit_lp_mode() calls
* gpio_wake(MCU) into a "bit already set" no-op: the GPIO
* never re-edges, sbus_active() exhausts its 200x2ms
* MCU_WAKEUP_READY budget against an unwoken chip, and
* the first TX after idle stalls for several seconds.
*
* Drop the MCU wake-flag bit explicitly here so the next
* wake injects a real GPIO edge. gpio_allow_mcu_sleep
* preserves multi-subsystem semantics: it only drives the
* GPIO low when no other subsystem still holds wake; if
* BT or another holder is keeping the chip awake, the
* GPIO stays high and the bit clear here is purely
* bookkeeping (so the next gpio_wake doesn't no-op).
*/
if (hw_priv->sbus_ops->gpio_sleep)
hw_priv->sbus_ops->gpio_sleep(hw_priv->sbus_priv,
GPIO_WAKE_FLAG_MCU);
else
ret = -ETIMEDOUT;
}
return ret;
}
@@ -889,7 +833,6 @@ void bes2600_pwr_init(struct bes2600_common *hw_priv)
hw_priv->bes_power.power_up_task = NULL;
mutex_init(&hw_priv->bes_power.pwr_mutex);
atomic_set(&hw_priv->bes_power.dev_state, 0);
atomic_set(&hw_priv->bes_power.chip_pm_state, BES2600_CHIP_PM_UNKNOWN);
init_completion(&hw_priv->bes_power.pm_enter_cmpl);
sema_init(&hw_priv->bes_power.sync_lock, 1);
device_set_wakeup_capable(hw_priv->pdev, true);
@@ -1270,28 +1213,9 @@ int bes2600_pwr_clear_busy_event(struct bes2600_common *hw_priv, u32 event)
void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode)
{
/*
* The firmware sends a PM-changed indication for every transition,
* including ones we didn't ask for (firmware-internal coex moves,
* idle-driven aging). Update chip_pm_state unconditionally so the
* wake path can use it, but only fire pm_enter_cmpl when a host-
* initiated set_pm is actually in flight - otherwise a stale
* indication can prime a future wait against a freshly
* reinit_completion()'ed state.
*/
if ((psmode & 0x01) != WSM_PSM_ACTIVE) {
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_LP);
if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
1, 0) == 1) {
bes_devel("complete pm_enter_cmpl\n");
complete(&hw_priv->bes_power.pm_enter_cmpl);
} else {
bes_devel("PM ind (LP) without pending wait; state recorded\n");
}
} else {
atomic_set(&hw_priv->bes_power.chip_pm_state,
BES2600_CHIP_PM_ACTIVE);
if((psmode & 0x01) != WSM_PSM_ACTIVE) {
bes_devel("complete pm_enter_cmpl\n");
complete(&hw_priv->bes_power.pm_enter_cmpl);
}
}
-15
View File
@@ -64,20 +64,6 @@ enum power_down_state
POWER_DOWN_STATE_UNLOCKED,
};
/*
* Confirmed PM state of the firmware-side chip. Tracks what the host
* has *seen* the firmware acknowledge, not what the host has
* requested. UNKNOWN means a host-initiated transition timed out
* before the firmware indication arrived; the next wake path should
* treat it as "we don't know" and probe before issuing GPIO/SDIO
* wakeup ops.
*/
enum bes2600_chip_pm_state {
BES2600_CHIP_PM_ACTIVE = 0,
BES2600_CHIP_PM_LP,
BES2600_CHIP_PM_UNKNOWN,
};
typedef void (*bes_pwr_enter_lp_cb)(struct bes2600_common *hw_priv);
typedef void (*bes_pwr_exit_lp_cb)(struct bes2600_common *hw_priv);
@@ -120,7 +106,6 @@ struct bes2600_pwr_t
bool ap_lp_bad;
struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM];
atomic_t pm_set_in_process;
atomic_t chip_pm_state;
};
#ifdef CONFIG_BES2600_WOWLAN
+8
View File
@@ -75,6 +75,14 @@ struct sbus_ops {
void (*halt_device)(struct sbus_priv *self);
bool (*wakeup_source)(struct sbus_priv *self);
int (*reboot)(struct sbus_priv *self);
/*
* Force the host bus to re-detect and re-probe the chip. Called
* from the firmware-wedge recovery path when power_switch() has no
* effective chip-reset signal of its own (e.g. PineTab2, where the
* wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 node).
* Returns 0 on success or a negative errno.
*/
int (*bus_reset)(struct sbus_priv *self);
};
void bes2600_irq_handler(struct bes2600_common *priv);
+2 -15
View File
@@ -22,17 +22,9 @@
* After this many consecutive WSM scan rejections from firmware, stop
* issuing new scans for BES2600_SCAN_BACKOFF_JIFFIES and let the state
* that's rejecting them (coex window, firmware-internal busy) clear.
*
* The backoff has to be at least as long as the natural mac80211 scan-
* retry cadence, otherwise the next attempt lands outside the window
* and bypasses the defer guard. Observed in the wild on PineTab2:
* roam-evaluation bursts at ~12 s cadence, idle background scans at
* ~5 min cadence. 30 s catches the burst and leaves the slow case
* alone (the firmware-policy state has had minutes to clear by then
* anyway).
*/
#define BES2600_SCAN_REJECT_THRESHOLD 3
#define BES2600_SCAN_BACKOFF_JIFFIES (30 * HZ)
#define BES2600_SCAN_BACKOFF_JIFFIES (10 * HZ)
static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
@@ -48,9 +40,7 @@ static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
* 2. We already saw >= BES2600_SCAN_REJECT_THRESHOLD consecutive
* rejections on recent scan attempts and the backoff window has
* not yet elapsed. Whatever was rejecting them is likely still
* rejecting them; give it time. If the backoff has elapsed without
* a fresh reject refreshing it, the burst is over and we reset the
* count so an isolated reject doesn't immediately re-trip.
* rejecting them; give it time.
*
* Returns true if the caller should abandon the scan iteration.
*/
@@ -61,9 +51,6 @@ static bool bes2600_scan_should_defer(struct bes2600_common *hw_priv)
return true;
#endif
if (time_after(jiffies, hw_priv->scan.backoff_until))
hw_priv->scan.reject_count = 0;
if (hw_priv->scan.reject_count >= BES2600_SCAN_REJECT_THRESHOLD &&
time_before(jiffies, hw_priv->scan.backoff_until))
return true;