737f28e29c4b8253939e24b1d6b97d5605bb7ac4
6 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
dc1505f5ba |
bes2600: self-detect when firmware does not honor PSM and skip the cycle
The c6 series fixed several host-side bookkeeping bugs around PSM
transitions, but didn't address the underlying contract: this chip's
firmware (BES2600 with the Bestechnic Dec 2023 build that ships on
PineTab2 and most danctnix images) silently drops every WSM_set_pm
request without emitting the corresponding PM_INDICATION. The driver's
own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every
~10s; without firmware acknowledgment each call burns 5s on
wait_for_completion_timeout(pm_enter_cmpl, 5*HZ) and produces a
recurring three-line cascade in dmesg:
bes2600_pwr_enter_lp_mode, wait pm ind timeout
bes2600_sdio_active failed, subsys:0
bes2600_pwr_device_exit_lp_mode, active mcu fail
Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2
6.19.10-danctnix1, ohm) running the c5+c6 stack: zero
wsm_set_pm_indication() invocations across an entire boot, while
bes2600_pwr_enter_lp_mode timed out repeatedly, and
bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return
0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the
firmware reports "I'm awake, there's nothing to wake from").
This patch makes the driver self-heal:
* struct bes2600_pwr_t gains pm_unsupported (bool) and
pm_consecutive_timeouts (unsigned int). Both initialised to
0/false.
* bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when
pm_unsupported is set. Skips the per-VIF set_pm round-trip and
the wait_for_completion entirely.
* On the cmpxchg-success branch of the timeout path, we increment
pm_consecutive_timeouts. When it crosses
BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch
pm_unsupported = true and force chip_pm_state = ACTIVE so that
bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the
wake side (no gpio_wake / sbus_active / WSM_set_operational_mode
reissue past the first one).
* bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0
on any incoming PM indication, and clears pm_unsupported if it
was previously latched. So a firmware update that fixes PM_IND
delivery automatically re-enables PSM transitions without a
driver rebuild.
mac80211's PSM requests via bes2600_set_pm() still flow to the
firmware unchanged; they just don't have host-side timeouts so they
remain silent regardless of firmware acknowledgment. Power
consumption goes up if the firmware actually CAN do PSM (we'd be
keeping the chip awake unnecessarily), but on a chip where the
counter trips this trade-off is forced anyway: the chip stayed awake
under the broken cascade as well, just with constant SDIO churn.
Net effect on dmesg: after ~15s of boot, the three-line cascade stops
firing entirely. The firmware-side wedge is observed once per boot
(captured by the pm_unsupported latch) instead of per-cycle.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
7a65dc374c |
bes2600: short-circuit wake handshake when chip is confirmed ACTIVE
The previous patch ("bes2600: gate PM indication completion on pending
request and track chip state") added enum bes2600_chip_pm_state and the
chip_pm_state field tracking what the host has *seen the firmware
confirm*. This patch makes the wake side use it.
Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally
runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active),
even when the chip is already in confirmed-ACTIVE state and the wake
sequence has nothing to do. The visible failure mode on PineTab2:
bes2600_pwr_enter_lp_mode, wait pm ind timeout
repeat set gpio_wake_flag, sub_sys:0
bes2600_sdio_active failed, subsys:0
bes2600_pwr_device_exit_lp_mode, active mcu fail
cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces:
1. enter_lp_mode timed out (firmware indication lost). With c6.1,
chip_pm_state is now UNKNOWN.
2. lock_device fires exit_lp_mode.
3. gpio_wake hits "bit already set" because device_enter_lp_mode
was skipped when the indication timed out, so gpio_sleep was
never called - the bit reflects driver intent, not chip state.
gpio_wake silently no-ops (no GPIO edge), bit stays set.
4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that
never comes (firmware was never told to wake), then fails.
5. Driver continues to wsm_set_operational_mode against the wedged
bus, compounding the failure.
This patch's three moves:
* bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry.
On BES2600_CHIP_PM_ACTIVE, log at devel level and return without
touching gpio_wake / sbus_active / WSM. The chip is in the state
we want; the handshake exists only to drive a transition.
* On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake
handshake as before, but on sbus_active() failure: set
chip_pm_state = UNKNOWN, log once at err level, and bail out.
Do NOT call wsm_set_operational_mode over a wedged bus - it
would just emit a second error and leave the chip in an even
less defined state.
* bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep():
demote "repeat set/clear gpio_wake_flag" from bes_err to
bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want
MCU awake) is the steady-state case, and the symmetric clear
while bit-already-clear is racy bookkeeping rather than a
hardware error. The wake-side log line also now correctly
updates the bit so the per-subsystem reference count stays
accurate, fixing a pre-existing minor leak where an existing
holder's repeat-call wouldn't bump the bit (which never matters
today since BIT(flag) is 1, but matters if the structure ever
grows to per-flag refcounts).
Net effect on the cycle:
* If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips
cleanly. Storm goes silent.
* If chip is genuinely LP, behaviour is unchanged.
* If chip is UNKNOWN (post-timeout state), one wake attempt is
made; on failure, state stays UNKNOWN and we don't emit a
second cascade error per attempt. Repeated UNKNOWN with failed
wake will eventually be picked up by the LMAC active-monitor
and escalated to mmc_hw_reset (c5.2).
No new locks, no new state. Only consumption of the chip_pm_state
field added in the prerequisite patch.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
40aec44a6e |
bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:
1) bes2600_pwr_notify_ps_changed() unconditionally fires
complete(pm_enter_cmpl) for any non-active psmode. It does not
check whether a host-initiated set_pm is actually pending. A
spontaneous indication (firmware-internal coex move,
idle-driven aging) primes the completion, and the next host-
driven enter_lp_mode sees a false success on its first
wait_for_completion_timeout.
2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is
status = wait_for_completion_timeout(...);
atomic_set(pm_set_in_process, 0);
reinit_completion(...);
If an indication arrives between wait_for_completion_timeout
returning with status==1 and reinit_completion, the next
enter_lp_mode iteration's wait can also see false success. The
reinit must happen *before* we start the new request, not
after handling the previous one.
3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
away. It does not record that the firmware's actual PM state
is no longer known to the host. Subsequent wake paths
(gpio_wake / sbus_active) assume the chip is still active and
hit deterministic SDIO failures when the firmware has
transitioned anyway.
This patch is the safe-prerequisite half of a wider fix:
* bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
and bes_power.chip_pm_state. Its job is to track what the host
has *seen the firmware confirm*, not what the host has
requested. Initialised to ACTIVE in bes2600_pwr_init().
* bes2600_pwr_notify_ps_changed() unconditionally updates
chip_pm_state on every indication, but only fires
complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
1, 0) succeeds. A spontaneous indication can no longer prime a
waiter that will only set up its request afterwards.
* bes2600_pwr_enter_lp_mode() now reinit_completion()s before
setting pm_set_in_process and sending wsm_set_pm. After a
timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
indication cannot prime the next iteration) and on the win-
cmpxchg branch records chip_pm_state=UNKNOWN.
A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.
No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
894c502cd5 |
bes2600: demote 'wait pm ind timeout' from bes_err to bes_devel
bes2600_pwr_enter_lp_mode() logs 'wait pm ind timeout' at bes_err
level every time wait_for_completion_timeout() on the firmware's
PM-change indication returns 0. The preceding patch ('bes2600:
gate device LP-mode entry on successful per-VIF firmware
handshake') already handles this case correctly: the per-VIF
timeouts counter is incremented, the function returns
-ETIMEDOUT, and the device-side LP transition is skipped -- the
cascade into sdio_tx_work splats and [RX] Receive failure
messages is prevented.
The timeout itself is benign steady-state noise on the PineTab2
(BES2600WM). Firmware occasionally misses the 5 s PM-change
deadline when mac80211 flips power-save rapidly during
association or roaming; observed rate on a quiet, associated
ohm is roughly 3-10 events per 10 min of uptime, with no
user-visible effect. Keeping it at bes_err() level (== KERN_ERR,
priority 3) floods dmesg with what is already a handled
condition and makes real SDIO / PM errors harder to spot.
Demote to bes_devel() (== KERN_DEBUG gated on the driver's debug
flag). The gate in the caller is unchanged, so the downstream
suppression behaviour introduced by the earlier patch remains.
Real pathologies -- bes_err("set operation mode fail") on the
same path, and the timeouts != 0 / -ETIMEDOUT return consumed
by callers -- still surface at bes_err() / return-value level.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
e8550e55fc |
bes2600: gate device LP-mode entry on successful per-VIF firmware handshake
bes2600_pwr_enter_lp_mode() drives the transition to low-power for each
associated STA VIF: it pushes wsm_set_pm(), waits up to 5 seconds on
pm_enter_cmpl for the firmware to acknowledge, then unconditionally
calls bes2600_pwr_device_enter_lp_mode() to drop the device end of the
bus.
Two bugs:
1. A failed wsm_set_pm() only logs an error, then still falls into
wait_for_completion_timeout() on a completion the firmware will
never post (the set-mode command never reached it). The loop
therefore always blocks the full 5 s, logs a second error, and
proceeds.
2. A genuine wait-timeout (firmware received the set-mode command but
never posted the indication) also only logs a warning. The code
then drops to bes2600_pwr_device_enter_lp_mode(), handing the
device subsystem an inconsistent view of mac-layer state.
On PineTab2 (BES2600WM + RK3566) the second bug is the recurring
root-cause of the 'bes2600_pwr_enter_lp_mode, wait pm ind timeout'
message flooding dmesg every 5-10 s when the interface is associated
and idle. Sending the device to LP in that state cascades into the
SDIO TX path as the 'bes_sdio_memcpy_to_io_helper / sdio_tx_work'
WARN splat.
Fix:
- Add a 'timeouts' counter; bump it on both failure paths.
- Skip the wait_for_completion entirely when wsm_set_pm() failed
(there is no completion to wait for).
- Only call bes2600_pwr_device_enter_lp_mode() when every per-VIF
handshake reached firmware-ACKed completion; otherwise return
-ETIMEDOUT and leave the device in its current power state.
Tested-on: PineTab2 running linux-pinetab2 6.19.10-danctnix1-1.
Post-patch the handshake still fails on this particular firmware
revision (separate root-cause investigation outside this patch), but
the driver now returns -ETIMEDOUT cleanly instead of flooding dmesg
and destabilising the SDIO path.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
ba20341e70 |
Upload
Source: https://github.com/cringeops/bes2600 Source: https://github.com/cringeops/bes2600/pull/14 Source: https://github.com/cringeops/bes2600/pull/17 Source: https://github.com/cringeops/bes2600/pull/20 |