When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:
1) bes2600_pwr_notify_ps_changed() unconditionally fires
complete(pm_enter_cmpl) for any non-active psmode. It does not
check whether a host-initiated set_pm is actually pending. A
spontaneous indication (firmware-internal coex move,
idle-driven aging) primes the completion, and the next host-
driven enter_lp_mode sees a false success on its first
wait_for_completion_timeout.
2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is
status = wait_for_completion_timeout(...);
atomic_set(pm_set_in_process, 0);
reinit_completion(...);
If an indication arrives between wait_for_completion_timeout
returning with status==1 and reinit_completion, the next
enter_lp_mode iteration's wait can also see false success. The
reinit must happen *before* we start the new request, not
after handling the previous one.
3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
away. It does not record that the firmware's actual PM state
is no longer known to the host. Subsequent wake paths
(gpio_wake / sbus_active) assume the chip is still active and
hit deterministic SDIO failures when the firmware has
transitioned anyway.
This patch is the safe-prerequisite half of a wider fix:
* bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
and bes_power.chip_pm_state. Its job is to track what the host
has *seen the firmware confirm*, not what the host has
requested. Initialised to ACTIVE in bes2600_pwr_init().
* bes2600_pwr_notify_ps_changed() unconditionally updates
chip_pm_state on every indication, but only fires
complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
1, 0) succeeds. A spontaneous indication can no longer prime a
waiter that will only set up its request afterwards.
* bes2600_pwr_enter_lp_mode() now reinit_completion()s before
setting pm_set_in_process and sending wsm_set_pm. After a
timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
indication cannot prime the next iteration) and on the win-
cmpxchg branch records chip_pm_state=UNKNOWN.
A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.
No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
bes2600_pwr_enter_lp_mode() logs 'wait pm ind timeout' at bes_err
level every time wait_for_completion_timeout() on the firmware's
PM-change indication returns 0. The preceding patch ('bes2600:
gate device LP-mode entry on successful per-VIF firmware
handshake') already handles this case correctly: the per-VIF
timeouts counter is incremented, the function returns
-ETIMEDOUT, and the device-side LP transition is skipped -- the
cascade into sdio_tx_work splats and [RX] Receive failure
messages is prevented.
The timeout itself is benign steady-state noise on the PineTab2
(BES2600WM). Firmware occasionally misses the 5 s PM-change
deadline when mac80211 flips power-save rapidly during
association or roaming; observed rate on a quiet, associated
ohm is roughly 3-10 events per 10 min of uptime, with no
user-visible effect. Keeping it at bes_err() level (== KERN_ERR,
priority 3) floods dmesg with what is already a handled
condition and makes real SDIO / PM errors harder to spot.
Demote to bes_devel() (== KERN_DEBUG gated on the driver's debug
flag). The gate in the caller is unchanged, so the downstream
suppression behaviour introduced by the earlier patch remains.
Real pathologies -- bes_err("set operation mode fail") on the
same path, and the timeouts != 0 / -ETIMEDOUT return consumed
by callers -- still surface at bes_err() / return-value level.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
bes2600_pwr_enter_lp_mode() drives the transition to low-power for each
associated STA VIF: it pushes wsm_set_pm(), waits up to 5 seconds on
pm_enter_cmpl for the firmware to acknowledge, then unconditionally
calls bes2600_pwr_device_enter_lp_mode() to drop the device end of the
bus.
Two bugs:
1. A failed wsm_set_pm() only logs an error, then still falls into
wait_for_completion_timeout() on a completion the firmware will
never post (the set-mode command never reached it). The loop
therefore always blocks the full 5 s, logs a second error, and
proceeds.
2. A genuine wait-timeout (firmware received the set-mode command but
never posted the indication) also only logs a warning. The code
then drops to bes2600_pwr_device_enter_lp_mode(), handing the
device subsystem an inconsistent view of mac-layer state.
On PineTab2 (BES2600WM + RK3566) the second bug is the recurring
root-cause of the 'bes2600_pwr_enter_lp_mode, wait pm ind timeout'
message flooding dmesg every 5-10 s when the interface is associated
and idle. Sending the device to LP in that state cascades into the
SDIO TX path as the 'bes_sdio_memcpy_to_io_helper / sdio_tx_work'
WARN splat.
Fix:
- Add a 'timeouts' counter; bump it on both failure paths.
- Skip the wait_for_completion entirely when wsm_set_pm() failed
(there is no completion to wait for).
- Only call bes2600_pwr_device_enter_lp_mode() when every per-VIF
handshake reached firmware-ACKed completion; otherwise return
-ETIMEDOUT and leave the device in its current power state.
Tested-on: PineTab2 running linux-pinetab2 6.19.10-danctnix1-1.
Post-patch the handshake still fails on this particular firmware
revision (separate root-cause investigation outside this patch), but
the driver now returns -ETIMEDOUT cleanly instead of flooding dmesg
and destabilising the SDIO path.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>