Commit Graph

4 Commits

Author SHA1 Message Date
test0r c57c77e446 bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:

  1) bes2600_pwr_notify_ps_changed() unconditionally fires
     complete(pm_enter_cmpl) for any non-active psmode. It does not
     check whether a host-initiated set_pm is actually pending. A
     spontaneous indication (firmware-internal coex move,
     idle-driven aging) primes the completion, and the next host-
     driven enter_lp_mode sees a false success on its first
     wait_for_completion_timeout.

  2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is

         status = wait_for_completion_timeout(...);
         atomic_set(pm_set_in_process, 0);
         reinit_completion(...);

     If an indication arrives between wait_for_completion_timeout
     returning with status==1 and reinit_completion, the next
     enter_lp_mode iteration's wait can also see false success. The
     reinit must happen *before* we start the new request, not
     after handling the previous one.

  3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
     away. It does not record that the firmware's actual PM state
     is no longer known to the host. Subsequent wake paths
     (gpio_wake / sbus_active) assume the chip is still active and
     hit deterministic SDIO failures when the firmware has
     transitioned anyway.

This patch is the safe-prerequisite half of a wider fix:

  * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
    and bes_power.chip_pm_state. Its job is to track what the host
    has *seen the firmware confirm*, not what the host has
    requested. Initialised to ACTIVE in bes2600_pwr_init().

  * bes2600_pwr_notify_ps_changed() unconditionally updates
    chip_pm_state on every indication, but only fires
    complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
    1, 0) succeeds. A spontaneous indication can no longer prime a
    waiter that will only set up its request afterwards.

  * bes2600_pwr_enter_lp_mode() now reinit_completion()s before
    setting pm_set_in_process and sending wsm_set_pm. After a
    timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
    indication cannot prime the next iteration) and on the win-
    cmpxchg branch records chip_pm_state=UNKNOWN.

A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.

No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-28 16:11:08 +02:00
test0r 8855718511 bes2600: demote 'wait pm ind timeout' from bes_err to bes_devel
bes2600_pwr_enter_lp_mode() logs 'wait pm ind timeout' at bes_err
level every time wait_for_completion_timeout() on the firmware's
PM-change indication returns 0. The preceding patch ('bes2600:
gate device LP-mode entry on successful per-VIF firmware
handshake') already handles this case correctly: the per-VIF
timeouts counter is incremented, the function returns
-ETIMEDOUT, and the device-side LP transition is skipped -- the
cascade into sdio_tx_work splats and [RX] Receive failure
messages is prevented.

The timeout itself is benign steady-state noise on the PineTab2
(BES2600WM). Firmware occasionally misses the 5 s PM-change
deadline when mac80211 flips power-save rapidly during
association or roaming; observed rate on a quiet, associated
ohm is roughly 3-10 events per 10 min of uptime, with no
user-visible effect. Keeping it at bes_err() level (== KERN_ERR,
priority 3) floods dmesg with what is already a handled
condition and makes real SDIO / PM errors harder to spot.

Demote to bes_devel() (== KERN_DEBUG gated on the driver's debug
flag). The gate in the caller is unchanged, so the downstream
suppression behaviour introduced by the earlier patch remains.
Real pathologies -- bes_err("set operation mode fail") on the
same path, and the timeouts != 0 / -ETIMEDOUT return consumed
by callers -- still surface at bes_err() / return-value level.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-24 08:55:10 +02:00
test0r 19feb8181a bes2600: gate device LP-mode entry on successful per-VIF firmware handshake
bes2600_pwr_enter_lp_mode() drives the transition to low-power for each
associated STA VIF: it pushes wsm_set_pm(), waits up to 5 seconds on
pm_enter_cmpl for the firmware to acknowledge, then unconditionally
calls bes2600_pwr_device_enter_lp_mode() to drop the device end of the
bus.

Two bugs:

1. A failed wsm_set_pm() only logs an error, then still falls into
   wait_for_completion_timeout() on a completion the firmware will
   never post (the set-mode command never reached it). The loop
   therefore always blocks the full 5 s, logs a second error, and
   proceeds.

2. A genuine wait-timeout (firmware received the set-mode command but
   never posted the indication) also only logs a warning. The code
   then drops to bes2600_pwr_device_enter_lp_mode(), handing the
   device subsystem an inconsistent view of mac-layer state.

On PineTab2 (BES2600WM + RK3566) the second bug is the recurring
root-cause of the 'bes2600_pwr_enter_lp_mode, wait pm ind timeout'
message flooding dmesg every 5-10 s when the interface is associated
and idle. Sending the device to LP in that state cascades into the
SDIO TX path as the 'bes_sdio_memcpy_to_io_helper / sdio_tx_work'
WARN splat.

Fix:
  - Add a 'timeouts' counter; bump it on both failure paths.
  - Skip the wait_for_completion entirely when wsm_set_pm() failed
    (there is no completion to wait for).
  - Only call bes2600_pwr_device_enter_lp_mode() when every per-VIF
    handshake reached firmware-ACKed completion; otherwise return
    -ETIMEDOUT and leave the device in its current power state.

Tested-on: PineTab2 running linux-pinetab2 6.19.10-danctnix1-1.
Post-patch the handshake still fails on this particular firmware
revision (separate root-cause investigation outside this patch), but
the driver now returns -ETIMEDOUT cleanly instead of flooding dmesg
and destabilising the SDIO path.

Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
2026-04-24 08:55:10 +02:00
Julian ba20341e70 Upload
Source: https://github.com/cringeops/bes2600
Source: https://github.com/cringeops/bes2600/pull/14
Source: https://github.com/cringeops/bes2600/pull/17
Source: https://github.com/cringeops/bes2600/pull/20
2025-09-17 16:35:45 +02:00