bes2600-dkms

10 Commits 38 Branches 0 Tags

Author	SHA1	Message	Date
Markus Fritsche	51d46a2e25	bes2600: short-circuit wake handshake when chip is confirmed ACTIVE The previous patch ("bes2600: gate PM indication completion on pending request and track chip state") added enum bes2600_chip_pm_state and the chip_pm_state field tracking what the host has seen the firmware confirm. This patch makes the wake side use it. Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active), even when the chip is already in confirmed-ACTIVE state and the wake sequence has nothing to do. The visible failure mode on PineTab2: bes2600_pwr_enter_lp_mode, wait pm ind timeout repeat set gpio_wake_flag, sub_sys:0 bes2600_sdio_active failed, subsys:0 bes2600_pwr_device_exit_lp_mode, active mcu fail cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces: 1. enter_lp_mode timed out (firmware indication lost). With c6.1, chip_pm_state is now UNKNOWN. 2. lock_device fires exit_lp_mode. 3. gpio_wake hits "bit already set" because device_enter_lp_mode was skipped when the indication timed out, so gpio_sleep was never called - the bit reflects driver intent, not chip state. gpio_wake silently no-ops (no GPIO edge), bit stays set. 4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that never comes (firmware was never told to wake), then fails. 5. Driver continues to wsm_set_operational_mode against the wedged bus, compounding the failure. This patch's three moves: * bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry. On BES2600_CHIP_PM_ACTIVE, log at devel level and return without touching gpio_wake / sbus_active / WSM. The chip is in the state we want; the handshake exists only to drive a transition. * On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake handshake as before, but on sbus_active() failure: set chip_pm_state = UNKNOWN, log once at err level, and bail out. Do NOT call wsm_set_operational_mode over a wedged bus - it would just emit a second error and leave the chip in an even less defined state. * bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep(): demote "repeat set/clear gpio_wake_flag" from bes_err to bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want MCU awake) is the steady-state case, and the symmetric clear while bit-already-clear is racy bookkeeping rather than a hardware error. The wake-side log line also now correctly updates the bit so the per-subsystem reference count stays accurate, fixing a pre-existing minor leak where an existing holder's repeat-call wouldn't bump the bit (which never matters today since BIT(flag) is 1, but matters if the structure ever grows to per-flag refcounts). Net effect on the cycle: * If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips cleanly. Storm goes silent. * If chip is genuinely LP, behaviour is unchanged. * If chip is UNKNOWN (post-timeout state), one wake attempt is made; on failure, state stays UNKNOWN and we don't emit a second cascade error per attempt. Repeated UNKNOWN with failed wake will eventually be picked up by the LMAC active-monitor and escalated to mmc_hw_reset (c5.2). No new locks, no new state. Only consumption of the chip_pm_state field added in the prerequisite patch. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:58 +02:00
Markus Fritsche	7c4ad3b1d6	bes2600: gate PM indication completion on pending request and track chip state When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side PM-changed indication confirming the transition. Three sequenced flaws make the wait-and-confirm racy and leave host/chip bookkeeping desynced when anything misfires: 1) bes2600_pwr_notify_ps_changed() unconditionally fires complete(pm_enter_cmpl) for any non-active psmode. It does not check whether a host-initiated set_pm is actually pending. A spontaneous indication (firmware-internal coex move, idle-driven aging) primes the completion, and the next host- driven enter_lp_mode sees a false success on its first wait_for_completion_timeout. 2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is status = wait_for_completion_timeout(...); atomic_set(pm_set_in_process, 0); reinit_completion(...); If an indication arrives between wait_for_completion_timeout returning with status==1 and reinit_completion, the next enter_lp_mode iteration's wait can also see false success. The reinit must happen before we start the new request, not after handling the previous one. 3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks away. It does not record that the firmware's actual PM state is no longer known to the host. Subsequent wake paths (gpio_wake / sbus_active) assume the chip is still active and hit deterministic SDIO failures when the firmware has transitioned anyway. This patch is the safe-prerequisite half of a wider fix: * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN} and bes_power.chip_pm_state. Its job is to track what the host has seen the firmware confirm, not what the host has requested. Initialised to ACTIVE in bes2600_pwr_init(). * bes2600_pwr_notify_ps_changed() unconditionally updates chip_pm_state on every indication, but only fires complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process, 1, 0) succeeds. A spontaneous indication can no longer prime a waiter that will only set up its request afterwards. * bes2600_pwr_enter_lp_mode() now reinit_completion()s before setting pm_set_in_process and sending wsm_set_pm. After a timeout, it cmpxchgs pm_set_in_process back to 0 (so a late indication cannot prime the next iteration) and on the win- cmpxchg branch records chip_pm_state=UNKNOWN. A follow-up patch consumes chip_pm_state on the wake side (bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix the deterministic "active mcu fail" cycle this state-record enables a fix for. Splitting the work this way keeps the lock-free race fix small and reviewable on its own. No new locks, no behaviour change on the success path. Only the recovery path (timeout + spontaneous indication) gains correctness. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:47 +02:00
Markus Fritsche	e0f664cbc9	bes2600: recover wedged firmware via mmc_hw_reset on link break When the LMAC active monitor detects 'link break between lmac and host' (the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor), bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the device down and prepare for a fresh probe. On the wifi_force_close_work side this calls bes2600_chrdev_do_system_close() which dispatches sbus_ops->power_switch(0). On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a no-op: * bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message, clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is the literal one-line stub bes_warn("...this function does nothing\n"). * bes2600_sdio_on() (the eventual power_switch(1) counterpart) toggles pdata->powerup, which is NULL on PineTab2 because the wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi: 'The reset pin is claimed by sdio_mmcseq, It is better to move it to U-Boot so the OS can use it.'). Net result: the chip is never reset. The function drivers are not removed (the SDIO core has no signal that the card is gone), the firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO function in a half-torn-down state. modprobe bes2600 then fails with 'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM) on both functions (:1 wifi, :2 BT-companion) until a full system reboot. Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150 minutes of background-scan rejects (wsm_generic_confirm 0x0007, [SCAN] Scan failed (-22)) accumulating until the LMAC stopped acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible under sustained scan pressure. Add a sbus operation bus_reset() that the recovery path can call when power_switch() has no effective chip-reset signal of its own. Provide an SDIO implementation that calls mmc_hw_reset(self->func->card), which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN and func 2 for the BT-companion path) takes the remove-and-rescan path: mmc_sdio_hw_reset() marks the card removed and schedules mmc_rescan, which tears down the bound function drivers and re-detects the card on the next sweep, in turn reinvoking bes2600_sdio_probe(). With a single function probed it instead invokes mmc_power_cycle() directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq. Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It invokes the bus op and then waits on probe_done_wq for the SDIO remove() callback to clear sbus_priv, mirroring the wait pattern already used by bes2600_chrdev_do_system_close() so that a subsequent bes2600_switch_wifi(true) sees a clean state and can wait on the fresh probe. Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is set (the hard-exception path used by both bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and the underlying bus implements bus_reset, take the new recovery path; otherwise fall back to the legacy power_switch(0) sequence so this patch is a no-op on USB or any other future bus that does not provide bus_reset. mmc_hw_reset() is exported by the MMC core and is the canonical recovery primitive; calling it without holding the SDIO host claim is correct because the multi-func remove-and-rescan path acquires the host claim via the mmc workqueue, and the single-func mmc_power_cycle path does not require the host claim. No DT change is required: this works against the existing PineTab2 DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on v2.0 boards) are both already configured as MMC pwrseq resets. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	bdb0450bdf	bes2600: widen scan-defer backoff to 30s and decay count on quiet The scan-defer logic added in the previous patch ("bes2600: defer scan and soften WARN on firmware reject") used a 10-second backoff window and never cleared reject_count outside of a successful scan. Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows two distinct mac80211 scan-retry cadences in practice: * Idle background scans every ~5 minutes when associated -- well outside any plausible backoff, the defer guard correctly falls through to a real WSM scan attempt. * Roam-evaluation bursts triggered when mac80211 wants to find a candidate AP for handover (signal degradation, beacon loss, locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and one boot reproduced 14 such rejected scans in 3 minutes during a single burst, none of which engaged the defer guard because every retry landed just outside the 10 s window. Two-line behaviour change to fix that: 1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10HZ to 30HZ, so a 12 s-cadence burst stays inside the window across consecutive rejects and the third reject in the burst trips the threshold guard. The 5 min idle case is still naturally past the window and is unaffected. 2. bes2600_scan_should_defer() resets reject_count to 0 when time_after(jiffies, backoff_until). Without this, reject_count accumulated indefinitely across the slow-cadence rejects, so an isolated reject after long quiet would have tripped the threshold the moment it arrived. After the change, count is latched only inside an active burst and decays cleanly when the burst ends. Net effect on a roam burst: * t=0 reject #1 (count 1, backoff_until = t0 + 30s) * t=12 reject #2 (count 2, backoff_until = t1 + 30s) * t=24 reject #3 (count 3, threshold met, next scan deferred) * t=36 defer fires, no WSM round-trip, reject not sent * ... defers continue until the firmware-policy state clears * scan succeeds -> reject_count = 0, normal cadence resumes WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just the scans needed to reach the threshold). wpa_supplicant's reason=3 locally-generated disconnects driven by exhausted roam candidates during the same burst window also drop. No new state, no new symbols, no change to mac80211-facing semantics: the deferred scan still completes via the existing fail: path with status=-EBUSY, the same response a real firmware-busy would produce. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	4fec8b2ecc	bes2600: defer scan and soften WARN on firmware reject On a BES2600-based PineTab2, mac80211's background-scan cadence (about every 30 s when associated) triggers a two-step WARN splat pattern, visible in dmesg roughly 30 times per 10 min of regular WiFi use: wsm_generic_confirm ret 2 WARNING: at wsm_handle_rx+0x8a4/0xf30 [bes2600] ... full stack trace ... ieee80211 phy0: wsm_generic_confirm failed for request 0x0007. WARNING: at bes2600_scan_work+0x5d4/0x810 [bes2600] ... full stack trace ... ieee80211 phy0: [SCAN] Scan failed (-22). 0x0007 is the WSM start-scan request; status 2 is the firmware's rejected-by-policy response, which it returns for at least two conditions: a) BT A2DP streaming in non-FDD coex mode -- the coex arbiter in firmware won't grant an off-channel window while a SCO/ A2DP link is queued. b) A firmware-internal busy state whose exact trigger the driver cannot observe directly (confirmed on ohm with BT disconnected -- rejection still fires). Likely transient firmware-PM transitions. Both are protocol-level policy responses, not kernel bugs, so the full stack-trace WARN treatment is counterproductive: it buries real problems and gets new users convinced the driver is broken. Three-part fix: 1. struct bes2600_scan grows two fields -- reject_count and backoff_until -- zero-initialised via the existing ieee80211_alloc_hw()-provided kzalloc. 2. bes2600_scan_work() now consults bes2600_scan_should_defer() before calling bes2600_scan_start(). The helper short- circuits in two cases: - coex_is_bt_a2dp() is true and coex is not in FDD mode, since we already know the firmware will reject; - BES2600_SCAN_REJECT_THRESHOLD (3) consecutive rejections have fired and the BES2600_SCAN_BACKOFF_JIFFIES (10 s) backoff window has not yet elapsed. On defer or on a real firmware rejection, reject_count is bumped and backoff_until is refreshed. A successful scan clears reject_count. 3. The WARN_ON(hw_priv->scan.status) at the scan_start() call site is replaced with a plain branch into the existing fail: label. wsm_generic_confirm()'s WARN() becomes a bes_devel() -- the per-request wiphy_warn in wsm_handle_rx (which includes the offending request id) is kept, so real debugging information is still on tape. Net behaviour: - Expected rejections no longer produce stack traces. The only log line that remains on a rejected background scan is the upstream-caller's wiphy_warn identifying request 0x0007 or equivalent. - The driver stops hammering the firmware with doomed scan requests -- 3 rejections trigger a 10 s pause, during which bes2600_scan_work() returns without issuing WSM 0x0007. - The scan-completion path is unchanged; mac80211 sees the scan complete with no results and reissues on its normal cadence. - Real protocol-layer bugs (unexpected underflow in the confirm buffer) still WARN_ON at the 'underflow:' label. Verified on ohm (PineTab2, linux-pinetab2 6.19.10-danctnix1-1): WARN splat count dropped from 32 to 0 per 10 min uptime. WiFi stays associated. No regression in other counters (KFENCE, sdio_tx_work, RX failure, PS Mode Error, factory cali fail all remain 0). Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	e0d752aae9	sync bes2600/ to v7.0-danctnix1 baseline (rebasing reference)	2026-05-19 09:04:33 +02:00
Manuel Traut	fe73571183	d/control: Fix packagename of fw dependency Signed-off-by: Manuel Traut <manut@mecka.net>	2025-12-09 13:42:27 +00:00
Julian	624fa34bf8	Depend on firmware	2025-11-27 09:02:49 +01:00
Julian	70f1551c94	WIP: Fix autopkgtest	2025-09-18 11:44:54 +02:00
Julian	ba20341e70	Upload Source: https://github.com/cringeops/bes2600 Source: https://github.com/cringeops/bes2600/pull/14 Source: https://github.com/cringeops/bes2600/pull/17 Source: https://github.com/cringeops/bes2600/pull/20	2025-09-17 16:35:45 +02:00