bes2600-dkms

Author	SHA1	Message	Date
Markus Fritsche	dd01be0162	bes2600: Patch E — skip ps_state_lock when PSM-known-disabled Per the Opus structural critique (PR #8 §2.4) and Sonnet review item 5. The per-RX-frame early-data path takes ps_state_lock to double-check whether a link entry transitioned to BES2600_LINK_SOFT (AP-side power-save state machine, soft-link transition). When c7 has latched pm_unsupported = true (firmware does not honor PSM, see feedback_bes2600_firmware_no_psm memory), the AP power-save state machine is dead and link entries never transition to LINK_SOFT. The per-frame spin_lock_bh + double-check is wasted work. This patch gates the lock acquisition on !pm_unsupported. When the latch is on (the steady state on the production-shipped bes2600 firmware), early_data RX frames bypass the spin_lock_bh and go directly to ieee80211_rx_irqsafe. If a future firmware drop fixes PSM, c7 self-clears pm_unsupported on the first real PM_INDICATION and the locked path resumes. Scope is narrower than Sonnet originally framed: only the per-RX-frame hot path (txrx.c:1945-1951 in cleanups+G+D) is touched. Other ps_state_lock sites in txrx.c (lines 657, 1256, 1420, 1528) are TX submission / multicast-start / link-id paths, not per-frame RX, and not on the Bug #5 hot path. Leave those alone. Build verified: srcversion B5922B4933590F33207EE97 on ohm sandbox.	2026-05-20 20:17:58 +02:00
Markus Fritsche	93f2aab656	bes2600: Patch D — atomicize ba_lock counters, drop the spinlock The block-ack policy uses 4 int counters (ba_acc, ba_cnt, ba_acc_rx, ba_cnt_rx) bumped per data frame in the TX and RX hot paths under spin_lock_bh(&hw_priv->ba_lock). The lock was the heaviest per-frame synchronization cost remaining after Patch C v3 (which fixed the sdio_rx_work relay). Per the Opus structural critique (PR #8), this pattern matches mac80211 driver convention for per-frame statistics: atomic_t suffices, no lock needed. Field-by-field changes in struct bes2600_common: ba_acc, ba_cnt, ba_acc_rx, ba_cnt_rx: int -> atomic_t ba_armed: new atomic_t (timer-arm flag) ba_ena: bool -> atomic_t ba_lock: removed (spinlock_t deleted) ba_hist: int (single-writer = ba_timer) Producer hot path (txrx.c TX submit + RX receive): - atomic_add for the byte accumulator - atomic_inc for the frame counter - atomic_cmpxchg(&ba_armed, 0, 1) to claim the once-per-window mod_timer arm — at most ONE producer succeeds; race-free - no spin_lock_bh Consumer paths (sta.c bes2600_ba_timer, sta.c disconnect-reset, sta.c bes2600_ba_work, debug.c debugfs reader): - atomic_read snapshots all 4 counters into locals; the threshold predicate (acc/cnt >= THLD) tolerates approximate snapshots — the timer fires periodically, a single misclassification just delays the policy update by one tick - atomic_set zeroes the counters at end of timer-callback window; racing producer increments after the snapshot are lost (acceptable for stats; same approximation the original lock allowed under contention) - atomic_set(&ba_armed, 0) re-enables the next window's arm Followup-amenable simplification: ba_hist remains int because only the single ba_timer callback writes it; multiple writers would need to upgrade it too. This patch follows the cw1200-mainline-idiom established by Patch C v3 (structural fix, not bandaid). The cw1200 reference doesn't have a similar lock to compare; bes2600 inherited this from a later Bestechnic addition rather than the upstream tree.	2026-05-20 20:17:58 +02:00
Markus Fritsche	a02f8b7629	bes2600: Patch G — restore SPDX identifiers + ST-Ericsson attribution The bes2600 driver is a fork of the upstream cw1200 driver (drivers/net/wireless/st/cw1200/, ST-Ericsson, Dmitry Tarnyagin 2010-2011). The fork's file headers have three GPL-compliance issues: 1. NO SPDX-License-Identifier on any of 48 source files (cw1200 mainline has them on all 25). kernel.org-mandated since 2017. 2. Original "Copyright (c) 2010, ST-Ericsson" lines stripped from all files inherited from cw1200, replaced with "Copyright (c) 2010, Bestechnic" — factually impossible (Bestechnic did not author the 2010 work) and a GPL-2.0 §1 attribution-preservation violation. 3. The "GPL version 2 as published by the Free Software Foundation" boilerplate paragraph is redundant alongside SPDX and is the legacy form modern kernel sources have replaced. This patch corrects all three for the 48 .c/.h files in bes2600/: - Adds `// SPDX-License-Identifier: GPL-2.0-only` (or `/* ... */` for headers) as line 1 of every file. - Restores `Copyright (c) 2010, ST-Ericsson` + `Author: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>` as the FIRST copyright chain entry on all 22 files derived from cw1200 (bh.{c,h}, debug.{c,h}, fwio.{c,h}, hwio.{c,h}, main.c, pm.{c,h}, queue.{c,h}, scan.{c,h}, sta.{c,h}, txrx.{c,h}, wsm.{c,h}). - Keeps `Copyright (c) 2022, Bestechnic (Beijing) Co., Ltd.` as the SECOND chain entry where Bestechnic genuinely contributed. - Notes "Derived from cw1200_sdio.c" + ST-Ericsson copyright on bes2600_sdio.c (heavy derivation, not a literal rename). - Notes "Replaces hwbus.h from cw1200/" + ST-Ericsson copyright on sbus.h. - Preserves the prism54/islsm authorship chain on main.c and bes2600.h (Michael Wu 2006 + Jean-Baptiste Note 2004-2006). - Drops the GPL-2.0 boilerplate paragraph in favour of SPDX. No code changes — only file-header comment blocks. Module build is unaffected (verified by header-only diff scope). This is a prerequisite for any kernel.org submission attempt. The existing MODULE_LICENSE("GPL") + MODULE_AUTHOR(Tarnyagin@stericsson.com) declarations were already present and are unchanged here; the mismatch between MODULE_AUTHOR and the (since-corrected) per-file copyrights is now resolved.	2026-05-20 20:17:58 +02:00
Markus Fritsche	73191b7bc1	bes2600: drop sdio_rx_work relay, IRQ→bh-direct (no-relay architecture) Patch C v3 — match cw1200 mainline architecture (drivers/net/wireless/st/cw1200/). Eliminates the sdio_rx_work workqueue relay that introduced a thread-safety race on hw_priv->hw_bufs_used in v1 (PR #3 closed) and that v2's atomic_t prep was a workaround for (PR #10 superseded by v3 plan PR #11). Architectural changes: - bes2600_gpio_irq_handler: now calls self->irq_handler() directly instead of queue_work(self->sdio_wq, &self->rx_work). Bumps bh_rx atomic + wakes bh_wq. - bes2600_bh_rx_helper (BES_SDIO_RX_MULTIPLE_ENABLE branch): now calls priv->sbus_ops->bus_rx_batch() to do the SDIO read inline. No pipe_read, no skb_dequeue. - bes2600_sdio_read_rx_batch (new): the SDIO read sequence extracted from sdio_rx_work, registered as sbus_ops->bus_rx_batch. Runs in bh thread context. - bes2600_sdio_extract_packets: calls bes2600_bh_handle_rx_skb() directly per parsed SKB. No skb_queue_tail, no rx_queue. - bes2600_bh_handle_rx_skb (new in bh.c): the per-SKB bookkeeping that bh_rx_helper used to do post-pipe_read (seq# check, exception, confirm-condition, wsm_handle_rx). Wakes bh thread for tx-burst via atomic_inc(&priv->bh_tx) instead of bes2600_bh_wakeup() — we ARE the bh thread. - Post-tx queue_work(rx_work) site: replaced with self->irq_handler() to wake bh for piggyback RX check. Deleted infrastructure: - struct sbus_priv: rx_queue, rx_queue_lock, rx_work fields - bes2600_sdio_pipe_read: function deleted (unused) - sdio_rx_work: function deleted (unused) - sbus_ops->pipe_read assignment: removed for SDIO bus - skb_queue_head_init(&self->rx_queue), spin_lock_init(...), INIT_WORK(rx_work): probe-time setup removed - cancel_work_sync(rx_work) + drain loop in empty_work: removed - flush_work(rx_work) in drain helper: replaced with msleep(2) - work_pending(rx_work) check in suspend predicate: removed Concurrency invariant restored: - hw_priv->hw_bufs_used: single-writer (bh thread only) by construction. No atomic_t needed. - hw_priv->hw_bufs_used_vif[]: ditto. - hw_priv->wsm_tx_pending[]: ditto. - All other shared state: unchanged or already protected. Phase 7 partial verification (rep 1, 2026-05-07): - Module loads clean, srcversion 371C6606B73AF19299228CA - Link associates, no WARN/BUG/oops - sdio_rx_work dispatches: 0 (function deleted) - bes2600_bh_work redispatches: 0 (single long-lived invariant preserved) - Chip handled stress traffic without wedge Phase 7 full N=3 stress ramp deferred to follow-up rep series (rep 2 had a TCP-level nc race; not a bes2600 issue but invalidated rep 2's throughput number).	2026-05-20 20:17:58 +02:00
Markus Fritsche	9e38ac5523	bes2600: fix concurrency UAF in bes2600_hw_scan and sched_scan bes2600_bss_info_changed() and bes2600_hw_scan() can run concurrently. The probe-request SKB allocated by ieee80211_probereq_get() before scan.lock + conf_lock are taken can be touched by a concurrent bss_info_changed (via wsm_set_template_frame's path) while we hold no lock. Reorder to acquire both locks BEFORE the SKB allocation. Also reorder cleanup paths so dev_kfree_skb() runs BEFORE up() — otherwise a small window exists where the SKB has been touched but the lock has been released, allowing concurrent code to also touch it. Three sites fixed: - bes2600_hw_scan: lock-take + ENOMEM cleanup + wsm_set_template_frame error cleanup + success-path SKB free + lock release order - bes2600_sched_scan_start (#ifdef ROAM_OFFLOAD): same three sub-fixes (compiled-out at default build, fixed for consistency) - All success/error paths: dev_kfree_skb before up() Backport of cw1200 mainline commit 86760e0dfe36 ("cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()", 2018-12-14), which fixed the identical bug in the same code shape we inherited. That commit was merged from upstream 4f68ef64cd7f. Cherry-picked from upstream Linux: 86760e0dfe36 cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan() Author: Jia-Ju Bai <baijiaju1990@gmail.com> Link: https://lore.kernel.org/r/20181214035521.7575-1-baijiaju1990@gmail.com	2026-05-20 20:17:58 +02:00
Markus Fritsche	77f966df25	bes2600: fix missing destroy_workqueue() on error in init_common Two error paths between create_singlethread_workqueue() (~main.c:489) and the success-path destroy_workqueue() in unregister_common (~609) return without cleaning up the workqueue, leaking it on probe failure: 1. bes2600_queue_stats_init() failure 2. bes2600_queue_init() failure (any of the 4 TID queues) Both call ieee80211_free_hw(hw); return NULL — without first destroy_workqueue(hw_priv->workqueue). Add it. Backport of cw1200 mainline commit 7ec8a926188e ("cw1200: fix missing destroy_workqueue() on error in cw1200_init_common", 2020-11-19), which fixed the identical bug in the same code shape we inherited. Reported on cw1200 by Hulk Robot. Cherry-picked from upstream Linux: 7ec8a926188e cw1200: fix missing destroy_workqueue() on error Author: Qinglang Miao <miaoqinglang@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20201119070842.1011-1-miaoqinglang@huawei.com Fixes: a910e4a94f69 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")	2026-05-20 20:17:58 +02:00
Markus Fritsche	d9268b433a	bes2600: replace a set of atomic_add() Backport of cw1200 mainline commit 07f995ca1951 ("cw1200: replace a set of atomic_add()", 2020-11-10). atomic_inc() reads more naturally than atomic_add(1, &x). Mechanical change, no functional impact. 7 sites: 6 in bh.c (bh_term, bh_rx x2, bh_tx x3) and 1 in itp.c (awaiting_confirm). Two of the bh_rx and three of the bh_tx sites are inside the cw1200-ancestor #if 0 block; replaced anyway to keep the file consistent with cw1200 mainline source style. Cherry-picked from upstream Linux: 07f995ca1951 cw1200: replace a set of atomic_add() Author: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1604991491-27908-1-git-send-email-yejune.deng@gmail.com	2026-05-20 20:17:58 +02:00
claude-noether	a7e232738d	bes2600: bus_reset on connection-loss storm to dodge assoc-comeback blackhole When mac80211 declares connection loss against this AP (typically driven by inactivity-deauth or beacon-loss), the userspace reauth that follows sometimes enters a long blackhole: the AP responds to auth with success but defers assoc with the 802.11v "assoc comeback" timer; ohm retries faster than the comeback grants permission; the AP eventually fires an unprotected deauth-reason-6 ("Class 2 frame received from non- authenticated station"), and recovery only completes via cross-SSID or cross-channel fallback. Receipts: ~86 s blackhole observed in the phase-7 rep on 2026-05-07 02:42, with three subsequent BSSIDs returning assoc comeback timeouts before reason-9 (STA_REQ_ASSOC_WITHOUT_AUTH) fired. Documented in marfrit/besser:notes/phase4-2026-05-07.md. When N=3 driver-side connection_loss decisions fire within a 60 s window on the same vif, skip the ieee80211_connection_loss() path and trigger the c5.2-introduced bes2600_chrdev_do_bus_reset() instead. The bus reset removes and re-probes the chip; userspace re-associates with a fresh chip state, dodging the AP's comeback-timer rejection cycle. Predicted Phase 7 delta vs current baseline: - api_connection_loss rate: unchanged (we don't address the trigger) - conditional probability of >5 s blackhole given event: <= 30 % - worst-case recovery: 86 s -> < 10 s Contract pin: bes2600_chrdev_do_bus_reset(sbus_ops, sbus_priv) at bes2600/bes_chardev.c:455, introduced by c5.2. The function is async- returning: sbus_ops->bus_reset() schedules an SDIO rescan; the helper waits up to 3 s for the remove() callback to clear sbus_priv, then returns. Per-vif state is gone after this point, so the recover work lives on bes2600_common (hw_priv) and uses the global bes2600_cdev for the bus_reset call rather than dereferencing per-vif state. Threshold (3 / 60 s) is well above the steady-state per-vif connection_loss rate observed in the patch-A phase-7 rep (0.86/h under sustained load), so a true storm is required to trip it. Files touched: - bes2600/bes2600.h: 3 counter fields on struct bes2600_vif, 1 work_struct on struct bes2600_common, 3 prototypes - bes2600/sta.c: 3 helpers + storm-account hook in bes2600_connection_loss_work + storm-init in bes2600_vif_setup + cancel_work_sync in the hw_priv shutdown path; #include bes_chardev.h was already pulled in by an earlier c-stack patch - bes2600/main.c: INIT_WORK alongside other hw_priv work_structs - bes2600/debug.c: ConnectionLossStormRecoveries seq_printf in the per-vif status seq_file output The cw1200/cw1260 ancestor has no equivalent; this is a clean addition. checkpatch.pl --no-tree --strict: clean (0/0/0). Signed-off-by: Claude (noether) <claude@reauktion.de>	2026-05-20 20:17:58 +02:00
claude-noether	3b4239ad2b	bes2600: pre-empt AP-deauth-6 with mac80211 reassoc on decrypt-fail storm When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a burst of received frames (typically because the host's PTK or GTK has fallen out of sync with the AP), the AP eventually concludes that the STA is not authenticated and emits an unprotected deauth-reason-6 ("Class 2 frame received from non-authenticated station"). On the deployed pinetab2 + bes2600 stack this AP-initiated deauth has been observed to leave the link blackholed for up to 109 s before userspace finds a different SSID/channel to recover on. (Receipts at https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.) Add a sliding-window counter on each bes2600_vif: when 5 decrypt failures fire within 5 s, schedule a worker that calls ieee80211_connection_loss(vif). mac80211 then performs immediate disassociation; userspace (NetworkManager / wpa_supplicant) reconnects with fresh keys before the AP gets a chance to fire its unprotected deauth. Predicted Phase 7 delta vs the unpatched baseline: - decrypt-burst rate: unchanged (this does not address root cause) - AP-deauth-6 rate: <= 0.2 of baseline - conditional probability of >5s blackhole given a burst: 100% -> <= 10% - worst-case recovery time: 109s -> <5s Contract pin: ieee80211_connection_loss() per include/net/mac80211.h: "may also be called if the connection needs to be terminated for some other reason... will cause immediate change to disassociated state, without connection recovery attempts." Userspace recovery is the existing NM/wpa_supplicant path. The worker context satisfies the implicit process-context expectation. Files touched: - bes2600/bes2600.h: 4 new fields on struct bes2600_vif + 2 prototypes - bes2600/txrx.c: new helpers + the call site at the existing WSM_STATUS_DECRYPTFAILURE log point (the unconditional "goto drop" branch in bes2600_rx_cb) - bes2600/sta.c: bes2600_decrypt_storm_init() in bes2600_vif_setup; cancel_work_sync() in bes2600_remove_interface, alongside the existing per-vif cancel_*_work_sync block. Safe under the kernel cancel_work_sync contract: the work_struct is INIT_WORK'd in setup, so the call is valid; it blocks until any in-flight handler returns, ensuring no use-after-free of priv when mac80211 frees the vif; and it is idempotent (subsequent calls just return false). - bes2600/debug.c: DecryptStormRecoveries seq_printf in the per-vif status seq_file output Threshold (5/5s) is set well above the steady-state per-vif decrypt- fail rate observed in measurement (~1/min even under sustained 1 MB/s load), so a true storm is required to trip it. The cw1200/cw1260 ancestor has no equivalent storm-recovery; this is a clean addition. checkpatch.pl --no-tree --strict: clean (0/0/0). Signed-off-by: Claude (noether) <claude@reauktion.de> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-20 20:17:58 +02:00
Markus Fritsche	d48f2ae73c	bes2600: handle multi-function SDIO cards in mmc_hw_reset bus_reset c5.2 (recover-wedged-firmware-via-mmc-hw-reset) wraps mmc_hw_reset() and treats any non-zero return as a recovery failure. On single-function SDIO cards mmc_hw_reset returns 0 after doing the remove + rescan inline. On multi-function cards (BES2600 has WLAN func 1 + BT companion func 2) the kernel's mmc_sdio_hw_reset() does NOT do the rescan: it tears the card down and returns 1 to signal "caller must trigger rescan". Field observation on PineTab2 (linux-pinetab2 6.19.10-danctnix1): when a real LMAC wedge fired bes2600_chrdev_wifi_force_close -> bes2600_chrdev_do_bus_reset, mmc_hw_reset returned 1, c5.2's wrapper treated that as "bus_reset failed: 1", logged the error, and gave up. The card was already removed (mmc2: card 0001 removed) but nothing scheduled a rescan; wifi (and the BT companion which shares the same SDIO host) stayed silent until the user rebooted four minutes later. Fix: - Capture the mmc_host pointer before calling mmc_hw_reset (the card pointer is invalid after the remove). - On positive return (multi-function path), log informationally and call mmc_detect_change(host, 0) to schedule a rescan. Return 0 so callers see the recovery as successful. - Negative return is still treated as failure as before. The mmc_detect_change side effect is asynchronous; the chrdev's wait_event_timeout(probe_done_wq, !sbus_priv) still observes the remove half synchronously, and the rescan + re-probe runs out of the host detect work afterwards. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:58 +02:00
Markus Fritsche	9a0a4c0a46	bes2600: self-detect when firmware does not honor PSM and skip the cycle The c6 series fixed several host-side bookkeeping bugs around PSM transitions, but didn't address the underlying contract: this chip's firmware (BES2600 with the Bestechnic Dec 2023 build that ships on PineTab2 and most danctnix images) silently drops every WSM_set_pm request without emitting the corresponding PM_INDICATION. The driver's own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every ~10s; without firmware acknowledgment each call burns 5s on wait_for_completion_timeout(pm_enter_cmpl, 5HZ) and produces a recurring three-line cascade in dmesg: bes2600_pwr_enter_lp_mode, wait pm ind timeout bes2600_sdio_active failed, subsys:0 bes2600_pwr_device_exit_lp_mode, active mcu fail Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2 6.19.10-danctnix1, ohm) running the c5+c6 stack: zero wsm_set_pm_indication() invocations across an entire boot, while bes2600_pwr_enter_lp_mode timed out repeatedly, and bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return 0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the firmware reports "I'm awake, there's nothing to wake from"). This patch makes the driver self-heal: struct bes2600_pwr_t gains pm_unsupported (bool) and pm_consecutive_timeouts (unsigned int). Both initialised to 0/false. * bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when pm_unsupported is set. Skips the per-VIF set_pm round-trip and the wait_for_completion entirely. * On the cmpxchg-success branch of the timeout path, we increment pm_consecutive_timeouts. When it crosses BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch pm_unsupported = true and force chip_pm_state = ACTIVE so that bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the wake side (no gpio_wake / sbus_active / WSM_set_operational_mode reissue past the first one). * bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0 on any incoming PM indication, and clears pm_unsupported if it was previously latched. So a firmware update that fixes PM_IND delivery automatically re-enables PSM transitions without a driver rebuild. mac80211's PSM requests via bes2600_set_pm() still flow to the firmware unchanged; they just don't have host-side timeouts so they remain silent regardless of firmware acknowledgment. Power consumption goes up if the firmware actually CAN do PSM (we'd be keeping the chip awake unnecessarily), but on a chip where the counter trips this trade-off is forced anyway: the chip stayed awake under the broken cascade as well, just with constant SDIO churn. Net effect on dmesg: after ~15s of boot, the three-line cascade stops firing entirely. The firmware-side wedge is observed once per boot (captured by the pm_unsupported latch) instead of per-cycle. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:58 +02:00
Markus Fritsche	51d46a2e25	bes2600: short-circuit wake handshake when chip is confirmed ACTIVE The previous patch ("bes2600: gate PM indication completion on pending request and track chip state") added enum bes2600_chip_pm_state and the chip_pm_state field tracking what the host has seen the firmware confirm. This patch makes the wake side use it. Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active), even when the chip is already in confirmed-ACTIVE state and the wake sequence has nothing to do. The visible failure mode on PineTab2: bes2600_pwr_enter_lp_mode, wait pm ind timeout repeat set gpio_wake_flag, sub_sys:0 bes2600_sdio_active failed, subsys:0 bes2600_pwr_device_exit_lp_mode, active mcu fail cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces: 1. enter_lp_mode timed out (firmware indication lost). With c6.1, chip_pm_state is now UNKNOWN. 2. lock_device fires exit_lp_mode. 3. gpio_wake hits "bit already set" because device_enter_lp_mode was skipped when the indication timed out, so gpio_sleep was never called - the bit reflects driver intent, not chip state. gpio_wake silently no-ops (no GPIO edge), bit stays set. 4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that never comes (firmware was never told to wake), then fails. 5. Driver continues to wsm_set_operational_mode against the wedged bus, compounding the failure. This patch's three moves: * bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry. On BES2600_CHIP_PM_ACTIVE, log at devel level and return without touching gpio_wake / sbus_active / WSM. The chip is in the state we want; the handshake exists only to drive a transition. * On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake handshake as before, but on sbus_active() failure: set chip_pm_state = UNKNOWN, log once at err level, and bail out. Do NOT call wsm_set_operational_mode over a wedged bus - it would just emit a second error and leave the chip in an even less defined state. * bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep(): demote "repeat set/clear gpio_wake_flag" from bes_err to bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want MCU awake) is the steady-state case, and the symmetric clear while bit-already-clear is racy bookkeeping rather than a hardware error. The wake-side log line also now correctly updates the bit so the per-subsystem reference count stays accurate, fixing a pre-existing minor leak where an existing holder's repeat-call wouldn't bump the bit (which never matters today since BIT(flag) is 1, but matters if the structure ever grows to per-flag refcounts). Net effect on the cycle: * If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips cleanly. Storm goes silent. * If chip is genuinely LP, behaviour is unchanged. * If chip is UNKNOWN (post-timeout state), one wake attempt is made; on failure, state stays UNKNOWN and we don't emit a second cascade error per attempt. Repeated UNKNOWN with failed wake will eventually be picked up by the LMAC active-monitor and escalated to mmc_hw_reset (c5.2). No new locks, no new state. Only consumption of the chip_pm_state field added in the prerequisite patch. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:58 +02:00
Markus Fritsche	7c4ad3b1d6	bes2600: gate PM indication completion on pending request and track chip state When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side PM-changed indication confirming the transition. Three sequenced flaws make the wait-and-confirm racy and leave host/chip bookkeeping desynced when anything misfires: 1) bes2600_pwr_notify_ps_changed() unconditionally fires complete(pm_enter_cmpl) for any non-active psmode. It does not check whether a host-initiated set_pm is actually pending. A spontaneous indication (firmware-internal coex move, idle-driven aging) primes the completion, and the next host- driven enter_lp_mode sees a false success on its first wait_for_completion_timeout. 2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is status = wait_for_completion_timeout(...); atomic_set(pm_set_in_process, 0); reinit_completion(...); If an indication arrives between wait_for_completion_timeout returning with status==1 and reinit_completion, the next enter_lp_mode iteration's wait can also see false success. The reinit must happen before we start the new request, not after handling the previous one. 3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks away. It does not record that the firmware's actual PM state is no longer known to the host. Subsequent wake paths (gpio_wake / sbus_active) assume the chip is still active and hit deterministic SDIO failures when the firmware has transitioned anyway. This patch is the safe-prerequisite half of a wider fix: * bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN} and bes_power.chip_pm_state. Its job is to track what the host has seen the firmware confirm, not what the host has requested. Initialised to ACTIVE in bes2600_pwr_init(). * bes2600_pwr_notify_ps_changed() unconditionally updates chip_pm_state on every indication, but only fires complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process, 1, 0) succeeds. A spontaneous indication can no longer prime a waiter that will only set up its request afterwards. * bes2600_pwr_enter_lp_mode() now reinit_completion()s before setting pm_set_in_process and sending wsm_set_pm. After a timeout, it cmpxchgs pm_set_in_process back to 0 (so a late indication cannot prime the next iteration) and on the win- cmpxchg branch records chip_pm_state=UNKNOWN. A follow-up patch consumes chip_pm_state on the wake side (bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix the deterministic "active mcu fail" cycle this state-record enables a fix for. Splitting the work this way keeps the lock-free race fix small and reviewable on its own. No new locks, no behaviour change on the success path. Only the recovery path (timeout + spontaneous indication) gains correctness. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:17:47 +02:00
Markus Fritsche	e0f664cbc9	bes2600: recover wedged firmware via mmc_hw_reset on link break When the LMAC active monitor detects 'link break between lmac and host' (the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor), bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the device down and prepare for a fresh probe. On the wifi_force_close_work side this calls bes2600_chrdev_do_system_close() which dispatches sbus_ops->power_switch(0). On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a no-op: * bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message, clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is the literal one-line stub bes_warn("...this function does nothing\n"). * bes2600_sdio_on() (the eventual power_switch(1) counterpart) toggles pdata->powerup, which is NULL on PineTab2 because the wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi: 'The reset pin is claimed by sdio_mmcseq, It is better to move it to U-Boot so the OS can use it.'). Net result: the chip is never reset. The function drivers are not removed (the SDIO core has no signal that the card is gone), the firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO function in a half-torn-down state. modprobe bes2600 then fails with 'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM) on both functions (:1 wifi, :2 BT-companion) until a full system reboot. Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150 minutes of background-scan rejects (wsm_generic_confirm 0x0007, [SCAN] Scan failed (-22)) accumulating until the LMAC stopped acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible under sustained scan pressure. Add a sbus operation bus_reset() that the recovery path can call when power_switch() has no effective chip-reset signal of its own. Provide an SDIO implementation that calls mmc_hw_reset(self->func->card), which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN and func 2 for the BT-companion path) takes the remove-and-rescan path: mmc_sdio_hw_reset() marks the card removed and schedules mmc_rescan, which tears down the bound function drivers and re-detects the card on the next sweep, in turn reinvoking bes2600_sdio_probe(). With a single function probed it instead invokes mmc_power_cycle() directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq. Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It invokes the bus op and then waits on probe_done_wq for the SDIO remove() callback to clear sbus_priv, mirroring the wait pattern already used by bes2600_chrdev_do_system_close() so that a subsequent bes2600_switch_wifi(true) sees a clean state and can wait on the fresh probe. Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is set (the hard-exception path used by both bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and the underlying bus implements bus_reset, take the new recovery path; otherwise fall back to the legacy power_switch(0) sequence so this patch is a no-op on USB or any other future bus that does not provide bus_reset. mmc_hw_reset() is exported by the MMC core and is the canonical recovery primitive; calling it without holding the SDIO host claim is correct because the multi-func remove-and-rescan path acquires the host claim via the mmc workqueue, and the single-func mmc_power_cycle path does not require the host claim. No DT change is required: this works against the existing PineTab2 DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on v2.0 boards) are both already configured as MMC pwrseq resets. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	bdb0450bdf	bes2600: widen scan-defer backoff to 30s and decay count on quiet The scan-defer logic added in the previous patch ("bes2600: defer scan and soften WARN on firmware reject") used a 10-second backoff window and never cleared reject_count outside of a successful scan. Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows two distinct mac80211 scan-retry cadences in practice: * Idle background scans every ~5 minutes when associated -- well outside any plausible backoff, the defer guard correctly falls through to a real WSM scan attempt. * Roam-evaluation bursts triggered when mac80211 wants to find a candidate AP for handover (signal degradation, beacon loss, locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and one boot reproduced 14 such rejected scans in 3 minutes during a single burst, none of which engaged the defer guard because every retry landed just outside the 10 s window. Two-line behaviour change to fix that: 1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10HZ to 30HZ, so a 12 s-cadence burst stays inside the window across consecutive rejects and the third reject in the burst trips the threshold guard. The 5 min idle case is still naturally past the window and is unaffected. 2. bes2600_scan_should_defer() resets reject_count to 0 when time_after(jiffies, backoff_until). Without this, reject_count accumulated indefinitely across the slow-cadence rejects, so an isolated reject after long quiet would have tripped the threshold the moment it arrived. After the change, count is latched only inside an active burst and decays cleanly when the burst ends. Net effect on a roam burst: * t=0 reject #1 (count 1, backoff_until = t0 + 30s) * t=12 reject #2 (count 2, backoff_until = t1 + 30s) * t=24 reject #3 (count 3, threshold met, next scan deferred) * t=36 defer fires, no WSM round-trip, reject not sent * ... defers continue until the firmware-policy state clears * scan succeeds -> reject_count = 0, normal cadence resumes WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just the scans needed to reach the threshold). wpa_supplicant's reason=3 locally-generated disconnects driven by exhausted roam candidates during the same burst window also drop. No new state, no new symbols, no change to mac80211-facing semantics: the deferred scan still completes via the existing fail: path with status=-EBUSY, the same response a real firmware-busy would produce. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	4fec8b2ecc	bes2600: defer scan and soften WARN on firmware reject On a BES2600-based PineTab2, mac80211's background-scan cadence (about every 30 s when associated) triggers a two-step WARN splat pattern, visible in dmesg roughly 30 times per 10 min of regular WiFi use: wsm_generic_confirm ret 2 WARNING: at wsm_handle_rx+0x8a4/0xf30 [bes2600] ... full stack trace ... ieee80211 phy0: wsm_generic_confirm failed for request 0x0007. WARNING: at bes2600_scan_work+0x5d4/0x810 [bes2600] ... full stack trace ... ieee80211 phy0: [SCAN] Scan failed (-22). 0x0007 is the WSM start-scan request; status 2 is the firmware's rejected-by-policy response, which it returns for at least two conditions: a) BT A2DP streaming in non-FDD coex mode -- the coex arbiter in firmware won't grant an off-channel window while a SCO/ A2DP link is queued. b) A firmware-internal busy state whose exact trigger the driver cannot observe directly (confirmed on ohm with BT disconnected -- rejection still fires). Likely transient firmware-PM transitions. Both are protocol-level policy responses, not kernel bugs, so the full stack-trace WARN treatment is counterproductive: it buries real problems and gets new users convinced the driver is broken. Three-part fix: 1. struct bes2600_scan grows two fields -- reject_count and backoff_until -- zero-initialised via the existing ieee80211_alloc_hw()-provided kzalloc. 2. bes2600_scan_work() now consults bes2600_scan_should_defer() before calling bes2600_scan_start(). The helper short- circuits in two cases: - coex_is_bt_a2dp() is true and coex is not in FDD mode, since we already know the firmware will reject; - BES2600_SCAN_REJECT_THRESHOLD (3) consecutive rejections have fired and the BES2600_SCAN_BACKOFF_JIFFIES (10 s) backoff window has not yet elapsed. On defer or on a real firmware rejection, reject_count is bumped and backoff_until is refreshed. A successful scan clears reject_count. 3. The WARN_ON(hw_priv->scan.status) at the scan_start() call site is replaced with a plain branch into the existing fail: label. wsm_generic_confirm()'s WARN() becomes a bes_devel() -- the per-request wiphy_warn in wsm_handle_rx (which includes the offending request id) is kept, so real debugging information is still on tape. Net behaviour: - Expected rejections no longer produce stack traces. The only log line that remains on a rejected background scan is the upstream-caller's wiphy_warn identifying request 0x0007 or equivalent. - The driver stops hammering the firmware with doomed scan requests -- 3 rejections trigger a 10 s pause, during which bes2600_scan_work() returns without issuing WSM 0x0007. - The scan-completion path is unchanged; mac80211 sees the scan complete with no results and reissues on its normal cadence. - Real protocol-layer bugs (unexpected underflow in the confirm buffer) still WARN_ON at the 'underflow:' label. Verified on ohm (PineTab2, linux-pinetab2 6.19.10-danctnix1-1): WARN splat count dropped from 32 to 0 per 10 min uptime. WiFi stays associated. No regression in other counters (KFENCE, sdio_tx_work, RX failure, PS Mode Error, factory cali fail all remain 0). Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-05-20 20:16:59 +02:00
Markus Fritsche	e0d752aae9	sync bes2600/ to v7.0-danctnix1 baseline (rebasing reference)	2026-05-19 09:04:33 +02:00
Manuel Traut	fe73571183	d/control: Fix packagename of fw dependency Signed-off-by: Manuel Traut <manut@mecka.net>	2025-12-09 13:42:27 +00:00
Julian	624fa34bf8	Depend on firmware	2025-11-27 09:02:49 +01:00
Julian	70f1551c94	WIP: Fix autopkgtest	2025-09-18 11:44:54 +02:00
Julian	ba20341e70	Upload Source: https://github.com/cringeops/bes2600 Source: https://github.com/cringeops/bes2600/pull/14 Source: https://github.com/cringeops/bes2600/pull/17 Source: https://github.com/cringeops/bes2600/pull/20	2025-09-17 16:35:45 +02:00

21 Commits