447240cbe8dee9d865683508f7d814e7ffe1d970
22 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
447240cbe8 |
bes2600: Patch C2 — replace ieee80211_rx_irqsafe with ieee80211_rx_ni
Per Phase 4 plan PR #14 + kerneldoc audit (Task #19). Six call sites deferred per-RX-frame mac80211 dispatch via tasklet; replace with the synchronous-from-process-context API ieee80211_rx_ni() which does its own local_bh_disable wrap. Why _ni and not _list: Phase 4 plan originally targeted ieee80211_rx_list for batch delivery. Mining mt76 mainline (the only driver using _list) showed the canonical pattern requires threading a struct list_head through the per-frame call chain. bes2600s WSM dispatcher (wsm_handle_rx -> bes2600_rx_cb / wsm.c beacon path) sits between the bh threads SDIO read and the mac80211 hand-off; threading a list_head through the dispatcher is a non-trivial refactor. ieee80211_rx_ni() is the simpler drop-in: no list management, still removes the tasklet hop. Per-call local_bh_disable cost is trivial vs the saved tasklet schedule. Future refactor can revisit _list if measurements warrant. Sites converted: - ap.c:96 (bes2600_sta_add link-id rx_queue drain on AP-mode STA add). Was inside spin_lock_bh(&ps_state_lock); refactored to splice the queue under the lock then deliver after unlock — _ni runs the synchronous mac80211 RX path inline, would otherwise hold the lock across mac80211 dispatch. splice via skb_queue_splice_init into a local sk_buff_head. - sta.c:1487 (deauth-frame inject in inactivity-event handler). Not under any lock; direct conversion. - txrx.c:1960 (early-data + pm_unsupported branch from Patch E). - txrx.c:1967 (early-data + LINK_SOFT-not-set branch). - txrx.c:1971 (normal RX path in bes2600_rx_cb). - wsm.c:2415 (beacon delivery in scan-complete WSM handler). beacon SKB ownership is preserved by the existing skb_copy(beacon, GFP_ATOMIC) -> beacon_bkp pattern; no lifecycle change needed. Mixing constraint (kerneldoc include/net/mac80211.h:5399-5430): ieee80211_rx_ni() cannot mix with ieee80211_rx_irqsafe() for a single hardware. All 6 sites convert atomically; no mixed state. Build verified clean on ohm sandbox: srcversion 619A51E61BF5479AAC146E6. Predicted Phase 7 delta: +5-15% over v3+D+E baseline (2.35 MB/s mean on v3 alone; D+E single-rep was 3.22 MB/s). Modest improvement expected from removing the tasklet schedule per RX frame. Smaller deltas would still be a net win for upstream-cleanliness — the kernel.org submission story benefits from not using _irqsafe from process context. |
||
|
|
dd01be0162 |
bes2600: Patch E — skip ps_state_lock when PSM-known-disabled
Per the Opus structural critique (PR #8 §2.4) and Sonnet review item 5. The per-RX-frame early-data path takes ps_state_lock to double-check whether a link entry transitioned to BES2600_LINK_SOFT (AP-side power-save state machine, soft-link transition). When c7 has latched pm_unsupported = true (firmware does not honor PSM, see feedback_bes2600_firmware_no_psm memory), the AP power-save state machine is dead and link entries never transition to LINK_SOFT. The per-frame spin_lock_bh + double-check is wasted work. This patch gates the lock acquisition on !pm_unsupported. When the latch is on (the steady state on the production-shipped bes2600 firmware), early_data RX frames bypass the spin_lock_bh and go directly to ieee80211_rx_irqsafe. If a future firmware drop fixes PSM, c7 self-clears pm_unsupported on the first real PM_INDICATION and the locked path resumes. Scope is narrower than Sonnet originally framed: only the per-RX-frame hot path (txrx.c:1945-1951 in cleanups+G+D) is touched. Other ps_state_lock sites in txrx.c (lines 657, 1256, 1420, 1528) are TX submission / multicast-start / link-id paths, not per-frame RX, and not on the Bug #5 hot path. Leave those alone. Build verified: srcversion B5922B4933590F33207EE97 on ohm sandbox. |
||
|
|
93f2aab656 |
bes2600: Patch D — atomicize ba_lock counters, drop the spinlock
The block-ack policy uses 4 int counters (ba_acc, ba_cnt, ba_acc_rx, ba_cnt_rx) bumped per data frame in the TX and RX hot paths under spin_lock_bh(&hw_priv->ba_lock). The lock was the heaviest per-frame synchronization cost remaining after Patch C v3 (which fixed the sdio_rx_work relay). Per the Opus structural critique (PR #8), this pattern matches mac80211 driver convention for per-frame statistics: atomic_t suffices, no lock needed. Field-by-field changes in struct bes2600_common: ba_acc, ba_cnt, ba_acc_rx, ba_cnt_rx: int -> atomic_t ba_armed: new atomic_t (timer-arm flag) ba_ena: bool -> atomic_t ba_lock: removed (spinlock_t deleted) ba_hist: int (single-writer = ba_timer) Producer hot path (txrx.c TX submit + RX receive): - atomic_add for the byte accumulator - atomic_inc for the frame counter - atomic_cmpxchg(&ba_armed, 0, 1) to claim the once-per-window mod_timer arm — at most ONE producer succeeds; race-free - no spin_lock_bh Consumer paths (sta.c bes2600_ba_timer, sta.c disconnect-reset, sta.c bes2600_ba_work, debug.c debugfs reader): - atomic_read snapshots all 4 counters into locals; the threshold predicate (acc/cnt >= THLD) tolerates approximate snapshots — the timer fires periodically, a single misclassification just delays the policy update by one tick - atomic_set zeroes the counters at end of timer-callback window; racing producer increments after the snapshot are lost (acceptable for stats; same approximation the original lock allowed under contention) - atomic_set(&ba_armed, 0) re-enables the next window's arm Followup-amenable simplification: ba_hist remains int because only the single ba_timer callback writes it; multiple writers would need to upgrade it too. This patch follows the cw1200-mainline-idiom established by Patch C v3 (structural fix, not bandaid). The cw1200 reference doesn't have a similar lock to compare; bes2600 inherited this from a later Bestechnic addition rather than the upstream tree. |
||
|
|
a02f8b7629 |
bes2600: Patch G — restore SPDX identifiers + ST-Ericsson attribution
The bes2600 driver is a fork of the upstream cw1200 driver
(drivers/net/wireless/st/cw1200/, ST-Ericsson, Dmitry Tarnyagin
2010-2011). The fork's file headers have three GPL-compliance issues:
1. NO SPDX-License-Identifier on any of 48 source files (cw1200
mainline has them on all 25). kernel.org-mandated since 2017.
2. Original "Copyright (c) 2010, ST-Ericsson" lines stripped from
all files inherited from cw1200, replaced with
"Copyright (c) 2010, Bestechnic" — factually impossible
(Bestechnic did not author the 2010 work) and a GPL-2.0 §1
attribution-preservation violation.
3. The "GPL version 2 as published by the Free Software Foundation"
boilerplate paragraph is redundant alongside SPDX and is the
legacy form modern kernel sources have replaced.
This patch corrects all three for the 48 .c/.h files in bes2600/:
- Adds `// SPDX-License-Identifier: GPL-2.0-only` (or `/* ... */`
for headers) as line 1 of every file.
- Restores `Copyright (c) 2010, ST-Ericsson` + `Author: Dmitry
Tarnyagin <dmitry.tarnyagin@lockless.no>` as the FIRST copyright
chain entry on all 22 files derived from cw1200 (bh.{c,h},
debug.{c,h}, fwio.{c,h}, hwio.{c,h}, main.c, pm.{c,h},
queue.{c,h}, scan.{c,h}, sta.{c,h}, txrx.{c,h}, wsm.{c,h}).
- Keeps `Copyright (c) 2022, Bestechnic (Beijing) Co., Ltd.` as
the SECOND chain entry where Bestechnic genuinely contributed.
- Notes "Derived from cw1200_sdio.c" + ST-Ericsson copyright on
bes2600_sdio.c (heavy derivation, not a literal rename).
- Notes "Replaces hwbus.h from cw1200/" + ST-Ericsson copyright
on sbus.h.
- Preserves the prism54/islsm authorship chain on main.c and
bes2600.h (Michael Wu 2006 + Jean-Baptiste Note 2004-2006).
- Drops the GPL-2.0 boilerplate paragraph in favour of SPDX.
No code changes — only file-header comment blocks. Module build is
unaffected (verified by header-only diff scope).
This is a prerequisite for any kernel.org submission attempt. The
existing MODULE_LICENSE("GPL") + MODULE_AUTHOR(Tarnyagin@stericsson.com)
declarations were already present and are unchanged here; the
mismatch between MODULE_AUTHOR and the (since-corrected) per-file
copyrights is now resolved.
|
||
|
|
73191b7bc1 |
bes2600: drop sdio_rx_work relay, IRQ→bh-direct (no-relay architecture)
Patch C v3 — match cw1200 mainline architecture (drivers/net/wireless/st/cw1200/). Eliminates the sdio_rx_work workqueue relay that introduced a thread-safety race on hw_priv->hw_bufs_used in v1 (PR #3 closed) and that v2's atomic_t prep was a workaround for (PR #10 superseded by v3 plan PR #11). Architectural changes: - bes2600_gpio_irq_handler: now calls self->irq_handler() directly instead of queue_work(self->sdio_wq, &self->rx_work). Bumps bh_rx atomic + wakes bh_wq. - bes2600_bh_rx_helper (BES_SDIO_RX_MULTIPLE_ENABLE branch): now calls priv->sbus_ops->bus_rx_batch() to do the SDIO read inline. No pipe_read, no skb_dequeue. - bes2600_sdio_read_rx_batch (new): the SDIO read sequence extracted from sdio_rx_work, registered as sbus_ops->bus_rx_batch. Runs in bh thread context. - bes2600_sdio_extract_packets: calls bes2600_bh_handle_rx_skb() directly per parsed SKB. No skb_queue_tail, no rx_queue. - bes2600_bh_handle_rx_skb (new in bh.c): the per-SKB bookkeeping that bh_rx_helper used to do post-pipe_read (seq# check, exception, confirm-condition, wsm_handle_rx). Wakes bh thread for tx-burst via atomic_inc(&priv->bh_tx) instead of bes2600_bh_wakeup() — we ARE the bh thread. - Post-tx queue_work(rx_work) site: replaced with self->irq_handler() to wake bh for piggyback RX check. Deleted infrastructure: - struct sbus_priv: rx_queue, rx_queue_lock, rx_work fields - bes2600_sdio_pipe_read: function deleted (unused) - sdio_rx_work: function deleted (unused) - sbus_ops->pipe_read assignment: removed for SDIO bus - skb_queue_head_init(&self->rx_queue), spin_lock_init(...), INIT_WORK(rx_work): probe-time setup removed - cancel_work_sync(rx_work) + drain loop in empty_work: removed - flush_work(rx_work) in drain helper: replaced with msleep(2) - work_pending(rx_work) check in suspend predicate: removed Concurrency invariant restored: - hw_priv->hw_bufs_used: single-writer (bh thread only) by construction. No atomic_t needed. - hw_priv->hw_bufs_used_vif[]: ditto. - hw_priv->wsm_tx_pending[]: ditto. - All other shared state: unchanged or already protected. Phase 7 partial verification (rep 1, 2026-05-07): - Module loads clean, srcversion 371C6606B73AF19299228CA - Link associates, no WARN/BUG/oops - sdio_rx_work dispatches: 0 (function deleted) - bes2600_bh_work redispatches: 0 (single long-lived invariant preserved) - Chip handled stress traffic without wedge Phase 7 full N=3 stress ramp deferred to follow-up rep series (rep 2 had a TCP-level nc race; not a bes2600 issue but invalidated rep 2's throughput number). |
||
|
|
9e38ac5523 |
bes2600: fix concurrency UAF in bes2600_hw_scan and sched_scan
bes2600_bss_info_changed() and bes2600_hw_scan() can run concurrently.
The probe-request SKB allocated by ieee80211_probereq_get() before
scan.lock + conf_lock are taken can be touched by a concurrent
bss_info_changed (via wsm_set_template_frame's path) while we hold no
lock. Reorder to acquire both locks BEFORE the SKB allocation.
Also reorder cleanup paths so dev_kfree_skb() runs BEFORE up() —
otherwise a small window exists where the SKB has been touched but the
lock has been released, allowing concurrent code to also touch it.
Three sites fixed:
- bes2600_hw_scan: lock-take + ENOMEM cleanup + wsm_set_template_frame
error cleanup + success-path SKB free + lock release order
- bes2600_sched_scan_start (#ifdef ROAM_OFFLOAD): same three sub-fixes
(compiled-out at default build, fixed for consistency)
- All success/error paths: dev_kfree_skb before up()
Backport of cw1200 mainline commit 86760e0dfe36 ("cw1200: Fix
concurrency use-after-free bugs in cw1200_hw_scan()", 2018-12-14),
which fixed the identical bug in the same code shape we inherited.
That commit was merged from upstream 4f68ef64cd7f.
Cherry-picked from upstream Linux:
86760e0dfe36 cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
Author: Jia-Ju Bai <baijiaju1990@gmail.com>
Link: https://lore.kernel.org/r/20181214035521.7575-1-baijiaju1990@gmail.com
|
||
|
|
77f966df25 |
bes2600: fix missing destroy_workqueue() on error in init_common
Two error paths between create_singlethread_workqueue() (~main.c:489)
and the success-path destroy_workqueue() in unregister_common (~609)
return without cleaning up the workqueue, leaking it on probe failure:
1. bes2600_queue_stats_init() failure
2. bes2600_queue_init() failure (any of the 4 TID queues)
Both call ieee80211_free_hw(hw); return NULL — without first
destroy_workqueue(hw_priv->workqueue). Add it.
Backport of cw1200 mainline commit 7ec8a926188e ("cw1200: fix missing
destroy_workqueue() on error in cw1200_init_common", 2020-11-19),
which fixed the identical bug in the same code shape we inherited.
Reported on cw1200 by Hulk Robot.
Cherry-picked from upstream Linux:
7ec8a926188e cw1200: fix missing destroy_workqueue() on error
Author: Qinglang Miao <miaoqinglang@huawei.com>
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/20201119070842.1011-1-miaoqinglang@huawei.com
Fixes: a910e4a94f69 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")
|
||
|
|
d9268b433a |
bes2600: replace a set of atomic_add()
Backport of cw1200 mainline commit 07f995ca1951 ("cw1200: replace a set
of atomic_add()", 2020-11-10). atomic_inc() reads more naturally than
atomic_add(1, &x). Mechanical change, no functional impact.
7 sites: 6 in bh.c (bh_term, bh_rx x2, bh_tx x3) and 1 in itp.c
(awaiting_confirm). Two of the bh_rx and three of the bh_tx sites are
inside the cw1200-ancestor #if 0 block; replaced anyway to keep the
file consistent with cw1200 mainline source style.
Cherry-picked from upstream Linux:
07f995ca1951 cw1200: replace a set of atomic_add()
Author: Yejune Deng <yejune.deng@gmail.com>
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Link: https://lore.kernel.org/r/1604991491-27908-1-git-send-email-yejune.deng@gmail.com
|
||
|
|
a7e232738d |
bes2600: bus_reset on connection-loss storm to dodge assoc-comeback blackhole
When mac80211 declares connection loss against this AP (typically driven
by inactivity-deauth or beacon-loss), the userspace reauth that follows
sometimes enters a long blackhole: the AP responds to auth with success
but defers assoc with the 802.11v "assoc comeback" timer; ohm retries
faster than the comeback grants permission; the AP eventually fires an
unprotected deauth-reason-6 ("Class 2 frame received from non-
authenticated station"), and recovery only completes via cross-SSID or
cross-channel fallback. Receipts: ~86 s blackhole observed in the
phase-7 rep on 2026-05-07 02:42, with three subsequent BSSIDs returning
assoc comeback timeouts before reason-9 (STA_REQ_ASSOC_WITHOUT_AUTH)
fired. Documented in marfrit/besser:notes/phase4-2026-05-07.md.
When N=3 driver-side connection_loss decisions fire within a 60 s window
on the same vif, skip the ieee80211_connection_loss() path and trigger
the c5.2-introduced bes2600_chrdev_do_bus_reset() instead. The bus
reset removes and re-probes the chip; userspace re-associates with a
fresh chip state, dodging the AP's comeback-timer rejection cycle.
Predicted Phase 7 delta vs current baseline:
- api_connection_loss rate: unchanged (we don't address the trigger)
- conditional probability of >5 s blackhole given event: <= 30 %
- worst-case recovery: 86 s -> < 10 s
Contract pin: bes2600_chrdev_do_bus_reset(sbus_ops, sbus_priv) at
bes2600/bes_chardev.c:455, introduced by c5.2. The function is async-
returning: sbus_ops->bus_reset() schedules an SDIO rescan; the helper
waits up to 3 s for the remove() callback to clear sbus_priv, then
returns. Per-vif state is gone after this point, so the recover work
lives on bes2600_common (hw_priv) and uses the global bes2600_cdev for
the bus_reset call rather than dereferencing per-vif state.
Threshold (3 / 60 s) is well above the steady-state per-vif
connection_loss rate observed in the patch-A phase-7 rep (0.86/h under
sustained load), so a true storm is required to trip it.
Files touched:
- bes2600/bes2600.h: 3 counter fields on struct bes2600_vif, 1
work_struct on struct bes2600_common, 3 prototypes
- bes2600/sta.c: 3 helpers + storm-account hook in
bes2600_connection_loss_work + storm-init in bes2600_vif_setup +
cancel_work_sync in the hw_priv shutdown path; #include bes_chardev.h
was already pulled in by an earlier c-stack patch
- bes2600/main.c: INIT_WORK alongside other hw_priv work_structs
- bes2600/debug.c: ConnectionLossStormRecoveries seq_printf in the
per-vif status seq_file output
The cw1200/cw1260 ancestor has no equivalent; this is a clean
addition. checkpatch.pl --no-tree --strict: clean (0/0/0).
Signed-off-by: Claude (noether) <claude@reauktion.de>
|
||
|
|
3b4239ad2b |
bes2600: pre-empt AP-deauth-6 with mac80211 reassoc on decrypt-fail storm
When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a burst
of received frames (typically because the host's PTK or GTK has fallen
out of sync with the AP), the AP eventually concludes that the STA is
not authenticated and emits an unprotected deauth-reason-6 ("Class 2
frame received from non-authenticated station"). On the deployed
pinetab2 + bes2600 stack this AP-initiated deauth has been observed to
leave the link blackholed for up to 109 s before userspace finds a
different SSID/channel to recover on. (Receipts at
https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)
Add a sliding-window counter on each bes2600_vif: when 5 decrypt
failures fire within 5 s, schedule a worker that calls
ieee80211_connection_loss(vif). mac80211 then performs immediate
disassociation; userspace (NetworkManager / wpa_supplicant) reconnects
with fresh keys before the AP gets a chance to fire its unprotected
deauth.
Predicted Phase 7 delta vs the unpatched baseline:
- decrypt-burst rate: unchanged (this does not address root cause)
- AP-deauth-6 rate: <= 0.2 of baseline
- conditional probability of >5s blackhole given a burst:
100% -> <= 10%
- worst-case recovery time: 109s -> <5s
Contract pin: ieee80211_connection_loss() per
include/net/mac80211.h: "may also be called if the connection needs to
be terminated for some other reason... will cause immediate change to
disassociated state, without connection recovery attempts." Userspace
recovery is the existing NM/wpa_supplicant path. The worker context
satisfies the implicit process-context expectation.
Files touched:
- bes2600/bes2600.h: 4 new fields on struct bes2600_vif + 2 prototypes
- bes2600/txrx.c: new helpers + the call site at the existing
WSM_STATUS_DECRYPTFAILURE log point (the unconditional "goto drop"
branch in bes2600_rx_cb)
- bes2600/sta.c: bes2600_decrypt_storm_init() in bes2600_vif_setup;
cancel_work_sync() in bes2600_remove_interface, alongside the
existing per-vif cancel_*_work_sync block. Safe under the kernel
cancel_work_sync contract: the work_struct is INIT_WORK'd in setup,
so the call is valid; it blocks until any in-flight handler returns,
ensuring no use-after-free of priv when mac80211 frees the vif; and
it is idempotent (subsequent calls just return false).
- bes2600/debug.c: DecryptStormRecoveries seq_printf in the per-vif
status seq_file output
Threshold (5/5s) is set well above the steady-state per-vif decrypt-
fail rate observed in measurement (~1/min even under sustained 1 MB/s
load), so a true storm is required to trip it. The cw1200/cw1260
ancestor has no equivalent storm-recovery; this is a clean addition.
checkpatch.pl --no-tree --strict: clean (0/0/0).
Signed-off-by: Claude (noether) <claude@reauktion.de>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
d48f2ae73c |
bes2600: handle multi-function SDIO cards in mmc_hw_reset bus_reset
c5.2 (recover-wedged-firmware-via-mmc-hw-reset) wraps mmc_hw_reset()
and treats any non-zero return as a recovery failure. On
single-function SDIO cards mmc_hw_reset returns 0 after doing the
remove + rescan inline. On multi-function cards (BES2600 has WLAN
func 1 + BT companion func 2) the kernel's mmc_sdio_hw_reset() does
NOT do the rescan: it tears the card down and returns 1 to signal
"caller must trigger rescan".
Field observation on PineTab2 (linux-pinetab2 6.19.10-danctnix1):
when a real LMAC wedge fired bes2600_chrdev_wifi_force_close ->
bes2600_chrdev_do_bus_reset, mmc_hw_reset returned 1, c5.2's wrapper
treated that as "bus_reset failed: 1", logged the error, and gave
up. The card was already removed (mmc2: card 0001 removed) but
nothing scheduled a rescan; wifi (and the BT companion which shares
the same SDIO host) stayed silent until the user rebooted four
minutes later.
Fix:
- Capture the mmc_host pointer before calling mmc_hw_reset (the
card pointer is invalid after the remove).
- On positive return (multi-function path), log informationally
and call mmc_detect_change(host, 0) to schedule a rescan.
Return 0 so callers see the recovery as successful.
- Negative return is still treated as failure as before.
The mmc_detect_change side effect is asynchronous; the chrdev's
wait_event_timeout(probe_done_wq, !sbus_priv) still observes the
remove half synchronously, and the rescan + re-probe runs out of
the host detect work afterwards.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
9a0a4c0a46 |
bes2600: self-detect when firmware does not honor PSM and skip the cycle
The c6 series fixed several host-side bookkeeping bugs around PSM
transitions, but didn't address the underlying contract: this chip's
firmware (BES2600 with the Bestechnic Dec 2023 build that ships on
PineTab2 and most danctnix images) silently drops every WSM_set_pm
request without emitting the corresponding PM_INDICATION. The driver's
own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every
~10s; without firmware acknowledgment each call burns 5s on
wait_for_completion_timeout(pm_enter_cmpl, 5*HZ) and produces a
recurring three-line cascade in dmesg:
bes2600_pwr_enter_lp_mode, wait pm ind timeout
bes2600_sdio_active failed, subsys:0
bes2600_pwr_device_exit_lp_mode, active mcu fail
Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2
6.19.10-danctnix1, ohm) running the c5+c6 stack: zero
wsm_set_pm_indication() invocations across an entire boot, while
bes2600_pwr_enter_lp_mode timed out repeatedly, and
bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return
0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the
firmware reports "I'm awake, there's nothing to wake from").
This patch makes the driver self-heal:
* struct bes2600_pwr_t gains pm_unsupported (bool) and
pm_consecutive_timeouts (unsigned int). Both initialised to
0/false.
* bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when
pm_unsupported is set. Skips the per-VIF set_pm round-trip and
the wait_for_completion entirely.
* On the cmpxchg-success branch of the timeout path, we increment
pm_consecutive_timeouts. When it crosses
BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch
pm_unsupported = true and force chip_pm_state = ACTIVE so that
bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the
wake side (no gpio_wake / sbus_active / WSM_set_operational_mode
reissue past the first one).
* bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0
on any incoming PM indication, and clears pm_unsupported if it
was previously latched. So a firmware update that fixes PM_IND
delivery automatically re-enables PSM transitions without a
driver rebuild.
mac80211's PSM requests via bes2600_set_pm() still flow to the
firmware unchanged; they just don't have host-side timeouts so they
remain silent regardless of firmware acknowledgment. Power
consumption goes up if the firmware actually CAN do PSM (we'd be
keeping the chip awake unnecessarily), but on a chip where the
counter trips this trade-off is forced anyway: the chip stayed awake
under the broken cascade as well, just with constant SDIO churn.
Net effect on dmesg: after ~15s of boot, the three-line cascade stops
firing entirely. The firmware-side wedge is observed once per boot
(captured by the pm_unsupported latch) instead of per-cycle.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
51d46a2e25 |
bes2600: short-circuit wake handshake when chip is confirmed ACTIVE
The previous patch ("bes2600: gate PM indication completion on pending
request and track chip state") added enum bes2600_chip_pm_state and the
chip_pm_state field tracking what the host has *seen the firmware
confirm*. This patch makes the wake side use it.
Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally
runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active),
even when the chip is already in confirmed-ACTIVE state and the wake
sequence has nothing to do. The visible failure mode on PineTab2:
bes2600_pwr_enter_lp_mode, wait pm ind timeout
repeat set gpio_wake_flag, sub_sys:0
bes2600_sdio_active failed, subsys:0
bes2600_pwr_device_exit_lp_mode, active mcu fail
cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces:
1. enter_lp_mode timed out (firmware indication lost). With c6.1,
chip_pm_state is now UNKNOWN.
2. lock_device fires exit_lp_mode.
3. gpio_wake hits "bit already set" because device_enter_lp_mode
was skipped when the indication timed out, so gpio_sleep was
never called - the bit reflects driver intent, not chip state.
gpio_wake silently no-ops (no GPIO edge), bit stays set.
4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that
never comes (firmware was never told to wake), then fails.
5. Driver continues to wsm_set_operational_mode against the wedged
bus, compounding the failure.
This patch's three moves:
* bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry.
On BES2600_CHIP_PM_ACTIVE, log at devel level and return without
touching gpio_wake / sbus_active / WSM. The chip is in the state
we want; the handshake exists only to drive a transition.
* On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake
handshake as before, but on sbus_active() failure: set
chip_pm_state = UNKNOWN, log once at err level, and bail out.
Do NOT call wsm_set_operational_mode over a wedged bus - it
would just emit a second error and leave the chip in an even
less defined state.
* bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep():
demote "repeat set/clear gpio_wake_flag" from bes_err to
bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want
MCU awake) is the steady-state case, and the symmetric clear
while bit-already-clear is racy bookkeeping rather than a
hardware error. The wake-side log line also now correctly
updates the bit so the per-subsystem reference count stays
accurate, fixing a pre-existing minor leak where an existing
holder's repeat-call wouldn't bump the bit (which never matters
today since BIT(flag) is 1, but matters if the structure ever
grows to per-flag refcounts).
Net effect on the cycle:
* If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips
cleanly. Storm goes silent.
* If chip is genuinely LP, behaviour is unchanged.
* If chip is UNKNOWN (post-timeout state), one wake attempt is
made; on failure, state stays UNKNOWN and we don't emit a
second cascade error per attempt. Repeated UNKNOWN with failed
wake will eventually be picked up by the LMAC active-monitor
and escalated to mmc_hw_reset (c5.2).
No new locks, no new state. Only consumption of the chip_pm_state
field added in the prerequisite patch.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
7c4ad3b1d6 |
bes2600: gate PM indication completion on pending request and track chip state
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
PM-changed indication confirming the transition. Three sequenced
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
desynced when anything misfires:
1) bes2600_pwr_notify_ps_changed() unconditionally fires
complete(pm_enter_cmpl) for any non-active psmode. It does not
check whether a host-initiated set_pm is actually pending. A
spontaneous indication (firmware-internal coex move,
idle-driven aging) primes the completion, and the next host-
driven enter_lp_mode sees a false success on its first
wait_for_completion_timeout.
2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is
status = wait_for_completion_timeout(...);
atomic_set(pm_set_in_process, 0);
reinit_completion(...);
If an indication arrives between wait_for_completion_timeout
returning with status==1 and reinit_completion, the next
enter_lp_mode iteration's wait can also see false success. The
reinit must happen *before* we start the new request, not
after handling the previous one.
3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
away. It does not record that the firmware's actual PM state
is no longer known to the host. Subsequent wake paths
(gpio_wake / sbus_active) assume the chip is still active and
hit deterministic SDIO failures when the firmware has
transitioned anyway.
This patch is the safe-prerequisite half of a wider fix:
* bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
and bes_power.chip_pm_state. Its job is to track what the host
has *seen the firmware confirm*, not what the host has
requested. Initialised to ACTIVE in bes2600_pwr_init().
* bes2600_pwr_notify_ps_changed() unconditionally updates
chip_pm_state on every indication, but only fires
complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
1, 0) succeeds. A spontaneous indication can no longer prime a
waiter that will only set up its request afterwards.
* bes2600_pwr_enter_lp_mode() now reinit_completion()s before
setting pm_set_in_process and sending wsm_set_pm. After a
timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
indication cannot prime the next iteration) and on the win-
cmpxchg branch records chip_pm_state=UNKNOWN.
A follow-up patch consumes chip_pm_state on the wake side
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
the deterministic "active mcu fail" cycle this state-record
enables a fix for. Splitting the work this way keeps the lock-free
race fix small and reviewable on its own.
No new locks, no behaviour change on the success path. Only the
recovery path (timeout + spontaneous indication) gains correctness.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
e0f664cbc9 |
bes2600: recover wedged firmware via mmc_hw_reset on link break
When the LMAC active monitor detects 'link break between lmac and host'
(the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor),
bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the
device down and prepare for a fresh probe. On the wifi_force_close_work
side this calls bes2600_chrdev_do_system_close() which dispatches
sbus_ops->power_switch(0).
On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a
no-op:
* bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message,
clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is
the literal one-line stub bes_warn("...this function does
nothing\n").
* bes2600_sdio_on() (the eventual power_switch(1) counterpart)
toggles pdata->powerup, which is NULL on PineTab2 because the
wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device
tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi:
'The reset pin is claimed by sdio_mmcseq, It is better to move it
to U-Boot so the OS can use it.').
Net result: the chip is never reset. The function drivers are not
removed (the SDIO core has no signal that the card is gone), the
firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO
function in a half-torn-down state. modprobe bes2600 then fails with
'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM)
on both functions (:1 wifi, :2 BT-companion) until a full system
reboot.
Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150
minutes of background-scan rejects (wsm_generic_confirm 0x0007,
[SCAN] Scan failed (-22)) accumulating until the LMAC stopped
acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible
under sustained scan pressure.
Add a sbus operation bus_reset() that the recovery path can call when
power_switch() has no effective chip-reset signal of its own. Provide
an SDIO implementation that calls mmc_hw_reset(self->func->card),
which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN
and func 2 for the BT-companion path) takes the remove-and-rescan
path: mmc_sdio_hw_reset() marks the card removed and schedules
mmc_rescan, which tears down the bound function drivers and re-detects
the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
With a single function probed it instead invokes mmc_power_cycle()
directly, which on PineTab2 toggles the wifi-reset GPIO via
sdio_pwrseq.
Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It
invokes the bus op and then waits on probe_done_wq for the SDIO
remove() callback to clear sbus_priv, mirroring the wait pattern
already used by bes2600_chrdev_do_system_close() so that a subsequent
bes2600_switch_wifi(true) sees a clean state and can wait on the
fresh probe.
Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is
set (the hard-exception path used by both
bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and
the underlying bus implements bus_reset, take the new recovery path;
otherwise fall back to the legacy power_switch(0) sequence so this
patch is a no-op on USB or any other future bus that does not provide
bus_reset.
mmc_hw_reset() is exported by the MMC core and is the canonical
recovery primitive; calling it without holding the SDIO host claim is
correct because the multi-func remove-and-rescan path acquires the
host claim via the mmc workqueue, and the single-func mmc_power_cycle
path does not require the host claim.
No DT change is required: this works against the existing PineTab2
DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on
v2.0 boards) are both already configured as MMC pwrseq resets.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
bdb0450bdf |
bes2600: widen scan-defer backoff to 30s and decay count on quiet
The scan-defer logic added in the previous patch ("bes2600: defer
scan and soften WARN on firmware reject") used a 10-second backoff
window and never cleared reject_count outside of a successful scan.
Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows
two distinct mac80211 scan-retry cadences in practice:
* Idle background scans every ~5 minutes when associated -- well
outside any plausible backoff, the defer guard correctly falls
through to a real WSM scan attempt.
* Roam-evaluation bursts triggered when mac80211 wants to find a
candidate AP for handover (signal degradation, beacon loss,
locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and
one boot reproduced 14 such rejected scans in 3 minutes during a
single burst, none of which engaged the defer guard because every
retry landed just outside the 10 s window.
Two-line behaviour change to fix that:
1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10*HZ to 30*HZ, so a
12 s-cadence burst stays inside the window across consecutive
rejects and the third reject in the burst trips the threshold
guard. The 5 min idle case is still naturally past the window
and is unaffected.
2. bes2600_scan_should_defer() resets reject_count to 0 when
time_after(jiffies, backoff_until). Without this, reject_count
accumulated indefinitely across the slow-cadence rejects, so an
isolated reject after long quiet would have tripped the
threshold the moment it arrived. After the change, count is
latched only inside an active burst and decays cleanly when the
burst ends.
Net effect on a roam burst:
* t=0 reject #1 (count 1, backoff_until = t0 + 30s)
* t=12 reject #2 (count 2, backoff_until = t1 + 30s)
* t=24 reject #3 (count 3, threshold met, next scan deferred)
* t=36 defer fires, no WSM round-trip, reject not sent
* ... defers continue until the firmware-policy state clears
* scan succeeds -> reject_count = 0, normal cadence resumes
WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just
the scans needed to reach the threshold). wpa_supplicant's reason=3
locally-generated disconnects driven by exhausted roam candidates
during the same burst window also drop.
No new state, no new symbols, no change to mac80211-facing semantics:
the deferred scan still completes via the existing fail: path with
status=-EBUSY, the same response a real firmware-busy would produce.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
4fec8b2ecc |
bes2600: defer scan and soften WARN on firmware reject
On a BES2600-based PineTab2, mac80211's background-scan cadence
(about every 30 s when associated) triggers a two-step WARN splat
pattern, visible in dmesg roughly 30 times per 10 min of regular
WiFi use:
wsm_generic_confirm ret 2
WARNING: at wsm_handle_rx+0x8a4/0xf30 [bes2600]
... full stack trace ...
ieee80211 phy0: wsm_generic_confirm failed for request 0x0007.
WARNING: at bes2600_scan_work+0x5d4/0x810 [bes2600]
... full stack trace ...
ieee80211 phy0: [SCAN] Scan failed (-22).
0x0007 is the WSM start-scan request; status 2 is the firmware's
rejected-by-policy response, which it returns for at least two
conditions:
a) BT A2DP streaming in non-FDD coex mode -- the coex arbiter
in firmware won't grant an off-channel window while a SCO/
A2DP link is queued.
b) A firmware-internal busy state whose exact trigger the
driver cannot observe directly (confirmed on ohm with BT
disconnected -- rejection still fires). Likely transient
firmware-PM transitions.
Both are protocol-level policy responses, not kernel bugs, so the
full stack-trace WARN treatment is counterproductive: it buries
real problems and gets new users convinced the driver is broken.
Three-part fix:
1. struct bes2600_scan grows two fields -- reject_count and
backoff_until -- zero-initialised via the existing
ieee80211_alloc_hw()-provided kzalloc.
2. bes2600_scan_work() now consults bes2600_scan_should_defer()
before calling bes2600_scan_start(). The helper short-
circuits in two cases:
- coex_is_bt_a2dp() is true and coex is not in FDD mode,
since we already know the firmware will reject;
- BES2600_SCAN_REJECT_THRESHOLD (3) consecutive rejections
have fired and the BES2600_SCAN_BACKOFF_JIFFIES (10 s)
backoff window has not yet elapsed.
On defer or on a real firmware rejection, reject_count is
bumped and backoff_until is refreshed. A successful scan
clears reject_count.
3. The WARN_ON(hw_priv->scan.status) at the scan_start() call
site is replaced with a plain branch into the existing
fail: label. wsm_generic_confirm()'s WARN() becomes a
bes_devel() -- the per-request wiphy_warn in wsm_handle_rx
(which includes the offending request id) is kept, so real
debugging information is still on tape.
Net behaviour:
- Expected rejections no longer produce stack traces. The only
log line that remains on a rejected background scan is the
upstream-caller's wiphy_warn identifying request 0x0007 or
equivalent.
- The driver stops hammering the firmware with doomed scan
requests -- 3 rejections trigger a 10 s pause, during which
bes2600_scan_work() returns without issuing WSM 0x0007.
- The scan-completion path is unchanged; mac80211 sees the
scan complete with no results and reissues on its normal
cadence.
- Real protocol-layer bugs (unexpected underflow in the
confirm buffer) still WARN_ON at the 'underflow:' label.
Verified on ohm (PineTab2, linux-pinetab2 6.19.10-danctnix1-1):
WARN splat count dropped from 32 to 0 per 10 min uptime. WiFi
stays associated. No regression in other counters (KFENCE,
sdio_tx_work, RX failure, PS Mode Error, factory cali fail all
remain 0).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||
|
|
e0d752aae9 | sync bes2600/ to v7.0-danctnix1 baseline (rebasing reference) | ||
|
|
fe73571183 |
d/control: Fix packagename of fw dependency
Signed-off-by: Manuel Traut <manut@mecka.net> |
||
|
|
624fa34bf8 | Depend on firmware | ||
|
|
70f1551c94 | WIP: Fix autopkgtest | ||
|
|
ba20341e70 |
Upload
Source: https://github.com/cringeops/bes2600 Source: https://github.com/cringeops/bes2600/pull/14 Source: https://github.com/cringeops/bes2600/pull/17 Source: https://github.com/cringeops/bes2600/pull/20 |