From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 11:27:40 +0200 Subject: [PATCH 1/3] bes2600: filter 5 GHz scans at the driver boundary (besser#1) The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected by policy"). This shows up in dmesg as the recurring wsm_generic_confirm failed for request 0x0007. [SCAN] Scan failed (-22). pattern (besser issue #1, ~14-16/h on ohm/PineTab2 baseline). Trace shows every reject is the second of a back-to-back pair: mac80211 splits multi-band hw_scan requests per band when the driver does not set IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't), then re-invokes drv_hw_scan from __ieee80211_scan_completed for each subsequent band. The 2.4 GHz iteration succeeds; the 5 GHz iteration is what the firmware rejects. See ieee80211_prep_hw_scan in net/mac80211/scan.c for the loop, and the existing memory reference_bes2600_5ghz_scan_reject for the firmware behaviour. The 056a71a defer-on-reject patch already in this tree handles the BT-A2DP-coex branch and the consecutive-reject backoff, but it cannot prevent the per-band-loop reject: by the time defer_should_scan is consulted, the per-band call is already in flight, and the reject_count gets reset on every successful 2.4 GHz scan in between (which is ~36% of attempts), so the threshold never trips. The fix: refuse the 5 GHz iteration upfront in bes2600_hw_scan. The 2.4 GHz scan still runs normally. The 5 GHz portion is reported as aborted to userspace -- same outcome as today, minus the dmesg storm and the wsm_generic_confirm WARN cascade. 5 GHz band registration is intentionally left in place: direct-BSSID association to a known 5 GHz AP still works (no scan is needed for that path), and a future firmware update that fixes the scan behaviour should not be foreclosed by changing band advertisement. Contract: per include/net/mac80211.h ieee80211_ops.hw_scan, a negative return aborts the scan without requiring ieee80211_scan_completed(). -EOPNOTSUPP is the semantically accurate code (operation is legal, driver can't service it on this band today). Phase 3 evidence: - baseline N=3: rate ~14.3-23.6/h converged at 14.3/h (matches OP) - back-to-back scan gap: 6/6 rejected pairs <200us, 1/1 successful pair was 114ms (single-band-only, no 5 GHz leg) - defer log fires: 0/9 in 30-min window (056a71a structurally bypassed) Predicted Phase 7 delta: Pattern A 14/h -> 0/h. --- bes2600/scan.c | 22 ++++++++++++++++++++++ 1 file changed, 22 insertions(+) diff --git a/bes2600/scan.c b/bes2600/scan.c index fb1d298..a81afb6 100644 --- a/bes2600/scan.c +++ b/bes2600/scan.c @@ -238,6 +238,28 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, /* Scan when P2P_GO corrupt firmware MiniAP mode */ if (priv->join_status == BES2600_JOIN_STATUS_AP) return -EOPNOTSUPP; + + /* + * Firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected + * by policy"); see besser issue #1. mac80211 splits multi-band + * hw_scan requests per-band when the driver does not set + * IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't -- see + * ieee80211_hw_set() calls in bes2600_main.c), so each per-band call + * has req->channels[] from one band only (see ieee80211_prep_hw_scan + * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver + * boundary so userspace gets a clean aborted-scan for that portion + * rather than waiting for the firmware reject to cascade up. 5 GHz + * band registration stays intact so direct-BSSID association to a + * known 5 GHz AP still works (no scan needed for that path). + * + * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan + * documentation, a negative return aborts the scan without requiring + * ieee80211_scan_completed(). + */ + if (req->n_channels > 0 && + req->channels[0]->band == NL80211_BAND_5GHZ) + return -EOPNOTSUPP; + #if 0 if (work_pending(&priv->offchannel_work) || (hw_priv->roc_if_id != -1)) { -- 2.47.3 From 8cd10f487c8144d462a510812ba0fa717b3e24df Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 15:56:34 +0200 Subject: [PATCH 2/3] bes2600: scan-filter-5ghz: allow targeted single-channel scans (besser#1 follow-up) The original Patch I refused EVERY 5 GHz scan request unconditionally (req->n_channels > 0 && band == NL80211_BAND_5GHZ). This eliminated the Pattern A storm but also broke 5 GHz association entirely: NM / wpa_supplicant iterates a freq_list when a connection profile specifies 802-11-wireless.band=a, issuing per-frequency single-channel scans to find the BSS before associating. Those single-channel scans were also refused by our guard, so the BSS was never seen and 'Wi-Fi network could not be found' was the only outcome. Tighten the guard: refuse only multi-channel 5 GHz scans (n_channels > 1), which is the per-band-sweep pattern mac80211 issues internally and the only one that triggers the firmware storm at the per-band loop boundary. Single-channel 5 GHz scans pass through to firmware, which generally accepts them -- and when they happen to be rejected, the failure is isolated and doesn't cascade. Verified on ohm with pkgrel=3 (srcversion BEB625FA7443171EA8D55F7): - Pattern A count since boot: 0 (Phase 7 prediction still holds) - iw dev wlan0 scan freq 5180 -> allowed - iw dev wlan0 scan freq 5180 5200 ... -> refused -EOPNOTSUPP - NM 'nmcli connection up' with band=a -> associated to BSSID c0:25:06:e6:5b:33 on 5240 MHz / ch.48 in ~1 second - TX bitrate 150 Mbit/s MCS 7 40MHz short-GI (vs 72.2 Mbit/s HT20 on 2.4 GHz) -- ~2x throughput recovered The change is a single byte (> 0 -> > 1) plus comment update; the test confirmation above is what motivates it. Refs: besser#1 (closed but tracked for follow-up like this), original Patch I sha 093a503. --- bes2600/scan.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/bes2600/scan.c b/bes2600/scan.c index a81afb6..497523b 100644 --- a/bes2600/scan.c +++ b/bes2600/scan.c @@ -248,15 +248,23 @@ int bes2600_hw_scan(struct ieee80211_hw *hw, * has req->channels[] from one band only (see ieee80211_prep_hw_scan * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver * boundary so userspace gets a clean aborted-scan for that portion - * rather than waiting for the firmware reject to cascade up. 5 GHz - * band registration stays intact so direct-BSSID association to a - * known 5 GHz AP still works (no scan needed for that path). + * rather than waiting for the firmware reject to cascade up. + * + * Only the multi-channel case is refused (n_channels > 1): that's + * the per-band-sweep pattern mac80211 issues internally and the + * one that triggers the firmware storm at the per-band loop + * boundary. Single-channel 5 GHz scans (BSS verification, NM's + * per-freq iteration when 802-11-wireless.band=a is set) pass + * through to firmware, which generally accepts them since the + * storm is the back-to-back per-band issue, not a blanket 5 GHz + * reject. This preserves 5 GHz association via the + * "wpa_supplicant iterates freq_list per channel" path. * * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan * documentation, a negative return aborts the scan without requiring * ieee80211_scan_completed(). */ - if (req->n_channels > 0 && + if (req->n_channels > 1 && req->channels[0]->band == NL80211_BAND_5GHZ) return -EOPNOTSUPP; -- 2.47.3 From d95453c98e31d7a47bc227aef5d0b426ac9e334b Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Mon, 18 May 2026 16:58:49 +0200 Subject: [PATCH 3/3] =?UTF-8?q?bes2600:=20take=20pending=5Frecord=5Flock?= =?UTF-8?q?=20with=20=5Fbh()=20to=20fix=20SOFTIRQ-safe=20=E2=86=92=20-unsa?= =?UTF-8?q?fe=20inversion=20(besser#18)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit PROVE_LOCKING reports: WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected kworker/u16:1 is trying to acquire: &hw_priv->tx_loop.pending_record_lock at bes2600_queue_clear+0x80 and this task is already holding: &queue->lock at bes2600_queue_clear+0x60 which would create a new lock dependency: (&queue->lock){+.-.} -> (&hw_priv->tx_loop.pending_record_lock){+.+.} but this new dependency connects a SOFTIRQ-irq-safe lock: (&queue->lock){+.-.} ... which became SOFTIRQ-irq-safe at: bes2600_tx -> ieee80211_handle_wake_tx_queue -> tasklet_action to a SOFTIRQ-irq-unsafe lock: (&hw_priv->tx_loop.pending_record_lock){+.+.} ... which became SOFTIRQ-irq-unsafe at: bes2600_queue_get_skb -> bes2600_join_work -> process_one_work queue->lock is taken consistently with spin_lock_bh() at 22 sites; the nested acquisition of pending_record_lock at queue.c:289 (inside the outer queue->lock_bh held at line 285) had it implicitly BH-safe via the outer scope. But pending_record_lock is ALSO taken from non-BH-disabled contexts: bes2600_queue_get_skb (queue.c:832) — process context via bes2600_join_work (workqueue), no outer queue->lock held bes2600_tx_loop_item_pending_check (tx_loop.c:112) — TX-loop context, no outer queue->lock held When CPU0 holds pending_record_lock from one of those non-BH paths and a softirq fires that wants queue->lock, and CPU1 in softirq has queue->lock and is about to acquire pending_record_lock — classic AB-BA SOFTIRQ deadlock. The fix is the conservative one: take pending_record_lock with _bh() at every site that's not already inside a queue->lock_bh-held scope. That makes the lock consistently SOFTIRQ-safe, eliminating the inversion. queue.c:289/295 stays as plain spin_lock because BH is already disabled by the outer queue->lock_bh acquired at queue.c:285. Five sites converted: bes2600/queue.c:832 -- spin_lock -> spin_lock_bh bes2600/queue.c:839 -- spin_unlock -> spin_unlock_bh bes2600/queue.c:844 -- spin_unlock -> spin_unlock_bh bes2600/tx_loop.c:112 -- spin_lock -> spin_lock_bh bes2600/tx_loop.c:114 -- spin_unlock -> spin_unlock_bh Contract: - Documentation/locking/locktypes.rst spelling: spin_lock_bh() is the canonical way to make a non-IRQ spinlock safe against softirq preemption that might re-enter the same lock. - Same shape as queue->lock in this driver and as is_drv->lock in the cw1200 ancestor. Closes: besser#18 Fixes: Signed-off-by: Markus Fritsche --- bes2600/queue.c | 6 +++--- bes2600/tx_loop.c | 4 ++-- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/bes2600/queue.c b/bes2600/queue.c index cc606c1..4016b76 100644 --- a/bes2600/queue.c +++ b/bes2600/queue.c @@ -829,19 +829,19 @@ int bes2600_queue_get_skb(struct bes2600_queue *queue, u32 packetID, bes2600_queue_parse_id(packetID, &queue_generation, &queue_id, &item_generation, &item_id, &if_id, &link_id); - spin_lock(&queue->stats->hw_priv->tx_loop.pending_record_lock); + spin_lock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); if (!list_empty(&queue->stats->hw_priv->tx_loop.pending_record_list)) { list_for_each_entry_safe(record_item, temp_record_item, &queue->stats->hw_priv->tx_loop.pending_record_list, head) { if (record_item->packetID == packetID) { list_del(&record_item->head); dev_kfree_skb(record_item->skb); kfree(record_item); - spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock); + spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); return -EINVAL; } } } - spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock); + spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock); item = &queue->pool[item_id]; diff --git a/bes2600/tx_loop.c b/bes2600/tx_loop.c index e6cf072..0cf7ce1 100644 --- a/bes2600/tx_loop.c +++ b/bes2600/tx_loop.c @@ -109,9 +109,9 @@ void bes2600_tx_loop_set_enable(struct bes2600_common *hw_priv, bool need_warn) bes2600_queue_iterate_pending_packet(&hw_priv->tx_queue[i], bes2600_tx_loop_item_pending_item); } - spin_lock(&hw_priv->tx_loop.pending_record_lock); + spin_lock_bh(&hw_priv->tx_loop.pending_record_lock); bes2600_queue_iterate_record_pending_packet(hw_priv, bes2600_tx_loop_item_pending_item); - spin_unlock(&hw_priv->tx_loop.pending_record_lock); + spin_unlock_bh(&hw_priv->tx_loop.pending_record_lock); if (atomic_read(&hw_priv->bh_rx) > 0) wake_up(&hw_priv->bh_wq); -- 2.47.3