Compare commits
22 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| f6448c44fe | |||
| fd0f5a8b71 | |||
| b08ab7aa62 | |||
| a1f18a5256 | |||
| f8986a4a18 | |||
| 122582e270 | |||
| ae175f9745 | |||
| 693e9b42aa | |||
| 0f783a1e69 | |||
| 843d40231f | |||
| 6ab61b9a06 | |||
| 216c7c59b1 | |||
| 02d3f4b222 | |||
| 3d63ec0a35 | |||
| 722434414a | |||
| fc88ff41c3 | |||
| fde41fcdd4 | |||
| 6bae531917 | |||
| 3a38286e6f | |||
| 1e408c9d33 | |||
| d01400140b | |||
| 993117a108 |
@@ -53,6 +53,9 @@ CW1200-ancestry markers in current source: same author Dmitry Tarnyagin,
|
||||
|------|------|
|
||||
| **This umbrella** | `git.reauktion.de/marfrit/besser` — patches/, scripts/, fw-analysis/, notes/ |
|
||||
| **Mobian DKMS fork** (PR target) | `git.reauktion.de/marfrit/bes2600-dkms` — branches per patch; upstream = `salsa.debian.org/Mobian-team/devices/bes2600-dkms` |
|
||||
| **DanctNIX kernel package** (ohm) | `git.reauktion.de/marfrit/marfrit-packages/arch/linux-pinetab2-danctnix-besser/` — kernel-agent-driven PKGBUILD, pkgrel=4+ |
|
||||
| **kernel-agent manifest + patches** | `git.reauktion.de/marfrit/kernel-agent` — `fleet/ohm.yaml` lists the per-patch series, `bin/ka-promote ohm` emits the cumulative the PKGBUILD consumes |
|
||||
| **Historical hand-managed PKGBUILD** | `git.reauktion.de/marfrit/besser/danctnix-besser-pkgbuild/` — pkgrel≤3, deprecated; see directory README |
|
||||
|
||||
## Patch series
|
||||
|
||||
|
||||
@@ -0,0 +1,222 @@
|
||||
# linux-pinetab2-danctnix-besser
|
||||
|
||||
Soft-upstream fork of `linux-pinetab2` (DanctNIX kernel for PineTab2) carrying the **BESser** bes2600 staging-driver patchset.
|
||||
|
||||
Drop-in replacement for `linux-pinetab2`. Same kernel version, same config (one toggle aside — see SCS caveat below), same modules — only the `drivers/staging/bes2600/` driver differs.
|
||||
|
||||
---
|
||||
|
||||
> ## ⚠️ PKGBUILD MOVED
|
||||
>
|
||||
> Starting with **pkgrel=4** (2026-05-18), the canonical PKGBUILD lives at
|
||||
> **`git.reauktion.de/marfrit/marfrit-packages/arch/linux-pinetab2-danctnix-besser/`**
|
||||
> and is driven by [kernel-agent](https://git.reauktion.de/marfrit/kernel-agent)'s
|
||||
> `ka-promote ohm` cumulative-patch flow against `fleet/ohm.yaml`.
|
||||
>
|
||||
> This directory remains for historical reference (pkgrel=1..3 hand-managed
|
||||
> flow + per-patch design notes that haven't been ported to the new home yet).
|
||||
>
|
||||
> **Use the new location** for builds going forward. See
|
||||
> [kernel-agent PR #28](https://git.reauktion.de/marfrit/kernel-agent/pulls/28)
|
||||
> and [marfrit-packages PR #28](https://git.reauktion.de/marfrit/marfrit-packages/pulls/28)
|
||||
> for the migration.
|
||||
|
||||
---
|
||||
|
||||
## TL;DR
|
||||
|
||||
| | |
|
||||
|---|---|
|
||||
| **Current package** | `linux-pinetab2-danctnix-besser-7.0.danctnix1-5-aarch64.pkg.tar.zst` (built via [kernel-agent](https://git.reauktion.de/marfrit/kernel-agent)) |
|
||||
| **PKGBUILD home** | `git.reauktion.de/marfrit/marfrit-packages/arch/linux-pinetab2-danctnix-besser/` *(new — pkgrel=4 onwards)* |
|
||||
| **Patch manifest** | `git.reauktion.de/marfrit/kernel-agent` `fleet/ohm.yaml` |
|
||||
| **Cumulative b2sum** | `0eb091ddaba4a8f1c3c2a78…` (pkgrel=5, `ka-promote ohm` output, 162 704 B, 4 patches) |
|
||||
| **Module srcversion** | `BEB625FA7443171EA8D55F7` for pkgrel=4 (byte-identical to pkgrel=3 source). pkgrel=5 srcversion differs because the besser#18 fix is bundled (TBD pending build verification). |
|
||||
| **Kernel base** | DanctNIX [`linux-pinetab2`](https://codeberg.org/DanctNIX/linux-pinetab2) tag `v7.0-danctnix1` |
|
||||
| **What it fixes vs upstream** | +73 % TX throughput, the `wsm_generic_confirm 0x0007` dmesg storm (besser#1 closed), the firmware-PSM-not-honored hang, the multi-function SDIO LMAC-wedge recovery |
|
||||
| **What it adds today vs pkgrel=1** | **Patch I**: 5 GHz scan filter — `iw scan freq <single-5ghz-channel>` works, multi-channel per-band sweep refused at driver boundary to dodge firmware reject cascade. NM `band=a` profiles associate to 5 GHz cleanly. **Sustained 11.32 MB/s** download (2.54 GB factory image) on `newton` 5 GHz ch.48 — **3.6× the 2.4 GHz baseline of 3.12 MB/s** on the same source. |
|
||||
| **Source-of-truth (driver)** | `git.reauktion.de/marfrit/bes2600-dkms` — branch `cleanups` for c-stack+A+B, branch `bes2600/scan-filter-5ghz` for Patch I |
|
||||
| **Caveat** | `CONFIG_SHADOW_CALL_STACK=n` (security-hardening regression, workaround for a GCC 15.2.1 + arm_neon.h pragma issue — tracked in [besser#20](https://git.reauktion.de/marfrit/besser/issues/20), restore to `=y` when GCC is fixed) |
|
||||
|
||||
## pkgrel history
|
||||
|
||||
| pkgrel | Date | Flow | Notes |
|
||||
|---|---|---|---|
|
||||
| 1–3 | 2026-05-08…05-18 | hand-managed, this dir | c-stack + Patches A/B/C/D/E/F/G/H + Patch I + SCS Makefile workaround |
|
||||
| 4 | 2026-05-18 | kernel-agent (`ka-promote ohm`) | migration-only release: byte-identical source to pkgrel=3 (148 149 + 7 735 + 1 562 = 157 446 cumulative arithmetic); fixes pkgrel=3 PKGBUILD's duplicated `0003-...patch` source-array bug. Available as fallback. |
|
||||
| **5** | **2026-05-18** | **kernel-agent (`ka-promote ohm`)** | adds [besser#18](https://git.reauktion.de/marfrit/besser/issues/18) lockdep fix (pending_record_lock SOFTIRQ-safe → -unsafe inversion). 4-patch cumulative, 162 704 B, b2sum `0eb091ddaba4…`. Closes besser#18 + besser#1. |
|
||||
|
||||
---
|
||||
|
||||
## What's in the patchset
|
||||
|
||||
A 17-commit cumulative diff over `v7.0-danctnix1`'s in-tree `drivers/staging/bes2600/`, plus the standalone Patch I (5 GHz scan filter) and an arm64 build-environment workaround for GCC 15.
|
||||
|
||||
Individual commits with full rationale + Phase-7 verification logs live on the **`cleanups` branch** of [`marfrit/bes2600-dkms`](https://git.reauktion.de/marfrit/bes2600-dkms/commits/branch/cleanups) and the **`bes2600/scan-filter-5ghz` branch** for Patch I. This PKGBUILD ships them squashed into separate patch files for build atomicity.
|
||||
|
||||
| group | what it does |
|
||||
|---|---|
|
||||
| **c-stack (c5.1–c5.2.1, c6.1, c6.2, c7)** | wifi-stability fixes: scan-defer-on-firmware-reject, scan-defer-backoff-tune, LMAC recover via `mmc_hw_reset`, PM state resync, wake-state consume, firmware-doesn't-honour-PSM self-detect, multi-function SDIO `mmc_hw_reset` rescan |
|
||||
| **Patch A** | decrypt-storm fast-recover at `bes2600_rx_cb`: ≥5 `WSM_STATUS_DECRYPTFAILURE` in 5 s → `ieee80211_connection_loss(vif)`. Phase-7 confirmed N=2 (2026-05-07), storms recover ~1 s vs 109 s baseline. |
|
||||
| **Patch B** | connection-loss bus-reset: ≥3 driver-side connection-loss decisions in 60 s on the same vif → `mmc_hw_reset` instead of mac80211 reauth. Installed dormant; never tripped in production yet. |
|
||||
| **Patch C v3** | structural: drop `sdio_rx_work` workqueue relay; IRQ → bh-direct architecture (matches mainline cw1200). +73 % sustained RX. |
|
||||
| **Patch D** | `ba_lock` removed; `ba_acc/ba_cnt/ba_acc_rx/ba_cnt_rx/ba_ena` → `atomic_t`; per-RX-frame spinlock eliminated. |
|
||||
| **Patch E** | per-RX-frame `ps_state_lock` skipped when c7's `pm_unsupported` latch is on (steady-state on production firmware). |
|
||||
| **Patch F** | cw1200 mainline backports: hw_scan SKB-lifecycle UAF, `init_common` `destroy_workqueue` on error, `atomic_add(1, x) → atomic_inc(x)` cosmetic. |
|
||||
| **Patch G** | GPL-2.0 §1 attribution restoration: SPDX-License-Identifier on every file, Tarnyagin/ST-Ericsson copyright restored on cw1200-derived files. |
|
||||
| **Patch C2** | `ieee80211_rx_irqsafe → ieee80211_rx_ni` at all 6 sites (kernel.org-clean process-context API; tasklet hop removed). |
|
||||
| **Patch H** | `bh.c` hygiene cleanup: 76- and 468-line `#if 0` cw1200-ancestor fossil blocks removed; `__bes2600_irq_enable` stub removed; per-iteration `BUG_ON` → `WARN_ON_ONCE`. |
|
||||
| **Patch I** ([besser#1](https://git.reauktion.de/marfrit/besser/issues/1)) | **5 GHz scan filter.** Refuses only **multi-channel** 5 GHz scans (the per-band-sweep mac80211 issues internally) at the driver boundary with `-EOPNOTSUPP`, dodging the firmware's status-2 reject cascade. Single-channel 5 GHz scans pass through so NM/`wpa_supplicant` per-freq BSS discovery (when `802-11-wireless.band=a`) still finds and associates to 5 GHz APs. Net effect: dmesg storm gone, 5 GHz attachment works, 3.6× sustained throughput on 5 GHz HT40 vs 2.4 GHz HT20. |
|
||||
| **arm64 SCS Makefile workaround** | Adds `-ffixed-x18` explicitly for `arch/arm64/lib/xor-neon.o` when `CONFIG_SHADOW_CALL_STACK=y`. Dead code in this pkgrel (SCS is off), in place for the day SCS re-enable becomes possible. See [besser#20](https://git.reauktion.de/marfrit/besser/issues/20). |
|
||||
|
||||
## Measured outcome
|
||||
|
||||
- **Phase 7 (Patch I, 2026-05-18):** Pattern A `wsm_generic_confirm failed for request 0x0007` storm: 14.3/h → **0/h** over 30-min observation. 5 GHz `newton` BSSID `c0:25:06:e6:5b:33` @ 5240 MHz (ch.48), TX bitrate 150 Mbit/s MCS 7 HT40 short-GI. Internet download throughput **11.32 MB/s** (sustained 90.5 Mbit/s, ~60 % of PHY) vs 3.12 MB/s on 2.4 GHz HT20 same source.
|
||||
- **Phase 7 (Patch C v3 + F + G + D + E + C2 + H, Mobian-flavor):** N=3 stress @ 4 MB/s sender on RK3566/PineTab2 — Patch B baseline 1.36 MB/s → +73 % sustained 2.28 MB/s. Race-fix verified under stress (no `wsm_release_tx_buffer` WARN storm under load).
|
||||
- Module loads + associates cleanly; `pm_unsupported` latch fires on boot as expected.
|
||||
|
||||
## Building (pkgrel=4+, kernel-agent flow)
|
||||
|
||||
Builds run out of the new home:
|
||||
|
||||
```sh
|
||||
cd ~/src/marfrit-packages/arch/linux-pinetab2-danctnix-besser
|
||||
makepkg -s
|
||||
```
|
||||
|
||||
To refresh the cumulative patch from a new kernel-agent manifest state:
|
||||
|
||||
```sh
|
||||
cd ~/src/kernel-agent
|
||||
./bin/ka-promote ohm
|
||||
cp build/ohm/v7.0-danctnix1/cumulative.patch \
|
||||
~/src/marfrit-packages/arch/linux-pinetab2-danctnix-besser/0001-bes2600-besser-kernel-agent-cumulative.patch
|
||||
cp build/ohm/v7.0-danctnix1/manifest.lock \
|
||||
~/src/marfrit-packages/arch/linux-pinetab2-danctnix-besser/manifest.lock
|
||||
b2sum 0001-bes2600-besser-kernel-agent-cumulative.patch # update PKGBUILD b2sums and pkgrel
|
||||
```
|
||||
|
||||
## Building (pkgrel ≤ 3, hand-managed flow — DEPRECATED)
|
||||
|
||||
```sh
|
||||
cd ~/src/besser/marfrit-besser/danctnix-besser-pkgbuild/kernel
|
||||
makepkg -s
|
||||
```
|
||||
|
||||
Produces `linux-pinetab2-danctnix-besser-<ver>-aarch64.pkg.tar.zst` plus a matching `-headers` package. Build host can be aarch64 native (recommended — no cross-toolchain setup) or x86 with an aarch64 cross-compiler.
|
||||
|
||||
Build time: ~45–55 min on an 8-core aarch64 host (boltzmann/RPi5-class), most of it the kernel modules phase.
|
||||
|
||||
**GCC 15.2.1 note:** This pkgrel ships with `CONFIG_SHADOW_CALL_STACK=n` because GCC 15.2.1's strict pragma validator chokes on `arm_neon.h`'s push/`target("+nothing+aes")`/pop sequences when SCS is on. The `0003-arm64-xor-neon-ffixed-x18-build-fix.patch` is a defensive Makefile-side workaround that's a no-op while SCS is off; it'll silently unblock SCS=y once GCC upstream is fixed. See [besser#20](https://git.reauktion.de/marfrit/besser/issues/20) for the re-enable plan.
|
||||
|
||||
## Installing
|
||||
|
||||
The package declares `provides=("linux-pinetab2=$pkgver-$pkgrel")` and `conflicts=(linux-pinetab2)`, so `pacman` will cleanly take over from upstream `linux-pinetab2`:
|
||||
|
||||
```sh
|
||||
sudo pacman -U linux-pinetab2-danctnix-besser-7.0.danctnix1-5-aarch64.pkg.tar.zst
|
||||
```
|
||||
|
||||
That removes the upstream `linux-pinetab2` package (if installed) and registers the BESser-flavored kernel under the same provides slot. Headers package is optional; install it if you build out-of-tree modules.
|
||||
|
||||
The pacman `mkinitcpio` hook auto-generates `/boot/initramfs-linux-pinetab2-danctnix-besser.img`. Modules land in `/usr/lib/modules/<release>-pinetab2-danctnix-besser/`, vmlinuz at `/boot/vmlinuz-linux-pinetab2-danctnix-besser`, DTBs at `/boot/dtbs/rockchip/rk3566-pinetab2-{v0.1,v2.0}.dtb`.
|
||||
|
||||
### Bootloader (PineTab2-specific)
|
||||
|
||||
PineTab2 boots via U-Boot loading a script `boot.scr` (compiled from `/boot/boot.txt` via `mkscr`). After install, point the script at the new kernel + initramfs:
|
||||
|
||||
```sh
|
||||
sudo cp /boot/boot.txt /boot/boot.txt.pre-besser
|
||||
sudo cp /boot/boot.scr /boot/boot.scr.pre-besser
|
||||
sudo sed -i \
|
||||
-e 's|/vmlinuz-linux-pinetab2$|/vmlinuz-linux-pinetab2-danctnix-besser|' \
|
||||
-e 's|/initramfs-linux-pinetab2\.img|/initramfs-linux-pinetab2-danctnix-besser.img|' \
|
||||
/boot/boot.txt
|
||||
cd /boot && sudo ./mkscr
|
||||
sudo systemctl reboot
|
||||
```
|
||||
|
||||
Backups (`*.pre-besser`) let you revert without touching the U-Boot console: `sudo cp /boot/boot.scr.pre-besser /boot/boot.scr` and reboot.
|
||||
|
||||
## Verifying
|
||||
|
||||
After reboot:
|
||||
|
||||
```sh
|
||||
uname -r
|
||||
# expected: <kver>-pinetab2-danctnix-besser
|
||||
|
||||
lsmod | grep -i bes2600
|
||||
# expected: bes2600 (loaded), bes2600_btuart (loaded if Bluetooth in use)
|
||||
|
||||
cat /sys/module/bes2600/srcversion
|
||||
# expected: BEB625FA7443171EA8D55F7 for pkgrel=3 (and pkgrel=4 — byte-identical source)
|
||||
```
|
||||
|
||||
`dmesg | grep bes2600` should show clean firmware load, no SDIO TX panic, no `wsm_release_tx_buffer` WARN storm under load, no `wsm_generic_confirm failed for request 0x0007` storm.
|
||||
|
||||
For the 5 GHz fix specifically:
|
||||
```sh
|
||||
sudo iw dev wlan0 scan freq 5180
|
||||
# expected: completes, no "Operation not supported"
|
||||
|
||||
sudo iw dev wlan0 scan freq 5180 5200 5220 5240
|
||||
# expected: "Operation not supported (-95)" — multi-channel 5 GHz refused
|
||||
```
|
||||
|
||||
## Rolling back
|
||||
|
||||
If the new kernel misbehaves:
|
||||
|
||||
```sh
|
||||
sudo cp /boot/boot.scr.pre-besser /boot/boot.scr
|
||||
sudo systemctl reboot
|
||||
```
|
||||
|
||||
That returns you to whatever kernel `boot.scr` was pointing at before the install (typically upstream `linux-pinetab2` or the previous `linux-pinetab2-danctnix-besser`). The package itself can be removed with `sudo pacman -R linux-pinetab2-danctnix-besser` and the original `linux-pinetab2` re-installed via `sudo pacman -S linux-pinetab2`.
|
||||
|
||||
## Provenance
|
||||
|
||||
- Mobian-flavor source-of-truth: <https://git.reauktion.de/marfrit/bes2600-dkms> (`cleanups` branch for c-stack + Patches A/B, `bes2600/scan-filter-5ghz` for Patch I)
|
||||
- Per-patch breakdown, Phase 0–7 logs, follow-up issues: <https://git.reauktion.de/marfrit/besser>
|
||||
- Upstream cw1200 mainline (architectural reference): `drivers/net/wireless/st/cw1200/` in linux-rockchip
|
||||
- Kernel base: <https://codeberg.org/DanctNIX/linux-pinetab2> tag `v7.0-danctnix1`
|
||||
- Kernel-agent mirror of the patch tree + per-host manifest: <https://git.reauktion.de/marfrit/kernel-agent>
|
||||
|
||||
## Why it's "BESser"
|
||||
|
||||
"Besser" = German for "better." Patch series ID across both DKMS (Mobian) and in-tree (Danctnix) trees. Single source-of-truth lives in `marfrit/bes2600-dkms`; this PKGBUILD is the danctnix-flavor consumption surface.
|
||||
|
||||
## Soft-upstream intent
|
||||
|
||||
Submitting this PKGBUILD to DanctNIX for review. If accepted as a replacement for `linux-pinetab2` (or sidegrade), the BESser patchset ships to all PineTab2 users via the regular danctnix package update channel. The bes2600 driver gets:
|
||||
|
||||
- ~2× sustained RX throughput on 2.4 GHz
|
||||
- ~3.6× sustained RX throughput on 5 GHz (via Patch I + correctly using HT40)
|
||||
- Race-correctness on the hot path
|
||||
- GPL-2.0 §1 attribution compliance
|
||||
- Modern kernel API (no deprecated `from_timer`, no `_irqsafe` from process context, no `BUG_ON` in steady-state)
|
||||
|
||||
Drop-in compatibility: same kernel version, same module names, no userspace ABI change. SCS off is the one config caveat, tracked in [besser#20](https://git.reauktion.de/marfrit/besser/issues/20).
|
||||
|
||||
## Maintenance plan
|
||||
|
||||
**Effective pkgrel=4+:** the per-host manifest in `marfrit/kernel-agent` (`fleet/ohm.yaml`) is the per-patch authority. `ka-promote ohm` produces the cumulative; the PKGBUILD in `marfrit/marfrit-packages` consumes it. Updates flow:
|
||||
|
||||
- New danctnix kernel release → bump `baseline.ref` in `fleet/ohm.yaml`, re-promote, bump pkgver in marfrit-packages PKGBUILD.
|
||||
- New BESser patch → add a new series-dir in `kernel-agent/patches/driver/bes2600/`, add to `fleet/ohm.yaml` `includes:`, re-promote, refresh cumulative + b2sum in marfrit-packages PKGBUILD, bump pkgrel.
|
||||
- Both flavors continue to be maintained in lockstep via `marfrit/bes2600-dkms` source-of-truth.
|
||||
- GCC 15 SCS issue → periodically re-test build with `CONFIG_SHADOW_CALL_STACK=y` against current Arch ARM GCC. When the build succeeds, flip the config and re-deploy.
|
||||
|
||||
## Known gaps
|
||||
|
||||
- Cumulative diff (squashed) for the c-stack + Patches A/B; Patch I as a separate `0002-` file. Per-patch series can be regenerated if danctnix maintainers prefer.
|
||||
- Bluetooth-side `bes2600_btuart` is independent and untouched by this patchset.
|
||||
- `bes2600_switch_bt` orchestration removed (Mobian-only entry point; not used in danctnix tree).
|
||||
- Multi-band `iw scan` (no `freq` filter) still reports aborted scan because mac80211 aggregates per-band results and marks the whole scan aborted when any leg returns negative (mac80211 contract, not bes2600). Single-band scans (`iw scan freq 2462` or `iw scan freq 5180`) work normally; `nmcli connection up` with `band=bg` or `band=a` profile works normally. This is the Phase 5 reviewer's predicted residual limitation; userspace tools that need full multi-band BSS discovery should issue per-band scans.
|
||||
|
||||
## Author
|
||||
|
||||
Markus Fritsche <fritsche.markus@gmail.com>
|
||||
|
||||
Built collaboratively with Claude Opus 4.7 (1M context).
|
||||
+226
@@ -0,0 +1,226 @@
|
||||
From 4fec8b2ecc006ab4aff589fc6742e251d6af96f0 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Fri, 24 Apr 2026 21:31:45 +0200
|
||||
Subject: [PATCH 01/20] bes2600: defer scan and soften WARN on firmware reject
|
||||
|
||||
On a BES2600-based PineTab2, mac80211's background-scan cadence
|
||||
(about every 30 s when associated) triggers a two-step WARN splat
|
||||
pattern, visible in dmesg roughly 30 times per 10 min of regular
|
||||
WiFi use:
|
||||
|
||||
wsm_generic_confirm ret 2
|
||||
WARNING: at wsm_handle_rx+0x8a4/0xf30 [bes2600]
|
||||
... full stack trace ...
|
||||
ieee80211 phy0: wsm_generic_confirm failed for request 0x0007.
|
||||
WARNING: at bes2600_scan_work+0x5d4/0x810 [bes2600]
|
||||
... full stack trace ...
|
||||
ieee80211 phy0: [SCAN] Scan failed (-22).
|
||||
|
||||
0x0007 is the WSM start-scan request; status 2 is the firmware's
|
||||
rejected-by-policy response, which it returns for at least two
|
||||
conditions:
|
||||
|
||||
a) BT A2DP streaming in non-FDD coex mode -- the coex arbiter
|
||||
in firmware won't grant an off-channel window while a SCO/
|
||||
A2DP link is queued.
|
||||
b) A firmware-internal busy state whose exact trigger the
|
||||
driver cannot observe directly (confirmed on ohm with BT
|
||||
disconnected -- rejection still fires). Likely transient
|
||||
firmware-PM transitions.
|
||||
|
||||
Both are protocol-level policy responses, not kernel bugs, so the
|
||||
full stack-trace WARN treatment is counterproductive: it buries
|
||||
real problems and gets new users convinced the driver is broken.
|
||||
|
||||
Three-part fix:
|
||||
|
||||
1. struct bes2600_scan grows two fields -- reject_count and
|
||||
backoff_until -- zero-initialised via the existing
|
||||
ieee80211_alloc_hw()-provided kzalloc.
|
||||
|
||||
2. bes2600_scan_work() now consults bes2600_scan_should_defer()
|
||||
before calling bes2600_scan_start(). The helper short-
|
||||
circuits in two cases:
|
||||
|
||||
- coex_is_bt_a2dp() is true and coex is not in FDD mode,
|
||||
since we already know the firmware will reject;
|
||||
- BES2600_SCAN_REJECT_THRESHOLD (3) consecutive rejections
|
||||
have fired and the BES2600_SCAN_BACKOFF_JIFFIES (10 s)
|
||||
backoff window has not yet elapsed.
|
||||
|
||||
On defer or on a real firmware rejection, reject_count is
|
||||
bumped and backoff_until is refreshed. A successful scan
|
||||
clears reject_count.
|
||||
|
||||
3. The WARN_ON(hw_priv->scan.status) at the scan_start() call
|
||||
site is replaced with a plain branch into the existing
|
||||
fail: label. wsm_generic_confirm()'s WARN() becomes a
|
||||
bes_devel() -- the per-request wiphy_warn in wsm_handle_rx
|
||||
(which includes the offending request id) is kept, so real
|
||||
debugging information is still on tape.
|
||||
|
||||
Net behaviour:
|
||||
|
||||
- Expected rejections no longer produce stack traces. The only
|
||||
log line that remains on a rejected background scan is the
|
||||
upstream-caller's wiphy_warn identifying request 0x0007 or
|
||||
equivalent.
|
||||
- The driver stops hammering the firmware with doomed scan
|
||||
requests -- 3 rejections trigger a 10 s pause, during which
|
||||
bes2600_scan_work() returns without issuing WSM 0x0007.
|
||||
- The scan-completion path is unchanged; mac80211 sees the
|
||||
scan complete with no results and reissues on its normal
|
||||
cadence.
|
||||
- Real protocol-layer bugs (unexpected underflow in the
|
||||
confirm buffer) still WARN_ON at the 'underflow:' label.
|
||||
|
||||
Verified on ohm (PineTab2, linux-pinetab2 6.19.10-danctnix1-1):
|
||||
WARN splat count dropped from 32 to 0 per 10 min uptime. WiFi
|
||||
stays associated. No regression in other counters (KFENCE,
|
||||
sdio_tx_work, RX failure, PS Mode Error, factory cali fail all
|
||||
remain 0).
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/scan.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++-
|
||||
bes2600/scan.h | 11 +++++++++
|
||||
bes2600/wsm.c | 14 +++++++++++-
|
||||
3 files changed, 83 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c
|
||||
index 3bfa535..5f6af3b 100644
|
||||
--- a/drivers/staging/bes2600/scan.c
|
||||
+++ b/drivers/staging/bes2600/scan.c
|
||||
@@ -14,11 +14,50 @@
|
||||
#include "scan.h"
|
||||
#include "sta.h"
|
||||
#include "pm.h"
|
||||
+#include "epta_coex.h"
|
||||
#include "epta_request.h"
|
||||
#include "bes_pwr.h"
|
||||
|
||||
+/*
|
||||
+ * After this many consecutive WSM scan rejections from firmware, stop
|
||||
+ * issuing new scans for BES2600_SCAN_BACKOFF_JIFFIES and let the state
|
||||
+ * that's rejecting them (coex window, firmware-internal busy) clear.
|
||||
+ */
|
||||
+#define BES2600_SCAN_REJECT_THRESHOLD 3
|
||||
+#define BES2600_SCAN_BACKOFF_JIFFIES (10 * HZ)
|
||||
+
|
||||
static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
|
||||
|
||||
+/*
|
||||
+ * Decide whether to skip sending the next WSM scan command without
|
||||
+ * bothering the firmware. Two triggers:
|
||||
+ *
|
||||
+ * 1. BT A2DP is streaming in non-FDD coex mode. The firmware is
|
||||
+ * known to reject scan requests during that window; short-
|
||||
+ * circuiting here saves a WSM round-trip and avoids the
|
||||
+ * wsm_generic_confirm / scan_work warning cascade that follows.
|
||||
+ *
|
||||
+ * 2. We already saw >= BES2600_SCAN_REJECT_THRESHOLD consecutive
|
||||
+ * rejections on recent scan attempts and the backoff window has
|
||||
+ * not yet elapsed. Whatever was rejecting them is likely still
|
||||
+ * rejecting them; give it time.
|
||||
+ *
|
||||
+ * Returns true if the caller should abandon the scan iteration.
|
||||
+ */
|
||||
+static bool bes2600_scan_should_defer(struct bes2600_common *hw_priv)
|
||||
+{
|
||||
+#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
|
||||
+ if (!coex_is_fdd_mode() && coex_is_bt_a2dp())
|
||||
+ return true;
|
||||
+#endif
|
||||
+
|
||||
+ if (hw_priv->scan.reject_count >= BES2600_SCAN_REJECT_THRESHOLD &&
|
||||
+ time_before(jiffies, hw_priv->scan.backoff_until))
|
||||
+ return true;
|
||||
+
|
||||
+ return false;
|
||||
+}
|
||||
+
|
||||
#ifdef CONFIG_BES2600_TESTMODE
|
||||
static int bes2600_advance_scan_start(struct bes2600_common *hw_priv)
|
||||
{
|
||||
@@ -703,10 +742,29 @@ void bes2600_scan_work(struct work_struct *work)
|
||||
wsm_unlock_tx(hw_priv);
|
||||
} else
|
||||
#endif
|
||||
+ {
|
||||
+ if (bes2600_scan_should_defer(hw_priv)) {
|
||||
+ hw_priv->scan.status = -EBUSY;
|
||||
+ hw_priv->scan.reject_count++;
|
||||
+ hw_priv->scan.backoff_until =
|
||||
+ jiffies + BES2600_SCAN_BACKOFF_JIFFIES;
|
||||
+ wiphy_dbg(priv->hw->wiphy,
|
||||
+ "[SCAN] deferred (coex/backoff, reject_count=%u)\n",
|
||||
+ hw_priv->scan.reject_count);
|
||||
+ kfree(scan.ch);
|
||||
+ goto fail;
|
||||
+ }
|
||||
hw_priv->scan.status = bes2600_scan_start(priv, &scan);
|
||||
+ }
|
||||
kfree(scan.ch);
|
||||
- if (WARN_ON(hw_priv->scan.status))
|
||||
+ if (hw_priv->scan.status) {
|
||||
+ hw_priv->scan.reject_count++;
|
||||
+ hw_priv->scan.backoff_until =
|
||||
+ jiffies + BES2600_SCAN_BACKOFF_JIFFIES;
|
||||
+ /* Lower callers already logged the reason at wiphy_warn. */
|
||||
goto fail;
|
||||
+ }
|
||||
+ hw_priv->scan.reject_count = 0;
|
||||
hw_priv->scan.curr = it;
|
||||
}
|
||||
up(&hw_priv->conf_lock);
|
||||
diff --git a/drivers/staging/bes2600/scan.h b/drivers/staging/bes2600/scan.h
|
||||
index e50fa36..1f3adea 100644
|
||||
--- a/drivers/staging/bes2600/scan.h
|
||||
+++ b/drivers/staging/bes2600/scan.h
|
||||
@@ -42,6 +42,17 @@ struct bes2600_scan {
|
||||
struct delayed_work probe_work;
|
||||
int direct_probe;
|
||||
u8 if_id;
|
||||
+ /*
|
||||
+ * Track consecutive firmware-side WSM scan rejections so we can
|
||||
+ * back off briefly instead of re-issuing the same scan on every
|
||||
+ * mac80211 background-scan tick. Firmware returns WSM status != 0
|
||||
+ * for a handful of transient conditions (BT A2DP active in non-
|
||||
+ * FDD coex, firmware-internal busy windows) and keeps rejecting
|
||||
+ * until the state clears; retrying at full cadence just floods
|
||||
+ * dmesg.
|
||||
+ */
|
||||
+ unsigned int reject_count;
|
||||
+ unsigned long backoff_until;
|
||||
};
|
||||
|
||||
int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
diff --git a/drivers/staging/bes2600/wsm.c b/drivers/staging/bes2600/wsm.c
|
||||
index d40df30..55a4e2b 100644
|
||||
--- a/drivers/staging/bes2600/wsm.c
|
||||
+++ b/drivers/staging/bes2600/wsm.c
|
||||
@@ -134,8 +134,20 @@ static int wsm_generic_confirm(struct bes2600_common *hw_priv,
|
||||
struct wsm_buf *buf)
|
||||
{
|
||||
u32 status = WSM_GET32(buf);
|
||||
- if (WARN(status != WSM_STATUS_SUCCESS, "wsm_generic_confirm ret %u", status))
|
||||
+
|
||||
+ /*
|
||||
+ * A non-SUCCESS status here is a firmware-side policy decision for
|
||||
+ * the command whose confirm this is -- commonly WSM status 2 for
|
||||
+ * scan (0x0407) rejected because of a coex window or transient
|
||||
+ * firmware-busy state. It is not a driver/kernel bug, so avoid the
|
||||
+ * WARN()/stack-trace treatment; the caller already emits a
|
||||
+ * wiphy_warn identifying the request id and will propagate the
|
||||
+ * error to mac80211.
|
||||
+ */
|
||||
+ if (status != WSM_STATUS_SUCCESS) {
|
||||
+ bes_devel("%s ret %u\n", __func__, status);
|
||||
return -EINVAL;
|
||||
+ }
|
||||
return 0;
|
||||
|
||||
underflow:
|
||||
--
|
||||
2.54.0
|
||||
|
||||
@@ -0,0 +1,168 @@
|
||||
From 093a5038b8b68f316d976b7cb69609ca7f24f322 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Mon, 18 May 2026 11:27:40 +0200
|
||||
Subject: [PATCH 1/2] bes2600: filter 5 GHz scans at the driver boundary
|
||||
(besser#1)
|
||||
|
||||
The BES2600 firmware refuses WSM start-scan for 5 GHz with status 2
|
||||
("rejected by policy"). This shows up in dmesg as the recurring
|
||||
|
||||
wsm_generic_confirm failed for request 0x0007.
|
||||
[SCAN] Scan failed (-22).
|
||||
|
||||
pattern (besser issue #1, ~14-16/h on ohm/PineTab2 baseline).
|
||||
|
||||
Trace shows every reject is the second of a back-to-back pair: mac80211
|
||||
splits multi-band hw_scan requests per band when the driver does not
|
||||
set IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't), then re-invokes
|
||||
drv_hw_scan from __ieee80211_scan_completed for each subsequent band.
|
||||
The 2.4 GHz iteration succeeds; the 5 GHz iteration is what the
|
||||
firmware rejects. See ieee80211_prep_hw_scan in net/mac80211/scan.c
|
||||
for the loop, and the existing memory reference_bes2600_5ghz_scan_reject
|
||||
for the firmware behaviour.
|
||||
|
||||
The 056a71a defer-on-reject patch already in this tree handles the
|
||||
BT-A2DP-coex branch and the consecutive-reject backoff, but it cannot
|
||||
prevent the per-band-loop reject: by the time defer_should_scan is
|
||||
consulted, the per-band call is already in flight, and the reject_count
|
||||
gets reset on every successful 2.4 GHz scan in between (which is
|
||||
~36% of attempts), so the threshold never trips.
|
||||
|
||||
The fix: refuse the 5 GHz iteration upfront in bes2600_hw_scan. The
|
||||
2.4 GHz scan still runs normally. The 5 GHz portion is reported as
|
||||
aborted to userspace -- same outcome as today, minus the dmesg storm
|
||||
and the wsm_generic_confirm WARN cascade.
|
||||
|
||||
5 GHz band registration is intentionally left in place: direct-BSSID
|
||||
association to a known 5 GHz AP still works (no scan is needed for
|
||||
that path), and a future firmware update that fixes the scan behaviour
|
||||
should not be foreclosed by changing band advertisement.
|
||||
|
||||
Contract: per include/net/mac80211.h ieee80211_ops.hw_scan, a negative
|
||||
return aborts the scan without requiring ieee80211_scan_completed().
|
||||
-EOPNOTSUPP is the semantically accurate code (operation is legal,
|
||||
driver can't service it on this band today).
|
||||
|
||||
Phase 3 evidence:
|
||||
- baseline N=3: rate ~14.3-23.6/h converged at 14.3/h (matches OP)
|
||||
- back-to-back scan gap: 6/6 rejected pairs <200us, 1/1 successful
|
||||
pair was 114ms (single-band-only, no 5 GHz leg)
|
||||
- defer log fires: 0/9 in 30-min window (056a71a structurally bypassed)
|
||||
|
||||
Predicted Phase 7 delta: Pattern A 14/h -> 0/h.
|
||||
---
|
||||
bes2600/scan.c | 22 ++++++++++++++++++++++
|
||||
1 file changed, 22 insertions(+)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c
|
||||
index fb1d298..a81afb6 100644
|
||||
--- a/drivers/staging/bes2600/scan.c
|
||||
+++ b/drivers/staging/bes2600/scan.c
|
||||
@@ -238,6 +238,28 @@ int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
/* Scan when P2P_GO corrupt firmware MiniAP mode */
|
||||
if (priv->join_status == BES2600_JOIN_STATUS_AP)
|
||||
return -EOPNOTSUPP;
|
||||
+
|
||||
+ /*
|
||||
+ * Firmware refuses WSM start-scan for 5 GHz with status 2 ("rejected
|
||||
+ * by policy"); see besser issue #1. mac80211 splits multi-band
|
||||
+ * hw_scan requests per-band when the driver does not set
|
||||
+ * IEEE80211_HW_SINGLE_SCAN_ON_ALL_BANDS (we don't -- see
|
||||
+ * ieee80211_hw_set() calls in bes2600_main.c), so each per-band call
|
||||
+ * has req->channels[] from one band only (see ieee80211_prep_hw_scan
|
||||
+ * in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver
|
||||
+ * boundary so userspace gets a clean aborted-scan for that portion
|
||||
+ * rather than waiting for the firmware reject to cascade up. 5 GHz
|
||||
+ * band registration stays intact so direct-BSSID association to a
|
||||
+ * known 5 GHz AP still works (no scan needed for that path).
|
||||
+ *
|
||||
+ * Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan
|
||||
+ * documentation, a negative return aborts the scan without requiring
|
||||
+ * ieee80211_scan_completed().
|
||||
+ */
|
||||
+ if (req->n_channels > 0 &&
|
||||
+ req->channels[0]->band == NL80211_BAND_5GHZ)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
#if 0
|
||||
if (work_pending(&priv->offchannel_work) ||
|
||||
(hw_priv->roc_if_id != -1)) {
|
||||
--
|
||||
2.54.0
|
||||
|
||||
|
||||
From 8cd10f487c8144d462a510812ba0fa717b3e24df Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Mon, 18 May 2026 15:56:34 +0200
|
||||
Subject: [PATCH 2/2] bes2600: scan-filter-5ghz: allow targeted single-channel
|
||||
scans (besser#1 follow-up)
|
||||
|
||||
The original Patch I refused EVERY 5 GHz scan request unconditionally
|
||||
(req->n_channels > 0 && band == NL80211_BAND_5GHZ). This eliminated
|
||||
the Pattern A storm but also broke 5 GHz association entirely:
|
||||
NM / wpa_supplicant iterates a freq_list when a connection profile
|
||||
specifies 802-11-wireless.band=a, issuing per-frequency single-channel
|
||||
scans to find the BSS before associating. Those single-channel scans
|
||||
were also refused by our guard, so the BSS was never seen and
|
||||
'Wi-Fi network could not be found' was the only outcome.
|
||||
|
||||
Tighten the guard: refuse only multi-channel 5 GHz scans (n_channels
|
||||
> 1), which is the per-band-sweep pattern mac80211 issues internally
|
||||
and the only one that triggers the firmware storm at the per-band
|
||||
loop boundary. Single-channel 5 GHz scans pass through to firmware,
|
||||
which generally accepts them -- and when they happen to be rejected,
|
||||
the failure is isolated and doesn't cascade.
|
||||
|
||||
Verified on ohm with pkgrel=3 (srcversion BEB625FA7443171EA8D55F7):
|
||||
- Pattern A count since boot: 0 (Phase 7 prediction still holds)
|
||||
- iw dev wlan0 scan freq 5180 -> allowed
|
||||
- iw dev wlan0 scan freq 5180 5200 ... -> refused -EOPNOTSUPP
|
||||
- NM 'nmcli connection up' with band=a -> associated to BSSID
|
||||
c0:25:06:e6:5b:33 on 5240 MHz / ch.48 in ~1 second
|
||||
- TX bitrate 150 Mbit/s MCS 7 40MHz short-GI (vs 72.2 Mbit/s
|
||||
HT20 on 2.4 GHz) -- ~2x throughput recovered
|
||||
|
||||
The change is a single byte (> 0 -> > 1) plus comment update; the
|
||||
test confirmation above is what motivates it.
|
||||
|
||||
Refs: besser#1 (closed but tracked for follow-up like this), original
|
||||
Patch I sha 093a503.
|
||||
---
|
||||
bes2600/scan.c | 16 ++++++++++++----
|
||||
1 file changed, 12 insertions(+), 4 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c
|
||||
index a81afb6..497523b 100644
|
||||
--- a/drivers/staging/bes2600/scan.c
|
||||
+++ b/drivers/staging/bes2600/scan.c
|
||||
@@ -248,15 +248,23 @@ int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
* has req->channels[] from one band only (see ieee80211_prep_hw_scan
|
||||
* in net/mac80211/scan.c). Refuse the 5 GHz iteration at the driver
|
||||
* boundary so userspace gets a clean aborted-scan for that portion
|
||||
- * rather than waiting for the firmware reject to cascade up. 5 GHz
|
||||
- * band registration stays intact so direct-BSSID association to a
|
||||
- * known 5 GHz AP still works (no scan needed for that path).
|
||||
+ * rather than waiting for the firmware reject to cascade up.
|
||||
+ *
|
||||
+ * Only the multi-channel case is refused (n_channels > 1): that's
|
||||
+ * the per-band-sweep pattern mac80211 issues internally and the
|
||||
+ * one that triggers the firmware storm at the per-band loop
|
||||
+ * boundary. Single-channel 5 GHz scans (BSS verification, NM's
|
||||
+ * per-freq iteration when 802-11-wireless.band=a is set) pass
|
||||
+ * through to firmware, which generally accepts them since the
|
||||
+ * storm is the back-to-back per-band issue, not a blanket 5 GHz
|
||||
+ * reject. This preserves 5 GHz association via the
|
||||
+ * "wpa_supplicant iterates freq_list per channel" path.
|
||||
*
|
||||
* Contract: per include/net/mac80211.h struct ieee80211_ops.hw_scan
|
||||
* documentation, a negative return aborts the scan without requiring
|
||||
* ieee80211_scan_completed().
|
||||
*/
|
||||
- if (req->n_channels > 0 &&
|
||||
+ if (req->n_channels > 1 &&
|
||||
req->channels[0]->band == NL80211_BAND_5GHZ)
|
||||
return -EOPNOTSUPP;
|
||||
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+109
@@ -0,0 +1,109 @@
|
||||
From bdb0450bdf6f51d91ee0ca850048d65d81864e77 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Tue, 28 Apr 2026 14:32:18 +0200
|
||||
Subject: [PATCH 02/20] bes2600: widen scan-defer backoff to 30s and decay
|
||||
count on quiet
|
||||
|
||||
The scan-defer logic added in the previous patch ("bes2600: defer
|
||||
scan and soften WARN on firmware reject") used a 10-second backoff
|
||||
window and never cleared reject_count outside of a successful scan.
|
||||
Field testing on a PineTab2 (linux-pinetab2 6.19.10-danctnix1) shows
|
||||
two distinct mac80211 scan-retry cadences in practice:
|
||||
|
||||
* Idle background scans every ~5 minutes when associated -- well
|
||||
outside any plausible backoff, the defer guard correctly falls
|
||||
through to a real WSM scan attempt.
|
||||
|
||||
* Roam-evaluation bursts triggered when mac80211 wants to find a
|
||||
candidate AP for handover (signal degradation, beacon loss,
|
||||
locally-generated DEAUTH_LEAVING reason=3). Cadence is ~12 s, and
|
||||
one boot reproduced 14 such rejected scans in 3 minutes during a
|
||||
single burst, none of which engaged the defer guard because every
|
||||
retry landed just outside the 10 s window.
|
||||
|
||||
Two-line behaviour change to fix that:
|
||||
|
||||
1. BES2600_SCAN_BACKOFF_JIFFIES grows from 10*HZ to 30*HZ, so a
|
||||
12 s-cadence burst stays inside the window across consecutive
|
||||
rejects and the third reject in the burst trips the threshold
|
||||
guard. The 5 min idle case is still naturally past the window
|
||||
and is unaffected.
|
||||
|
||||
2. bes2600_scan_should_defer() resets reject_count to 0 when
|
||||
time_after(jiffies, backoff_until). Without this, reject_count
|
||||
accumulated indefinitely across the slow-cadence rejects, so an
|
||||
isolated reject after long quiet would have tripped the
|
||||
threshold the moment it arrived. After the change, count is
|
||||
latched only inside an active burst and decays cleanly when the
|
||||
burst ends.
|
||||
|
||||
Net effect on a roam burst:
|
||||
|
||||
* t=0 reject #1 (count 1, backoff_until = t0 + 30s)
|
||||
* t=12 reject #2 (count 2, backoff_until = t1 + 30s)
|
||||
* t=24 reject #3 (count 3, threshold met, next scan deferred)
|
||||
* t=36 defer fires, no WSM round-trip, reject not sent
|
||||
* ... defers continue until the firmware-policy state clears
|
||||
* scan succeeds -> reject_count = 0, normal cadence resumes
|
||||
|
||||
WSM 0x0007 confirm rejections in a burst drop from ~14 to ~3 (just
|
||||
the scans needed to reach the threshold). wpa_supplicant's reason=3
|
||||
locally-generated disconnects driven by exhausted roam candidates
|
||||
during the same burst window also drop.
|
||||
|
||||
No new state, no new symbols, no change to mac80211-facing semantics:
|
||||
the deferred scan still completes via the existing fail: path with
|
||||
status=-EBUSY, the same response a real firmware-busy would produce.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/scan.c | 17 +++++++++++++++--
|
||||
1 file changed, 15 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c
|
||||
index 5f6af3b..b944adc 100644
|
||||
--- a/drivers/staging/bes2600/scan.c
|
||||
+++ b/drivers/staging/bes2600/scan.c
|
||||
@@ -22,9 +22,17 @@
|
||||
* After this many consecutive WSM scan rejections from firmware, stop
|
||||
* issuing new scans for BES2600_SCAN_BACKOFF_JIFFIES and let the state
|
||||
* that's rejecting them (coex window, firmware-internal busy) clear.
|
||||
+ *
|
||||
+ * The backoff has to be at least as long as the natural mac80211 scan-
|
||||
+ * retry cadence, otherwise the next attempt lands outside the window
|
||||
+ * and bypasses the defer guard. Observed in the wild on PineTab2:
|
||||
+ * roam-evaluation bursts at ~12 s cadence, idle background scans at
|
||||
+ * ~5 min cadence. 30 s catches the burst and leaves the slow case
|
||||
+ * alone (the firmware-policy state has had minutes to clear by then
|
||||
+ * anyway).
|
||||
*/
|
||||
#define BES2600_SCAN_REJECT_THRESHOLD 3
|
||||
-#define BES2600_SCAN_BACKOFF_JIFFIES (10 * HZ)
|
||||
+#define BES2600_SCAN_BACKOFF_JIFFIES (30 * HZ)
|
||||
|
||||
static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
|
||||
|
||||
@@ -40,7 +48,9 @@ static void bes2600_scan_restart_delayed(struct bes2600_vif *priv);
|
||||
* 2. We already saw >= BES2600_SCAN_REJECT_THRESHOLD consecutive
|
||||
* rejections on recent scan attempts and the backoff window has
|
||||
* not yet elapsed. Whatever was rejecting them is likely still
|
||||
- * rejecting them; give it time.
|
||||
+ * rejecting them; give it time. If the backoff has elapsed without
|
||||
+ * a fresh reject refreshing it, the burst is over and we reset the
|
||||
+ * count so an isolated reject doesn't immediately re-trip.
|
||||
*
|
||||
* Returns true if the caller should abandon the scan iteration.
|
||||
*/
|
||||
@@ -51,6 +61,9 @@ static bool bes2600_scan_should_defer(struct bes2600_common *hw_priv)
|
||||
return true;
|
||||
#endif
|
||||
|
||||
+ if (time_after(jiffies, hw_priv->scan.backoff_until))
|
||||
+ hw_priv->scan.reject_count = 0;
|
||||
+
|
||||
if (hw_priv->scan.reject_count >= BES2600_SCAN_REJECT_THRESHOLD &&
|
||||
time_before(jiffies, hw_priv->scan.backoff_until))
|
||||
return true;
|
||||
--
|
||||
2.54.0
|
||||
|
||||
@@ -0,0 +1,36 @@
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Mon, 18 May 2026 11:42:00 +0200
|
||||
Subject: [PATCH] arm64: xor-neon: restore -ffixed-x18 when SHADOW_CALL_STACK=y
|
||||
(GCC 15+ build fix)
|
||||
|
||||
GCC 15.2.1 enforces that -fsanitize=shadow-call-stack requires
|
||||
-ffixed-x18 inside arm_neon.h's #pragma GCC target() blocks. The
|
||||
existing CFLAGS_REMOVE_xor-neon.o line strips the kernel-wide
|
||||
-ffixed-x18 (it's part of CC_FLAGS_NO_FPU) and CC_FLAGS_FPU does not
|
||||
restore it, so xor-neon.c fails to build on stricter GCC versions
|
||||
when CONFIG_SHADOW_CALL_STACK=y.
|
||||
|
||||
Add an explicit -ffixed-x18 just for this object, gated on the
|
||||
SCS config so non-SCS builds are unaffected.
|
||||
|
||||
Build environment workaround; not a kernel-runtime bug.
|
||||
---
|
||||
arch/arm64/lib/Makefile | 4 ++++
|
||||
1 file changed, 4 insertions(+)
|
||||
|
||||
diff --git a/arch/arm64/lib/Makefile b/arch/arm64/lib/Makefile
|
||||
index 1234567..2345678 100644
|
||||
--- a/arch/arm64/lib/Makefile
|
||||
+++ b/arch/arm64/lib/Makefile
|
||||
@@ -9,6 +9,10 @@ ifeq ($(CONFIG_KERNEL_MODE_NEON), y)
|
||||
obj-$(CONFIG_XOR_BLOCKS) += xor-neon.o
|
||||
CFLAGS_xor-neon.o += $(CC_FLAGS_FPU)
|
||||
CFLAGS_REMOVE_xor-neon.o += $(CC_FLAGS_NO_FPU)
|
||||
+# GCC 15+ enforces that -fsanitize=shadow-call-stack requires -ffixed-x18
|
||||
+# even after a #pragma GCC pop_options inside arm_neon.h. CC_FLAGS_REMOVE
|
||||
+# above strips the kernel-wide -ffixed-x18 (part of CC_FLAGS_NO_FPU); add
|
||||
+# it back here so xor-neon.c still compiles when SHADOW_CALL_STACK=y.
|
||||
+CFLAGS_xor-neon.o += $(if $(CONFIG_SHADOW_CALL_STACK),-ffixed-x18)
|
||||
endif
|
||||
|
||||
lib-$(CONFIG_ARCH_HAS_UACCESS_FLUSHCACHE) += uaccess_flushcache.o
|
||||
+251
@@ -0,0 +1,251 @@
|
||||
From e0f664cbc9e23098da3f119f2f4cb399279c129b Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Sun, 26 Apr 2026 22:31:58 +0200
|
||||
Subject: [PATCH 03/20] bes2600: recover wedged firmware via mmc_hw_reset on
|
||||
link break
|
||||
|
||||
When the LMAC active monitor detects 'link break between lmac and host'
|
||||
(the hw_buf_used==pending watchdog in bes2600_bh_lmac_active_monitor),
|
||||
bes2600_chrdev_wifi_force_close(hw_priv, true) is invoked to tear the
|
||||
device down and prepare for a fresh probe. On the wifi_force_close_work
|
||||
side this calls bes2600_chrdev_do_system_close() which dispatches
|
||||
sbus_ops->power_switch(0).
|
||||
|
||||
On PineTab2 (RK3566 + BES2600WM over SDIO) this recovery path is a
|
||||
no-op:
|
||||
|
||||
* bes2600_sdio_power_down() writes a SYSTEM_CLOSE host-int message,
|
||||
clears MMC_CAP_NONREMOVABLE, and schedules sdio_scan_work, which is
|
||||
the literal one-line stub bes_warn("...this function does
|
||||
nothing\n").
|
||||
* bes2600_sdio_on() (the eventual power_switch(1) counterpart)
|
||||
toggles pdata->powerup, which is NULL on PineTab2 because the
|
||||
wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 device
|
||||
tree node (see arch/arm64/boot/dts/rockchip/rk3566-pinetab2.dtsi:
|
||||
'The reset pin is claimed by sdio_mmcseq, It is better to move it
|
||||
to U-Boot so the OS can use it.').
|
||||
|
||||
Net result: the chip is never reset. The function drivers are not
|
||||
removed (the SDIO core has no signal that the card is gone), the
|
||||
firmware stays wedged, and a subsequent rmmod bes2600 leaves the SDIO
|
||||
function in a half-torn-down state. modprobe bes2600 then fails with
|
||||
'probe with driver bes2600_wlan failed with error -123' (-ENOMEDIUM)
|
||||
on both functions (:1 wifi, :2 BT-companion) until a full system
|
||||
reboot.
|
||||
|
||||
Observed on PineTab2 (linux-pinetab2 6.19.10-danctnix1-1) after ~150
|
||||
minutes of background-scan rejects (wsm_generic_confirm 0x0007,
|
||||
[SCAN] Scan failed (-22)) accumulating until the LMAC stopped
|
||||
acknowledging TX buffers (hw_buf_used:24 pending:24). Reproducible
|
||||
under sustained scan pressure.
|
||||
|
||||
Add a sbus operation bus_reset() that the recovery path can call when
|
||||
power_switch() has no effective chip-reset signal of its own. Provide
|
||||
an SDIO implementation that calls mmc_hw_reset(self->func->card),
|
||||
which on a multi-function SDIO card (PineTab2 binds func 1 for WLAN
|
||||
and func 2 for the BT-companion path) takes the remove-and-rescan
|
||||
path: mmc_sdio_hw_reset() marks the card removed and schedules
|
||||
mmc_rescan, which tears down the bound function drivers and re-detects
|
||||
the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
|
||||
With a single function probed it instead invokes mmc_power_cycle()
|
||||
directly, which on PineTab2 toggles the wifi-reset GPIO via
|
||||
sdio_pwrseq.
|
||||
|
||||
Add bes2600_chrdev_do_bus_reset() as the chrdev-side helper. It
|
||||
invokes the bus op and then waits on probe_done_wq for the SDIO
|
||||
remove() callback to clear sbus_priv, mirroring the wait pattern
|
||||
already used by bes2600_chrdev_do_system_close() so that a subsequent
|
||||
bes2600_switch_wifi(true) sees a clean state and can wait on the
|
||||
fresh probe.
|
||||
|
||||
Wire it into bes2600_chrdev_wifi_force_close_work(): when halt_dev is
|
||||
set (the hard-exception path used by both
|
||||
bes2600_bh_lmac_active_monitor and bes2600_bh_mcu_active_monitor) and
|
||||
the underlying bus implements bus_reset, take the new recovery path;
|
||||
otherwise fall back to the legacy power_switch(0) sequence so this
|
||||
patch is a no-op on USB or any other future bus that does not provide
|
||||
bus_reset.
|
||||
|
||||
mmc_hw_reset() is exported by the MMC core and is the canonical
|
||||
recovery primitive; calling it without holding the SDIO host claim is
|
||||
correct because the multi-func remove-and-rescan path acquires the
|
||||
host claim via the mmc workqueue, and the single-func mmc_power_cycle
|
||||
path does not require the host claim.
|
||||
|
||||
No DT change is required: this works against the existing PineTab2
|
||||
DTS, where the wifi-reset GPIO and the optional sdio_pwrkey GPIO (on
|
||||
v2.0 boards) are both already configured as MMC pwrseq resets.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/bes2600_sdio.c | 29 +++++++++++++++++++++
|
||||
bes2600/bes_chardev.c | 59 ++++++++++++++++++++++++++++++++++++++++--
|
||||
bes2600/bes_chardev.h | 1 +
|
||||
bes2600/sbus.h | 8 ++++++
|
||||
4 files changed, 95 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
index 13d4ff1..8552b12 100644
|
||||
--- a/drivers/staging/bes2600/bes2600_sdio.c
|
||||
+++ b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
@@ -16,6 +16,7 @@
|
||||
#include <linux/mmc/host.h>
|
||||
#include <linux/mmc/sdio_func.h>
|
||||
#include <linux/mmc/card.h>
|
||||
+#include <linux/mmc/core.h>
|
||||
#include <linux/mmc/sdio.h>
|
||||
#include <linux/spinlock.h>
|
||||
#include <net/mac80211.h>
|
||||
@@ -1756,6 +1757,33 @@ static void bes2600_sdio_halt_device(struct sbus_priv *self)
|
||||
sdio_work_debug(self);
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Trigger an SDIO bus reset via mmc_hw_reset().
|
||||
+ *
|
||||
+ * With multiple SDIO functions probed (PineTab2 binds func 1 for WLAN and
|
||||
+ * func 2 for the BT-companion path) mmc_sdio_hw_reset() takes the
|
||||
+ * remove-and-rescan path: it marks the card removed and schedules
|
||||
+ * mmc_rescan, which tears down the bound function drivers and re-detects
|
||||
+ * the card on the next sweep, in turn reinvoking bes2600_sdio_probe().
|
||||
+ *
|
||||
+ * With a single function probed it instead invokes mmc_power_cycle()
|
||||
+ * directly, which on PineTab2 toggles the wifi-reset GPIO via sdio_pwrseq.
|
||||
+ *
|
||||
+ * In both cases the chip ends up in a freshly reset state, which is the
|
||||
+ * goal of the recovery path.
|
||||
+ *
|
||||
+ * mmc_hw_reset() must be called without holding the SDIO host claim --
|
||||
+ * the multi-func remove-and-rescan path acquires the host claim via the
|
||||
+ * mmc workqueue.
|
||||
+ */
|
||||
+static int bes2600_sdio_bus_reset(struct sbus_priv *self)
|
||||
+{
|
||||
+ if (!self || !self->func || !self->func->card)
|
||||
+ return -EINVAL;
|
||||
+
|
||||
+ return mmc_hw_reset(self->func->card);
|
||||
+}
|
||||
+
|
||||
static bool bes2600_sdio_wakeup_source(struct sbus_priv *self)
|
||||
{
|
||||
struct bes2600_platform_data_sdio *pdata = bes2600_get_platform_data();
|
||||
@@ -1794,6 +1822,7 @@ static struct sbus_ops bes2600_sdio_sbus_ops = {
|
||||
.gpio_sleep = bes2600_gpio_allow_mcu_sleep,
|
||||
.halt_device = bes2600_sdio_halt_device,
|
||||
.wakeup_source = bes2600_sdio_wakeup_source,
|
||||
+ .bus_reset = bes2600_sdio_bus_reset,
|
||||
};
|
||||
|
||||
static void bes2600_sdio_en_lp_cb(struct bes2600_common *hw_priv)
|
||||
diff --git a/drivers/staging/bes2600/bes_chardev.c b/drivers/staging/bes2600/bes_chardev.c
|
||||
index f89dcb8..a74bf60 100644
|
||||
--- a/drivers/staging/bes2600/bes_chardev.c
|
||||
+++ b/drivers/staging/bes2600/bes_chardev.c
|
||||
@@ -1078,6 +1078,48 @@ int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_
|
||||
return ret;
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Hard-reset the bus and wait for the bus core to remove the chip.
|
||||
+ *
|
||||
+ * Used by the firmware-wedge recovery path on platforms where the normal
|
||||
+ * power_switch(0) sequence has no effective chip-reset signal. The bus
|
||||
+ * implementation triggers an asynchronous re-detect; this helper waits for
|
||||
+ * the resulting remove() callback to clear bes2600_cdev.sbus_priv so that a
|
||||
+ * subsequent bes2600_switch_wifi(true) sees a clean state and can wait on
|
||||
+ * the fresh probe.
|
||||
+ */
|
||||
+int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv)
|
||||
+{
|
||||
+ int ret;
|
||||
+ long status;
|
||||
+
|
||||
+ if (!sbus_ops || !priv)
|
||||
+ return -EINVAL;
|
||||
+
|
||||
+ if (!sbus_ops->bus_reset)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
+ bes_info("trigger bus reset to recover wedged firmware.\n");
|
||||
+
|
||||
+ ret = sbus_ops->bus_reset(priv);
|
||||
+ if (ret) {
|
||||
+ bes_err("bus_reset failed: %d\n", ret);
|
||||
+ return ret;
|
||||
+ }
|
||||
+
|
||||
+ /*
|
||||
+ * The bus reset is asynchronous: the bus core schedules a rescan
|
||||
+ * which removes the bound function drivers and then re-detects the
|
||||
+ * chip. Wait for the remove callback to clear sbus_priv. Do not
|
||||
+ * dereference 'priv' after this point -- it may already be freed.
|
||||
+ */
|
||||
+ status = wait_event_timeout(bes2600_cdev.probe_done_wq,
|
||||
+ !bes2600_cdev.sbus_priv, HZ * 3);
|
||||
+ WARN_ON(status <= 0);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+
|
||||
bool bes2600_chrdev_is_wifi_opened(void)
|
||||
{
|
||||
bool wifi_opened = false;
|
||||
@@ -1184,8 +1226,21 @@ static void bes2600_chrdev_wifi_force_close_work(struct work_struct *work)
|
||||
/* unregister wifi */
|
||||
bes2600_switch_wifi(0);
|
||||
|
||||
- /* power down device if wifi is only opened */
|
||||
- if (bes2600_chrdev_check_system_close()) {
|
||||
+ /*
|
||||
+ * Hard exception with a bus_reset implementation: tear the
|
||||
+ * bus down via mmc_hw_reset() (or equivalent) so the next
|
||||
+ * bringup probes a freshly reset chip. On PineTab2 this is
|
||||
+ * the only effective recovery path -- the existing
|
||||
+ * power_switch(0)/(1) sequence has no chip-reset signal of
|
||||
+ * its own (sdio_pwrseq owns wifi_reset).
|
||||
+ *
|
||||
+ * Soft close, or hard close on a board without bus_reset:
|
||||
+ * fall back to the legacy power_switch(0) sequence.
|
||||
+ */
|
||||
+ if (bes2600_cdev.halt_dev && bes2600_cdev.sbus_ops->bus_reset) {
|
||||
+ bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
|
||||
+ bes2600_cdev.sbus_priv);
|
||||
+ } else if (bes2600_chrdev_check_system_close()) {
|
||||
bes2600_chrdev_do_system_close(bes2600_cdev.sbus_ops,
|
||||
bes2600_cdev.sbus_priv);
|
||||
}
|
||||
diff --git a/drivers/staging/bes2600/bes_chardev.h b/drivers/staging/bes2600/bes_chardev.h
|
||||
index c627bb7..ca8419e 100644
|
||||
--- a/drivers/staging/bes2600/bes_chardev.h
|
||||
+++ b/drivers/staging/bes2600/bes_chardev.h
|
||||
@@ -60,6 +60,7 @@ struct sbus_priv *bes2600_chrdev_get_sbus_priv_data(void);
|
||||
/* used to control device power down */
|
||||
int bes2600_chrdev_check_system_close(void);
|
||||
int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
|
||||
+int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
|
||||
void bes2600_chrdev_wakeup_bt(void);
|
||||
void bes2600_chrdev_wifi_force_close(struct bes2600_common *hw_priv, bool halt_dev);
|
||||
void bes2600_chrdev_usb_remove(struct bes2600_common *hw_priv);
|
||||
diff --git a/drivers/staging/bes2600/sbus.h b/drivers/staging/bes2600/sbus.h
|
||||
index 1f2c0cd..cb90890 100644
|
||||
--- a/drivers/staging/bes2600/sbus.h
|
||||
+++ b/drivers/staging/bes2600/sbus.h
|
||||
@@ -75,6 +75,14 @@ struct sbus_ops {
|
||||
void (*halt_device)(struct sbus_priv *self);
|
||||
bool (*wakeup_source)(struct sbus_priv *self);
|
||||
int (*reboot)(struct sbus_priv *self);
|
||||
+ /*
|
||||
+ * Force the host bus to re-detect and re-probe the chip. Called
|
||||
+ * from the firmware-wedge recovery path when power_switch() has no
|
||||
+ * effective chip-reset signal of its own (e.g. PineTab2, where the
|
||||
+ * wifi-reset GPIO is owned by sdio_pwrseq, not the bes2600 node).
|
||||
+ * Returns 0 on success or a negative errno.
|
||||
+ */
|
||||
+ int (*bus_reset)(struct sbus_priv *self);
|
||||
};
|
||||
|
||||
void bes2600_irq_handler(struct bes2600_common *priv);
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+261
@@ -0,0 +1,261 @@
|
||||
From 7c4ad3b1d6614347dd7d9df87875f899acdffa79 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Tue, 28 Apr 2026 15:05:27 +0200
|
||||
Subject: [PATCH 04/20] bes2600: gate PM indication completion on pending
|
||||
request and track chip state
|
||||
|
||||
When mac80211 toggles PSM on the BES2600, the host sends WSM set_pm
|
||||
and waits up to 5 s on bes_power.pm_enter_cmpl for a firmware-side
|
||||
PM-changed indication confirming the transition. Three sequenced
|
||||
flaws make the wait-and-confirm racy and leave host/chip bookkeeping
|
||||
desynced when anything misfires:
|
||||
|
||||
1) bes2600_pwr_notify_ps_changed() unconditionally fires
|
||||
complete(pm_enter_cmpl) for any non-active psmode. It does not
|
||||
check whether a host-initiated set_pm is actually pending. A
|
||||
spontaneous indication (firmware-internal coex move,
|
||||
idle-driven aging) primes the completion, and the next host-
|
||||
driven enter_lp_mode sees a false success on its first
|
||||
wait_for_completion_timeout.
|
||||
|
||||
2) The wait/reinit ordering in bes2600_pwr_enter_lp_mode is
|
||||
|
||||
status = wait_for_completion_timeout(...);
|
||||
atomic_set(pm_set_in_process, 0);
|
||||
reinit_completion(...);
|
||||
|
||||
If an indication arrives between wait_for_completion_timeout
|
||||
returning with status==1 and reinit_completion, the next
|
||||
enter_lp_mode iteration's wait can also see false success. The
|
||||
reinit must happen *before* we start the new request, not
|
||||
after handling the previous one.
|
||||
|
||||
3) On wait_pm_ind timeout, the driver returns -ETIMEDOUT and walks
|
||||
away. It does not record that the firmware's actual PM state
|
||||
is no longer known to the host. Subsequent wake paths
|
||||
(gpio_wake / sbus_active) assume the chip is still active and
|
||||
hit deterministic SDIO failures when the firmware has
|
||||
transitioned anyway.
|
||||
|
||||
This patch is the safe-prerequisite half of a wider fix:
|
||||
|
||||
* bes_pwr.h gains enum bes2600_chip_pm_state {ACTIVE, LP, UNKNOWN}
|
||||
and bes_power.chip_pm_state. Its job is to track what the host
|
||||
has *seen the firmware confirm*, not what the host has
|
||||
requested. Initialised to ACTIVE in bes2600_pwr_init().
|
||||
|
||||
* bes2600_pwr_notify_ps_changed() unconditionally updates
|
||||
chip_pm_state on every indication, but only fires
|
||||
complete(pm_enter_cmpl) when atomic_cmpxchg(pm_set_in_process,
|
||||
1, 0) succeeds. A spontaneous indication can no longer prime a
|
||||
waiter that will only set up its request afterwards.
|
||||
|
||||
* bes2600_pwr_enter_lp_mode() now reinit_completion()s before
|
||||
setting pm_set_in_process and sending wsm_set_pm. After a
|
||||
timeout, it cmpxchgs pm_set_in_process back to 0 (so a late
|
||||
indication cannot prime the next iteration) and on the win-
|
||||
cmpxchg branch records chip_pm_state=UNKNOWN.
|
||||
|
||||
A follow-up patch consumes chip_pm_state on the wake side
|
||||
(bes2600_pwr_device_exit_lp_mode + bes2600_gpio_wakeup_mcu) to fix
|
||||
the deterministic "active mcu fail" cycle this state-record
|
||||
enables a fix for. Splitting the work this way keeps the lock-free
|
||||
race fix small and reviewable on its own.
|
||||
|
||||
No new locks, no behaviour change on the success path. Only the
|
||||
recovery path (timeout + spontaneous indication) gains correctness.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/bes_pwr.c | 106 ++++++++++++++++++++++++++++++++++++++++++----
|
||||
bes2600/bes_pwr.h | 15 +++++++
|
||||
2 files changed, 112 insertions(+), 9 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes_pwr.c b/drivers/staging/bes2600/bes_pwr.c
|
||||
index e7a1045..4c6bd78 100644
|
||||
--- a/drivers/staging/bes2600/bes_pwr.c
|
||||
+++ b/drivers/staging/bes2600/bes_pwr.c
|
||||
@@ -472,6 +472,7 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
int i = 0;
|
||||
struct bes2600_vif *priv;
|
||||
int ret = 0;
|
||||
+ int timeouts = 0;
|
||||
char ip_str[20];
|
||||
unsigned long status = 0;
|
||||
|
||||
@@ -523,7 +524,17 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
bes_devel("%s, psMode:%s, fastPsmIdlePeriod:%d apPsmChangePeriod:%d minAutoPsPollPeriod:%d\n",
|
||||
__func__, bes2600_get_ps_mode_str(priv->powersave_mode.pmMode), priv->powersave_mode.fastPsmIdlePeriod,
|
||||
priv->powersave_mode.apPsmChangePeriod, priv->powersave_mode.minAutoPsPollPeriod);
|
||||
+ /*
|
||||
+ * Reinit BEFORE the WSM goes out, so a stale
|
||||
+ * indication from a previous cycle cannot have
|
||||
+ * primed pm_enter_cmpl. From here until the
|
||||
+ * indication callback's cmpxchg(1->0) on
|
||||
+ * pm_set_in_process, only the indication for
|
||||
+ * THIS request can complete the wait.
|
||||
+ */
|
||||
+ reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
atomic_set(&hw_priv->bes_power.pm_set_in_process, 1);
|
||||
+
|
||||
ret = bes2600_set_pm(priv, &priv->powersave_mode);
|
||||
if (ret) {
|
||||
atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
|
||||
@@ -532,18 +543,75 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
|
||||
/* wait power save mode changed indication */
|
||||
status = wait_for_completion_timeout(&hw_priv->bes_power.pm_enter_cmpl, 5 * HZ);
|
||||
- atomic_set(&hw_priv->bes_power.pm_set_in_process, 0);
|
||||
- reinit_completion(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
- if (!status)
|
||||
- bes_err("%s, wait pm ind timeout\n", __func__);
|
||||
+ if (!status) {
|
||||
+ /*
|
||||
+ * The indication callback only fires
|
||||
+ * complete() when it observes
|
||||
+ * pm_set_in_process == 1; cmpxchg it
|
||||
+ * to 0 here so a late indication
|
||||
+ * cannot prime the next wait.
|
||||
+ *
|
||||
+ * If we win the cmpxchg, this is a
|
||||
+ * real timeout: the firmware's PS
|
||||
+ * state is unknown to us. Mark it as
|
||||
+ * such so the next wake path can
|
||||
+ * probe before assuming the chip is
|
||||
+ * still active.
|
||||
+ *
|
||||
+ * If we lose the cmpxchg, the
|
||||
+ * indication arrived between the
|
||||
+ * wait timing out and us getting
|
||||
+ * here; treat as success.
|
||||
+ */
|
||||
+ if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
|
||||
+ 1, 0) == 1) {
|
||||
+ bes_devel("%s, wait pm ind timeout\n", __func__);
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
+ BES2600_CHIP_PM_UNKNOWN);
|
||||
+ timeouts++;
|
||||
+ }
|
||||
+ }
|
||||
} else {
|
||||
bes_devel("skip enter lp mode\n");
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
- /* set device low power configuration */
|
||||
- bes2600_pwr_device_enter_lp_mode(hw_priv);
|
||||
+ /*
|
||||
+ * Enter the device-end of the LP transition only if every per-VIF
|
||||
+ * mac80211 handshake reached firmware-ACKed completion. Doing the
|
||||
+ * device-LP setup while any VIF is still pending leaves the driver
|
||||
+ * in an inconsistent state that cascades into SDIO TX errors on
|
||||
+ * the BES2600.
|
||||
+ */
|
||||
+ if (timeouts == 0) {
|
||||
+ bes2600_pwr_device_enter_lp_mode(hw_priv);
|
||||
+ } else {
|
||||
+ /*
|
||||
+ * device_enter_lp_mode() was skipped (one or more VIFs
|
||||
+ * timed out waiting for the firmware indication) so its
|
||||
+ * gpio_sleep(MCU) - which drops the wake-flag bit and, if
|
||||
+ * no other subsystem holds the wake, drives the GPIO low -
|
||||
+ * never ran. Without it the bit stays asserted, and the
|
||||
+ * next bes2600_pwr_device_exit_lp_mode() calls
|
||||
+ * gpio_wake(MCU) into a "bit already set" no-op: the GPIO
|
||||
+ * never re-edges, sbus_active() exhausts its 200x2ms
|
||||
+ * MCU_WAKEUP_READY budget against an unwoken chip, and
|
||||
+ * the first TX after idle stalls for several seconds.
|
||||
+ *
|
||||
+ * Drop the MCU wake-flag bit explicitly here so the next
|
||||
+ * wake injects a real GPIO edge. gpio_allow_mcu_sleep
|
||||
+ * preserves multi-subsystem semantics: it only drives the
|
||||
+ * GPIO low when no other subsystem still holds wake; if
|
||||
+ * BT or another holder is keeping the chip awake, the
|
||||
+ * GPIO stays high and the bit clear here is purely
|
||||
+ * bookkeeping (so the next gpio_wake doesn't no-op).
|
||||
+ */
|
||||
+ if (hw_priv->sbus_ops->gpio_sleep)
|
||||
+ hw_priv->sbus_ops->gpio_sleep(hw_priv->sbus_priv,
|
||||
+ GPIO_WAKE_FLAG_MCU);
|
||||
+ ret = -ETIMEDOUT;
|
||||
+ }
|
||||
|
||||
return ret;
|
||||
}
|
||||
@@ -819,6 +887,7 @@ void bes2600_pwr_init(struct bes2600_common *hw_priv)
|
||||
hw_priv->bes_power.power_up_task = NULL;
|
||||
mutex_init(&hw_priv->bes_power.pwr_mutex);
|
||||
atomic_set(&hw_priv->bes_power.dev_state, 0);
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state, BES2600_CHIP_PM_UNKNOWN);
|
||||
init_completion(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
sema_init(&hw_priv->bes_power.sync_lock, 1);
|
||||
device_set_wakeup_capable(hw_priv->pdev, true);
|
||||
@@ -1199,9 +1268,28 @@ int bes2600_pwr_clear_busy_event(struct bes2600_common *hw_priv, u32 event)
|
||||
|
||||
void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode)
|
||||
{
|
||||
- if((psmode & 0x01) != WSM_PSM_ACTIVE) {
|
||||
- bes_devel("complete pm_enter_cmpl\n");
|
||||
- complete(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
+ /*
|
||||
+ * The firmware sends a PM-changed indication for every transition,
|
||||
+ * including ones we didn't ask for (firmware-internal coex moves,
|
||||
+ * idle-driven aging). Update chip_pm_state unconditionally so the
|
||||
+ * wake path can use it, but only fire pm_enter_cmpl when a host-
|
||||
+ * initiated set_pm is actually in flight - otherwise a stale
|
||||
+ * indication can prime a future wait against a freshly
|
||||
+ * reinit_completion()'ed state.
|
||||
+ */
|
||||
+ if ((psmode & 0x01) != WSM_PSM_ACTIVE) {
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
+ BES2600_CHIP_PM_LP);
|
||||
+ if (atomic_cmpxchg(&hw_priv->bes_power.pm_set_in_process,
|
||||
+ 1, 0) == 1) {
|
||||
+ bes_devel("complete pm_enter_cmpl\n");
|
||||
+ complete(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
+ } else {
|
||||
+ bes_devel("PM ind (LP) without pending wait; state recorded\n");
|
||||
+ }
|
||||
+ } else {
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
+ BES2600_CHIP_PM_ACTIVE);
|
||||
}
|
||||
}
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes_pwr.h b/drivers/staging/bes2600/bes_pwr.h
|
||||
index 1ba866c..6bc44ac 100644
|
||||
--- a/drivers/staging/bes2600/bes_pwr.h
|
||||
+++ b/drivers/staging/bes2600/bes_pwr.h
|
||||
@@ -64,6 +64,20 @@ enum power_down_state
|
||||
POWER_DOWN_STATE_UNLOCKED,
|
||||
};
|
||||
|
||||
+/*
|
||||
+ * Confirmed PM state of the firmware-side chip. Tracks what the host
|
||||
+ * has *seen* the firmware acknowledge, not what the host has
|
||||
+ * requested. UNKNOWN means a host-initiated transition timed out
|
||||
+ * before the firmware indication arrived; the next wake path should
|
||||
+ * treat it as "we don't know" and probe before issuing GPIO/SDIO
|
||||
+ * wakeup ops.
|
||||
+ */
|
||||
+enum bes2600_chip_pm_state {
|
||||
+ BES2600_CHIP_PM_ACTIVE = 0,
|
||||
+ BES2600_CHIP_PM_LP,
|
||||
+ BES2600_CHIP_PM_UNKNOWN,
|
||||
+};
|
||||
+
|
||||
typedef void (*bes_pwr_enter_lp_cb)(struct bes2600_common *hw_priv);
|
||||
typedef void (*bes_pwr_exit_lp_cb)(struct bes2600_common *hw_priv);
|
||||
|
||||
@@ -106,6 +120,7 @@ struct bes2600_pwr_t
|
||||
bool ap_lp_bad;
|
||||
struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM];
|
||||
atomic_t pm_set_in_process;
|
||||
+ atomic_t chip_pm_state;
|
||||
};
|
||||
|
||||
#ifdef CONFIG_BES2600_WOWLAN
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+190
@@ -0,0 +1,190 @@
|
||||
From 51d46a2e2597ade0786b7af49bf1b687490f9dc9 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Tue, 28 Apr 2026 15:23:34 +0200
|
||||
Subject: [PATCH 05/20] bes2600: short-circuit wake handshake when chip is
|
||||
confirmed ACTIVE
|
||||
|
||||
The previous patch ("bes2600: gate PM indication completion on pending
|
||||
request and track chip state") added enum bes2600_chip_pm_state and the
|
||||
chip_pm_state field tracking what the host has *seen the firmware
|
||||
confirm*. This patch makes the wake side use it.
|
||||
|
||||
Without this, every bes2600_pwr_device_exit_lp_mode() unconditionally
|
||||
runs gpio_wake() + sbus_active() + wsm_set_operational_mode(active),
|
||||
even when the chip is already in confirmed-ACTIVE state and the wake
|
||||
sequence has nothing to do. The visible failure mode on PineTab2:
|
||||
|
||||
bes2600_pwr_enter_lp_mode, wait pm ind timeout
|
||||
repeat set gpio_wake_flag, sub_sys:0
|
||||
bes2600_sdio_active failed, subsys:0
|
||||
bes2600_pwr_device_exit_lp_mode, active mcu fail
|
||||
|
||||
cycling every ~9 s, ~22 cycles in 10 minutes. Three pieces:
|
||||
|
||||
1. enter_lp_mode timed out (firmware indication lost). With c6.1,
|
||||
chip_pm_state is now UNKNOWN.
|
||||
2. lock_device fires exit_lp_mode.
|
||||
3. gpio_wake hits "bit already set" because device_enter_lp_mode
|
||||
was skipped when the indication timed out, so gpio_sleep was
|
||||
never called - the bit reflects driver intent, not chip state.
|
||||
gpio_wake silently no-ops (no GPIO edge), bit stays set.
|
||||
4. sbus_active spends 200 x 2 ms looking for MCU_WAKEUP_READY that
|
||||
never comes (firmware was never told to wake), then fails.
|
||||
5. Driver continues to wsm_set_operational_mode against the wedged
|
||||
bus, compounding the failure.
|
||||
|
||||
This patch's three moves:
|
||||
|
||||
* bes2600_pwr_device_exit_lp_mode() reads chip_pm_state at entry.
|
||||
On BES2600_CHIP_PM_ACTIVE, log at devel level and return without
|
||||
touching gpio_wake / sbus_active / WSM. The chip is in the state
|
||||
we want; the handshake exists only to drive a transition.
|
||||
|
||||
* On BES2600_CHIP_PM_LP or BES2600_CHIP_PM_UNKNOWN, run the wake
|
||||
handshake as before, but on sbus_active() failure: set
|
||||
chip_pm_state = UNKNOWN, log once at err level, and bail out.
|
||||
Do NOT call wsm_set_operational_mode over a wedged bus - it
|
||||
would just emit a second error and leave the chip in an even
|
||||
less defined state.
|
||||
|
||||
* bes2600_gpio_wakeup_mcu() / bes2600_gpio_allow_mcu_sleep():
|
||||
demote "repeat set/clear gpio_wake_flag" from bes_err to
|
||||
bes_devel. Multi-subsystem wake-hold (e.g. WIFI + BT both want
|
||||
MCU awake) is the steady-state case, and the symmetric clear
|
||||
while bit-already-clear is racy bookkeeping rather than a
|
||||
hardware error. The wake-side log line also now correctly
|
||||
updates the bit so the per-subsystem reference count stays
|
||||
accurate, fixing a pre-existing minor leak where an existing
|
||||
holder's repeat-call wouldn't bump the bit (which never matters
|
||||
today since BIT(flag) is 1, but matters if the structure ever
|
||||
grows to per-flag refcounts).
|
||||
|
||||
Net effect on the cycle:
|
||||
|
||||
* If chip is genuinely ACTIVE (chip_pm_state == ACTIVE), wake skips
|
||||
cleanly. Storm goes silent.
|
||||
* If chip is genuinely LP, behaviour is unchanged.
|
||||
* If chip is UNKNOWN (post-timeout state), one wake attempt is
|
||||
made; on failure, state stays UNKNOWN and we don't emit a
|
||||
second cascade error per attempt. Repeated UNKNOWN with failed
|
||||
wake will eventually be picked up by the LMAC active-monitor
|
||||
and escalated to mmc_hw_reset (c5.2).
|
||||
|
||||
No new locks, no new state. Only consumption of the chip_pm_state
|
||||
field added in the prerequisite patch.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/bes2600_sdio.c | 15 +++++++++--
|
||||
bes2600/bes_pwr.c | 56 ++++++++++++++++++++++++++++++++++++------
|
||||
2 files changed, 62 insertions(+), 9 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
index 8552b12..deefba9 100644
|
||||
--- a/drivers/staging/bes2600/bes2600_sdio.c
|
||||
+++ b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
@@ -1368,7 +1368,14 @@ static void bes2600_gpio_wakeup_mcu(struct sbus_priv *self, int flag)
|
||||
|
||||
/* error check */
|
||||
if((self->gpio_wakup_flags & BIT(flag)) != 0) {
|
||||
- bes_err( "repeat set gpio_wake_flag, sub_sys:%d", flag);
|
||||
+ /*
|
||||
+ * Multiple subsystems holding wake is the steady-state case
|
||||
+ * (e.g. WIFI + BT both want MCU awake). Demoted from bes_err
|
||||
+ * to bes_devel since it isn't an error - the GPIO is already
|
||||
+ * asserted high and the subsystem is now also tracked.
|
||||
+ */
|
||||
+ bes_devel("repeat set gpio_wake_flag, sub_sys:%d\n", flag);
|
||||
+ self->gpio_wakup_flags |= BIT(flag);
|
||||
mutex_unlock(&self->io_mutex);
|
||||
return;
|
||||
}
|
||||
@@ -1400,7 +1407,11 @@ static void bes2600_gpio_allow_mcu_sleep(struct sbus_priv *self, int flag)
|
||||
|
||||
/* error check */
|
||||
if((self->gpio_wakup_flags & BIT(flag)) == 0) {
|
||||
- bes_err( "repeat clear gpio_wake_flag, sub_sys:%d", flag);
|
||||
+ /*
|
||||
+ * Mirror of the wake path: a clear when the bit is already
|
||||
+ * clear is racy bookkeeping, not a hardware error.
|
||||
+ */
|
||||
+ bes_devel("repeat clear gpio_wake_flag, sub_sys:%d\n", flag);
|
||||
mutex_unlock(&self->io_mutex);
|
||||
return;
|
||||
}
|
||||
diff --git a/drivers/staging/bes2600/bes_pwr.c b/drivers/staging/bes2600/bes_pwr.c
|
||||
index 4c6bd78..5798e8a 100644
|
||||
--- a/drivers/staging/bes2600/bes_pwr.c
|
||||
+++ b/drivers/staging/bes2600/bes_pwr.c
|
||||
@@ -619,19 +619,61 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
static void bes2600_pwr_device_exit_lp_mode(struct bes2600_common *hw_priv)
|
||||
{
|
||||
int ret = 0;
|
||||
+ enum bes2600_chip_pm_state state;
|
||||
struct wsm_operational_mode mode = {
|
||||
.power_mode = wsm_power_mode_active,
|
||||
.disableMoreFlagUsage = true,
|
||||
};
|
||||
|
||||
- bes_devel("host lock lmac\n");
|
||||
- if(hw_priv->sbus_ops->gpio_wake)
|
||||
- hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv, GPIO_WAKE_FLAG_MCU);
|
||||
+ /*
|
||||
+ * Consult chip_pm_state set by bes2600_pwr_notify_ps_changed().
|
||||
+ * If we last saw the firmware confirm ACTIVE, skip ONLY the
|
||||
+ * gpio_wake + sbus_active wake handshake - the GPIO is already
|
||||
+ * asserted high and the SDIO MCU subsystem is already running,
|
||||
+ * so another sbus_active() round-trip just hits its 200x2ms
|
||||
+ * timeout because the firmware has nothing to do.
|
||||
+ *
|
||||
+ * wsm_set_operational_mode() below is NOT part of the wake
|
||||
+ * handshake; it is the operational-mode setter the firmware
|
||||
+ * tracks per call. Skipping it leaves the chip's SDIO state
|
||||
+ * machine without a fresh operational-mode update, which on
|
||||
+ * PineTab2 wedges the bus (-EBUSY on next sdio_rx_work read)
|
||||
+ * within a few seconds of probe completion. So it must run
|
||||
+ * unconditionally.
|
||||
+ */
|
||||
+ state = atomic_read(&hw_priv->bes_power.chip_pm_state);
|
||||
+ if (state == BES2600_CHIP_PM_ACTIVE) {
|
||||
+ bes_devel("device_exit_lp_mode: chip already ACTIVE, skipping wake handshake\n");
|
||||
+ } else {
|
||||
+ bes_devel("host lock lmac\n");
|
||||
+ if (hw_priv->sbus_ops->gpio_wake)
|
||||
+ hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
|
||||
+ GPIO_WAKE_FLAG_MCU);
|
||||
|
||||
- if(hw_priv->sbus_ops->sbus_active) {
|
||||
- ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv, SUBSYSTEM_MCU);
|
||||
- if (ret)
|
||||
- bes_err("%s, active mcu fail\n", __func__);
|
||||
+ if (hw_priv->sbus_ops->sbus_active) {
|
||||
+ ret = hw_priv->sbus_ops->sbus_active(hw_priv->sbus_priv,
|
||||
+ SUBSYSTEM_MCU);
|
||||
+ if (ret) {
|
||||
+ /*
|
||||
+ * MCU_WAKEUP_READY did not arrive within
|
||||
+ * the SDIO handshake window. Record state
|
||||
+ * as UNKNOWN so the next exit_lp_mode call
|
||||
+ * also runs the full wake sequence (no
|
||||
+ * skip), but still send operational_mode
|
||||
+ * below to match pre-c6 behaviour - the
|
||||
+ * WSM may succeed even if the SDIO active
|
||||
+ * confirm was lost, and if it fails too,
|
||||
+ * we just emit a second devel-level error.
|
||||
+ * Repeated UNKNOWN is the signal for the
|
||||
+ * LMAC active-monitor to eventually
|
||||
+ * escalate to bus_reset (c5.2's
|
||||
+ * mmc_hw_reset path).
|
||||
+ */
|
||||
+ bes_err("%s, active mcu fail\n", __func__);
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
+ BES2600_CHIP_PM_UNKNOWN);
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
|
||||
ret = wsm_set_operational_mode(hw_priv, &mode, 0);
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+209
@@ -0,0 +1,209 @@
|
||||
From 9a0a4c0a4687cc0a70a34be57a74a0fbc327b066 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Tue, 28 Apr 2026 16:54:06 +0200
|
||||
Subject: [PATCH 06/20] bes2600: self-detect when firmware does not honor PSM
|
||||
and skip the cycle
|
||||
|
||||
The c6 series fixed several host-side bookkeeping bugs around PSM
|
||||
transitions, but didn't address the underlying contract: this chip's
|
||||
firmware (BES2600 with the Bestechnic Dec 2023 build that ships on
|
||||
PineTab2 and most danctnix images) silently drops every WSM_set_pm
|
||||
request without emitting the corresponding PM_INDICATION. The driver's
|
||||
own power_down_work delayed work calls bes2600_pwr_enter_lp_mode every
|
||||
~10s; without firmware acknowledgment each call burns 5s on
|
||||
wait_for_completion_timeout(pm_enter_cmpl, 5*HZ) and produces a
|
||||
recurring three-line cascade in dmesg:
|
||||
|
||||
bes2600_pwr_enter_lp_mode, wait pm ind timeout
|
||||
bes2600_sdio_active failed, subsys:0
|
||||
bes2600_pwr_device_exit_lp_mode, active mcu fail
|
||||
|
||||
Confirmed by tripwire instrumentation on PineTab2 (linux-pinetab2
|
||||
6.19.10-danctnix1, ohm) running the c5+c6 stack: zero
|
||||
wsm_set_pm_indication() invocations across an entire boot, while
|
||||
bes2600_pwr_enter_lp_mode timed out repeatedly, and
|
||||
bes2600_sdio_active() consistently saw BES_SLAVE_STATUS_REG_ID return
|
||||
0x2f (every "ready" bit set except MCU_WAKEUP_READY (bit 4) - the
|
||||
firmware reports "I'm awake, there's nothing to wake from").
|
||||
|
||||
This patch makes the driver self-heal:
|
||||
|
||||
* struct bes2600_pwr_t gains pm_unsupported (bool) and
|
||||
pm_consecutive_timeouts (unsigned int). Both initialised to
|
||||
0/false.
|
||||
|
||||
* bes2600_pwr_enter_lp_mode early-returns -EOPNOTSUPP when
|
||||
pm_unsupported is set. Skips the per-VIF set_pm round-trip and
|
||||
the wait_for_completion entirely.
|
||||
|
||||
* On the cmpxchg-success branch of the timeout path, we increment
|
||||
pm_consecutive_timeouts. When it crosses
|
||||
BES2600_PM_UNSUPPORTED_THRESHOLD (3, ~15s of trying), we latch
|
||||
pm_unsupported = true and force chip_pm_state = ACTIVE so that
|
||||
bes2600_pwr_device_exit_lp_mode's c6.2 skip branch covers the
|
||||
wake side (no gpio_wake / sbus_active / WSM_set_operational_mode
|
||||
reissue past the first one).
|
||||
|
||||
* bes2600_pwr_notify_ps_changed resets pm_consecutive_timeouts to 0
|
||||
on any incoming PM indication, and clears pm_unsupported if it
|
||||
was previously latched. So a firmware update that fixes PM_IND
|
||||
delivery automatically re-enables PSM transitions without a
|
||||
driver rebuild.
|
||||
|
||||
mac80211's PSM requests via bes2600_set_pm() still flow to the
|
||||
firmware unchanged; they just don't have host-side timeouts so they
|
||||
remain silent regardless of firmware acknowledgment. Power
|
||||
consumption goes up if the firmware actually CAN do PSM (we'd be
|
||||
keeping the chip awake unnecessarily), but on a chip where the
|
||||
counter trips this trade-off is forced anyway: the chip stayed awake
|
||||
under the broken cascade as well, just with constant SDIO churn.
|
||||
|
||||
Net effect on dmesg: after ~15s of boot, the three-line cascade stops
|
||||
firing entirely. The firmware-side wedge is observed once per boot
|
||||
(captured by the pm_unsupported latch) instead of per-cycle.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/bes_pwr.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++-
|
||||
bes2600/bes_pwr.h | 9 ++++++
|
||||
2 files changed, 78 insertions(+), 1 deletion(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes_pwr.c b/drivers/staging/bes2600/bes_pwr.c
|
||||
index 5798e8a..ec91485 100644
|
||||
--- a/drivers/staging/bes2600/bes_pwr.c
|
||||
+++ b/drivers/staging/bes2600/bes_pwr.c
|
||||
@@ -467,6 +467,45 @@ static void bes2600_pwr_device_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
bes_devel("device enter sleep\n");
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Number of consecutive bes2600_pwr_enter_lp_mode timeouts (with zero
|
||||
+ * PM_INDICATIONs received) before we conclude the firmware does not
|
||||
+ * honor host-driven PSM and switch to a sticky skip path.
|
||||
+ */
|
||||
+#define BES2600_PM_UNSUPPORTED_THRESHOLD 3
|
||||
+
|
||||
+/*
|
||||
+ * Latch pm_unsupported = true and force chip_pm_state = ACTIVE so the
|
||||
+ * c6.2 wake-side skip branch covers bes2600_pwr_device_exit_lp_mode.
|
||||
+ * Called after BES2600_PM_UNSUPPORTED_THRESHOLD consecutive enter_lp_mode
|
||||
+ * timeouts with zero PM_INDICATIONs.
|
||||
+ */
|
||||
+static void bes2600_pwr_latch_pm_unsupported(struct bes2600_common *hw_priv)
|
||||
+{
|
||||
+ bes_warn("PSM not honored (%u timeouts), switching to skip mode\n",
|
||||
+ hw_priv->bes_power.pm_consecutive_timeouts);
|
||||
+ hw_priv->bes_power.pm_unsupported = true;
|
||||
+ atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
+ BES2600_CHIP_PM_ACTIVE);
|
||||
+
|
||||
+ /*
|
||||
+ * Hold the MCU wake-flag bit permanently. Without this, every
|
||||
+ * sdio_rx_work invocation hits bes2600_gpio_wakeup_mcu(SDIO_RX)
|
||||
+ * when gpio_wakup_flags == 0, drives the GPIO high and msleeps
|
||||
+ * 10 ms per RX. With ~50 RX/s of beacons + multicast that's
|
||||
+ * ~50%% of the bes_sdio workqueue thread blocked in msleep,
|
||||
+ * which directly caps RX throughput. Holding the MCU bit makes
|
||||
+ * those calls bit-only bookkeeping (gpio_wakeup = (flags == 0)
|
||||
+ * stays false, no GPIO toggle, no msleep). The bit is never
|
||||
+ * cleared once pm_unsupported is set because
|
||||
+ * bes2600_pwr_device_enter_lp_mode is unreachable under the
|
||||
+ * early-return.
|
||||
+ */
|
||||
+ if (hw_priv->sbus_ops->gpio_wake)
|
||||
+ hw_priv->sbus_ops->gpio_wake(hw_priv->sbus_priv,
|
||||
+ GPIO_WAKE_FLAG_MCU);
|
||||
+}
|
||||
+
|
||||
static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
{
|
||||
int i = 0;
|
||||
@@ -476,6 +515,17 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
char ip_str[20];
|
||||
unsigned long status = 0;
|
||||
|
||||
+ /*
|
||||
+ * Sticky early-return when we've previously concluded the firmware
|
||||
+ * doesn't honor PSM. Each attempt would otherwise burn 5s on a
|
||||
+ * doomed wait_for_completion_timeout and produce a noisy three-line
|
||||
+ * cascade in dmesg every time power_down_work retries (every
|
||||
+ * ~10s). The chip stays in active mode, which on this firmware is
|
||||
+ * the de-facto state anyway.
|
||||
+ */
|
||||
+ if (hw_priv->bes_power.pm_unsupported)
|
||||
+ return -EOPNOTSUPP;
|
||||
+
|
||||
/* set interface low power configuration */
|
||||
bes2600_for_each_vif(hw_priv, priv, i) {
|
||||
#ifdef P2P_MULTIVIF
|
||||
@@ -569,6 +619,9 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
BES2600_CHIP_PM_UNKNOWN);
|
||||
timeouts++;
|
||||
+ if (++hw_priv->bes_power.pm_consecutive_timeouts
|
||||
+ >= BES2600_PM_UNSUPPORTED_THRESHOLD)
|
||||
+ bes2600_pwr_latch_pm_unsupported(hw_priv);
|
||||
}
|
||||
}
|
||||
} else {
|
||||
@@ -607,7 +660,8 @@ static int bes2600_pwr_enter_lp_mode(struct bes2600_common *hw_priv)
|
||||
* GPIO stays high and the bit clear here is purely
|
||||
* bookkeeping (so the next gpio_wake doesn't no-op).
|
||||
*/
|
||||
- if (hw_priv->sbus_ops->gpio_sleep)
|
||||
+ if (!hw_priv->bes_power.pm_unsupported &&
|
||||
+ hw_priv->sbus_ops->gpio_sleep)
|
||||
hw_priv->sbus_ops->gpio_sleep(hw_priv->sbus_priv,
|
||||
GPIO_WAKE_FLAG_MCU);
|
||||
ret = -ETIMEDOUT;
|
||||
@@ -930,6 +984,8 @@ void bes2600_pwr_init(struct bes2600_common *hw_priv)
|
||||
mutex_init(&hw_priv->bes_power.pwr_mutex);
|
||||
atomic_set(&hw_priv->bes_power.dev_state, 0);
|
||||
atomic_set(&hw_priv->bes_power.chip_pm_state, BES2600_CHIP_PM_UNKNOWN);
|
||||
+ hw_priv->bes_power.pm_unsupported = false;
|
||||
+ hw_priv->bes_power.pm_consecutive_timeouts = 0;
|
||||
init_completion(&hw_priv->bes_power.pm_enter_cmpl);
|
||||
sema_init(&hw_priv->bes_power.sync_lock, 1);
|
||||
device_set_wakeup_capable(hw_priv->pdev, true);
|
||||
@@ -1319,6 +1375,18 @@ void bes2600_pwr_notify_ps_changed(struct bes2600_common *hw_priv, u8 psmode)
|
||||
* indication can prime a future wait against a freshly
|
||||
* reinit_completion()'ed state.
|
||||
*/
|
||||
+ /*
|
||||
+ * Any PM indication, whatever its psmode, proves the firmware is
|
||||
+ * actually emitting them. Reset the consecutive-timeout counter
|
||||
+ * so a transient stall doesn't permanently disable PSM, and clear
|
||||
+ * pm_unsupported if a previous run had latched it.
|
||||
+ */
|
||||
+ hw_priv->bes_power.pm_consecutive_timeouts = 0;
|
||||
+ if (hw_priv->bes_power.pm_unsupported) {
|
||||
+ bes_warn("PM indication arrived after pm_unsupported was set; re-enabling PSM transitions\n");
|
||||
+ hw_priv->bes_power.pm_unsupported = false;
|
||||
+ }
|
||||
+
|
||||
if ((psmode & 0x01) != WSM_PSM_ACTIVE) {
|
||||
atomic_set(&hw_priv->bes_power.chip_pm_state,
|
||||
BES2600_CHIP_PM_LP);
|
||||
diff --git a/drivers/staging/bes2600/bes_pwr.h b/drivers/staging/bes2600/bes_pwr.h
|
||||
index 6bc44ac..92de90b 100644
|
||||
--- a/drivers/staging/bes2600/bes_pwr.h
|
||||
+++ b/drivers/staging/bes2600/bes_pwr.h
|
||||
@@ -121,6 +121,15 @@ struct bes2600_pwr_t
|
||||
struct bes2600_pwr_event_t pwr_events[BES2600_DELAY_EVENT_NUM];
|
||||
atomic_t pm_set_in_process;
|
||||
atomic_t chip_pm_state;
|
||||
+ /*
|
||||
+ * Sticky flag set after BES2600_PM_UNSUPPORTED_THRESHOLD
|
||||
+ * consecutive enter_lp_mode timeouts with zero PM_INDICATIONs
|
||||
+ * received from firmware. Indicates this chip's firmware does
|
||||
+ * not honor host-driven PSM transitions; further attempts are
|
||||
+ * skipped to avoid the 5s timeout cascade.
|
||||
+ */
|
||||
+ bool pm_unsupported;
|
||||
+ unsigned int pm_consecutive_timeouts;
|
||||
};
|
||||
|
||||
#ifdef CONFIG_BES2600_WOWLAN
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+83
@@ -0,0 +1,83 @@
|
||||
From d48f2ae73ca17761d7a64aa645b4629641c8be5d Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Tue, 28 Apr 2026 21:37:37 +0200
|
||||
Subject: [PATCH 07/20] bes2600: handle multi-function SDIO cards in
|
||||
mmc_hw_reset bus_reset
|
||||
|
||||
c5.2 (recover-wedged-firmware-via-mmc-hw-reset) wraps mmc_hw_reset()
|
||||
and treats any non-zero return as a recovery failure. On
|
||||
single-function SDIO cards mmc_hw_reset returns 0 after doing the
|
||||
remove + rescan inline. On multi-function cards (BES2600 has WLAN
|
||||
func 1 + BT companion func 2) the kernel's mmc_sdio_hw_reset() does
|
||||
NOT do the rescan: it tears the card down and returns 1 to signal
|
||||
"caller must trigger rescan".
|
||||
|
||||
Field observation on PineTab2 (linux-pinetab2 6.19.10-danctnix1):
|
||||
when a real LMAC wedge fired bes2600_chrdev_wifi_force_close ->
|
||||
bes2600_chrdev_do_bus_reset, mmc_hw_reset returned 1, c5.2's wrapper
|
||||
treated that as "bus_reset failed: 1", logged the error, and gave
|
||||
up. The card was already removed (mmc2: card 0001 removed) but
|
||||
nothing scheduled a rescan; wifi (and the BT companion which shares
|
||||
the same SDIO host) stayed silent until the user rebooted four
|
||||
minutes later.
|
||||
|
||||
Fix:
|
||||
|
||||
- Capture the mmc_host pointer before calling mmc_hw_reset (the
|
||||
card pointer is invalid after the remove).
|
||||
- On positive return (multi-function path), log informationally
|
||||
and call mmc_detect_change(host, 0) to schedule a rescan.
|
||||
Return 0 so callers see the recovery as successful.
|
||||
- Negative return is still treated as failure as before.
|
||||
|
||||
The mmc_detect_change side effect is asynchronous; the chrdev's
|
||||
wait_event_timeout(probe_done_wq, !sbus_priv) still observes the
|
||||
remove half synchronously, and the rescan + re-probe runs out of
|
||||
the host detect work afterwards.
|
||||
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/bes2600_sdio.c | 24 +++++++++++++++++++++++-
|
||||
1 file changed, 23 insertions(+), 1 deletion(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
index deefba9..c0b67b0 100644
|
||||
--- a/drivers/staging/bes2600/bes2600_sdio.c
|
||||
+++ b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
@@ -1789,10 +1789,32 @@ static void bes2600_sdio_halt_device(struct sbus_priv *self)
|
||||
*/
|
||||
static int bes2600_sdio_bus_reset(struct sbus_priv *self)
|
||||
{
|
||||
+ struct mmc_host *host;
|
||||
+ int ret;
|
||||
+
|
||||
if (!self || !self->func || !self->func->card)
|
||||
return -EINVAL;
|
||||
|
||||
- return mmc_hw_reset(self->func->card);
|
||||
+ host = self->func->card->host;
|
||||
+ ret = mmc_hw_reset(self->func->card);
|
||||
+
|
||||
+ /*
|
||||
+ * On multi-function SDIO cards (BES2600 has WLAN func 1 + BT
|
||||
+ * companion func 2), mmc_sdio_hw_reset() removes the card and
|
||||
+ * returns 1 to signal "remove happened, caller must trigger
|
||||
+ * rescan". The kernel does NOT auto-rescan in this case;
|
||||
+ * single-function cards take the rescan path inline and return 0.
|
||||
+ * Treat any non-negative return as success and force a rescan if
|
||||
+ * mmc_hw_reset signalled the multi-function path - otherwise the
|
||||
+ * card stays removed indefinitely after a wedge recovery,
|
||||
+ * leaving wifi (and the BT companion) silent until reboot.
|
||||
+ */
|
||||
+ if (ret > 0) {
|
||||
+ bes_info("multi-func mmc_hw_reset removed card; scheduling rescan\n");
|
||||
+ mmc_detect_change(host, 0);
|
||||
+ ret = 0;
|
||||
+ }
|
||||
+ return ret;
|
||||
}
|
||||
|
||||
static bool bes2600_sdio_wakeup_source(struct sbus_priv *self)
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+221
@@ -0,0 +1,221 @@
|
||||
From 3b4239ad2b7976eab04ccae748e36fb78422874f Mon Sep 17 00:00:00 2001
|
||||
From: "Claude (noether)" <claude@reauktion.de>
|
||||
Date: Wed, 6 May 2026 19:50:52 +0200
|
||||
Subject: [PATCH 08/20] bes2600: pre-empt AP-deauth-6 with mac80211 reassoc on
|
||||
decrypt-fail storm
|
||||
|
||||
When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a burst
|
||||
of received frames (typically because the host's PTK or GTK has fallen
|
||||
out of sync with the AP), the AP eventually concludes that the STA is
|
||||
not authenticated and emits an unprotected deauth-reason-6 ("Class 2
|
||||
frame received from non-authenticated station"). On the deployed
|
||||
pinetab2 + bes2600 stack this AP-initiated deauth has been observed to
|
||||
leave the link blackholed for up to 109 s before userspace finds a
|
||||
different SSID/channel to recover on. (Receipts at
|
||||
https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)
|
||||
|
||||
Add a sliding-window counter on each bes2600_vif: when 5 decrypt
|
||||
failures fire within 5 s, schedule a worker that calls
|
||||
ieee80211_connection_loss(vif). mac80211 then performs immediate
|
||||
disassociation; userspace (NetworkManager / wpa_supplicant) reconnects
|
||||
with fresh keys before the AP gets a chance to fire its unprotected
|
||||
deauth.
|
||||
|
||||
Predicted Phase 7 delta vs the unpatched baseline:
|
||||
- decrypt-burst rate: unchanged (this does not address root cause)
|
||||
- AP-deauth-6 rate: <= 0.2 of baseline
|
||||
- conditional probability of >5s blackhole given a burst:
|
||||
100% -> <= 10%
|
||||
- worst-case recovery time: 109s -> <5s
|
||||
|
||||
Contract pin: ieee80211_connection_loss() per
|
||||
include/net/mac80211.h: "may also be called if the connection needs to
|
||||
be terminated for some other reason... will cause immediate change to
|
||||
disassociated state, without connection recovery attempts." Userspace
|
||||
recovery is the existing NM/wpa_supplicant path. The worker context
|
||||
satisfies the implicit process-context expectation.
|
||||
|
||||
Files touched:
|
||||
- bes2600/bes2600.h: 4 new fields on struct bes2600_vif + 2 prototypes
|
||||
- bes2600/txrx.c: new helpers + the call site at the existing
|
||||
WSM_STATUS_DECRYPTFAILURE log point (the unconditional "goto drop"
|
||||
branch in bes2600_rx_cb)
|
||||
- bes2600/sta.c: bes2600_decrypt_storm_init() in bes2600_vif_setup;
|
||||
cancel_work_sync() in bes2600_remove_interface, alongside the
|
||||
existing per-vif cancel_*_work_sync block. Safe under the kernel
|
||||
cancel_work_sync contract: the work_struct is INIT_WORK'd in setup,
|
||||
so the call is valid; it blocks until any in-flight handler returns,
|
||||
ensuring no use-after-free of priv when mac80211 frees the vif; and
|
||||
it is idempotent (subsequent calls just return false).
|
||||
- bes2600/debug.c: DecryptStormRecoveries seq_printf in the per-vif
|
||||
status seq_file output
|
||||
|
||||
Threshold (5/5s) is set well above the steady-state per-vif decrypt-
|
||||
fail rate observed in measurement (~1/min even under sustained 1 MB/s
|
||||
load), so a true storm is required to trip it. The cw1200/cw1260
|
||||
ancestor has no equivalent storm-recovery; this is a clean addition.
|
||||
|
||||
checkpatch.pl --no-tree --strict: clean (0/0/0).
|
||||
|
||||
Signed-off-by: Claude (noether) <claude@reauktion.de>
|
||||
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||||
---
|
||||
bes2600/bes2600.h | 9 ++++++
|
||||
bes2600/debug.c | 2 ++
|
||||
bes2600/sta.c | 2 ++
|
||||
bes2600/txrx.c | 74 +++++++++++++++++++++++++++++++++++++++++++++++
|
||||
4 files changed, 87 insertions(+)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600.h b/drivers/staging/bes2600/bes2600.h
|
||||
index 0e60960..66482f7 100644
|
||||
--- a/drivers/staging/bes2600/bes2600.h
|
||||
+++ b/drivers/staging/bes2600/bes2600.h
|
||||
@@ -596,6 +596,11 @@ struct bes2600_vif {
|
||||
unsigned long rx_timestamp;
|
||||
u32 cipherType;
|
||||
|
||||
+ /* Decrypt-storm fast-recover (Trigger B). See txrx.c. */
|
||||
+ unsigned long decrypt_storm_window_start;
|
||||
+ unsigned int decrypt_storm_count;
|
||||
+ unsigned int decrypt_storm_recoveries;
|
||||
+ struct work_struct decrypt_storm_recover_work;
|
||||
|
||||
/* AP powersave */
|
||||
u32 link_id_map;
|
||||
@@ -856,4 +861,8 @@ int bes2600_btusb_setup_pipes(struct sbus_priv *sbus_priv);
|
||||
void bes2600_btusb_uninit(struct usb_interface *interface);
|
||||
#endif
|
||||
|
||||
+/* Decrypt-storm fast-recover helpers — see txrx.c. */
|
||||
+void bes2600_decrypt_storm_init(struct bes2600_vif *priv);
|
||||
+void bes2600_decrypt_storm_account(struct bes2600_vif *priv);
|
||||
+
|
||||
#endif /* BES2600_H */
|
||||
diff --git a/drivers/staging/bes2600/debug.c b/drivers/staging/bes2600/debug.c
|
||||
index 5228b22..ca223dd 100644
|
||||
--- a/drivers/staging/bes2600/debug.c
|
||||
+++ b/drivers/staging/bes2600/debug.c
|
||||
@@ -542,6 +542,8 @@ static int bes2600_status_show_priv(struct seq_file *seq, void *v)
|
||||
priv->listening ? " (listening)" : "");
|
||||
seq_printf(seq, "Assoc: %s\n",
|
||||
bes2600_debug_join_status[priv->join_status]);
|
||||
+ seq_printf(seq, "DecryptStormRecoveries: %u\n",
|
||||
+ priv->decrypt_storm_recoveries);
|
||||
if (priv->rx_filter.promiscuous)
|
||||
seq_puts(seq, "Filter: promisc\n");
|
||||
else if (priv->rx_filter.fcs)
|
||||
diff --git a/drivers/staging/bes2600/sta.c b/drivers/staging/bes2600/sta.c
|
||||
index ca1c77c..ee9fd81 100644
|
||||
--- a/drivers/staging/bes2600/sta.c
|
||||
+++ b/drivers/staging/bes2600/sta.c
|
||||
@@ -464,6 +464,7 @@ void bes2600_remove_interface(struct ieee80211_hw *dev,
|
||||
cancel_delayed_work_sync(&priv->join_timeout);
|
||||
cancel_delayed_work_sync(&priv->set_cts_work);
|
||||
cancel_delayed_work_sync(&priv->pending_offchanneltx_work);
|
||||
+ cancel_work_sync(&priv->decrypt_storm_recover_work);
|
||||
|
||||
timer_delete_sync(&priv->mcast_timeout);
|
||||
/* TODO:COMBO: May be reset of these variables "delayed_link_loss and
|
||||
@@ -2639,6 +2640,7 @@ int bes2600_vif_setup(struct bes2600_vif *priv)
|
||||
|
||||
/* Setup per vif workitems and locks */
|
||||
spin_lock_init(&priv->vif_lock);
|
||||
+ bes2600_decrypt_storm_init(priv);
|
||||
INIT_WORK(&priv->join_work, bes2600_join_work);
|
||||
INIT_DELAYED_WORK(&priv->join_timeout, bes2600_join_timeout);
|
||||
INIT_WORK(&priv->unjoin_work, bes2600_unjoin_work);
|
||||
diff --git a/drivers/staging/bes2600/txrx.c b/drivers/staging/bes2600/txrx.c
|
||||
index 017f0d8..f6a66d6 100644
|
||||
--- a/drivers/staging/bes2600/txrx.c
|
||||
+++ b/drivers/staging/bes2600/txrx.c
|
||||
@@ -26,6 +26,78 @@
|
||||
|
||||
#define BES2600_INVALID_RATE_ID (0xFF)
|
||||
|
||||
+/*
|
||||
+ * Decrypt-storm fast-recover (Trigger B).
|
||||
+ *
|
||||
+ * When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a
|
||||
+ * burst of received frames (typically because the host's PTK or GTK
|
||||
+ * has fallen out of sync with the AP), the AP eventually concludes that
|
||||
+ * the STA is not authenticated and emits an unprotected deauth-reason-6
|
||||
+ * ("Class 2 frame received from non-authenticated station"). On the
|
||||
+ * deployed pinetab2 + bes2600 stack this AP-initiated deauth has been
|
||||
+ * observed to leave the link blackholed for up to 109 s before
|
||||
+ * userspace finds a different SSID/channel to recover on. (Receipts at
|
||||
+ * https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)
|
||||
+ *
|
||||
+ * Recovery here pre-empts the AP: when we see THRESHOLD decrypt
|
||||
+ * failures within WINDOW, we ask mac80211 for a clean reassoc via
|
||||
+ * ieee80211_connection_loss(), which causes immediate disassociation
|
||||
+ * and lets userspace auto-reconnect with fresh keys.
|
||||
+ *
|
||||
+ * mac80211 contract: ieee80211_connection_loss() may be called
|
||||
+ * regardless of IEEE80211_HW_CONNECTION_MONITOR; it causes immediate
|
||||
+ * disassociation without driver-side recovery attempts. See
|
||||
+ * include/net/mac80211.h for the canonical doc-comment.
|
||||
+ *
|
||||
+ * The threshold is set well above the steady-state per-vif
|
||||
+ * decrypt-fail rate observed in measurement (~1/min even under
|
||||
+ * sustained 1 MB/s load), so a true storm is required to trip it.
|
||||
+ */
|
||||
+#define BES2600_DECRYPT_STORM_THRESHOLD 5
|
||||
+#define BES2600_DECRYPT_STORM_WINDOW_MS 5000
|
||||
+
|
||||
+static void bes2600_decrypt_storm_recover_work(struct work_struct *work)
|
||||
+{
|
||||
+ struct bes2600_vif *priv = container_of(work, struct bes2600_vif,
|
||||
+ decrypt_storm_recover_work);
|
||||
+
|
||||
+ if (!priv->vif)
|
||||
+ return;
|
||||
+
|
||||
+ bes_warn("[bes2600] decrypt-storm fast-recover: forcing reassoc\n");
|
||||
+ ieee80211_connection_loss(priv->vif);
|
||||
+ priv->decrypt_storm_recoveries++;
|
||||
+}
|
||||
+
|
||||
+void bes2600_decrypt_storm_init(struct bes2600_vif *priv)
|
||||
+{
|
||||
+ INIT_WORK(&priv->decrypt_storm_recover_work,
|
||||
+ bes2600_decrypt_storm_recover_work);
|
||||
+ priv->decrypt_storm_window_start = 0;
|
||||
+ priv->decrypt_storm_count = 0;
|
||||
+ priv->decrypt_storm_recoveries = 0;
|
||||
+}
|
||||
+
|
||||
+void bes2600_decrypt_storm_account(struct bes2600_vif *priv)
|
||||
+{
|
||||
+ unsigned long now = jiffies;
|
||||
+ unsigned long window = msecs_to_jiffies(BES2600_DECRYPT_STORM_WINDOW_MS);
|
||||
+
|
||||
+ if (priv->decrypt_storm_window_start == 0 ||
|
||||
+ time_after(now, priv->decrypt_storm_window_start + window)) {
|
||||
+ priv->decrypt_storm_window_start = now;
|
||||
+ priv->decrypt_storm_count = 1;
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ if (++priv->decrypt_storm_count >= BES2600_DECRYPT_STORM_THRESHOLD) {
|
||||
+ priv->decrypt_storm_count = 0;
|
||||
+ /* Skew the window so we don't re-fire on the same storm. */
|
||||
+ priv->decrypt_storm_window_start = now + window;
|
||||
+ schedule_work(&priv->decrypt_storm_recover_work);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
#ifdef CONFIG_BES2600_TESTMODE
|
||||
#include "bes_nl80211_testmode_msg.h"
|
||||
#endif /* CONFIG_BES2600_TESTMODE */
|
||||
@@ -1694,6 +1766,8 @@ void bes2600_rx_cb(struct bes2600_vif *priv,
|
||||
goto drop;
|
||||
} else {
|
||||
bes_warn("[RX] Receive failure: %d.\n", arg->status);
|
||||
+ if (arg->status == WSM_STATUS_DECRYPTFAILURE)
|
||||
+ bes2600_decrypt_storm_account(priv);
|
||||
goto drop;
|
||||
}
|
||||
}
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+279
@@ -0,0 +1,279 @@
|
||||
From a7e232738d50c797bb2be1e71cbe1578a1d46dda Mon Sep 17 00:00:00 2001
|
||||
From: "Claude (noether)" <claude@reauktion.de>
|
||||
Date: Thu, 7 May 2026 11:30:09 +0200
|
||||
Subject: [PATCH 09/20] bes2600: bus_reset on connection-loss storm to dodge
|
||||
assoc-comeback blackhole
|
||||
|
||||
When mac80211 declares connection loss against this AP (typically driven
|
||||
by inactivity-deauth or beacon-loss), the userspace reauth that follows
|
||||
sometimes enters a long blackhole: the AP responds to auth with success
|
||||
but defers assoc with the 802.11v "assoc comeback" timer; ohm retries
|
||||
faster than the comeback grants permission; the AP eventually fires an
|
||||
unprotected deauth-reason-6 ("Class 2 frame received from non-
|
||||
authenticated station"), and recovery only completes via cross-SSID or
|
||||
cross-channel fallback. Receipts: ~86 s blackhole observed in the
|
||||
phase-7 rep on 2026-05-07 02:42, with three subsequent BSSIDs returning
|
||||
assoc comeback timeouts before reason-9 (STA_REQ_ASSOC_WITHOUT_AUTH)
|
||||
fired. Documented in marfrit/besser:notes/phase4-2026-05-07.md.
|
||||
|
||||
When N=3 driver-side connection_loss decisions fire within a 60 s window
|
||||
on the same vif, skip the ieee80211_connection_loss() path and trigger
|
||||
the c5.2-introduced bes2600_chrdev_do_bus_reset() instead. The bus
|
||||
reset removes and re-probes the chip; userspace re-associates with a
|
||||
fresh chip state, dodging the AP's comeback-timer rejection cycle.
|
||||
|
||||
Predicted Phase 7 delta vs current baseline:
|
||||
- api_connection_loss rate: unchanged (we don't address the trigger)
|
||||
- conditional probability of >5 s blackhole given event: <= 30 %
|
||||
- worst-case recovery: 86 s -> < 10 s
|
||||
|
||||
Contract pin: bes2600_chrdev_do_bus_reset(sbus_ops, sbus_priv) at
|
||||
bes2600/bes_chardev.c:455, introduced by c5.2. The function is async-
|
||||
returning: sbus_ops->bus_reset() schedules an SDIO rescan; the helper
|
||||
waits up to 3 s for the remove() callback to clear sbus_priv, then
|
||||
returns. Per-vif state is gone after this point, so the recover work
|
||||
lives on bes2600_common (hw_priv) and uses the global bes2600_cdev for
|
||||
the bus_reset call rather than dereferencing per-vif state.
|
||||
|
||||
Threshold (3 / 60 s) is well above the steady-state per-vif
|
||||
connection_loss rate observed in the patch-A phase-7 rep (0.86/h under
|
||||
sustained load), so a true storm is required to trip it.
|
||||
|
||||
Files touched:
|
||||
- bes2600/bes2600.h: 3 counter fields on struct bes2600_vif, 1
|
||||
work_struct on struct bes2600_common, 3 prototypes
|
||||
- bes2600/sta.c: 3 helpers + storm-account hook in
|
||||
bes2600_connection_loss_work + storm-init in bes2600_vif_setup +
|
||||
cancel_work_sync in the hw_priv shutdown path; #include bes_chardev.h
|
||||
was already pulled in by an earlier c-stack patch
|
||||
- bes2600/main.c: INIT_WORK alongside other hw_priv work_structs
|
||||
- bes2600/debug.c: ConnectionLossStormRecoveries seq_printf in the
|
||||
per-vif status seq_file output
|
||||
|
||||
The cw1200/cw1260 ancestor has no equivalent; this is a clean
|
||||
addition. checkpatch.pl --no-tree --strict: clean (0/0/0).
|
||||
|
||||
Signed-off-by: Claude (noether) <claude@reauktion.de>
|
||||
---
|
||||
bes2600/bes2600.h | 12 +++++++
|
||||
bes2600/bes_chardev.c | 12 +++++++
|
||||
bes2600/bes_chardev.h | 1 +
|
||||
bes2600/debug.c | 2 ++
|
||||
bes2600/main.c | 2 ++
|
||||
bes2600/sta.c | 82 +++++++++++++++++++++++++++++++++++++++++--
|
||||
6 files changed, 109 insertions(+), 2 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600.h b/drivers/staging/bes2600/bes2600.h
|
||||
index 66482f7..ec41141 100644
|
||||
--- a/drivers/staging/bes2600/bes2600.h
|
||||
+++ b/drivers/staging/bes2600/bes2600.h
|
||||
@@ -511,6 +511,9 @@ struct bes2600_common {
|
||||
struct list_head coex_event_list;
|
||||
spinlock_t coex_event_lock;
|
||||
|
||||
+ /* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
|
||||
+ struct work_struct connection_loss_storm_recover_work;
|
||||
+
|
||||
/* member for low power */
|
||||
struct bes2600_pwr_t bes_power;
|
||||
|
||||
@@ -627,6 +630,10 @@ struct bes2600_vif {
|
||||
/* CQM Implementation */
|
||||
struct delayed_work bss_loss_work;
|
||||
struct delayed_work connection_loss_work;
|
||||
+ /* Connection-loss-storm fast-recover (Trigger A). See sta.c. */
|
||||
+ unsigned long connection_loss_storm_window_start;
|
||||
+ unsigned int connection_loss_storm_count;
|
||||
+ unsigned int connection_loss_storm_recoveries;
|
||||
struct work_struct tx_failure_work;
|
||||
int delayed_link_loss;
|
||||
spinlock_t bss_loss_lock;
|
||||
@@ -865,4 +872,9 @@ void bes2600_btusb_uninit(struct usb_interface *interface);
|
||||
void bes2600_decrypt_storm_init(struct bes2600_vif *priv);
|
||||
void bes2600_decrypt_storm_account(struct bes2600_vif *priv);
|
||||
|
||||
+/* Connection-loss-storm fast-recover helpers — see sta.c. */
|
||||
+void bes2600_connection_loss_storm_init(struct bes2600_vif *priv);
|
||||
+bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv);
|
||||
+void bes2600_connection_loss_storm_recover(struct work_struct *work);
|
||||
+
|
||||
#endif /* BES2600_H */
|
||||
diff --git a/drivers/staging/bes2600/bes_chardev.c b/drivers/staging/bes2600/bes_chardev.c
|
||||
index a74bf60..df6b911 100644
|
||||
--- a/drivers/staging/bes2600/bes_chardev.c
|
||||
+++ b/drivers/staging/bes2600/bes_chardev.c
|
||||
@@ -1120,6 +1120,18 @@ int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_pri
|
||||
return 0;
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Trigger bes2600_chrdev_do_bus_reset() against the file-global
|
||||
+ * bes2600_cdev. Used by host-side recovery paths outside this
|
||||
+ * compilation unit (e.g. sta.c connection-loss-storm fast-recover) so
|
||||
+ * those callers do not need to reach the static bes2600_cdev directly.
|
||||
+ */
|
||||
+int bes2600_chrdev_trigger_bus_reset(void)
|
||||
+{
|
||||
+ return bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
|
||||
+ bes2600_cdev.sbus_priv);
|
||||
+}
|
||||
+
|
||||
bool bes2600_chrdev_is_wifi_opened(void)
|
||||
{
|
||||
bool wifi_opened = false;
|
||||
diff --git a/drivers/staging/bes2600/bes_chardev.h b/drivers/staging/bes2600/bes_chardev.h
|
||||
index ca8419e..2a7cad7 100644
|
||||
--- a/drivers/staging/bes2600/bes_chardev.h
|
||||
+++ b/drivers/staging/bes2600/bes_chardev.h
|
||||
@@ -61,6 +61,7 @@ struct sbus_priv *bes2600_chrdev_get_sbus_priv_data(void);
|
||||
int bes2600_chrdev_check_system_close(void);
|
||||
int bes2600_chrdev_do_system_close(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
|
||||
int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_priv *priv);
|
||||
+int bes2600_chrdev_trigger_bus_reset(void);
|
||||
void bes2600_chrdev_wakeup_bt(void);
|
||||
void bes2600_chrdev_wifi_force_close(struct bes2600_common *hw_priv, bool halt_dev);
|
||||
void bes2600_chrdev_usb_remove(struct bes2600_common *hw_priv);
|
||||
diff --git a/drivers/staging/bes2600/debug.c b/drivers/staging/bes2600/debug.c
|
||||
index ca223dd..0d68392 100644
|
||||
--- a/drivers/staging/bes2600/debug.c
|
||||
+++ b/drivers/staging/bes2600/debug.c
|
||||
@@ -544,6 +544,8 @@ static int bes2600_status_show_priv(struct seq_file *seq, void *v)
|
||||
bes2600_debug_join_status[priv->join_status]);
|
||||
seq_printf(seq, "DecryptStormRecoveries: %u\n",
|
||||
priv->decrypt_storm_recoveries);
|
||||
+ seq_printf(seq, "ConnectionLossStormRecoveries: %u\n",
|
||||
+ priv->connection_loss_storm_recoveries);
|
||||
if (priv->rx_filter.promiscuous)
|
||||
seq_puts(seq, "Filter: promisc\n");
|
||||
else if (priv->rx_filter.fcs)
|
||||
diff --git a/drivers/staging/bes2600/main.c b/drivers/staging/bes2600/main.c
|
||||
index 3b0b7a3..000329c 100644
|
||||
--- a/drivers/staging/bes2600/main.c
|
||||
+++ b/drivers/staging/bes2600/main.c
|
||||
@@ -489,6 +489,8 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
|
||||
spin_lock_init(&hw_priv->rtsvalue_lock);
|
||||
INIT_WORK(&hw_priv->dynamic_opt_txrx_work, bes2600_dynamic_opt_txrx_work);
|
||||
INIT_WORK(&hw_priv->tx_policy_upload_work, tx_policy_upload_work);
|
||||
+ INIT_WORK(&hw_priv->connection_loss_storm_recover_work,
|
||||
+ bes2600_connection_loss_storm_recover);
|
||||
spin_lock_init(&hw_priv->event_queue_lock);
|
||||
INIT_LIST_HEAD(&hw_priv->event_queue);
|
||||
INIT_WORK(&hw_priv->event_handler, bes2600_event_handler);
|
||||
diff --git a/drivers/staging/bes2600/sta.c b/drivers/staging/bes2600/sta.c
|
||||
index ee9fd81..ec67d38 100644
|
||||
--- a/drivers/staging/bes2600/sta.c
|
||||
+++ b/drivers/staging/bes2600/sta.c
|
||||
@@ -268,6 +268,7 @@ void bes2600_stop(struct ieee80211_hw *dev, bool suspend)
|
||||
cancel_work_sync(&hw_priv->coex_work);
|
||||
coex_stop(hw_priv);
|
||||
#endif
|
||||
+ cancel_work_sync(&hw_priv->connection_loss_storm_recover_work);
|
||||
|
||||
bes2600_wifi_stop(hw_priv);
|
||||
|
||||
@@ -1675,6 +1676,70 @@ report:
|
||||
spin_unlock(&priv->bss_loss_lock);
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Connection-loss-storm fast-recover (Trigger A).
|
||||
+ *
|
||||
+ * bes2600_connection_loss_work below is the driver's own decision-point
|
||||
+ * to give up on a BSS (after bss-loss detection accumulates beyond
|
||||
+ * tolerance) and tell mac80211 via ieee80211_connection_loss(). On the
|
||||
+ * deployed pinetab2 stack a single ieee80211_connection_loss() event
|
||||
+ * sometimes triggers a userspace reauth blackhole (assoc-comeback
|
||||
+ * timeouts followed by AP unprotected-deauth-reason-6) that ends only
|
||||
+ * via cross-channel/cross-SSID fallback and can take 80+ s. Receipts at
|
||||
+ * https://git.reauktion.de/marfrit/besser, notes/phase4-2026-05-07.md.
|
||||
+ *
|
||||
+ * When N connection-loss decisions land within WINDOW on the same vif,
|
||||
+ * skip the ieee80211_connection_loss() path and trigger a chip-level
|
||||
+ * bus_reset (the c5.2-introduced bes2600_chrdev_do_bus_reset). The chip
|
||||
+ * is removed and re-probed; userspace re-associates from a fresh state,
|
||||
+ * dodging the assoc-comeback loop.
|
||||
+ *
|
||||
+ * Threshold (3 / 60 s) is chosen well above the steady-state per-vif
|
||||
+ * connection-loss rate observed in the patch-A Phase-7 rep
|
||||
+ * (0.86/h under sustained load), so a true storm is required.
|
||||
+ *
|
||||
+ * The recover work_struct lives on bes2600_common (hw_priv) so that
|
||||
+ * scheduling it does not race with vif teardown after bus_reset frees
|
||||
+ * the per-vif state.
|
||||
+ */
|
||||
+#define BES2600_CONNECTION_LOSS_STORM_THRESHOLD 3
|
||||
+#define BES2600_CONNECTION_LOSS_STORM_WINDOW_MS 60000
|
||||
+
|
||||
+void bes2600_connection_loss_storm_recover(struct work_struct *work)
|
||||
+{
|
||||
+ bes_warn("[bes2600] connection-loss-storm fast-recover: bus_reset\n");
|
||||
+ bes2600_chrdev_trigger_bus_reset();
|
||||
+ /*
|
||||
+ * After bes2600_chrdev_do_bus_reset() returns, the SDIO core has
|
||||
+ * scheduled a remove + rescan; per-vif state may already be gone.
|
||||
+ * Do not dereference any per-vif pointer here.
|
||||
+ */
|
||||
+}
|
||||
+
|
||||
+void bes2600_connection_loss_storm_init(struct bes2600_vif *priv)
|
||||
+{
|
||||
+ priv->connection_loss_storm_window_start = 0;
|
||||
+ priv->connection_loss_storm_count = 0;
|
||||
+ priv->connection_loss_storm_recoveries = 0;
|
||||
+}
|
||||
+
|
||||
+bool bes2600_connection_loss_storm_account(struct bes2600_vif *priv)
|
||||
+{
|
||||
+ unsigned long now = jiffies;
|
||||
+ unsigned long window =
|
||||
+ msecs_to_jiffies(BES2600_CONNECTION_LOSS_STORM_WINDOW_MS);
|
||||
+
|
||||
+ if (priv->connection_loss_storm_window_start == 0 ||
|
||||
+ time_after(now, priv->connection_loss_storm_window_start + window)) {
|
||||
+ priv->connection_loss_storm_window_start = now;
|
||||
+ priv->connection_loss_storm_count = 1;
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ return ++priv->connection_loss_storm_count >=
|
||||
+ BES2600_CONNECTION_LOSS_STORM_THRESHOLD;
|
||||
+}
|
||||
+
|
||||
void bes2600_connection_loss_work(struct work_struct *work)
|
||||
{
|
||||
struct bes2600_vif *priv =
|
||||
@@ -1684,9 +1749,21 @@ void bes2600_connection_loss_work(struct work_struct *work)
|
||||
|
||||
bes_devel("[CQM] Reporting connection loss.\n");
|
||||
bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_BSS_LOST);
|
||||
- if(bes2600_suspend_status_get(hw_priv)) {
|
||||
+
|
||||
+ if (bes2600_connection_loss_storm_account(priv)) {
|
||||
+ bes_warn("[bes2600] connection-loss storm: %u in %u s, scheduling bus reset\n",
|
||||
+ priv->connection_loss_storm_count,
|
||||
+ BES2600_CONNECTION_LOSS_STORM_WINDOW_MS / 1000);
|
||||
+ priv->connection_loss_storm_count = 0;
|
||||
+ priv->connection_loss_storm_recoveries++;
|
||||
+ schedule_work(&hw_priv->connection_loss_storm_recover_work);
|
||||
+ /* bus_reset will tear the chip down; skip the mac80211 path. */
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ if (bes2600_suspend_status_get(hw_priv))
|
||||
bes2600_pending_unjoin_set(hw_priv, priv->if_id);
|
||||
- } else
|
||||
+ else
|
||||
ieee80211_connection_loss(priv->vif);
|
||||
#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
|
||||
// set disconnected in BSS_CHANGED_ASSOC
|
||||
@@ -2641,6 +2718,7 @@ int bes2600_vif_setup(struct bes2600_vif *priv)
|
||||
/* Setup per vif workitems and locks */
|
||||
spin_lock_init(&priv->vif_lock);
|
||||
bes2600_decrypt_storm_init(priv);
|
||||
+ bes2600_connection_loss_storm_init(priv);
|
||||
INIT_WORK(&priv->join_work, bes2600_join_work);
|
||||
INIT_DELAYED_WORK(&priv->join_timeout, bes2600_join_timeout);
|
||||
INIT_WORK(&priv->unjoin_work, bes2600_unjoin_work);
|
||||
--
|
||||
2.54.0
|
||||
|
||||
@@ -0,0 +1,92 @@
|
||||
From d9268b433abc035c6e3f63a26191df5855b09b61 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Thu, 7 May 2026 21:19:49 +0200
|
||||
Subject: [PATCH 10/20] bes2600: replace a set of atomic_add()
|
||||
|
||||
Backport of cw1200 mainline commit 07f995ca1951 ("cw1200: replace a set
|
||||
of atomic_add()", 2020-11-10). atomic_inc() reads more naturally than
|
||||
atomic_add(1, &x). Mechanical change, no functional impact.
|
||||
|
||||
7 sites: 6 in bh.c (bh_term, bh_rx x2, bh_tx x3) and 1 in itp.c
|
||||
(awaiting_confirm). Two of the bh_rx and three of the bh_tx sites are
|
||||
inside the cw1200-ancestor #if 0 block; replaced anyway to keep the
|
||||
file consistent with cw1200 mainline source style.
|
||||
|
||||
Cherry-picked from upstream Linux:
|
||||
07f995ca1951 cw1200: replace a set of atomic_add()
|
||||
Author: Yejune Deng <yejune.deng@gmail.com>
|
||||
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
||||
Link: https://lore.kernel.org/r/1604991491-27908-1-git-send-email-yejune.deng@gmail.com
|
||||
---
|
||||
bes2600/bh.c | 12 ++++++------
|
||||
bes2600/itp.c | 2 +-
|
||||
2 files changed, 7 insertions(+), 7 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bh.c b/drivers/staging/bes2600/bh.c
|
||||
index 175ab5e..fab3bf0 100644
|
||||
--- a/drivers/staging/bes2600/bh.c
|
||||
+++ b/drivers/staging/bes2600/bh.c
|
||||
@@ -102,7 +102,7 @@ void bes2600_unregister_bh(struct bes2600_common *hw_priv)
|
||||
coex_deinit_mode(hw_priv);
|
||||
#endif
|
||||
|
||||
- atomic_add(1, &hw_priv->bh_term);
|
||||
+ atomic_inc(&hw_priv->bh_term);
|
||||
wake_up(&hw_priv->bh_wq);
|
||||
|
||||
flush_workqueue(hw_priv->bh_workqueue);
|
||||
@@ -591,7 +591,7 @@ static int bes2600_bh(void *arg)
|
||||
bes_devel("[BH] Device resume.\n");
|
||||
atomic_set(&hw_priv->bh_suspend, BES2600_BH_RESUMED);
|
||||
wake_up(&hw_priv->bh_evt_wq);
|
||||
- atomic_add(1, &hw_priv->bh_rx);
|
||||
+ atomic_inc(&hw_priv->bh_rx);
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -759,9 +759,9 @@ tx:
|
||||
|
||||
#if 0 /* count is not implemented */
|
||||
if (ret > 1)
|
||||
- atomic_add(1, &hw_priv->bh_tx);
|
||||
+ atomic_inc(&hw_priv->bh_tx);
|
||||
#else
|
||||
- atomic_add(1, &hw_priv->bh_tx);
|
||||
+ atomic_inc(&hw_priv->bh_tx);
|
||||
#endif
|
||||
|
||||
#if defined(CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES)
|
||||
@@ -1135,7 +1135,7 @@ static int bes2600_bh_tx_helper(struct bes2600_common *hw_priv,
|
||||
tx_len += 4;
|
||||
#endif
|
||||
|
||||
- atomic_add(1, &hw_priv->bh_tx);
|
||||
+ atomic_inc(&hw_priv->bh_tx);
|
||||
|
||||
tx_len = hw_priv->sbus_ops->align_size(
|
||||
hw_priv->sbus_priv, tx_len);
|
||||
@@ -1442,7 +1442,7 @@ static int bes2600_bh(void *arg)
|
||||
bes_devel("[BH] Device resume.\n");
|
||||
atomic_set(&hw_priv->bh_suspend, BES2600_BH_RESUMED);
|
||||
wake_up(&hw_priv->bh_evt_wq);
|
||||
- atomic_add(1, &hw_priv->bh_rx);
|
||||
+ atomic_inc(&hw_priv->bh_rx);
|
||||
goto done;
|
||||
}
|
||||
|
||||
diff --git a/drivers/staging/bes2600/itp.c b/drivers/staging/bes2600/itp.c
|
||||
index e5c2958..c50b29c 100644
|
||||
--- a/drivers/staging/bes2600/itp.c
|
||||
+++ b/drivers/staging/bes2600/itp.c
|
||||
@@ -570,7 +570,7 @@ int bes2600_itp_get_tx(struct bes2600_common *priv, u8 **data,
|
||||
*burst = 2;
|
||||
atomic_set(&priv->bh_tx, 1);
|
||||
ktime_get_ts(&itp->last_sent);
|
||||
- atomic_add(1, &itp->awaiting_confirm);
|
||||
+ atomic_inc(&itp->awaiting_confirm);
|
||||
spin_unlock_bh(&itp->tx_lock);
|
||||
return 1;
|
||||
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+58
@@ -0,0 +1,58 @@
|
||||
From 77f966df25d24a2fb85d235bcaa6248ddc394822 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Thu, 7 May 2026 21:20:46 +0200
|
||||
Subject: [PATCH 11/20] bes2600: fix missing destroy_workqueue() on error in
|
||||
init_common
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Two error paths between create_singlethread_workqueue() (~main.c:489)
|
||||
and the success-path destroy_workqueue() in unregister_common (~609)
|
||||
return without cleaning up the workqueue, leaking it on probe failure:
|
||||
|
||||
1. bes2600_queue_stats_init() failure
|
||||
2. bes2600_queue_init() failure (any of the 4 TID queues)
|
||||
|
||||
Both call ieee80211_free_hw(hw); return NULL — without first
|
||||
destroy_workqueue(hw_priv->workqueue). Add it.
|
||||
|
||||
Backport of cw1200 mainline commit 7ec8a926188e ("cw1200: fix missing
|
||||
destroy_workqueue() on error in cw1200_init_common", 2020-11-19),
|
||||
which fixed the identical bug in the same code shape we inherited.
|
||||
Reported on cw1200 by Hulk Robot.
|
||||
|
||||
Cherry-picked from upstream Linux:
|
||||
7ec8a926188e cw1200: fix missing destroy_workqueue() on error
|
||||
Author: Qinglang Miao <miaoqinglang@huawei.com>
|
||||
Reported-by: Hulk Robot <hulkci@huawei.com>
|
||||
Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
|
||||
Link: https://lore.kernel.org/r/20201119070842.1011-1-miaoqinglang@huawei.com
|
||||
Fixes: a910e4a94f69 ("cw1200: add driver for the ST-E CW1100 & CW1200 WLAN chipsets")
|
||||
---
|
||||
bes2600/main.c | 2 ++
|
||||
1 file changed, 2 insertions(+)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/main.c b/drivers/staging/bes2600/main.c
|
||||
index 000329c..f9f5f3b 100644
|
||||
--- a/drivers/staging/bes2600/main.c
|
||||
+++ b/drivers/staging/bes2600/main.c
|
||||
@@ -502,6 +502,7 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
|
||||
WLAN_LINK_ID_MAX,
|
||||
bes2600_skb_dtor,
|
||||
hw_priv))) {
|
||||
+ destroy_workqueue(hw_priv->workqueue);
|
||||
ieee80211_free_hw(hw);
|
||||
return NULL;
|
||||
}
|
||||
@@ -513,6 +514,7 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
|
||||
for (; i > 0; i--)
|
||||
bes2600_queue_deinit(&hw_priv->tx_queue[i - 1]);
|
||||
bes2600_queue_stats_deinit(&hw_priv->tx_queue_stats);
|
||||
+ destroy_workqueue(hw_priv->workqueue);
|
||||
ieee80211_free_hw(hw);
|
||||
return NULL;
|
||||
}
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+144
@@ -0,0 +1,144 @@
|
||||
From 9e38ac552302b6a6bbbeeb27339b8f8ca190110f Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Thu, 7 May 2026 21:24:01 +0200
|
||||
Subject: [PATCH 12/20] bes2600: fix concurrency UAF in bes2600_hw_scan and
|
||||
sched_scan
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
bes2600_bss_info_changed() and bes2600_hw_scan() can run concurrently.
|
||||
The probe-request SKB allocated by ieee80211_probereq_get() before
|
||||
scan.lock + conf_lock are taken can be touched by a concurrent
|
||||
bss_info_changed (via wsm_set_template_frame's path) while we hold no
|
||||
lock. Reorder to acquire both locks BEFORE the SKB allocation.
|
||||
|
||||
Also reorder cleanup paths so dev_kfree_skb() runs BEFORE up() —
|
||||
otherwise a small window exists where the SKB has been touched but the
|
||||
lock has been released, allowing concurrent code to also touch it.
|
||||
|
||||
Three sites fixed:
|
||||
- bes2600_hw_scan: lock-take + ENOMEM cleanup + wsm_set_template_frame
|
||||
error cleanup + success-path SKB free + lock release order
|
||||
- bes2600_sched_scan_start (#ifdef ROAM_OFFLOAD): same three sub-fixes
|
||||
(compiled-out at default build, fixed for consistency)
|
||||
- All success/error paths: dev_kfree_skb before up()
|
||||
|
||||
Backport of cw1200 mainline commit 86760e0dfe36 ("cw1200: Fix
|
||||
concurrency use-after-free bugs in cw1200_hw_scan()", 2018-12-14),
|
||||
which fixed the identical bug in the same code shape we inherited.
|
||||
That commit was merged from upstream 4f68ef64cd7f.
|
||||
|
||||
Cherry-picked from upstream Linux:
|
||||
86760e0dfe36 cw1200: Fix concurrency use-after-free bugs in cw1200_hw_scan()
|
||||
Author: Jia-Ju Bai <baijiaju1990@gmail.com>
|
||||
Link: https://lore.kernel.org/r/20181214035521.7575-1-baijiaju1990@gmail.com
|
||||
---
|
||||
bes2600/scan.c | 37 ++++++++++++++++++++++---------------
|
||||
1 file changed, 22 insertions(+), 15 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/scan.c b/drivers/staging/bes2600/scan.c
|
||||
index b944adc..3cd7b64 100644
|
||||
--- a/drivers/staging/bes2600/scan.c
|
||||
+++ b/drivers/staging/bes2600/scan.c
|
||||
@@ -257,18 +257,21 @@ int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
|
||||
bes2600_pwr_set_busy_event(hw_priv, BES_PWR_LOCK_ON_SCAN);
|
||||
|
||||
+ /* will be unlocked in bes2600_scan_work() */
|
||||
+ down(&hw_priv->scan.lock);
|
||||
+ down(&hw_priv->conf_lock);
|
||||
+
|
||||
frame.skb = ieee80211_probereq_get(hw, priv->vif->addr, NULL, 0,
|
||||
req->ie_len);
|
||||
- if (!frame.skb)
|
||||
+ if (!frame.skb) {
|
||||
+ up(&hw_priv->conf_lock);
|
||||
+ up(&hw_priv->scan.lock);
|
||||
return -ENOMEM;
|
||||
+ }
|
||||
|
||||
if (req->ie_len)
|
||||
skb_put_data(frame.skb, req->ie, req->ie_len);
|
||||
|
||||
- /* will be unlocked in bes2600_scan_work() */
|
||||
- down(&hw_priv->scan.lock);
|
||||
- down(&hw_priv->conf_lock);
|
||||
-
|
||||
if (frame.skb) {
|
||||
int ret;
|
||||
//if (priv->if_id == 0)
|
||||
@@ -286,9 +289,9 @@ int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
}
|
||||
#endif
|
||||
if (ret) {
|
||||
+ dev_kfree_skb(frame.skb);
|
||||
up(&hw_priv->conf_lock);
|
||||
up(&hw_priv->scan.lock);
|
||||
- dev_kfree_skb(frame.skb);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
@@ -318,10 +321,10 @@ int bes2600_hw_scan(struct ieee80211_hw *hw,
|
||||
++hw_priv->scan.n_ssids;
|
||||
}
|
||||
|
||||
- up(&hw_priv->conf_lock);
|
||||
-
|
||||
if (frame.skb)
|
||||
dev_kfree_skb(frame.skb);
|
||||
+
|
||||
+ up(&hw_priv->conf_lock);
|
||||
#ifdef WIFI_BT_COEXIST_EPTA_ENABLE
|
||||
bwifi_change_current_status(hw_priv, BWIFI_STATUS_SCANNING);
|
||||
#endif
|
||||
@@ -362,14 +365,18 @@ int bes2600_hw_sched_scan_start(struct ieee80211_hw *hw,
|
||||
if (req->n_ssids > hw->wiphy->max_scan_ssids)
|
||||
return -EINVAL;
|
||||
|
||||
+ /* will be unlocked in bes2600_scan_work() */
|
||||
+ down(&hw_priv->scan.lock);
|
||||
+ down(&hw_priv->conf_lock);
|
||||
+
|
||||
frame.skb = ieee80211_probereq_get(hw, priv->vif->addr, NULL, 0,
|
||||
req->ie_len);
|
||||
- if (!frame.skb)
|
||||
+ if (!frame.skb) {
|
||||
+ up(&hw_priv->conf_lock);
|
||||
+ up(&hw_priv->scan.lock);
|
||||
return -ENOMEM;
|
||||
+ }
|
||||
|
||||
- /* will be unlocked in bes2600_scan_work() */
|
||||
- down(&hw_priv->scan.lock);
|
||||
- down(&hw_priv->conf_lock);
|
||||
if (frame.skb) {
|
||||
int ret;
|
||||
if (priv->if_id == 0)
|
||||
@@ -380,9 +387,9 @@ int bes2600_hw_sched_scan_start(struct ieee80211_hw *hw,
|
||||
ret = wsm_set_probe_responder(priv, true);
|
||||
}
|
||||
if (ret) {
|
||||
+ dev_kfree_skb(frame.skb);
|
||||
up(&hw_priv->conf_lock);
|
||||
up(&hw_priv->scan.lock);
|
||||
- dev_kfree_skb(frame.skb);
|
||||
return ret;
|
||||
}
|
||||
}
|
||||
@@ -414,10 +421,10 @@ int bes2600_hw_sched_scan_start(struct ieee80211_hw *hw,
|
||||
}
|
||||
}
|
||||
|
||||
- up(&hw_priv->conf_lock);
|
||||
-
|
||||
if (frame.skb)
|
||||
dev_kfree_skb(frame.skb);
|
||||
+
|
||||
+ up(&hw_priv->conf_lock);
|
||||
queue_work(hw_priv->workqueue, &hw_priv->scan.swork);
|
||||
wiphy_warn(hw->wiphy, "<--[SCAN] Scheduled scan request.\n");
|
||||
return 0;
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+540
@@ -0,0 +1,540 @@
|
||||
From 73191b7bc1b607d0331b590c0c54c848c078a088 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Thu, 7 May 2026 22:34:11 +0200
|
||||
Subject: [PATCH 13/20] =?UTF-8?q?bes2600:=20drop=20sdio=5Frx=5Fwork=20rela?=
|
||||
=?UTF-8?q?y,=20IRQ=E2=86=92bh-direct=20(no-relay=20architecture)?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Patch C v3 — match cw1200 mainline architecture
|
||||
(drivers/net/wireless/st/cw1200/). Eliminates the
|
||||
sdio_rx_work workqueue relay that introduced a thread-safety
|
||||
race on hw_priv->hw_bufs_used in v1 (PR #3 closed) and that
|
||||
v2's atomic_t prep was a workaround for (PR #10 superseded by
|
||||
v3 plan PR #11).
|
||||
|
||||
Architectural changes:
|
||||
|
||||
- bes2600_gpio_irq_handler: now calls self->irq_handler()
|
||||
directly instead of queue_work(self->sdio_wq, &self->rx_work).
|
||||
Bumps bh_rx atomic + wakes bh_wq.
|
||||
- bes2600_bh_rx_helper (BES_SDIO_RX_MULTIPLE_ENABLE branch):
|
||||
now calls priv->sbus_ops->bus_rx_batch() to do the SDIO read
|
||||
inline. No pipe_read, no skb_dequeue.
|
||||
- bes2600_sdio_read_rx_batch (new): the SDIO read sequence
|
||||
extracted from sdio_rx_work, registered as
|
||||
sbus_ops->bus_rx_batch. Runs in bh thread context.
|
||||
- bes2600_sdio_extract_packets: calls
|
||||
bes2600_bh_handle_rx_skb() directly per parsed SKB. No
|
||||
skb_queue_tail, no rx_queue.
|
||||
- bes2600_bh_handle_rx_skb (new in bh.c): the per-SKB
|
||||
bookkeeping that bh_rx_helper used to do post-pipe_read
|
||||
(seq# check, exception, confirm-condition, wsm_handle_rx).
|
||||
Wakes bh thread for tx-burst via atomic_inc(&priv->bh_tx)
|
||||
instead of bes2600_bh_wakeup() — we ARE the bh thread.
|
||||
- Post-tx queue_work(rx_work) site: replaced with
|
||||
self->irq_handler() to wake bh for piggyback RX check.
|
||||
|
||||
Deleted infrastructure:
|
||||
|
||||
- struct sbus_priv: rx_queue, rx_queue_lock, rx_work fields
|
||||
- bes2600_sdio_pipe_read: function deleted (unused)
|
||||
- sdio_rx_work: function deleted (unused)
|
||||
- sbus_ops->pipe_read assignment: removed for SDIO bus
|
||||
- skb_queue_head_init(&self->rx_queue), spin_lock_init(...),
|
||||
INIT_WORK(rx_work): probe-time setup removed
|
||||
- cancel_work_sync(rx_work) + drain loop in empty_work: removed
|
||||
- flush_work(rx_work) in drain helper: replaced with msleep(2)
|
||||
- work_pending(rx_work) check in suspend predicate: removed
|
||||
|
||||
Concurrency invariant restored:
|
||||
|
||||
- hw_priv->hw_bufs_used: single-writer (bh thread only)
|
||||
by construction. No atomic_t needed.
|
||||
- hw_priv->hw_bufs_used_vif[]: ditto.
|
||||
- hw_priv->wsm_tx_pending[]: ditto.
|
||||
- All other shared state: unchanged or already protected.
|
||||
|
||||
Phase 7 partial verification (rep 1, 2026-05-07):
|
||||
|
||||
- Module loads clean, srcversion 371C6606B73AF19299228CA
|
||||
- Link associates, no WARN/BUG/oops
|
||||
- sdio_rx_work dispatches: 0 (function deleted)
|
||||
- bes2600_bh_work redispatches: 0 (single long-lived
|
||||
invariant preserved)
|
||||
- Chip handled stress traffic without wedge
|
||||
|
||||
Phase 7 full N=3 stress ramp deferred to follow-up rep series
|
||||
(rep 2 had a TCP-level nc race; not a bes2600 issue but
|
||||
invalidated rep 2's throughput number).
|
||||
---
|
||||
bes2600/bes2600_sdio.c | 144 ++++++++++++++++++++++++-----------------
|
||||
bes2600/bh.c | 129 ++++++++++++++++++++++++++++++++++--
|
||||
bes2600/bh.h | 9 +++
|
||||
bes2600/sbus.h | 8 +++
|
||||
4 files changed, 226 insertions(+), 64 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
index c0b67b0..ba1e1c3 100644
|
||||
--- a/drivers/staging/bes2600/bes2600_sdio.c
|
||||
+++ b/drivers/staging/bes2600/bes2600_sdio.c
|
||||
@@ -29,6 +29,7 @@
|
||||
#include <linux/of_gpio.h>
|
||||
|
||||
#include "bes2600.h"
|
||||
+#include "bh.h"
|
||||
#include "sbus.h"
|
||||
#include "bes2600_plat.h"
|
||||
#include "hwio.h"
|
||||
@@ -71,10 +72,12 @@ struct sbus_priv {
|
||||
int rx_data_toggle;
|
||||
#endif
|
||||
#ifdef BES_SDIO_RX_MULTIPLE_ENABLE
|
||||
- spinlock_t rx_queue_lock;
|
||||
- struct sk_buff_head rx_queue;
|
||||
+ /*
|
||||
+ * Patch C v3: rx_queue, rx_queue_lock, rx_work removed (no relay).
|
||||
+ * The bh thread now reads RX inline; the rx_buffer scratch area
|
||||
+ * stays. Counters/timestamps stay for debugfs visibility.
|
||||
+ */
|
||||
u8 *rx_buffer;
|
||||
- struct work_struct rx_work;
|
||||
u32 rx_last_ctrl;
|
||||
u32 rx_valid_ctrl;
|
||||
u32 rx_total_ctrl_cnt;
|
||||
@@ -410,10 +413,19 @@ static void bes2600_sdio_irq_handler(struct sdio_func *func)
|
||||
|
||||
bes_devel("%s called, fw_started:%d \n",
|
||||
__func__, self->fw_started);
|
||||
- if (likely(self->fw_started && self->core)) {
|
||||
- queue_work(self->sdio_wq, &self->rx_work);
|
||||
+ /*
|
||||
+ * Patch C v3: no more sdio_rx_work relay. Wake the bh thread
|
||||
+ * directly via self->irq_handler (bes2600_irq_handler in bh.c
|
||||
+ * which bumps bh_rx atomic + wakes bh_wq). The bh thread will
|
||||
+ * then call sbus_ops->bus_rx_batch() to do the SDIO read inline.
|
||||
+ * Matches cw1200 mainline IRQ → bh-direct architecture.
|
||||
+ */
|
||||
+ if (likely(self->fw_started && self->core && self->irq_handler)) {
|
||||
+ spin_lock_irqsave(&self->lock, flags);
|
||||
+ self->irq_handler(self->irq_priv);
|
||||
+ spin_unlock_irqrestore(&self->lock, flags);
|
||||
self->last_irq_timestamp = jiffies;
|
||||
- } else if(self->irq_handler) {
|
||||
+ } else if (self->irq_handler) {
|
||||
spin_lock_irqsave(&self->lock, flags);
|
||||
self->irq_handler(self->irq_priv);
|
||||
spin_unlock_irqrestore(&self->lock, flags);
|
||||
@@ -810,10 +822,15 @@ static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8
|
||||
skb_put(skb, packet_len);
|
||||
memcpy(skb->data, &data[pos], packet_len);
|
||||
bes_devel("%s, %d,%d\n", __func__, packet_len, pos);
|
||||
- spin_lock(&self->rx_queue_lock);
|
||||
- skb_queue_tail(&self->rx_queue, skb);
|
||||
self->rx_data_cnt++;
|
||||
- spin_unlock(&self->rx_queue_lock);
|
||||
+ /*
|
||||
+ * Patch C v3: deliver the SKB directly into the WSM/mac80211
|
||||
+ * stack from the bh thread. No rx_queue, no inter-thread
|
||||
+ * handoff, no atomic_t needed on the counters that
|
||||
+ * wsm_release_tx_buffer touches — single-writer-from-bh is
|
||||
+ * preserved by construction. See bh.c for the contract block.
|
||||
+ */
|
||||
+ bes2600_bh_handle_rx_skb(self->core, skb);
|
||||
packet_len = (packet_len + 3) & (~0x3);
|
||||
pos += packet_len;
|
||||
#ifdef BES_SDIO_OPTIMIZED_LEN
|
||||
@@ -824,17 +841,31 @@ static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8
|
||||
return 0;
|
||||
}
|
||||
|
||||
-static void sdio_rx_work(struct work_struct *work)
|
||||
+/*
|
||||
+ * Patch C v3: bh thread calls this directly via sbus_ops->bus_rx_batch.
|
||||
+ * No more sdio_rx_work workqueue. SDIO read sequence (lock →
|
||||
+ * read_ctrl → memcpy_fromio → packets_check → extract_packets) runs
|
||||
+ * inline in bh-thread context. Each parsed SKB is delivered via
|
||||
+ * bes2600_bh_handle_rx_skb() from extract_packets — no rx_queue, no
|
||||
+ * second worker, no inter-thread handoff.
|
||||
+ *
|
||||
+ * Architecture matches cw1200 mainline. Single-writer-from-bh
|
||||
+ * invariant on hw_bufs_used preserved by construction.
|
||||
+ *
|
||||
+ * Returns 0 on success (caller's bh outer loop decides whether to
|
||||
+ * continue), negative on bus read error. On error: triggers
|
||||
+ * wifi_force_close (same as the old sdio_rx_work).
|
||||
+ */
|
||||
+static int bes2600_sdio_read_rx_batch(struct sbus_priv *self)
|
||||
{
|
||||
- int ret, again = 0, retry = 0, crc_retry = 0;
|
||||
+ int ret = 0, again = 0, retry = 0, crc_retry = 0;
|
||||
u32 ctrl_reg = 0;
|
||||
int total_len;
|
||||
- struct sbus_priv *self = container_of(work, struct sbus_priv, rx_work);
|
||||
u8 *buf = self->rx_buffer;
|
||||
|
||||
/* don't read/write sdio when sdio error */
|
||||
if (bes2600_chrdev_is_bus_error())
|
||||
- return;
|
||||
+ return 0;
|
||||
|
||||
bes2600_gpio_wakeup_mcu(self, GPIO_WAKE_FLAG_SDIO_RX);
|
||||
|
||||
@@ -889,6 +920,10 @@ static void sdio_rx_work(struct work_struct *work)
|
||||
goto failed;
|
||||
}
|
||||
|
||||
+ /*
|
||||
+ * extract_packets parses the multi-RX buffer and calls
|
||||
+ * bes2600_bh_handle_rx_skb() per SKB. No queueing.
|
||||
+ */
|
||||
if ((ret = bes2600_sdio_extract_packets(self, ctrl_reg, buf))) {
|
||||
bes_err("%s,%d error=%d\n", __func__, __LINE__, ret);
|
||||
goto failed;
|
||||
@@ -896,22 +931,16 @@ static void sdio_rx_work(struct work_struct *work)
|
||||
|
||||
ctrl_reg = 0;
|
||||
|
||||
- if (likely(self->irq_handler)) {
|
||||
- self->irq_handler(self->irq_priv);
|
||||
- } else {
|
||||
- bes_err("%s,%d\n", __func__, __LINE__);
|
||||
- goto failed;
|
||||
- }
|
||||
-
|
||||
} while (again);
|
||||
|
||||
bes2600_gpio_allow_mcu_sleep(self, GPIO_WAKE_FLAG_SDIO_RX);
|
||||
- return;
|
||||
+ return 0;
|
||||
|
||||
failed:
|
||||
bes2600_gpio_allow_mcu_sleep(self, GPIO_WAKE_FLAG_SDIO_RX);
|
||||
bes2600_chrdev_wifi_force_close(self->core, false);
|
||||
WARN_ON(1);
|
||||
+ return -1;
|
||||
}
|
||||
|
||||
static void sdio_scan_work(struct work_struct *work)
|
||||
@@ -919,26 +948,11 @@ static void sdio_scan_work(struct work_struct *work)
|
||||
bes_warn("%s: this function does nothing\n", __FUNCTION__);
|
||||
}
|
||||
|
||||
-static void *bes2600_sdio_pipe_read(struct sbus_priv *self)
|
||||
-{
|
||||
- struct sk_buff *skb;
|
||||
-
|
||||
- if (bes2600_chrdev_is_bus_error()) {
|
||||
- return bes2600_tx_loop_read(self->core);
|
||||
- }
|
||||
-
|
||||
- spin_lock(&self->rx_queue_lock);
|
||||
- skb = skb_dequeue(&self->rx_queue);
|
||||
- if (skb)
|
||||
- self->rx_proc_cnt++;
|
||||
- spin_unlock(&self->rx_queue_lock);
|
||||
- if (likely(self->fw_started == true &&
|
||||
- !bes2600_pwr_device_is_idle(self->core) &&
|
||||
- self->core->hw_bufs_used > 0))
|
||||
- if (!skb)
|
||||
- queue_work(self->sdio_wq, &self->rx_work);
|
||||
- return skb;
|
||||
-}
|
||||
+/* Patch C v3: bes2600_sdio_pipe_read deleted. bh thread reads the
|
||||
+ * SDIO bus inline via bes2600_sdio_read_rx_batch (sbus_ops->bus_rx_batch).
|
||||
+ * No rx_queue, no skb_dequeue, no relay. bes2600_tx_loop_read remains
|
||||
+ * for the test bus error-fallback path but is now invoked at higher
|
||||
+ * level. */
|
||||
|
||||
#endif
|
||||
|
||||
@@ -1175,7 +1189,14 @@ flush_previous:
|
||||
}
|
||||
} while (crc_retry <= 10);
|
||||
sdio_release_host(self->func);
|
||||
- queue_work(self->sdio_wq, &self->rx_work);
|
||||
+ /*
|
||||
+ * Patch C v3: wake the bh thread to check for any RX
|
||||
+ * that piggybacked on this TX window. Bumps bh_rx
|
||||
+ * atomic; bh's wait_event will pick it up and call
|
||||
+ * sbus_ops->bus_rx_batch().
|
||||
+ */
|
||||
+ if (likely(self->irq_handler))
|
||||
+ self->irq_handler(self->irq_priv);
|
||||
if (ret) {
|
||||
bes_err("%s,%d err=%d,%d,%d\n", __func__, __LINE__, ret, scatters, cur_blk);
|
||||
sdio_work_debug(self);
|
||||
@@ -1226,12 +1247,11 @@ static int bes2600_sdio_misc_init(struct sbus_priv *self, struct bes2600_common
|
||||
self->next_toggle = 0;
|
||||
#endif
|
||||
#ifdef BES_SDIO_RX_MULTIPLE_ENABLE
|
||||
- spin_lock_init(&self->rx_queue_lock);
|
||||
- skb_queue_head_init(&self->rx_queue);
|
||||
+ /* Patch C v3: rx_queue / rx_queue_lock removed (no relay). */
|
||||
self->rx_buffer = (u8 *)__get_dma_pages(GFP_KERNEL, get_order(1632 * BES_SDIO_RX_MULTIPLE_NUM));
|
||||
if (!self->rx_buffer)
|
||||
return -ENOMEM;
|
||||
- INIT_WORK(&self->rx_work, sdio_rx_work);
|
||||
+ /* Patch C v3: sdio_rx_work removed; bh thread does the read. */
|
||||
#endif
|
||||
#ifdef BES_SDIO_TX_MULTIPLE_ENABLE
|
||||
INIT_LIST_HEAD(&self->tx_bufferlist);
|
||||
@@ -1560,22 +1580,15 @@ err:
|
||||
|
||||
static void bes2600_sdio_empty_work(struct sbus_priv *self)
|
||||
{
|
||||
-#ifdef BES_SDIO_RX_MULTIPLE_ENABLE
|
||||
- struct sk_buff *skb;
|
||||
-#endif
|
||||
#ifdef BES_SDIO_TX_MULTIPLE_ENABLE
|
||||
struct bes_sdio_tx_list_t *tx_buffer, *temp;
|
||||
#endif
|
||||
|
||||
#ifdef BES_SDIO_RX_MULTIPLE_ENABLE
|
||||
- cancel_work_sync(&self->rx_work);
|
||||
- while (1) {
|
||||
- skb = skb_dequeue(&self->rx_queue);
|
||||
- if (skb)
|
||||
- dev_kfree_skb(skb);
|
||||
- else
|
||||
- break;
|
||||
- }
|
||||
+ /*
|
||||
+ * Patch C v3: rx_work and rx_queue removed. Counters still
|
||||
+ * reset for the next attach cycle.
|
||||
+ */
|
||||
self->rx_last_ctrl = 0;
|
||||
self->rx_total_ctrl_cnt = 0;
|
||||
self->rx_continuous_ctrl_cnt = 0;
|
||||
@@ -1843,7 +1856,8 @@ static struct sbus_ops bes2600_sdio_sbus_ops = {
|
||||
.sbus_reg_write = bes2600_sdio_reg_write,
|
||||
.init = bes2600_sdio_misc_init,
|
||||
#ifdef BES_SDIO_RX_MULTIPLE_ENABLE
|
||||
- .pipe_read = bes2600_sdio_pipe_read,
|
||||
+ /* Patch C v3: .pipe_read removed; bus_rx_batch replaces it. */
|
||||
+ .bus_rx_batch = bes2600_sdio_read_rx_batch,
|
||||
#endif
|
||||
#ifdef BES_SDIO_TX_MULTIPLE_ENABLE
|
||||
.pipe_send = bes2600_sdio_pipe_send,
|
||||
@@ -1863,9 +1877,15 @@ static void bes2600_sdio_en_lp_cb(struct bes2600_common *hw_priv)
|
||||
long unsigned int old_ts, new_ts;
|
||||
struct sbus_priv *self = hw_priv->sbus_priv;
|
||||
|
||||
+ /*
|
||||
+ * Patch C v3: rx_work removed. Wait for IRQ-timestamp activity
|
||||
+ * to settle by polling self->last_irq_timestamp via msleep
|
||||
+ * (best-effort). The caller already knows the bh thread will
|
||||
+ * process pending bh_rx during its next wait_event round.
|
||||
+ */
|
||||
do {
|
||||
old_ts = self->last_irq_timestamp;
|
||||
- flush_work(&self->rx_work);
|
||||
+ msleep(2);
|
||||
new_ts = self->last_irq_timestamp;
|
||||
} while(old_ts != new_ts);
|
||||
}
|
||||
@@ -2202,8 +2222,12 @@ static int bes2600_sdio_suspend_noirq(struct device *dev)
|
||||
if (func->num > 1)
|
||||
return 0;
|
||||
|
||||
- if(self->core &&
|
||||
- (work_pending(&self->rx_work) || atomic_read(&self->core->bh_rx))) {
|
||||
+ /*
|
||||
+ * Patch C v3: work_pending(&self->rx_work) check dropped (no
|
||||
+ * relay). bh_rx atomic alone tells us whether the bh thread
|
||||
+ * has un-processed RX events queued.
|
||||
+ */
|
||||
+ if (self->core && atomic_read(&self->core->bh_rx)) {
|
||||
bes_devel("%s: Suspend interrupted.\n", __func__);
|
||||
return -EAGAIN;
|
||||
}
|
||||
diff --git a/drivers/staging/bes2600/bh.c b/drivers/staging/bes2600/bh.c
|
||||
index fab3bf0..febcaf4 100644
|
||||
--- a/drivers/staging/bes2600/bh.c
|
||||
+++ b/drivers/staging/bes2600/bh.c
|
||||
@@ -959,6 +959,119 @@ static void bes2600_bh_parse_wakeup_event(struct bes2600_common *hw_priv, struct
|
||||
}
|
||||
}
|
||||
|
||||
+/*
|
||||
+ * Direct-deliver an RX SKB into the WSM/mac80211 stack.
|
||||
+ *
|
||||
+ * Patch C v3 (no-relay architecture, matches cw1200): the bh thread
|
||||
+ * calls bes2600_sdio_read_rx_batch which calls
|
||||
+ * bes2600_sdio_extract_packets which calls THIS function per parsed
|
||||
+ * SKB. No rx_queue, no sdio_rx_work, no inter-thread handoff.
|
||||
+ *
|
||||
+ * Single-writer-from-bh invariant on hw_priv->hw_bufs_used,
|
||||
+ * hw_priv->hw_bufs_used_vif[] and hw_priv->wsm_tx_pending[] is
|
||||
+ * preserved BY CONSTRUCTION — there is now only one writer (the bh
|
||||
+ * thread itself), same as cw1200's design. No atomic_t conversion
|
||||
+ * needed.
|
||||
+ *
|
||||
+ * Contract:
|
||||
+ * - process context, sleepable. wsm_handle_rx (wsm.c, EXPORT_SYMBOL)
|
||||
+ * acquires wsm_cmd.lock and may sleep on wait_event_timeout.
|
||||
+ * - caller holds no bes2600 spinlock. bes2600_sdio_unlock(self) is
|
||||
+ * called inside read_rx_batch before extract_packets is invoked.
|
||||
+ * - SKB ownership: function frees on every path (success + error).
|
||||
+ * - No need to wake the bh thread on TX-confirm — we ARE the bh
|
||||
+ * thread; tx_burst is signalled by returning *tx_out = 1 to the
|
||||
+ * caller (bh_rx_helper), which propagates it to bh's outer loop.
|
||||
+ */
|
||||
+int bes2600_bh_handle_rx_skb(struct bes2600_common *priv, struct sk_buff *skb)
|
||||
+{
|
||||
+ struct wsm_hdr *wsm;
|
||||
+ size_t wsm_len;
|
||||
+ u16 wsm_id;
|
||||
+ u8 wsm_seq;
|
||||
+ int tx = 0;
|
||||
+ u32 confirm_label = 0x0;
|
||||
+
|
||||
+ if (!skb)
|
||||
+ return 0;
|
||||
+
|
||||
+ wsm = (struct wsm_hdr *)skb->data;
|
||||
+ wsm_len = __le16_to_cpu(wsm->len);
|
||||
+ if (WARN_ON(wsm_len > skb->len)) {
|
||||
+ bes_err("wsm_len err %d %d\n", (int)wsm_len, (int)skb->len);
|
||||
+ dev_kfree_skb(skb);
|
||||
+ return -1;
|
||||
+ }
|
||||
+
|
||||
+ if (priv->wsm_enable_wsm_dumps)
|
||||
+ print_hex_dump(KERN_DEBUG, "<-- ", DUMP_PREFIX_NONE, 16, 1,
|
||||
+ skb->data, wsm_len, false);
|
||||
+
|
||||
+ wsm_id = __le16_to_cpu(wsm->id) & 0xFFF;
|
||||
+ wsm_seq = (__le16_to_cpu(wsm->id) >> 13) & 7;
|
||||
+ bes_devel("bes2600_bh_handle_rx_skb wsm_id:0x%04x seq:%d\n",
|
||||
+ wsm_id, wsm_seq);
|
||||
+
|
||||
+ skb_trim(skb, wsm_len);
|
||||
+
|
||||
+ if (wsm_id == 0x0800) {
|
||||
+ wsm_handle_exception(priv,
|
||||
+ &skb->data[sizeof(*wsm)],
|
||||
+ wsm_len - sizeof(*wsm));
|
||||
+ bes_err("wsm exception\n");
|
||||
+ dev_kfree_skb(skb);
|
||||
+ return -1;
|
||||
+ } else if ((wsm_seq != priv->wsm_rx_seq[WSM_TXRX_SEQ_IDX(wsm_id)])) {
|
||||
+ bes_err("seq error! %u. %u. 0x%x.", wsm_seq,
|
||||
+ priv->wsm_rx_seq[WSM_TXRX_SEQ_IDX(wsm_id)], wsm_id);
|
||||
+ dev_kfree_skb(skb);
|
||||
+ return -1;
|
||||
+ }
|
||||
+
|
||||
+ bes2600_bh_parse_wakeup_event(priv, skb);
|
||||
+
|
||||
+ priv->wsm_rx_seq[WSM_TXRX_SEQ_IDX(wsm_id)] = (wsm_seq + 1) & 7;
|
||||
+
|
||||
+ if (IS_DRIVER_TO_MCU_CMD(wsm_id))
|
||||
+ confirm_label = __le32_to_cpu(((struct wsm_mcu_hdr *)wsm)->handle_label);
|
||||
+
|
||||
+ if (WSM_CONFIRM_CONDITION(wsm_id, confirm_label)) {
|
||||
+ int rc = wsm_release_tx_buffer(priv, 1);
|
||||
+ bes2600_bh_dec_pending_count(priv, WSM_TXRX_SEQ_IDX(wsm->id));
|
||||
+
|
||||
+ if (rc < 0) {
|
||||
+ bes_err("wsm_release_tx_buffer failed: %d\n", rc);
|
||||
+ dev_kfree_skb(skb);
|
||||
+ return rc;
|
||||
+ } else if (rc > 0) {
|
||||
+ tx = 1;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ /* wsm_handle_rx takes care of SKB lifetime: zeroes *skb_p if consumed. */
|
||||
+ if (wsm_handle_rx(priv, wsm_id, wsm, &skb)) {
|
||||
+ bes_err("wsm_handle_rx failed (id=0x%04x)\n", wsm_id);
|
||||
+ if (skb)
|
||||
+ dev_kfree_skb(skb);
|
||||
+ return -1;
|
||||
+ }
|
||||
+
|
||||
+ if (skb)
|
||||
+ dev_kfree_skb(skb);
|
||||
+
|
||||
+ /*
|
||||
+ * Signal "tx side has new headroom" via atomic so the bh outer
|
||||
+ * loop's wait_event predicate notices on its next wait. No
|
||||
+ * cross-thread wake needed because we are the bh thread; the
|
||||
+ * outer loop will pick this up after read_rx_batch returns.
|
||||
+ */
|
||||
+ if (tx)
|
||||
+ atomic_inc(&priv->bh_tx);
|
||||
+
|
||||
+ return 0;
|
||||
+}
|
||||
+EXPORT_SYMBOL(bes2600_bh_handle_rx_skb);
|
||||
+
|
||||
static int bes2600_bh_rx_helper(struct bes2600_common *priv, int *tx)
|
||||
{
|
||||
struct sk_buff *skb = NULL;
|
||||
@@ -970,10 +1083,18 @@ static int bes2600_bh_rx_helper(struct bes2600_common *priv, int *tx)
|
||||
u32 confirm_label = 0x0; /* wsm to mcu cmd cnfirm label */
|
||||
|
||||
#if defined(BES_SDIO_RX_MULTIPLE_ENABLE)
|
||||
- skb = (struct sk_buff *)priv->sbus_ops->pipe_read(priv->sbus_priv);
|
||||
- if (!skb)
|
||||
- return 0;
|
||||
- rx = 1; // always consider rx pipe not empty
|
||||
+ /*
|
||||
+ * Patch C v3: the bh thread does the SDIO read inline via
|
||||
+ * sbus_ops->bus_rx_batch. bes2600_sdio_read_rx_batch reads the
|
||||
+ * multi-RX coalesced frames out of the chip and delivers each
|
||||
+ * one inline via bes2600_bh_handle_rx_skb (no rx_queue, no
|
||||
+ * pipe_read, no inter-thread handoff). Return value: 0 on
|
||||
+ * success (bh outer loop will check whether to continue),
|
||||
+ * negative on read error.
|
||||
+ */
|
||||
+ if (priv->sbus_ops->bus_rx_batch)
|
||||
+ return priv->sbus_ops->bus_rx_batch(priv->sbus_priv);
|
||||
+ return 0;
|
||||
#else
|
||||
u32 ctrl_reg = 0;
|
||||
size_t read_len = 0;
|
||||
diff --git a/drivers/staging/bes2600/bh.h b/drivers/staging/bes2600/bh.h
|
||||
index 7be82dc..9ed08b1 100644
|
||||
--- a/drivers/staging/bes2600/bh.h
|
||||
+++ b/drivers/staging/bes2600/bh.h
|
||||
@@ -39,6 +39,15 @@ int wsm_release_vif_tx_buffer(struct bes2600_common *hw_priv, int if_id,
|
||||
int bes2600_bh_sw_process(struct bes2600_common *hw_priv,
|
||||
struct wsm_tx_confirm *tx_confirm);
|
||||
|
||||
+/*
|
||||
+ * Direct-deliver an RX SKB into the WSM/mac80211 stack from the bh thread.
|
||||
+ * Called by bes2600_sdio_extract_packets per RX frame, no queueing.
|
||||
+ * Process context, sleepable, caller holds no bes2600 spinlock.
|
||||
+ * Function frees skb on every path. See bh.c for full contract.
|
||||
+ */
|
||||
+int bes2600_bh_handle_rx_skb(struct bes2600_common *hw_priv,
|
||||
+ struct sk_buff *skb);
|
||||
+
|
||||
void bes2600_bh_inc_pending_count(struct bes2600_common *hw_priv, int idx);
|
||||
void bes2600_bh_dec_pending_count(struct bes2600_common *hw_priv, int idx);
|
||||
|
||||
diff --git a/drivers/staging/bes2600/sbus.h b/drivers/staging/bes2600/sbus.h
|
||||
index cb90890..96b1d4c 100644
|
||||
--- a/drivers/staging/bes2600/sbus.h
|
||||
+++ b/drivers/staging/bes2600/sbus.h
|
||||
@@ -83,6 +83,14 @@ struct sbus_ops {
|
||||
* Returns 0 on success or a negative errno.
|
||||
*/
|
||||
int (*bus_reset)(struct sbus_priv *self);
|
||||
+ /*
|
||||
+ * Read a batch of RX frames inline from the bus and deliver each
|
||||
+ * one via bes2600_bh_handle_rx_skb(). Called from the bh thread
|
||||
+ * (process context, sleepable). Replaces the
|
||||
+ * sdio_rx_work + rx_queue + pipe_read relay (Patch C v3, 2026).
|
||||
+ * Returns 0 on success, negative on read error.
|
||||
+ */
|
||||
+ int (*bus_rx_batch)(struct sbus_priv *self);
|
||||
};
|
||||
|
||||
void bes2600_irq_handler(struct bes2600_common *priv);
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+1154
File diff suppressed because it is too large
Load Diff
+313
@@ -0,0 +1,313 @@
|
||||
From 93f2aab65682d0ea1938607e7426257e9758d6c0 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Fri, 8 May 2026 00:17:46 +0200
|
||||
Subject: [PATCH 15/20] =?UTF-8?q?bes2600:=20Patch=20D=20=E2=80=94=20atomic?=
|
||||
=?UTF-8?q?ize=20ba=5Flock=20counters,=20drop=20the=20spinlock?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
The block-ack policy uses 4 int counters (ba_acc, ba_cnt, ba_acc_rx,
|
||||
ba_cnt_rx) bumped per data frame in the TX and RX hot paths under
|
||||
spin_lock_bh(&hw_priv->ba_lock). The lock was the heaviest per-frame
|
||||
synchronization cost remaining after Patch C v3 (which fixed the
|
||||
sdio_rx_work relay). Per the Opus structural critique (PR #8), this
|
||||
pattern matches mac80211 driver convention for per-frame statistics:
|
||||
atomic_t suffices, no lock needed.
|
||||
|
||||
Field-by-field changes in struct bes2600_common:
|
||||
ba_acc, ba_cnt, ba_acc_rx, ba_cnt_rx: int -> atomic_t
|
||||
ba_armed: new atomic_t (timer-arm flag)
|
||||
ba_ena: bool -> atomic_t
|
||||
ba_lock: removed (spinlock_t deleted)
|
||||
ba_hist: int (single-writer = ba_timer)
|
||||
|
||||
Producer hot path (txrx.c TX submit + RX receive):
|
||||
- atomic_add for the byte accumulator
|
||||
- atomic_inc for the frame counter
|
||||
- atomic_cmpxchg(&ba_armed, 0, 1) to claim the once-per-window
|
||||
mod_timer arm — at most ONE producer succeeds; race-free
|
||||
- no spin_lock_bh
|
||||
|
||||
Consumer paths (sta.c bes2600_ba_timer, sta.c disconnect-reset, sta.c
|
||||
bes2600_ba_work, debug.c debugfs reader):
|
||||
- atomic_read snapshots all 4 counters into locals; the threshold
|
||||
predicate (acc/cnt >= THLD) tolerates approximate snapshots — the
|
||||
timer fires periodically, a single misclassification just delays
|
||||
the policy update by one tick
|
||||
- atomic_set zeroes the counters at end of timer-callback window;
|
||||
racing producer increments after the snapshot are lost (acceptable
|
||||
for stats; same approximation the original lock allowed under
|
||||
contention)
|
||||
- atomic_set(&ba_armed, 0) re-enables the next window's arm
|
||||
|
||||
Followup-amenable simplification: ba_hist remains int because only
|
||||
the single ba_timer callback writes it; multiple writers would need
|
||||
to upgrade it too.
|
||||
|
||||
This patch follows the cw1200-mainline-idiom established by Patch C v3
|
||||
(structural fix, not bandaid). The cw1200 reference doesn't have a
|
||||
similar lock to compare; bes2600 inherited this from a later
|
||||
Bestechnic addition rather than the upstream tree.
|
||||
---
|
||||
bes2600/bes2600.h | 26 ++++++++++------
|
||||
bes2600/debug.c | 13 +++++---
|
||||
bes2600/main.c | 2 +-
|
||||
bes2600/sta.c | 77 ++++++++++++++++++++++++++++-------------------
|
||||
bes2600/txrx.c | 23 ++++++++------
|
||||
5 files changed, 85 insertions(+), 56 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes2600.h b/drivers/staging/bes2600/bes2600.h
|
||||
index 84059c7..32bce5e 100644
|
||||
--- a/drivers/staging/bes2600/bes2600.h
|
||||
+++ b/drivers/staging/bes2600/bes2600.h
|
||||
@@ -353,15 +353,23 @@ struct bes2600_common {
|
||||
* Keeping in common structure for the time being. Will be moved to VIFF
|
||||
* after the mechanism is clear */
|
||||
u8 ba_tid_mask;
|
||||
- int ba_acc; /*TODO: Same as above */
|
||||
- int ba_cnt; /*TODO: Same as above */
|
||||
- int ba_cnt_rx; /*TODO: Same as above */
|
||||
- int ba_acc_rx; /*TODO: Same as above */
|
||||
- int ba_hist; /*TODO: Same as above */
|
||||
- struct timer_list ba_timer;/*TODO: Same as above */
|
||||
- spinlock_t ba_lock; /*TODO: Same as above */
|
||||
- bool ba_ena; /*TODO: Same as above */
|
||||
- struct work_struct ba_work; /*TODO: Same as above */
|
||||
+ /*
|
||||
+ * Patch D: ba_lock removed. Per-frame TX/RX hot-path bumped these
|
||||
+ * counters under spin_lock_bh; the lock did not protect any
|
||||
+ * compound invariant that atomic ops can't satisfy. Counters are
|
||||
+ * now atomic_t; ba_armed gates the once-per-window mod_timer
|
||||
+ * arm via cmpxchg so concurrent TX/RX at a fresh window each
|
||||
+ * try to claim the arm and exactly one succeeds.
|
||||
+ */
|
||||
+ atomic_t ba_acc;
|
||||
+ atomic_t ba_cnt;
|
||||
+ atomic_t ba_cnt_rx;
|
||||
+ atomic_t ba_acc_rx;
|
||||
+ atomic_t ba_armed;
|
||||
+ int ba_hist;
|
||||
+ struct timer_list ba_timer;
|
||||
+ atomic_t ba_ena;
|
||||
+ struct work_struct ba_work;
|
||||
bool is_BT_Present;
|
||||
bool is_go_thru_go_neg;
|
||||
u8 conf_listen_interval;
|
||||
diff --git a/drivers/staging/bes2600/debug.c b/drivers/staging/bes2600/debug.c
|
||||
index 47e27be..0ab79c0 100644
|
||||
--- a/drivers/staging/bes2600/debug.c
|
||||
+++ b/drivers/staging/bes2600/debug.c
|
||||
@@ -110,17 +110,20 @@ static int bes2600_status_show_common(struct seq_file *seq, void *v)
|
||||
int ba_cnt, ba_acc, ba_cnt_rx, ba_acc_rx, ba_avg = 0, ba_avg_rx = 0;
|
||||
bool ba_ena;
|
||||
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
- ba_cnt = hw_priv->debug->ba_cnt;
|
||||
- ba_acc = hw_priv->debug->ba_acc;
|
||||
+ /*
|
||||
+ * Patch D: ba_lock removed. hw_priv->debug->ba_* are written only
|
||||
+ * by the timer callback (single writer); reading without a lock is
|
||||
+ * fine for stats. ba_ena is atomic_t.
|
||||
+ */
|
||||
+ ba_cnt = hw_priv->debug->ba_cnt;
|
||||
+ ba_acc = hw_priv->debug->ba_acc;
|
||||
ba_cnt_rx = hw_priv->debug->ba_cnt_rx;
|
||||
ba_acc_rx = hw_priv->debug->ba_acc_rx;
|
||||
- ba_ena = hw_priv->ba_ena;
|
||||
+ ba_ena = !!atomic_read(&hw_priv->ba_ena);
|
||||
if (ba_cnt)
|
||||
ba_avg = ba_acc / ba_cnt;
|
||||
if (ba_cnt_rx)
|
||||
ba_avg_rx = ba_acc_rx / ba_cnt_rx;
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
|
||||
seq_puts(seq, "BES2600 Wireless LAN driver status\n");
|
||||
seq_printf(seq, "Hardware: %d.%d\n",
|
||||
diff --git a/drivers/staging/bes2600/main.c b/drivers/staging/bes2600/main.c
|
||||
index 02a79c0..76ca668 100644
|
||||
--- a/drivers/staging/bes2600/main.c
|
||||
+++ b/drivers/staging/bes2600/main.c
|
||||
@@ -501,7 +501,7 @@ static struct ieee80211_hw *bes2600_init_common(size_t hw_priv_data_len)
|
||||
INIT_LIST_HEAD(&hw_priv->event_queue);
|
||||
INIT_WORK(&hw_priv->event_handler, bes2600_event_handler);
|
||||
INIT_WORK(&hw_priv->ba_work, bes2600_ba_work);
|
||||
- spin_lock_init(&hw_priv->ba_lock);
|
||||
+ /* Patch D: ba_lock removed; ba_acc/ba_cnt/etc are atomic_t. */
|
||||
timer_setup(&hw_priv->ba_timer, bes2600_ba_timer, 0);
|
||||
|
||||
if (unlikely(bes2600_queue_stats_init(&hw_priv->tx_queue_stats,
|
||||
diff --git a/drivers/staging/bes2600/sta.c b/drivers/staging/bes2600/sta.c
|
||||
index 2ba9a0a..412b2c4 100644
|
||||
--- a/drivers/staging/bes2600/sta.c
|
||||
+++ b/drivers/staging/bes2600/sta.c
|
||||
@@ -2362,14 +2362,19 @@ void bes2600_join_work(struct work_struct *work)
|
||||
//WARN_ON(wsm_reset(hw_priv, &reset, priv->if_id));
|
||||
WARN_ON(wsm_set_block_ack_policy(hw_priv,
|
||||
0, hw_priv->ba_tid_mask, priv->if_id));
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
- hw_priv->ba_ena = false;
|
||||
- hw_priv->ba_cnt = 0;
|
||||
- hw_priv->ba_acc = 0;
|
||||
+ /*
|
||||
+ * Patch D: ba_lock removed. Disconnect-reset clears the
|
||||
+ * counters and the arm flag; producers racing here cannot
|
||||
+ * cause harm — at worst they re-arm the timer and bump
|
||||
+ * counters that will be cleared on the next timer tick.
|
||||
+ */
|
||||
+ atomic_set(&hw_priv->ba_ena, 0);
|
||||
+ atomic_set(&hw_priv->ba_cnt, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc, 0);
|
||||
hw_priv->ba_hist = 0;
|
||||
- hw_priv->ba_cnt_rx = 0;
|
||||
- hw_priv->ba_acc_rx = 0;
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
+ atomic_set(&hw_priv->ba_cnt_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_armed, 0);
|
||||
|
||||
mgmt_policy.protectedMgmtEnable = 0;
|
||||
mgmt_policy.unprotectedMgmtFramesAllowed = 1;
|
||||
@@ -2649,10 +2654,11 @@ void bes2600_ba_work(struct work_struct *work)
|
||||
return;*/
|
||||
|
||||
bes_devel("BA work****\n");
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
-// tx_ba_tid_mask = hw_priv->ba_ena ? hw_priv->ba_tid_mask : 0;
|
||||
+ /*
|
||||
+ * Patch D: ba_lock removed. ba_tid_mask is u8 set once at init
|
||||
+ * (main.c); reading it without a lock is fine.
|
||||
+ */
|
||||
tx_ba_tid_mask = hw_priv->ba_tid_mask;
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
|
||||
wsm_lock_tx(hw_priv);
|
||||
|
||||
@@ -2665,37 +2671,49 @@ void bes2600_ba_work(struct work_struct *work)
|
||||
void bes2600_ba_timer(struct timer_list *t)
|
||||
{
|
||||
bool ba_ena;
|
||||
+ int cnt, acc, cnt_rx, acc_rx;
|
||||
struct bes2600_common *hw_priv = timer_container_of(hw_priv, t, ba_timer);
|
||||
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
- bes2600_debug_ba(hw_priv, hw_priv->ba_cnt, hw_priv->ba_acc,
|
||||
- hw_priv->ba_cnt_rx, hw_priv->ba_acc_rx);
|
||||
+ /*
|
||||
+ * Patch D: ba_lock removed. Snapshot atomic counters into locals
|
||||
+ * for the predicate evaluation; producers may race incrementing
|
||||
+ * after the snapshot but the resulting decision is approximate
|
||||
+ * which the policy already tolerates (next timer tick re-evaluates).
|
||||
+ */
|
||||
+ cnt = atomic_read(&hw_priv->ba_cnt);
|
||||
+ acc = atomic_read(&hw_priv->ba_acc);
|
||||
+ cnt_rx = atomic_read(&hw_priv->ba_cnt_rx);
|
||||
+ acc_rx = atomic_read(&hw_priv->ba_acc_rx);
|
||||
+
|
||||
+ bes2600_debug_ba(hw_priv, cnt, acc, cnt_rx, acc_rx);
|
||||
|
||||
if (atomic_read(&hw_priv->scan.in_progress)) {
|
||||
- hw_priv->ba_cnt = 0;
|
||||
- hw_priv->ba_acc = 0;
|
||||
- hw_priv->ba_cnt_rx = 0;
|
||||
- hw_priv->ba_acc_rx = 0;
|
||||
- goto skip_statistic_update;
|
||||
+ atomic_set(&hw_priv->ba_cnt, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc, 0);
|
||||
+ atomic_set(&hw_priv->ba_cnt_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_armed, 0);
|
||||
+ return;
|
||||
}
|
||||
|
||||
- if (hw_priv->ba_cnt >= BES2600_BLOCK_ACK_CNT &&
|
||||
- (hw_priv->ba_acc / hw_priv->ba_cnt >= BES2600_BLOCK_ACK_THLD ||
|
||||
- (hw_priv->ba_cnt_rx >= BES2600_BLOCK_ACK_CNT &&
|
||||
- hw_priv->ba_acc_rx / hw_priv->ba_cnt_rx >=
|
||||
+ if (cnt >= BES2600_BLOCK_ACK_CNT &&
|
||||
+ (acc / cnt >= BES2600_BLOCK_ACK_THLD ||
|
||||
+ (cnt_rx >= BES2600_BLOCK_ACK_CNT &&
|
||||
+ acc_rx / cnt_rx >=
|
||||
BES2600_BLOCK_ACK_THLD)))
|
||||
ba_ena = true;
|
||||
else
|
||||
ba_ena = false;
|
||||
|
||||
- hw_priv->ba_cnt = 0;
|
||||
- hw_priv->ba_acc = 0;
|
||||
- hw_priv->ba_cnt_rx = 0;
|
||||
- hw_priv->ba_acc_rx = 0;
|
||||
+ atomic_set(&hw_priv->ba_cnt, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc, 0);
|
||||
+ atomic_set(&hw_priv->ba_cnt_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_acc_rx, 0);
|
||||
+ atomic_set(&hw_priv->ba_armed, 0);
|
||||
|
||||
- if (ba_ena != hw_priv->ba_ena) {
|
||||
+ if (ba_ena != !!atomic_read(&hw_priv->ba_ena)) {
|
||||
if (ba_ena || ++hw_priv->ba_hist >= BES2600_BLOCK_ACK_HIST) {
|
||||
- hw_priv->ba_ena = ba_ena;
|
||||
+ atomic_set(&hw_priv->ba_ena, ba_ena ? 1 : 0);
|
||||
hw_priv->ba_hist = 0;
|
||||
#if 0
|
||||
bes_devel("[STA] %s block ACK:\n",
|
||||
@@ -2705,9 +2723,6 @@ void bes2600_ba_timer(struct timer_list *t)
|
||||
}
|
||||
} else if (hw_priv->ba_hist)
|
||||
--hw_priv->ba_hist;
|
||||
-
|
||||
-skip_statistic_update:
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
}
|
||||
|
||||
int bes2600_vif_setup(struct bes2600_vif *priv)
|
||||
diff --git a/drivers/staging/bes2600/txrx.c b/drivers/staging/bes2600/txrx.c
|
||||
index 3aef009..536b198 100644
|
||||
--- a/drivers/staging/bes2600/txrx.c
|
||||
+++ b/drivers/staging/bes2600/txrx.c
|
||||
@@ -996,14 +996,18 @@ bes2600_tx_h_ba_stat(struct bes2600_vif *priv,
|
||||
if (!ieee80211_is_data(t->hdr->frame_control))
|
||||
return;
|
||||
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
- hw_priv->ba_acc += t->skb->len - t->hdrlen;
|
||||
- if (!(hw_priv->ba_cnt_rx || hw_priv->ba_cnt)) {
|
||||
+ /*
|
||||
+ * Patch D: lock-free hot-path BA accounting. atomic_inc + atomic_add
|
||||
+ * each per-frame; the once-per-window timer-arm uses cmpxchg on
|
||||
+ * ba_armed so concurrent TX/RX can't both try to set the timer and
|
||||
+ * we don't need cross-counter coherency on the ba_cnt/ba_cnt_rx pair.
|
||||
+ */
|
||||
+ atomic_add(t->skb->len - t->hdrlen, &hw_priv->ba_acc);
|
||||
+ atomic_inc(&hw_priv->ba_cnt);
|
||||
+ if (atomic_cmpxchg(&hw_priv->ba_armed, 0, 1) == 0) {
|
||||
mod_timer(&hw_priv->ba_timer,
|
||||
jiffies + BES2600_BLOCK_ACK_INTERVAL);
|
||||
}
|
||||
- hw_priv->ba_cnt++;
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
}
|
||||
|
||||
static int
|
||||
@@ -1651,14 +1655,13 @@ bes2600_rx_h_ba_stat(struct bes2600_vif *priv,
|
||||
if (!priv->setbssparams_done)
|
||||
return;
|
||||
|
||||
- spin_lock_bh(&hw_priv->ba_lock);
|
||||
- hw_priv->ba_acc_rx += skb_len - hdrlen;
|
||||
- if (!(hw_priv->ba_cnt_rx || hw_priv->ba_cnt)) {
|
||||
+ /* Patch D: lock-free hot-path BA accounting; see TX side comment. */
|
||||
+ atomic_add(skb_len - hdrlen, &hw_priv->ba_acc_rx);
|
||||
+ atomic_inc(&hw_priv->ba_cnt_rx);
|
||||
+ if (atomic_cmpxchg(&hw_priv->ba_armed, 0, 1) == 0) {
|
||||
mod_timer(&hw_priv->ba_timer,
|
||||
jiffies + BES2600_BLOCK_ACK_INTERVAL);
|
||||
}
|
||||
- hw_priv->ba_cnt_rx++;
|
||||
- spin_unlock_bh(&hw_priv->ba_lock);
|
||||
}
|
||||
|
||||
void bes2600_rx_cb(struct bes2600_vif *priv,
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+83
@@ -0,0 +1,83 @@
|
||||
From dd01be0162846b61c6695887ce9e421b69e099d4 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Fri, 8 May 2026 00:22:14 +0200
|
||||
Subject: [PATCH 16/20] =?UTF-8?q?bes2600:=20Patch=20E=20=E2=80=94=20skip?=
|
||||
=?UTF-8?q?=20ps=5Fstate=5Flock=20when=20PSM-known-disabled?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Per the Opus structural critique (PR #8 §2.4) and Sonnet review item 5.
|
||||
The per-RX-frame early-data path takes ps_state_lock to double-check
|
||||
whether a link entry transitioned to BES2600_LINK_SOFT (AP-side
|
||||
power-save state machine, soft-link transition).
|
||||
|
||||
When c7 has latched pm_unsupported = true (firmware does not honor
|
||||
PSM, see feedback_bes2600_firmware_no_psm memory), the AP power-save
|
||||
state machine is dead and link entries never transition to LINK_SOFT.
|
||||
The per-frame spin_lock_bh + double-check is wasted work.
|
||||
|
||||
This patch gates the lock acquisition on !pm_unsupported. When the
|
||||
latch is on (the steady state on the production-shipped bes2600
|
||||
firmware), early_data RX frames bypass the spin_lock_bh and go
|
||||
directly to ieee80211_rx_irqsafe.
|
||||
|
||||
If a future firmware drop fixes PSM, c7 self-clears pm_unsupported on
|
||||
the first real PM_INDICATION and the locked path resumes.
|
||||
|
||||
Scope is narrower than Sonnet originally framed: only the per-RX-frame
|
||||
hot path (txrx.c:1945-1951 in cleanups+G+D) is touched. Other
|
||||
ps_state_lock sites in txrx.c (lines 657, 1256, 1420, 1528) are TX
|
||||
submission / multicast-start / link-id paths, not per-frame RX, and
|
||||
not on the Bug #5 hot path. Leave those alone.
|
||||
|
||||
Build verified: srcversion B5922B4933590F33207EE97 on ohm sandbox.
|
||||
---
|
||||
bes2600/txrx.c | 30 ++++++++++++++++++++++++------
|
||||
1 file changed, 24 insertions(+), 6 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/txrx.c b/drivers/staging/bes2600/txrx.c
|
||||
index 536b198..cb718ad 100644
|
||||
--- a/drivers/staging/bes2600/txrx.c
|
||||
+++ b/drivers/staging/bes2600/txrx.c
|
||||
@@ -1965,13 +1965,31 @@ void bes2600_rx_cb(struct bes2600_vif *priv,
|
||||
if (unlikely(bes2600_itp_rxed(hw_priv, skb)))
|
||||
consume_skb(skb);
|
||||
else if (unlikely(early_data)) {
|
||||
- spin_lock_bh(&priv->ps_state_lock);
|
||||
- /* Double-check status with lock held */
|
||||
- if (entry->status == BES2600_LINK_SOFT)
|
||||
- skb_queue_tail(&entry->rx_queue, skb);
|
||||
- else
|
||||
+ /*
|
||||
+ * Patch E: when c7 has latched pm_unsupported (firmware
|
||||
+ * doesn't honour PSM, see feedback_bes2600_firmware_no_psm),
|
||||
+ * AP-side power-save state machine is dead and link entries
|
||||
+ * never transition to BES2600_LINK_SOFT. The double-check
|
||||
+ * branch under ps_state_lock is unreachable in that case,
|
||||
+ * so skip the per-frame lock acquisition entirely and
|
||||
+ * deliver to mac80211 directly.
|
||||
+ *
|
||||
+ * On firmware that does honour PSM (the latch self-clears
|
||||
+ * if a real PM_INDICATION ever arrives — see c7), this
|
||||
+ * predicate flips back to false and the original locked
|
||||
+ * path is taken.
|
||||
+ */
|
||||
+ if (hw_priv->bes_power.pm_unsupported) {
|
||||
ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
- spin_unlock_bh(&priv->ps_state_lock);
|
||||
+ } else {
|
||||
+ spin_lock_bh(&priv->ps_state_lock);
|
||||
+ /* Double-check status with lock held */
|
||||
+ if (entry->status == BES2600_LINK_SOFT)
|
||||
+ skb_queue_tail(&entry->rx_queue, skb);
|
||||
+ else
|
||||
+ ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ spin_unlock_bh(&priv->ps_state_lock);
|
||||
+ }
|
||||
} else {
|
||||
ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
}
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+157
@@ -0,0 +1,157 @@
|
||||
From 447240cbe8dee9d865683508f7d814e7ffe1d970 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Fri, 8 May 2026 06:40:00 +0200
|
||||
Subject: [PATCH 17/20] =?UTF-8?q?bes2600:=20Patch=20C2=20=E2=80=94=20repla?=
|
||||
=?UTF-8?q?ce=20ieee80211=5Frx=5Firqsafe=20with=20ieee80211=5Frx=5Fni?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Per Phase 4 plan PR #14 + kerneldoc audit (Task #19). Six call sites
|
||||
deferred per-RX-frame mac80211 dispatch via tasklet; replace with the
|
||||
synchronous-from-process-context API ieee80211_rx_ni() which does its
|
||||
own local_bh_disable wrap.
|
||||
|
||||
Why _ni and not _list:
|
||||
|
||||
Phase 4 plan originally targeted ieee80211_rx_list for batch
|
||||
delivery. Mining mt76 mainline (the only driver using _list)
|
||||
showed the canonical pattern requires threading a struct list_head
|
||||
through the per-frame call chain. bes2600s WSM dispatcher
|
||||
(wsm_handle_rx -> bes2600_rx_cb / wsm.c beacon path) sits between
|
||||
the bh threads SDIO read and the mac80211 hand-off; threading a
|
||||
list_head through the dispatcher is a non-trivial refactor.
|
||||
ieee80211_rx_ni() is the simpler drop-in: no list management, still
|
||||
removes the tasklet hop. Per-call local_bh_disable cost is trivial
|
||||
vs the saved tasklet schedule. Future refactor can revisit _list
|
||||
if measurements warrant.
|
||||
|
||||
Sites converted:
|
||||
|
||||
- ap.c:96 (bes2600_sta_add link-id rx_queue drain on AP-mode
|
||||
STA add). Was inside spin_lock_bh(&ps_state_lock);
|
||||
refactored to splice the queue under the lock then
|
||||
deliver after unlock — _ni runs the synchronous
|
||||
mac80211 RX path inline, would otherwise hold the
|
||||
lock across mac80211 dispatch. splice via
|
||||
skb_queue_splice_init into a local sk_buff_head.
|
||||
- sta.c:1487 (deauth-frame inject in inactivity-event handler).
|
||||
Not under any lock; direct conversion.
|
||||
- txrx.c:1960 (early-data + pm_unsupported branch from Patch E).
|
||||
- txrx.c:1967 (early-data + LINK_SOFT-not-set branch).
|
||||
- txrx.c:1971 (normal RX path in bes2600_rx_cb).
|
||||
- wsm.c:2415 (beacon delivery in scan-complete WSM handler).
|
||||
beacon SKB ownership is preserved by the existing
|
||||
skb_copy(beacon, GFP_ATOMIC) -> beacon_bkp pattern;
|
||||
no lifecycle change needed.
|
||||
|
||||
Mixing constraint (kerneldoc include/net/mac80211.h:5399-5430):
|
||||
ieee80211_rx_ni() cannot mix with ieee80211_rx_irqsafe() for a
|
||||
single hardware. All 6 sites convert atomically; no mixed state.
|
||||
|
||||
Build verified clean on ohm sandbox: srcversion 619A51E61BF5479AAC146E6.
|
||||
|
||||
Predicted Phase 7 delta: +5-15% over v3+D+E baseline (2.35 MB/s mean
|
||||
on v3 alone; D+E single-rep was 3.22 MB/s). Modest improvement
|
||||
expected from removing the tasklet schedule per RX frame. Smaller
|
||||
deltas would still be a net win for upstream-cleanliness — the
|
||||
kernel.org submission story benefits from not using _irqsafe from
|
||||
process context.
|
||||
---
|
||||
bes2600/ap.c | 15 +++++++++++++--
|
||||
bes2600/sta.c | 2 +-
|
||||
bes2600/txrx.c | 6 +++---
|
||||
bes2600/wsm.c | 2 +-
|
||||
4 files changed, 18 insertions(+), 7 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/ap.c b/drivers/staging/bes2600/ap.c
|
||||
index 8a17545..99e2da2 100644
|
||||
--- a/drivers/staging/bes2600/ap.c
|
||||
+++ b/drivers/staging/bes2600/ap.c
|
||||
@@ -63,8 +63,11 @@ int bes2600_sta_add(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
|
||||
struct bes2600_vif *priv = cw12xx_get_vif_from_ieee80211(vif);
|
||||
struct bes2600_link_entry *entry;
|
||||
struct sk_buff *skb;
|
||||
+ struct sk_buff_head local_drain;
|
||||
struct bes2600_common *hw_priv = hw->priv;
|
||||
|
||||
+ __skb_queue_head_init(&local_drain);
|
||||
+
|
||||
#ifdef P2P_MULTIVIF
|
||||
WARN_ON(priv->if_id == CW12XX_GENERIC_IF_ID);
|
||||
#endif
|
||||
@@ -93,9 +96,17 @@ int bes2600_sta_add(struct ieee80211_hw *hw, struct ieee80211_vif *vif,
|
||||
IEEE80211_WMM_IE_STA_QOSINFO_AC_MASK)
|
||||
priv->sta_asleep_mask |= BIT(sta_priv->link_id);
|
||||
entry->status = BES2600_LINK_HARD;
|
||||
- while ((skb = skb_dequeue(&entry->rx_queue)))
|
||||
- ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ /*
|
||||
+ * Patch C2: splice the rx_queue out under the lock then deliver
|
||||
+ * after unlock. ieee80211_rx_ni() runs the mac80211 RX path
|
||||
+ * synchronously (formerly ieee80211_rx_irqsafe deferred to a
|
||||
+ * tasklet); calling it from inside spin_lock_bh would hold the
|
||||
+ * lock across mac80211's full RX dispatch.
|
||||
+ */
|
||||
+ skb_queue_splice_init(&entry->rx_queue, &local_drain);
|
||||
spin_unlock_bh(&priv->ps_state_lock);
|
||||
+ while ((skb = __skb_dequeue(&local_drain)))
|
||||
+ ieee80211_rx_ni(priv->hw, skb);
|
||||
#ifdef AP_AGGREGATE_FW_FIX
|
||||
hw_priv->connected_sta_cnt++;
|
||||
if(hw_priv->connected_sta_cnt>1) {
|
||||
diff --git a/drivers/staging/bes2600/sta.c b/drivers/staging/bes2600/sta.c
|
||||
index 412b2c4..476d875 100644
|
||||
--- a/drivers/staging/bes2600/sta.c
|
||||
+++ b/drivers/staging/bes2600/sta.c
|
||||
@@ -1500,7 +1500,7 @@ void bes2600_event_handler(struct work_struct *work)
|
||||
IEEE80211_STYPE_DEAUTH | IEEE80211_FCTL_TODS);
|
||||
deauth->u.deauth.reason_code = WLAN_REASON_DEAUTH_LEAVING;
|
||||
deauth->seq_ctrl = 0;
|
||||
- ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ ieee80211_rx_ni(priv->hw, skb);
|
||||
bes_devel(" Inactivity Deauth Frame sent for MAC SA %pM \t and DA %pM\n", deauth->sa, deauth->da);
|
||||
queue_work(priv->hw_priv->workqueue, &priv->set_tim_work);
|
||||
break;
|
||||
diff --git a/drivers/staging/bes2600/txrx.c b/drivers/staging/bes2600/txrx.c
|
||||
index cb718ad..9074972 100644
|
||||
--- a/drivers/staging/bes2600/txrx.c
|
||||
+++ b/drivers/staging/bes2600/txrx.c
|
||||
@@ -1980,18 +1980,18 @@ void bes2600_rx_cb(struct bes2600_vif *priv,
|
||||
* path is taken.
|
||||
*/
|
||||
if (hw_priv->bes_power.pm_unsupported) {
|
||||
- ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ ieee80211_rx_ni(priv->hw, skb);
|
||||
} else {
|
||||
spin_lock_bh(&priv->ps_state_lock);
|
||||
/* Double-check status with lock held */
|
||||
if (entry->status == BES2600_LINK_SOFT)
|
||||
skb_queue_tail(&entry->rx_queue, skb);
|
||||
else
|
||||
- ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ ieee80211_rx_ni(priv->hw, skb);
|
||||
spin_unlock_bh(&priv->ps_state_lock);
|
||||
}
|
||||
} else {
|
||||
- ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
+ ieee80211_rx_ni(priv->hw, skb);
|
||||
}
|
||||
*skb_p = NULL;
|
||||
|
||||
diff --git a/drivers/staging/bes2600/wsm.c b/drivers/staging/bes2600/wsm.c
|
||||
index 908c965..2424181 100644
|
||||
--- a/drivers/staging/bes2600/wsm.c
|
||||
+++ b/drivers/staging/bes2600/wsm.c
|
||||
@@ -2412,7 +2412,7 @@ int wsm_handle_rx(struct bes2600_common *hw_priv, int id,
|
||||
if (!hw_priv->beacon_bkp)
|
||||
hw_priv->beacon_bkp = \
|
||||
skb_copy(hw_priv->beacon, GFP_ATOMIC);
|
||||
- ieee80211_rx_irqsafe(hw_priv->hw, hw_priv->beacon);
|
||||
+ ieee80211_rx_ni(hw_priv->hw, hw_priv->beacon);
|
||||
hw_priv->beacon = hw_priv->beacon_bkp;
|
||||
|
||||
hw_priv->beacon_bkp = NULL;
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+725
@@ -0,0 +1,725 @@
|
||||
From dc13f5d64fd4267bd85bef5fbf945b64f21a1c93 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Fri, 8 May 2026 08:23:20 +0200
|
||||
Subject: [PATCH 18/20] =?UTF-8?q?bes2600:=20Patch=20H=20=E2=80=94=20bh.c?=
|
||||
=?UTF-8?q?=20hygiene=20cleanup=20(drop=20fossil=20blocks,=20dead=20stubs)?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
Per Opus structural critique §4.1 (#if 0 graveyard), §4.3 (asm
|
||||
volatile("nop") placeholder), §4.4 (BUG_ON in steady-state hot
|
||||
path). Pure source-tree cleanup, no functional change.
|
||||
|
||||
Removed:
|
||||
|
||||
1. bh.c lines 319-395 (76-line #if 0 block) — dead helper
|
||||
functions inherited from cw1200 ancestor:
|
||||
bes2600_bh_read_ctrl_reg, bes2600_get_skb, bes2600_put_skb,
|
||||
bes2600_device_wakeup. Compiled out for years.
|
||||
|
||||
2. bh.c lines 405-873 + line 1659 (the outer #if 0 / #else /
|
||||
#endif) — 468-line cw1200-ancestor bes2600_bh() function body,
|
||||
preserved verbatim alongside the active impl. Same function
|
||||
name, same goto labels. Maintenance hazard removed.
|
||||
|
||||
3. bh.c done: label body — `__bes2600_irq_enable(1)` placeholder
|
||||
(commented out) + `asm volatile ("nop")` filler. Both
|
||||
no-ops on bes2600 silicon.
|
||||
|
||||
4. bh.c post-loop "Explicitly disable device interrupts" block
|
||||
(sbus lock + __bes2600_irq_enable(0) + sbus unlock) — the
|
||||
stub call wrapped in lock/unlock ceremony. Dead.
|
||||
|
||||
5. hwio.c __bes2600_irq_enable() function definition —
|
||||
`int __bes2600_irq_enable(int enable) { return 0; }`. Stub.
|
||||
Removed entirely.
|
||||
|
||||
6. sbus.h __bes2600_irq_enable() forward declaration.
|
||||
|
||||
Replaced:
|
||||
|
||||
7. bh.c bes2600_bh outer-loop BUG_ON(hw_bufs_used > numInpChBufs)
|
||||
-> WARN_ON_ONCE. The BUG_ON ran every bh-loop iteration;
|
||||
tripping it on a bookkeeping bug locks the kernel up during
|
||||
normal operation — the wrong response to a (recoverable)
|
||||
accounting drift. WARN_ON_ONCE surfaces the issue without
|
||||
taking the system down.
|
||||
|
||||
Why __bes2600_irq_enable was a stub on bes2600:
|
||||
|
||||
cw1200 has the same-named function (drivers/net/wireless/st/cw1200/
|
||||
hwio.c:267) that does real work — reads ST90TDS_CONFIG_REG_ID and
|
||||
toggles the ST90TDS_CONF_IRQ_RDY_ENABLE bit. bes2600 inherited
|
||||
the function name + signature when forked, but the bes2600 chip's
|
||||
IRQ enable is managed by sdio_claim_irq + chip-side firmware, not
|
||||
by a driver-side enable register. Bestechnic kept the function as
|
||||
a no-op stub (return 0). Patch H removes the dead infrastructure.
|
||||
|
||||
Diff scope:
|
||||
|
||||
- bes2600/bh.c -578/+27 (mostly deletions)
|
||||
- bes2600/hwio.c -7/+7 (stub function -> comment block)
|
||||
- bes2600/sbus.h -2/+1 (declaration -> comment)
|
||||
- net: -578/+28 across 3 files
|
||||
|
||||
Build verification deferred — ohm offline. Pure-deletion change,
|
||||
no semantic risk; the deleted code was either #if 0-gated
|
||||
(never compiled) or stub-implementations (always returned 0).
|
||||
---
|
||||
bes2600/bh.c | 578 ++-----------------------------------------------
|
||||
bes2600/hwio.c | 11 +-
|
||||
bes2600/sbus.h | 3 +-
|
||||
3 files changed, 28 insertions(+), 564 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bh.c b/drivers/staging/bes2600/bh.c
|
||||
index 61f6991..67dfad4 100644
|
||||
--- a/drivers/staging/bes2600/bh.c
|
||||
+++ b/drivers/staging/bes2600/bh.c
|
||||
@@ -317,83 +317,6 @@ int wsm_release_buffer_to_fw(struct bes2600_vif *priv, int count)
|
||||
}
|
||||
#endif
|
||||
|
||||
-#if 0
|
||||
-static struct sk_buff *bes2600_get_skb(struct bes2600_common *hw_priv, size_t len)
|
||||
-{
|
||||
- struct sk_buff *skb;
|
||||
- size_t alloc_len = (len > SDIO_BLOCK_SIZE) ? len : SDIO_BLOCK_SIZE;
|
||||
-
|
||||
- if (len > SDIO_BLOCK_SIZE || !hw_priv->skb_cache) {
|
||||
- skb = dev_alloc_skb(alloc_len
|
||||
- + WSM_TX_EXTRA_HEADROOM
|
||||
- + 8 /* TKIP IV */
|
||||
- + 12 /* TKIP ICV + MIC */
|
||||
- - 2 /* Piggyback */);
|
||||
- /* In AP mode RXed SKB can be looped back as a broadcast.
|
||||
- * Here we reserve enough space for headers. */
|
||||
- skb_reserve(skb, WSM_TX_EXTRA_HEADROOM
|
||||
- + 8 /* TKIP IV */
|
||||
- - WSM_RX_EXTRA_HEADROOM);
|
||||
- } else {
|
||||
- skb = hw_priv->skb_cache;
|
||||
- hw_priv->skb_cache = NULL;
|
||||
- }
|
||||
- return skb;
|
||||
-}
|
||||
-
|
||||
-static void bes2600_put_skb(struct bes2600_common *hw_priv, struct sk_buff *skb)
|
||||
-{
|
||||
- if (hw_priv->skb_cache)
|
||||
- dev_kfree_skb(skb);
|
||||
- else
|
||||
- hw_priv->skb_cache = skb;
|
||||
-}
|
||||
-
|
||||
-static int bes2600_bh_read_ctrl_reg(struct bes2600_common *hw_priv,
|
||||
- u16 *ctrl_reg)
|
||||
-{
|
||||
- int ret;
|
||||
-
|
||||
- ret = bes2600_reg_read_16(hw_priv,
|
||||
- ST90TDS_CONTROL_REG_ID, ctrl_reg);
|
||||
- if (ret) {
|
||||
- ret = bes2600_reg_read_16(hw_priv,
|
||||
- ST90TDS_CONTROL_REG_ID, ctrl_reg);
|
||||
- if (ret)
|
||||
- bes_err("[BH] Failed to read control register.\n");
|
||||
- }
|
||||
-
|
||||
- return ret;
|
||||
-}
|
||||
-
|
||||
-static int bes2600_device_wakeup(struct bes2600_common *hw_priv)
|
||||
-{
|
||||
- u16 ctrl_reg;
|
||||
- int ret;
|
||||
-
|
||||
- bes_devel("[BH] Device wakeup.\n");
|
||||
-
|
||||
- /* To force the device to be always-on, the host sets WLAN_UP to 1 */
|
||||
- ret = bes2600_reg_write_16(hw_priv, ST90TDS_CONTROL_REG_ID,
|
||||
- ST90TDS_CONT_WUP_BIT);
|
||||
- if (WARN_ON(ret))
|
||||
- return ret;
|
||||
-
|
||||
- ret = bes2600_bh_read_ctrl_reg(hw_priv, &ctrl_reg);
|
||||
- if (WARN_ON(ret))
|
||||
- return ret;
|
||||
-
|
||||
- /* If the device returns WLAN_RDY as 1, the device is active and will
|
||||
- * remain active. */
|
||||
- if (ctrl_reg & ST90TDS_CONT_RDY_BIT) {
|
||||
- bes_devel("[BH] Device awake.\n");
|
||||
- return 1;
|
||||
- }
|
||||
-
|
||||
- return 0;
|
||||
-}
|
||||
-
|
||||
-#endif
|
||||
|
||||
/* Must be called from BH thraed. */
|
||||
void bes2600_enable_powersave(struct bes2600_vif *priv,
|
||||
@@ -403,475 +326,6 @@ void bes2600_enable_powersave(struct bes2600_vif *priv,
|
||||
priv->powersave_enabled = enable;
|
||||
}
|
||||
|
||||
-#if 0
|
||||
-#define INTERRUPT_WORKAROUND
|
||||
-static int bes2600_bh(void *arg)
|
||||
-{
|
||||
- struct bes2600_common *hw_priv = arg;
|
||||
- struct bes2600_vif *priv = NULL;
|
||||
- struct sk_buff *skb_rx = NULL;
|
||||
- size_t read_len = 0;
|
||||
- int rx, tx, term, suspend;
|
||||
- struct wsm_hdr *wsm;
|
||||
- size_t wsm_len;
|
||||
- int wsm_id;
|
||||
- u8 wsm_seq;
|
||||
- int rx_resync = 1;
|
||||
- u16 ctrl_reg = 0;
|
||||
- int tx_allowed;
|
||||
- int pending_tx = 0;
|
||||
- int tx_burst;
|
||||
- int rx_burst = 0;
|
||||
- long status;
|
||||
-#if defined(CONFIG_BES2600_WSM_DUMPS)
|
||||
- size_t wsm_dump_max = -1;
|
||||
-#endif
|
||||
- u32 dummy;
|
||||
- bool powersave_enabled;
|
||||
- int i;
|
||||
- int vif_selected;
|
||||
-
|
||||
- for (;;) {
|
||||
- powersave_enabled = 1;
|
||||
- spin_lock(&hw_priv->vif_list_lock);
|
||||
- bes2600_for_each_vif(hw_priv, priv, i) {
|
||||
-#ifdef P2P_MULTIVIF
|
||||
- if ((i = (CW12XX_MAX_VIFS - 1)) || !priv)
|
||||
-#else
|
||||
- if (!priv)
|
||||
-#endif
|
||||
- continue;
|
||||
- powersave_enabled &= !!priv->powersave_enabled;
|
||||
- }
|
||||
- spin_unlock(&hw_priv->vif_list_lock);
|
||||
- if (!hw_priv->hw_bufs_used
|
||||
- && powersave_enabled
|
||||
- && !hw_priv->device_can_sleep
|
||||
- && !atomic_read(&hw_priv->recent_scan)) {
|
||||
- status = HZ/8;
|
||||
- bes_devel("[BH] No Device wakedown.\n");
|
||||
-#ifndef FPGA_SETUP
|
||||
- WARN_ON(bes2600_reg_write_16(hw_priv,
|
||||
- ST90TDS_CONTROL_REG_ID, 0));
|
||||
- hw_priv->device_can_sleep = true;
|
||||
-#endif
|
||||
- } else if (hw_priv->hw_bufs_used)
|
||||
- /* Interrupt loss detection */
|
||||
- status = HZ/8;
|
||||
- else
|
||||
- status = HZ/8;
|
||||
-
|
||||
- /* Dummy Read for SDIO retry mechanism*/
|
||||
- if (((atomic_read(&hw_priv->bh_rx) == 0) &&
|
||||
- (atomic_read(&hw_priv->bh_tx) == 0)))
|
||||
- bes2600_reg_read(hw_priv, ST90TDS_CONFIG_REG_ID,
|
||||
- &dummy, sizeof(dummy));
|
||||
-#if defined(CONFIG_BES2600_WSM_DUMPS_SHORT)
|
||||
- wsm_dump_max = hw_priv->wsm_dump_max_size;
|
||||
-#endif /* CONFIG_BES2600_WSM_DUMPS_SHORT */
|
||||
-
|
||||
-#ifdef INTERRUPT_WORKAROUND
|
||||
- /* If a packet has already been txed to the device then read the
|
||||
- control register for a probable interrupt miss before going
|
||||
- further to wait for interrupt; if the read length is non-zero
|
||||
- then it means there is some data to be received */
|
||||
- if (hw_priv->hw_bufs_used) {
|
||||
- bes2600_bh_read_ctrl_reg(hw_priv, &ctrl_reg);
|
||||
- if(ctrl_reg & ST90TDS_CONT_NEXT_LEN_MASK)
|
||||
- {
|
||||
- rx = 1;
|
||||
- goto test;
|
||||
- }
|
||||
- }
|
||||
-#endif
|
||||
-
|
||||
- status = wait_event_interruptible_timeout(hw_priv->bh_wq, ({
|
||||
- rx = atomic_xchg(&hw_priv->bh_rx, 0);
|
||||
- tx = atomic_xchg(&hw_priv->bh_tx, 0);
|
||||
- term = atomic_xchg(&hw_priv->bh_term, 0);
|
||||
- suspend = pending_tx ?
|
||||
- 0 : atomic_read(&hw_priv->bh_suspend);
|
||||
- (rx || tx || term || suspend || hw_priv->bh_error);
|
||||
- }), status);
|
||||
-
|
||||
- if (status < 0 || term || hw_priv->bh_error)
|
||||
- break;
|
||||
-
|
||||
-#ifdef INTERRUPT_WORKAROUND
|
||||
- if (!status) {
|
||||
- bes2600_bh_read_ctrl_reg(hw_priv, &ctrl_reg);
|
||||
- if(ctrl_reg & ST90TDS_CONT_NEXT_LEN_MASK)
|
||||
- {
|
||||
- bes_err("MISS 1\n");
|
||||
- rx = 1;
|
||||
- goto test;
|
||||
- }
|
||||
- }
|
||||
-#endif
|
||||
- if (!status && hw_priv->hw_bufs_used) {
|
||||
- unsigned long timestamp = jiffies;
|
||||
- long timeout;
|
||||
- bool pending = false;
|
||||
- int i;
|
||||
-
|
||||
- wiphy_warn(hw_priv->hw->wiphy, "Missed interrupt?\n");
|
||||
- rx = 1;
|
||||
-
|
||||
- /* Get a timestamp of "oldest" frame */
|
||||
- for (i = 0; i < 4; ++i)
|
||||
- pending |= bes2600_queue_get_xmit_timestamp(
|
||||
- &hw_priv->tx_queue[i],
|
||||
- ×tamp, -1,
|
||||
- hw_priv->pending_frame_id);
|
||||
-
|
||||
- /* Check if frame transmission is timed out.
|
||||
- * Add an extra second with respect to possible
|
||||
- * interrupt loss. */
|
||||
- timeout = timestamp +
|
||||
- WSM_CMD_LAST_CHANCE_TIMEOUT +
|
||||
- 1 * HZ -
|
||||
- jiffies;
|
||||
-
|
||||
- /* And terminate BH tread if the frame is "stuck" */
|
||||
- if (pending && timeout < 0) {
|
||||
- //wiphy_warn(priv->hw->wiphy,
|
||||
- // "Timeout waiting for TX confirm.\n");
|
||||
- bes_devel("bes2600_bh: Timeout waiting for TX confirm.\n");
|
||||
- break;
|
||||
- }
|
||||
-
|
||||
-#if defined(CONFIG_BES2600_DUMP_ON_ERROR)
|
||||
- BUG_ON(1);
|
||||
-#endif /* CONFIG_BES2600_DUMP_ON_ERROR */
|
||||
- } else if (!status) {
|
||||
- if (!hw_priv->device_can_sleep
|
||||
- && !atomic_read(&hw_priv->recent_scan)) {
|
||||
- bes_devel("[BH] Device wakedown. Timeout.\n");
|
||||
-#ifndef FPGA_SETUP
|
||||
- WARN_ON(bes2600_reg_write_16(hw_priv,
|
||||
- ST90TDS_CONTROL_REG_ID, 0));
|
||||
- hw_priv->device_can_sleep = true;
|
||||
-#endif
|
||||
- }
|
||||
- continue;
|
||||
- } else if (suspend) {
|
||||
- bes_devel("[BH] Device suspend.\n");
|
||||
- powersave_enabled = 1;
|
||||
- spin_lock(&hw_priv->vif_list_lock);
|
||||
- bes2600_for_each_vif(hw_priv, priv, i) {
|
||||
-#ifdef P2P_MULTIVIF
|
||||
- if ((i = (CW12XX_MAX_VIFS - 1)) || !priv)
|
||||
-#else
|
||||
- if (!priv)
|
||||
-#endif
|
||||
- continue;
|
||||
- powersave_enabled &= !!priv->powersave_enabled;
|
||||
- }
|
||||
- spin_unlock(&hw_priv->vif_list_lock);
|
||||
- if (powersave_enabled) {
|
||||
- bes_devel("[BH] No Device wakedown. Suspend.\n");
|
||||
-#ifndef FPGA_SETUP
|
||||
- WARN_ON(bes2600_reg_write_16(hw_priv,
|
||||
- ST90TDS_CONTROL_REG_ID, 0));
|
||||
- hw_priv->device_can_sleep = true;
|
||||
-#endif
|
||||
- }
|
||||
-
|
||||
- atomic_set(&hw_priv->bh_suspend, BES2600_BH_SUSPENDED);
|
||||
- wake_up(&hw_priv->bh_evt_wq);
|
||||
- status = wait_event_interruptible(hw_priv->bh_wq,
|
||||
- BES2600_BH_RESUME == atomic_read(
|
||||
- &hw_priv->bh_suspend));
|
||||
- if (status < 0) {
|
||||
- wiphy_err(hw_priv->hw->wiphy,
|
||||
- "%s: Failed to wait for resume: %ld.\n",
|
||||
- __func__, status);
|
||||
- break;
|
||||
- }
|
||||
- bes_devel("[BH] Device resume.\n");
|
||||
- atomic_set(&hw_priv->bh_suspend, BES2600_BH_RESUMED);
|
||||
- wake_up(&hw_priv->bh_evt_wq);
|
||||
- atomic_inc(&hw_priv->bh_rx);
|
||||
- continue;
|
||||
- }
|
||||
-
|
||||
-test:
|
||||
- tx += pending_tx;
|
||||
- pending_tx = 0;
|
||||
-
|
||||
- if (rx) {
|
||||
- size_t alloc_len;
|
||||
- u8 *data;
|
||||
-
|
||||
-#ifdef INTERRUPT_WORKAROUND
|
||||
- if(!(ctrl_reg & ST90TDS_CONT_NEXT_LEN_MASK))
|
||||
-#endif
|
||||
- if (WARN_ON(bes2600_bh_read_ctrl_reg(
|
||||
- hw_priv, &ctrl_reg)))
|
||||
- break;
|
||||
-rx:
|
||||
- read_len = (ctrl_reg & ST90TDS_CONT_NEXT_LEN_MASK) * 2;
|
||||
- if (!read_len) {
|
||||
- rx_burst = 0;
|
||||
- goto tx;
|
||||
- }
|
||||
-
|
||||
- if (WARN_ON((read_len < sizeof(struct wsm_hdr)) ||
|
||||
- (read_len > EFFECTIVE_BUF_SIZE))) {
|
||||
- bes_devel("Invalid read len: %d", read_len);
|
||||
- break;
|
||||
- }
|
||||
-
|
||||
- /* Add SIZE of PIGGYBACK reg (CONTROL Reg)
|
||||
- * to the NEXT Message length + 2 Bytes for SKB */
|
||||
- read_len = read_len + 2;
|
||||
-
|
||||
-#if defined(CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES)
|
||||
- alloc_len = hw_priv->sbus_ops->align_size(
|
||||
- hw_priv->sbus_priv, read_len);
|
||||
-#else /* CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES */
|
||||
- /* Platform's SDIO workaround */
|
||||
- alloc_len = read_len & ~(SDIO_BLOCK_SIZE - 1);
|
||||
- if (read_len & (SDIO_BLOCK_SIZE - 1))
|
||||
- alloc_len += SDIO_BLOCK_SIZE;
|
||||
-#endif /* CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES */
|
||||
-
|
||||
- /* Check if not exceeding BES2600 capabilities */
|
||||
- if (WARN_ON_ONCE(alloc_len > EFFECTIVE_BUF_SIZE))
|
||||
- bes_devel("Read aligned len: %d\n", alloc_len);
|
||||
-
|
||||
- skb_rx = bes2600_get_skb(hw_priv, alloc_len);
|
||||
- if (WARN_ON(!skb_rx))
|
||||
- break;
|
||||
-
|
||||
- skb_trim(skb_rx, 0);
|
||||
- skb_put(skb_rx, read_len);
|
||||
- data = skb_rx->data;
|
||||
- if (WARN_ON(!data))
|
||||
- break;
|
||||
-
|
||||
- if (WARN_ON(bes2600_data_read(hw_priv, data, alloc_len)))
|
||||
- break;
|
||||
-
|
||||
- /* Piggyback */
|
||||
- ctrl_reg = __le16_to_cpu(
|
||||
- ((__le16 *)data)[alloc_len / 2 - 1]);
|
||||
-
|
||||
- wsm = (struct wsm_hdr *)data;
|
||||
- wsm_len = __le32_to_cpu(wsm->len);
|
||||
- if (WARN_ON(wsm_len > read_len))
|
||||
- break;
|
||||
-
|
||||
-#if defined(CONFIG_BES2600_WSM_DUMPS)
|
||||
- if (unlikely(hw_priv->wsm_enable_wsm_dumps)) {
|
||||
- u16 msgid, ifid;
|
||||
- u16 *p = (u16 *)data;
|
||||
- msgid = (*(p + 1)) & 0xC3F;
|
||||
- ifid = (*(p + 1)) >> 6;
|
||||
- ifid &= 0xF;
|
||||
- bes_devel("[DUMP] <<< msgid 0x%.4X ifid %d len %d\n", msgid, ifid, *p);
|
||||
- print_hex_dump(KERN_DEBUG, "<-- ", DUMP_PREFIX_NONE, data, min(wsm_len, wsm_dump_max));
|
||||
- }
|
||||
-#endif /* CONFIG_BES2600_WSM_DUMPS */
|
||||
-
|
||||
- wsm_id = __le32_to_cpu(wsm->id) & 0xFFF;
|
||||
- wsm_seq = (__le32_to_cpu(wsm->id) >> 13) & 7;
|
||||
-
|
||||
- skb_trim(skb_rx, wsm_len);
|
||||
-
|
||||
- if (unlikely(wsm_id == 0x0800)) {
|
||||
- wsm_handle_exception(hw_priv,
|
||||
- &data[sizeof(*wsm)],
|
||||
- wsm_len - sizeof(*wsm));
|
||||
- break;
|
||||
- } else if (unlikely(!rx_resync)) {
|
||||
- if (WARN_ON(wsm_seq != hw_priv->wsm_rx_seq)) {
|
||||
-#if defined(CONFIG_BES2600_DUMP_ON_ERROR)
|
||||
- BUG_ON(1);
|
||||
-#endif /* CONFIG_BES2600_DUMP_ON_ERROR */
|
||||
- break;
|
||||
- }
|
||||
- }
|
||||
- hw_priv->wsm_rx_seq = (wsm_seq + 1) & 7;
|
||||
- rx_resync = 0;
|
||||
-
|
||||
- if (wsm_id & 0x0400) {
|
||||
- int rc = wsm_release_tx_buffer(hw_priv, 1);
|
||||
- if (WARN_ON(rc < 0))
|
||||
- break;
|
||||
- else if (rc > 0)
|
||||
- tx = 1;
|
||||
- }
|
||||
-
|
||||
- /* bes2600_wsm_rx takes care on SKB livetime */
|
||||
- if (WARN_ON(wsm_handle_rx(hw_priv, wsm_id, wsm,
|
||||
- &skb_rx)))
|
||||
- break;
|
||||
-
|
||||
- if (skb_rx) {
|
||||
- bes2600_put_skb(hw_priv, skb_rx);
|
||||
- skb_rx = NULL;
|
||||
- }
|
||||
-
|
||||
- read_len = 0;
|
||||
-
|
||||
- if (rx_burst) {
|
||||
- bes2600_debug_rx_burst(hw_priv);
|
||||
- --rx_burst;
|
||||
- goto rx;
|
||||
- }
|
||||
- }
|
||||
-
|
||||
-tx:
|
||||
- BUG_ON(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs);
|
||||
- tx_burst = hw_priv->wsm_caps.numInpChBufs -
|
||||
- hw_priv->hw_bufs_used;
|
||||
- tx_allowed = tx_burst > 0;
|
||||
- if (tx && tx_allowed) {
|
||||
- size_t tx_len;
|
||||
- u8 *data;
|
||||
- int ret;
|
||||
-
|
||||
- if (hw_priv->device_can_sleep) {
|
||||
- ret = bes2600_device_wakeup(hw_priv);
|
||||
- if (WARN_ON(ret < 0))
|
||||
- break;
|
||||
- else if (ret)
|
||||
- hw_priv->device_can_sleep = false;
|
||||
- else {
|
||||
- /* Wait for "awake" interrupt */
|
||||
- pending_tx = tx;
|
||||
- continue;
|
||||
- }
|
||||
- }
|
||||
-
|
||||
- wsm_alloc_tx_buffer(hw_priv);
|
||||
- ret = wsm_get_tx(hw_priv, &data, &tx_len, &tx_burst,
|
||||
- &vif_selected);
|
||||
- if (ret <= 0) {
|
||||
- wsm_release_tx_buffer(hw_priv, 1);
|
||||
- if (WARN_ON(ret < 0))
|
||||
- break;
|
||||
- } else {
|
||||
- wsm = (struct wsm_hdr *)data;
|
||||
- BUG_ON(tx_len < sizeof(*wsm));
|
||||
- BUG_ON(__le32_to_cpu(wsm->len) != tx_len);
|
||||
-
|
||||
-#if 0 /* count is not implemented */
|
||||
- if (ret > 1)
|
||||
- atomic_inc(&hw_priv->bh_tx);
|
||||
-#else
|
||||
- atomic_inc(&hw_priv->bh_tx);
|
||||
-#endif
|
||||
-
|
||||
-#if defined(CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES)
|
||||
- if (tx_len <= 8)
|
||||
- tx_len = 16;
|
||||
- tx_len = hw_priv->sbus_ops->align_size(
|
||||
- hw_priv->sbus_priv, tx_len);
|
||||
-#else /* CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES */
|
||||
- /* HACK!!! Platform limitation.
|
||||
- * It is also supported by upper layer:
|
||||
- * there is always enough space at the
|
||||
- * end of the buffer. */
|
||||
- if (tx_len & (SDIO_BLOCK_SIZE - 1)) {
|
||||
- tx_len &= ~(SDIO_BLOCK_SIZE - 1);
|
||||
- tx_len += SDIO_BLOCK_SIZE;
|
||||
- }
|
||||
-#endif /* CONFIG_BES2600_NON_POWER_OF_TWO_BLOCKSIZES */
|
||||
-
|
||||
- /* Check if not exceeding BES2600
|
||||
- capabilities */
|
||||
- if (WARN_ON_ONCE(tx_len > EFFECTIVE_BUF_SIZE))
|
||||
- bes_devel("Write aligned len: %d\n", tx_len);
|
||||
-
|
||||
- wsm->id &= __cpu_to_le32(
|
||||
- ~WSM_TX_SEQ(WSM_TX_SEQ_MAX));
|
||||
- wsm->id |= cpu_to_le32(WSM_TX_SEQ(
|
||||
- hw_priv->wsm_tx_seq));
|
||||
-
|
||||
- if (WARN_ON(bes2600_data_write(hw_priv,
|
||||
- data, tx_len))) {
|
||||
- wsm_release_tx_buffer(hw_priv, 1);
|
||||
- break;
|
||||
- }
|
||||
-
|
||||
- if (vif_selected != -1) {
|
||||
- hw_priv->hw_bufs_used_vif[
|
||||
- vif_selected]++;
|
||||
- }
|
||||
-
|
||||
-#if defined(CONFIG_BES2600_WSM_DUMPS)
|
||||
- if (unlikely(hw_priv->wsm_enable_wsm_dumps)) {
|
||||
- u16 msgid, ifid;
|
||||
- u16 *p = (u16 *)data;
|
||||
- msgid = (*(p + 1)) & 0x3F;
|
||||
- ifid = (*(p + 1)) >> 6;
|
||||
- ifid &= 0xF;
|
||||
- if (msgid == 0x0006)
|
||||
- bes_devel("[DUMP] >>> msgid 0x%.4X ifid %d len %d MIB 0x%.4X\n", msgid, ifid, *p, *(p + 2));
|
||||
- else
|
||||
- bes_devel("[DUMP] >>> msgid 0x%.4X ifid %d len %d\n", msgid, ifid, *p);
|
||||
- print_hex_dump(KERN_DEBUG, "--> ", DUMP_PREFIX_NONE, data, min(__le32_to_cpu(wsm->len), wsm_dump_max));
|
||||
- }
|
||||
-#endif /* CONFIG_BES2600_WSM_DUMPS */
|
||||
-
|
||||
- wsm_txed(hw_priv, data);
|
||||
- hw_priv->wsm_tx_seq = (hw_priv->wsm_tx_seq + 1)
|
||||
- & WSM_TX_SEQ_MAX;
|
||||
-
|
||||
- if (tx_burst > 1) {
|
||||
- bes2600_debug_tx_burst(hw_priv);
|
||||
- ++rx_burst;
|
||||
- goto tx;
|
||||
- }
|
||||
- }
|
||||
- }
|
||||
-
|
||||
- if (ctrl_reg & ST90TDS_CONT_NEXT_LEN_MASK)
|
||||
- goto rx;
|
||||
- }
|
||||
-
|
||||
- if (skb_rx) {
|
||||
- bes2600_put_skb(hw_priv, skb_rx);
|
||||
- skb_rx = NULL;
|
||||
- }
|
||||
-
|
||||
-
|
||||
- if (!term) {
|
||||
- bes_devel("[BH] Fatal error, exitting.\n");
|
||||
-#if defined(CONFIG_BES2600_DUMP_ON_ERROR)
|
||||
- BUG_ON(1);
|
||||
-#endif /* CONFIG_BES2600_DUMP_ON_ERROR */
|
||||
- hw_priv->bh_error = 1;
|
||||
-#if defined(CONFIG_BES2600_USE_STE_EXTENSIONS)
|
||||
- spin_lock(&hw_priv->vif_list_lock);
|
||||
- bes2600_for_each_vif(hw_priv, priv, i) {
|
||||
- if (!priv)
|
||||
- continue;
|
||||
- ieee80211_driver_hang_notify(priv->vif, GFP_KERNEL);
|
||||
- }
|
||||
- spin_unlock(&hw_priv->vif_list_lock);
|
||||
- bes2600_pm_stay_awake(&hw_priv->pm_state, 3*HZ);
|
||||
-#endif
|
||||
- /* TODO: schedule_work(recovery) */
|
||||
-#ifndef HAS_PUT_TASK_STRUCT
|
||||
- /* The only reason of having this stupid code here is
|
||||
- * that __put_task_struct is not exported by kernel. */
|
||||
- for (;;) {
|
||||
- int status = wait_event_interruptible(hw_priv->bh_wq, ({
|
||||
- term = atomic_xchg(&hw_priv->bh_term, 0);
|
||||
- (term);
|
||||
- }));
|
||||
-
|
||||
- if (status || term)
|
||||
- break;
|
||||
- }
|
||||
-#endif
|
||||
- }
|
||||
- return 0;
|
||||
-}
|
||||
-#else
|
||||
|
||||
extern int bes2600_bh_read_ctrl_reg(struct bes2600_common *priv, u32 *ctrl_reg);
|
||||
|
||||
@@ -1599,7 +1053,15 @@ static int bes2600_bh(void *arg)
|
||||
|
||||
tx = 0;
|
||||
|
||||
- BUG_ON(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs);
|
||||
+ /*
|
||||
+ * Patch H: BUG_ON -> WARN_ON_ONCE in the steady-state
|
||||
+ * hot path. The original BUG_ON ran every bh-loop
|
||||
+ * iteration; tripping it on a bookkeeping bug locks
|
||||
+ * the kernel up during normal operation, which is
|
||||
+ * the wrong response. WARN_ON_ONCE surfaces the
|
||||
+ * issue without taking the system down.
|
||||
+ */
|
||||
+ WARN_ON_ONCE(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs);
|
||||
tx_burst = hw_priv->wsm_caps.numInpChBufs - hw_priv->hw_bufs_used;
|
||||
tx_allowed = tx_burst > 0;
|
||||
|
||||
@@ -1643,18 +1105,19 @@ static int bes2600_bh(void *arg)
|
||||
goto tx;
|
||||
|
||||
done:
|
||||
- /* Re-enable device interrupts */
|
||||
- //hw_priv->sbus_ops->lock(hw_priv->sbus_priv);
|
||||
- //__bes2600_irq_enable(1);
|
||||
- //hw_priv->sbus_ops->unlock(hw_priv->sbus_priv);
|
||||
- asm volatile ("nop");
|
||||
+ /*
|
||||
+ * Patch H: dropped the dead `__bes2600_irq_enable(1)` /
|
||||
+ * `asm volatile("nop")` placeholder that used to sit here.
|
||||
+ * `__bes2600_irq_enable()` is a stub that returns 0 on
|
||||
+ * bes2600 silicon — the IRQ is managed by sdio_claim_irq
|
||||
+ * and chip-side firmware, not by a driver-side enable bit.
|
||||
+ * (cw1200 inherited the function from a different chip
|
||||
+ * shape; bes2600 kept the stub but the call sites are
|
||||
+ * meaningless.)
|
||||
+ */
|
||||
+ ;
|
||||
}
|
||||
|
||||
- /* Explicitly disable device interrupts */
|
||||
- hw_priv->sbus_ops->lock(hw_priv->sbus_priv);
|
||||
- __bes2600_irq_enable(0);
|
||||
- hw_priv->sbus_ops->unlock(hw_priv->sbus_priv);
|
||||
-
|
||||
if (!term) {
|
||||
bes_err("[BH] Fatal error, exiting.\n");
|
||||
sdio_work_debug(hw_priv->sbus_priv);
|
||||
@@ -1663,4 +1126,3 @@ static int bes2600_bh(void *arg)
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
-#endif
|
||||
diff --git a/drivers/staging/bes2600/hwio.c b/drivers/staging/bes2600/hwio.c
|
||||
index 0934a13..1a63e4f 100644
|
||||
--- a/drivers/staging/bes2600/hwio.c
|
||||
+++ b/drivers/staging/bes2600/hwio.c
|
||||
@@ -324,7 +324,10 @@ out:
|
||||
}
|
||||
#endif
|
||||
|
||||
-int __bes2600_irq_enable(int enable)
|
||||
-{
|
||||
- return 0;
|
||||
-}
|
||||
+/*
|
||||
+ * Patch H: __bes2600_irq_enable stub removed. It was a no-op
|
||||
+ * (always returned 0) inherited from cw1200 where the analogous
|
||||
+ * function manipulates the chip's IRQ-enable register. bes2600
|
||||
+ * silicon manages SDIO IRQ via sdio_claim_irq and chip-side
|
||||
+ * firmware — there is no driver-side enable register to write.
|
||||
+ */
|
||||
diff --git a/drivers/staging/bes2600/sbus.h b/drivers/staging/bes2600/sbus.h
|
||||
index 43c2dae..4193084 100644
|
||||
--- a/drivers/staging/bes2600/sbus.h
|
||||
+++ b/drivers/staging/bes2600/sbus.h
|
||||
@@ -95,7 +95,6 @@ struct sbus_ops {
|
||||
|
||||
void bes2600_irq_handler(struct bes2600_common *priv);
|
||||
|
||||
-/* This MUST be wrapped with hwbus_ops->lock/unlock! */
|
||||
-int __bes2600_irq_enable(int enable);
|
||||
+/* Patch H: __bes2600_irq_enable removed (was a stub). */
|
||||
|
||||
#endif /* BES2600_SBUS_H */
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+121
@@ -0,0 +1,121 @@
|
||||
From f469448c605e41bb90440c6d48047830c6febe33 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Mon, 18 May 2026 16:58:49 +0200
|
||||
Subject: [PATCH 19/20] =?UTF-8?q?bes2600:=20take=20pending=5Frecord=5Flock?=
|
||||
=?UTF-8?q?=20with=20=5Fbh()=20to=20fix=20SOFTIRQ-safe=20=E2=86=92=20-unsa?=
|
||||
=?UTF-8?q?fe=20inversion=20(besser#18)?=
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
PROVE_LOCKING reports:
|
||||
|
||||
WARNING: SOFTIRQ-safe -> SOFTIRQ-unsafe lock order detected
|
||||
kworker/u16:1 is trying to acquire:
|
||||
&hw_priv->tx_loop.pending_record_lock at bes2600_queue_clear+0x80
|
||||
and this task is already holding:
|
||||
&queue->lock at bes2600_queue_clear+0x60
|
||||
|
||||
which would create a new lock dependency:
|
||||
(&queue->lock){+.-.} -> (&hw_priv->tx_loop.pending_record_lock){+.+.}
|
||||
|
||||
but this new dependency connects a SOFTIRQ-irq-safe lock:
|
||||
(&queue->lock){+.-.}
|
||||
... which became SOFTIRQ-irq-safe at:
|
||||
bes2600_tx -> ieee80211_handle_wake_tx_queue -> tasklet_action
|
||||
to a SOFTIRQ-irq-unsafe lock:
|
||||
(&hw_priv->tx_loop.pending_record_lock){+.+.}
|
||||
... which became SOFTIRQ-irq-unsafe at:
|
||||
bes2600_queue_get_skb -> bes2600_join_work -> process_one_work
|
||||
|
||||
queue->lock is taken consistently with spin_lock_bh() at 22 sites;
|
||||
the nested acquisition of pending_record_lock at queue.c:289 (inside
|
||||
the outer queue->lock_bh held at line 285) had it implicitly BH-safe
|
||||
via the outer scope. But pending_record_lock is ALSO taken from
|
||||
non-BH-disabled contexts:
|
||||
|
||||
bes2600_queue_get_skb (queue.c:832) — process context via
|
||||
bes2600_join_work (workqueue), no outer queue->lock held
|
||||
bes2600_tx_loop_item_pending_check (tx_loop.c:112)
|
||||
— TX-loop context, no outer
|
||||
queue->lock held
|
||||
|
||||
When CPU0 holds pending_record_lock from one of those non-BH paths
|
||||
and a softirq fires that wants queue->lock, and CPU1 in softirq has
|
||||
queue->lock and is about to acquire pending_record_lock — classic AB-BA
|
||||
SOFTIRQ deadlock.
|
||||
|
||||
The fix is the conservative one: take pending_record_lock with _bh()
|
||||
at every site that's not already inside a queue->lock_bh-held scope.
|
||||
That makes the lock consistently SOFTIRQ-safe, eliminating the
|
||||
inversion. queue.c:289/295 stays as plain spin_lock because BH is
|
||||
already disabled by the outer queue->lock_bh acquired at queue.c:285.
|
||||
|
||||
Five sites converted:
|
||||
bes2600/queue.c:832 -- spin_lock -> spin_lock_bh
|
||||
bes2600/queue.c:839 -- spin_unlock -> spin_unlock_bh
|
||||
bes2600/queue.c:844 -- spin_unlock -> spin_unlock_bh
|
||||
bes2600/tx_loop.c:112 -- spin_lock -> spin_lock_bh
|
||||
bes2600/tx_loop.c:114 -- spin_unlock -> spin_unlock_bh
|
||||
|
||||
Contract:
|
||||
- Documentation/locking/locktypes.rst spelling: spin_lock_bh() is
|
||||
the canonical way to make a non-IRQ spinlock safe against
|
||||
softirq preemption that might re-enter the same lock.
|
||||
- Same shape as queue->lock in this driver and as is_drv->lock
|
||||
in the cw1200 ancestor.
|
||||
|
||||
Closes: besser#18
|
||||
Fixes: <bes2600 base import>
|
||||
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
---
|
||||
bes2600/queue.c | 6 +++---
|
||||
bes2600/tx_loop.c | 4 ++--
|
||||
2 files changed, 5 insertions(+), 5 deletions(-)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/queue.c b/drivers/staging/bes2600/queue.c
|
||||
index b56ca43..1e8390f 100644
|
||||
--- a/drivers/staging/bes2600/queue.c
|
||||
+++ b/drivers/staging/bes2600/queue.c
|
||||
@@ -827,19 +827,19 @@ int bes2600_queue_get_skb(struct bes2600_queue *queue, u32 packetID,
|
||||
bes2600_queue_parse_id(packetID, &queue_generation, &queue_id,
|
||||
&item_generation, &item_id, &if_id, &link_id);
|
||||
|
||||
- spin_lock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
+ spin_lock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
if (!list_empty(&queue->stats->hw_priv->tx_loop.pending_record_list)) {
|
||||
list_for_each_entry_safe(record_item, temp_record_item, &queue->stats->hw_priv->tx_loop.pending_record_list, head) {
|
||||
if (record_item->packetID == packetID) {
|
||||
list_del(&record_item->head);
|
||||
dev_kfree_skb(record_item->skb);
|
||||
kfree(record_item);
|
||||
- spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
+ spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
return -EINVAL;
|
||||
}
|
||||
}
|
||||
}
|
||||
- spin_unlock(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
+ spin_unlock_bh(&queue->stats->hw_priv->tx_loop.pending_record_lock);
|
||||
|
||||
item = &queue->pool[item_id];
|
||||
|
||||
diff --git a/drivers/staging/bes2600/tx_loop.c b/drivers/staging/bes2600/tx_loop.c
|
||||
index e6cf072..0cf7ce1 100644
|
||||
--- a/drivers/staging/bes2600/tx_loop.c
|
||||
+++ b/drivers/staging/bes2600/tx_loop.c
|
||||
@@ -109,9 +109,9 @@ void bes2600_tx_loop_set_enable(struct bes2600_common *hw_priv, bool need_warn)
|
||||
bes2600_queue_iterate_pending_packet(&hw_priv->tx_queue[i],
|
||||
bes2600_tx_loop_item_pending_item);
|
||||
}
|
||||
- spin_lock(&hw_priv->tx_loop.pending_record_lock);
|
||||
+ spin_lock_bh(&hw_priv->tx_loop.pending_record_lock);
|
||||
bes2600_queue_iterate_record_pending_packet(hw_priv, bes2600_tx_loop_item_pending_item);
|
||||
- spin_unlock(&hw_priv->tx_loop.pending_record_lock);
|
||||
+ spin_unlock_bh(&hw_priv->tx_loop.pending_record_lock);
|
||||
|
||||
if (atomic_read(&hw_priv->bh_rx) > 0)
|
||||
wake_up(&hw_priv->bh_wq);
|
||||
--
|
||||
2.54.0
|
||||
|
||||
+47
@@ -0,0 +1,47 @@
|
||||
From 0792ba44bb2f60e6f83e031364ee20739be71d01 Mon Sep 17 00:00:00 2001
|
||||
From: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
Date: Wed, 20 May 2026 20:29:43 +0200
|
||||
Subject: [PATCH 20/20] bes2600: export bus_reset helpers for danctnix
|
||||
bes2600_btuart (danctnix-flavor)
|
||||
MIME-Version: 1.0
|
||||
Content-Type: text/plain; charset=UTF-8
|
||||
Content-Transfer-Encoding: 8bit
|
||||
|
||||
bes2600_chrdev_do_bus_reset() and bes2600_chrdev_trigger_bus_reset() are
|
||||
already present (added by the connection-loss bus_reset commit) but not
|
||||
exported. danctnix's bes2600_btuart.c uses these symbols for BT power
|
||||
switching and bus-error recovery; without EXPORT_SYMBOL_GPL the btuart
|
||||
module cannot be built as a separate object in the intree staging tree.
|
||||
|
||||
The userspace /dev/bes2600 chardev remains intact for danctnix — btuart
|
||||
depends on the internal chardev state machine. This commit is
|
||||
danctnix-specific; the Mobian DKMS flavor does not need the exports.
|
||||
|
||||
Signed-off-by: Claude (noether) <claude@reauktion.de>
|
||||
---
|
||||
bes2600/bes_chardev.c | 2 ++
|
||||
1 file changed, 2 insertions(+)
|
||||
|
||||
diff --git a/drivers/staging/bes2600/bes_chardev.c b/drivers/staging/bes2600/bes_chardev.c
|
||||
index 801e4bf..35696af 100644
|
||||
--- a/drivers/staging/bes2600/bes_chardev.c
|
||||
+++ b/drivers/staging/bes2600/bes_chardev.c
|
||||
@@ -1116,6 +1116,7 @@ int bes2600_chrdev_do_bus_reset(const struct sbus_ops *sbus_ops, struct sbus_pri
|
||||
|
||||
return 0;
|
||||
}
|
||||
+EXPORT_SYMBOL_GPL(bes2600_chrdev_do_bus_reset);
|
||||
|
||||
/*
|
||||
* Trigger bes2600_chrdev_do_bus_reset() against the file-global
|
||||
@@ -1128,6 +1129,7 @@ int bes2600_chrdev_trigger_bus_reset(void)
|
||||
return bes2600_chrdev_do_bus_reset(bes2600_cdev.sbus_ops,
|
||||
bes2600_cdev.sbus_priv);
|
||||
}
|
||||
+EXPORT_SYMBOL_GPL(bes2600_chrdev_trigger_bus_reset);
|
||||
|
||||
bool bes2600_chrdev_is_wifi_opened(void)
|
||||
{
|
||||
--
|
||||
2.54.0
|
||||
|
||||
@@ -0,0 +1,270 @@
|
||||
# Maintainer: Markus Fritsche <fritsche.markus@gmail.com>
|
||||
# Forked from: linux-pinetab2 by Danct12 <danct12@disroot.org>
|
||||
# Original Contributor: Jan Alexander Steffens (heftig) <heftig@archlinux.org>
|
||||
#
|
||||
# linux-pinetab2-danctnix-besser: linux-pinetab2 + the BESser
|
||||
# bes2600 driver patchset (race-fix, lock-removal, attribution-restore,
|
||||
# fossil-cleanup; +73% throughput vs the in-tree baseline). Soft-upstream
|
||||
# fork of linux-pinetab2 — drop-in replacement, same kernel version, only
|
||||
# the bes2600 staging driver differs. See git.reauktion.de/marfrit/besser
|
||||
# and git.reauktion.de/marfrit/bes2600-dkms for full provenance.
|
||||
|
||||
pkgbase=linux-pinetab2-danctnix-besser
|
||||
pkgver=7.0.danctnix1
|
||||
pkgrel=4
|
||||
pkgdesc='PineTab2 (BESser bes2600 driver patchset)'
|
||||
_srcname=linux-pinetab2
|
||||
_srctag=v${pkgver%.*}-${pkgver##*.}
|
||||
arch=(aarch64)
|
||||
_url_git="https://codeberg.org/DanctNIX/${_srcname}"
|
||||
url="${_url_git}/commits/tag/$_srctag"
|
||||
license=(GPL-2.0-only)
|
||||
makedepends=(
|
||||
bc
|
||||
cpio
|
||||
gettext
|
||||
git
|
||||
libelf
|
||||
pahole
|
||||
perl
|
||||
python
|
||||
tar
|
||||
xz
|
||||
)
|
||||
options=(
|
||||
!debug
|
||||
!strip
|
||||
)
|
||||
source=(
|
||||
https://cdn.kernel.org/pub/linux/kernel/v${pkgver%%.*}.x/linux-${pkgver%.*}.tar.{xz,sign}
|
||||
${_url_git}/releases/download/${_srctag}/${_srctag}.patch.zst{,.sig}
|
||||
0001-bes2600-defer-scan-and-soften-WARN-on-firmware-rejec.patch
|
||||
0002-bes2600-widen-scan-defer-backoff-to-30s-and-decay-co.patch
|
||||
0003-bes2600-recover-wedged-firmware-via-mmc_hw_reset-on-.patch
|
||||
0004-bes2600-gate-PM-indication-completion-on-pending-req.patch
|
||||
0005-bes2600-short-circuit-wake-handshake-when-chip-is-co.patch
|
||||
0006-bes2600-self-detect-when-firmware-does-not-honor-PSM.patch
|
||||
0007-bes2600-handle-multi-function-SDIO-cards-in-mmc_hw_r.patch
|
||||
0008-bes2600-pre-empt-AP-deauth-6-with-mac80211-reassoc-o.patch
|
||||
0009-bes2600-bus_reset-on-connection-loss-storm-to-dodge-.patch
|
||||
0010-bes2600-replace-a-set-of-atomic_add.patch
|
||||
0011-bes2600-fix-missing-destroy_workqueue-on-error-in-in.patch
|
||||
0012-bes2600-fix-concurrency-UAF-in-bes2600_hw_scan-and-s.patch
|
||||
0013-bes2600-drop-sdio_rx_work-relay-IRQ-bh-direct-no-rel.patch
|
||||
0014-bes2600-Patch-G-restore-SPDX-identifiers-ST-Ericsson.patch
|
||||
0015-bes2600-Patch-D-atomicize-ba_lock-counters-drop-the-.patch
|
||||
0016-bes2600-Patch-E-skip-ps_state_lock-when-PSM-known-di.patch
|
||||
0017-bes2600-Patch-C2-replace-ieee80211_rx_irqsafe-with-i.patch
|
||||
0018-bes2600-Patch-H-bh.c-hygiene-cleanup-drop-fossil-blo.patch
|
||||
0019-bes2600-take-pending_record_lock-with-_bh-to-fix-SOF.patch
|
||||
0020-bes2600-export-bus_reset-helpers-for-danctnix-bes260.patch
|
||||
0002-bes2600-filter-5ghz-scan.patch
|
||||
config # the main kernel config file
|
||||
)
|
||||
validpgpkeys=(
|
||||
ABAF11C65A2970B130ABE3C479BE3E4300411886 # Linus Torvalds
|
||||
647F28654894E3BD457199BE38DBBDC86092693E # Greg Kroah-Hartman
|
||||
F09A933C0FE0331E558CA4E166CAB7EAA45DD781 # Danct12
|
||||
)
|
||||
b2sums=('3d9795083c8938f80f480de0d10bfd9c525640e59d5c7f22983de3f12ee42c84c31be902cafb05579ddb1c32bac5ed06b0d4953f9705450be185bd2d9ab08f89'
|
||||
'SKIP'
|
||||
'71fe98221e802b315e54b4b10d3e8c8f376695a36bae3541d876e5776a37f3fa33c8f8dfa6e51fcbd6f5396add02e5166634165f2351836a0ea0453c172fe56c'
|
||||
'SKIP'
|
||||
'5268f55c132441e1ef2e0042e48940a51556286c2e2813c99e983bf89606c2aa05df56e42ebd8bcfd201ceaf63493ca3f2639a39f926e8419b3bc27a4ac4aced'
|
||||
'ebf786a401b5883431068b7a88ff1890ff4f2936cfceb6560828ba202a548c0c6f1f89d721837f1b67e85165d4dd1a2973cbe97e396e1b258efe5288a17d1a81'
|
||||
'ddf0f8c052f7d40f324791353b3831827cdb80da4726fb5596a0e61d6f194e84cbd0ceb036e22cb89a1af2baccb15ef7850621799d90e96a4049f9b11fc61565'
|
||||
'c811e415a549100da927e2caa4ef46ccdd6b2b834b0a781db6ca232a12d90278744133e19916de6421be2f95780b2978ec10eb620fe81a9697df3f2539b5747a'
|
||||
'4c28c0ee7443445986a4631d61e9c9f82944c4fd8380d6ba28a14dc85c8e641e88407f25c8abeb47000db25e267946a0d401d0bae4ad1c0b91e4f13953ad0081'
|
||||
'597b648ef625aff58fab7ca2067c303c1b7abdf03b78296c7b656260982eaded1938f294975abda75e864499f2bad4801941ff7acc5713d2628ae6550c9ecea3'
|
||||
'0f6e20acb800f55c853307a4fe9129280fd440a2b5214c068d91d3dbe5e7e207466ca5019d1792800ac9e4f072f006a5bbcb9b4004700426fc8f2eac6cbef5b2'
|
||||
'b793908df0483e64d98e91c7cae1496668f2597d5b6669e2f313abd3a648ba4a685562338e649cbe12a33ab142c90a129f9d642309ee38ad188cbc92fe99ae84'
|
||||
'3a41ced2ebbc6773fc4f2803ac835b7e839d81bae529c84191355ad2768065c2ef5e67a165af6bed29c0775c608869425bd1d20c8e2632faceac5bfa8ecb18d5'
|
||||
'2aa236f4a72712b974f3d4870ff6557892df8e05c748bc89a195284a3ab7330e0859a52815ee1c4447fd64365283117301fced72b590ea1d16cfc450cfd07018'
|
||||
'8c0de659c5dcb70cd6d993c9c8b7607476491440fa62a26a9aea4ee075e20016fe05ce8023c43125bd82b7f8879b20537a0d74e5de2d1b7211b5b37e787b48aa'
|
||||
'6e343e15b14ccc980e5ff21641051db57c8c8cb0705426403c0d0e2f7d1adf3efb79f331c34a5e1714ac5103b28e073404229588d8042ab5b8bb95c9ef8421a2'
|
||||
'54c9529e1d4fe55d028341fd761e24630f4f0a1c43b287db67bc878aa84ceca8e64283560399980bdcd10987ad3222c30e173e33ac1d341190d1237d6cf4f806'
|
||||
'0839ab95b408483774aaff978ece3a1e54ba8ec4bd8146cb2c649ee044224f3ad9c024bd534df09e6883e1d6d4b92593f7e168b6bd51bb32d9b3ee11f7b52716'
|
||||
'7b11001ba0638c24e36926a934448203c94240261742df999429954b9a5253e7e72ddda93d47c39b44e61b99491b83b7a46be2d098d3054bf92f73c226048715'
|
||||
'154a1a564a6d6ac316869456d271024c0af4cf7175c31579e6ac7293bdb20f413dcf5fd4684e63376627545c13c231ba2cbd28026684e33daec14e3751c25a1e'
|
||||
'4611b825d9a79589c427569f2d9521cc3c8d21603d7aae980b763414bbfd96c8d2ef04917805c0af4a8abf397228a866ff5f2c0540ae035662eb1f376bae5312'
|
||||
'e318299e4cb828220ac7d5142dc41969f22f83f1f791bd46f7f4ce19dbd1d7074b0faa9ac6a4daac4f70e6c7852b38a6482de62111bb7e653cd870d2968fce70'
|
||||
'5c71b88f2ae8a7ebd0932db9a4da72a3ba8c636f31a1bed953a81359588bcb0309f62aa9dee98db62bdc988a9b669341910da2b133d9fb92d14c27d64b54efe9'
|
||||
'e09273ddcdc44f4d40fe8a69e0fd70b963681ec4434ce63cf6114ea38954891e709ced877e0be914054854e2d295a2991e8c3d8dc0deb244bfc8b0568c681687'
|
||||
'396acbdcf570eada62533c0b8f505ed18077e8432249bab5b8ac8d1107cabc9489bdb91a5780446237ec4fd9ba5fc57a49dff34c16ddab60dc30513fc535f00f'
|
||||
'656a998ab40cb85ee4c00f087b071a91632a6c091da2c84b0f74236b51d2dea6e9db6886625f80ad81dc249d8494ec47cd79d6dd9ea4f5e44f3cde857f861e10')
|
||||
|
||||
export KBUILD_BUILD_HOST=archlinux
|
||||
export KBUILD_BUILD_USER=$pkgbase
|
||||
export KBUILD_BUILD_TIMESTAMP="$(date -Ru${SOURCE_DATE_EPOCH:+d @$SOURCE_DATE_EPOCH})"
|
||||
|
||||
prepare() {
|
||||
cd linux-${pkgver%.*}
|
||||
|
||||
echo "Setting version..."
|
||||
echo "-$pkgrel" > localversion.10-pkgrel
|
||||
echo "${pkgbase#linux}" > localversion.20-pkgname
|
||||
|
||||
local src
|
||||
for src in "${source[@]}"; do
|
||||
src="${src%%::*}"
|
||||
src="${src##*/}"
|
||||
src="${src%.zst}"
|
||||
[[ $src = *.patch ]] || continue
|
||||
echo "Applying patch: $src..."
|
||||
patch -Np1 < "../$src"
|
||||
done
|
||||
|
||||
echo "Setting config..."
|
||||
cp ../config .config
|
||||
make olddefconfig
|
||||
diff -u ../config .config || :
|
||||
|
||||
make -s kernelrelease > version
|
||||
echo "Prepared $pkgbase version $(<version)"
|
||||
}
|
||||
|
||||
build() {
|
||||
cd linux-${pkgver%.*}
|
||||
make DTC_FLAGS="-@" all
|
||||
make -C tools/bpf/bpftool vmlinux.h feature-clang-bpf-co-re=1
|
||||
}
|
||||
|
||||
_package() {
|
||||
pkgdesc="The $pkgdesc kernel and modules"
|
||||
depends=(
|
||||
coreutils
|
||||
kmod
|
||||
mkinitcpio
|
||||
)
|
||||
optdepends=(
|
||||
'wireless-regdb: to set the correct wireless channels of your country'
|
||||
'linux-firmware: firmware images needed for some devices'
|
||||
)
|
||||
provides=(
|
||||
KSMBD-MODULE
|
||||
WIREGUARD-MODULE
|
||||
"linux-pinetab2=$pkgver-$pkgrel"
|
||||
)
|
||||
conflicts=(linux-pinetab2)
|
||||
replaces=(
|
||||
wireguard-arch
|
||||
)
|
||||
|
||||
cd linux-${pkgver%.*}
|
||||
local modulesdir="$pkgdir/usr/lib/modules/$(<version)"
|
||||
|
||||
echo "Installing boot image..."
|
||||
# systemd expects to find the kernel here to allow hibernation
|
||||
# https://github.com/systemd/systemd/commit/edda44605f06a41fb86b7ab8128dcf99161d2344
|
||||
install -Dm644 "$(make -s image_name)" "$modulesdir/vmlinuz"
|
||||
|
||||
# Used by mkinitcpio to name the kernel
|
||||
echo "$pkgbase" | install -Dm644 /dev/stdin "$modulesdir/pkgbase"
|
||||
|
||||
echo "Installing modules..."
|
||||
ZSTD_CLEVEL=19 make INSTALL_MOD_PATH="$pkgdir/usr" INSTALL_MOD_STRIP=1 \
|
||||
DEPMOD=/doesnt/exist modules_install # Suppress depmod
|
||||
|
||||
echo "Installing device trees..."
|
||||
make INSTALL_DTBS_PATH="$pkgdir/boot/dtbs" dtbs_install
|
||||
|
||||
# Removing unnecessary device trees (keep only pinetab2 variants).
|
||||
# Use find -delete instead of a bash for-loop: the previous for-loop
|
||||
# silently no-op'd in the makepkg environment, leaving 234 unrelated
|
||||
# board DTBs in the package. find is robust to nullglob/cwd quirks.
|
||||
find "$pkgdir"/boot/dtbs/rockchip/ -mindepth 1 -maxdepth 1 -type f \
|
||||
! -name 'rk3566-pinetab2-*' -delete
|
||||
|
||||
# remove build link
|
||||
rm "$modulesdir"/build
|
||||
}
|
||||
|
||||
_package-headers() {
|
||||
pkgdesc="Headers and scripts for building modules for the $pkgdesc kernel"
|
||||
depends=(pahole)
|
||||
|
||||
cd linux-${pkgver%.*}
|
||||
local builddir="$pkgdir/usr/lib/modules/$(<version)/build"
|
||||
|
||||
echo "Installing build files..."
|
||||
install -Dt "$builddir" -m644 .config Makefile Module.symvers System.map \
|
||||
localversion.* version vmlinux tools/bpf/bpftool/vmlinux.h
|
||||
install -Dt "$builddir/kernel" -m644 kernel/Makefile
|
||||
install -Dt "$builddir/arch/arm64" -m644 arch/arm64/Makefile
|
||||
cp -t "$builddir" -a scripts
|
||||
|
||||
# required when DEBUG_INFO_BTF_MODULES is enabled
|
||||
install -Dt "$builddir/tools/bpf/resolve_btfids" tools/bpf/resolve_btfids/resolve_btfids
|
||||
|
||||
echo "Installing headers..."
|
||||
cp -t "$builddir" -a include
|
||||
cp -t "$builddir/arch/arm64" -a arch/arm64/include
|
||||
install -Dt "$builddir/arch/arm64/kernel" -m644 arch/arm64/kernel/asm-offsets.s
|
||||
|
||||
install -Dt "$builddir/drivers/md" -m644 drivers/md/*.h
|
||||
install -Dt "$builddir/net/mac80211" -m644 net/mac80211/*.h
|
||||
|
||||
# https://bugs.archlinux.org/task/13146
|
||||
install -Dt "$builddir/drivers/media/i2c" -m644 drivers/media/i2c/msp3400-driver.h
|
||||
|
||||
# https://bugs.archlinux.org/task/20402
|
||||
install -Dt "$builddir/drivers/media/usb/dvb-usb" -m644 drivers/media/usb/dvb-usb/*.h
|
||||
install -Dt "$builddir/drivers/media/dvb-frontends" -m644 drivers/media/dvb-frontends/*.h
|
||||
install -Dt "$builddir/drivers/media/tuners" -m644 drivers/media/tuners/*.h
|
||||
|
||||
# https://bugs.archlinux.org/task/71392
|
||||
install -Dt "$builddir/drivers/iio/common/hid-sensors" -m644 drivers/iio/common/hid-sensors/*.h
|
||||
|
||||
echo "Installing KConfig files..."
|
||||
find . -name 'Kconfig*' -exec install -Dm644 {} "$builddir/{}" \;
|
||||
|
||||
echo "Removing unneeded architectures..."
|
||||
local arch
|
||||
for arch in "$builddir"/arch/*/; do
|
||||
[[ $arch = */arm64/ ]] && continue
|
||||
echo "Removing $(basename "$arch")"
|
||||
rm -r "$arch"
|
||||
done
|
||||
|
||||
echo "Removing documentation..."
|
||||
rm -r "$builddir/Documentation"
|
||||
|
||||
echo "Removing broken symlinks..."
|
||||
find -L "$builddir" -type l -printf 'Removing %P\n' -delete
|
||||
|
||||
echo "Removing loose objects..."
|
||||
find "$builddir" -type f -name '*.o' -printf 'Removing %P\n' -delete
|
||||
|
||||
echo "Stripping build tools..."
|
||||
local file
|
||||
while read -rd '' file; do
|
||||
case "$(file -Sib "$file")" in
|
||||
application/x-sharedlib\;*) # Libraries (.so)
|
||||
strip -v $STRIP_SHARED "$file" ;;
|
||||
application/x-archive\;*) # Libraries (.a)
|
||||
strip -v $STRIP_STATIC "$file" ;;
|
||||
application/x-executable\;*) # Binaries
|
||||
strip -v $STRIP_BINARIES "$file" ;;
|
||||
application/x-pie-executable\;*) # Relocatable binaries
|
||||
strip -v $STRIP_SHARED "$file" ;;
|
||||
esac
|
||||
done < <(find "$builddir" -type f -perm -u+x ! -name vmlinux -print0)
|
||||
|
||||
echo "Stripping vmlinux..."
|
||||
strip -v $STRIP_STATIC "$builddir/vmlinux"
|
||||
|
||||
echo "Adding symlink..."
|
||||
mkdir -p "$pkgdir/usr/src"
|
||||
ln -sr "$builddir" "$pkgdir/usr/src/$pkgbase"
|
||||
}
|
||||
|
||||
pkgname=(
|
||||
"$pkgbase"
|
||||
"$pkgbase-headers"
|
||||
)
|
||||
for _p in "${pkgname[@]}"; do
|
||||
eval "package_$_p() {
|
||||
$(declare -f "_package${_p#$pkgbase}")
|
||||
_package${_p#$pkgbase}
|
||||
}"
|
||||
done
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,108 @@
|
||||
# Bug #5 RX-degradation campaign — Phase 0
|
||||
|
||||
**Date:** 2026-05-07
|
||||
**Module under test:** v3 + F (`bes2600.ko` srcversion `371C6606B73AF19299228CA`)
|
||||
**Hardware:** ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 fallback path live.
|
||||
|
||||
---
|
||||
|
||||
## Research question (locked)
|
||||
|
||||
> **Why does the bes2600 RX path collapse from ~2 MB/s sustained @ fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic `wsm_generic_confirm failed for request 0x0007` + `ieee80211 phy0: [SCAN] Scan failed (-22)` every 300 s in the intervening window?**
|
||||
|
||||
Reproduces on Patch B, Patch F, and Patch C v3 alike — independent of the relay/race issues v3 addressed. Side-effect that was masked by the throughput floor while v2's race was the dominant variable.
|
||||
|
||||
## Predecessor data (reference, not anchor)
|
||||
|
||||
| source | observation |
|
||||
|---|---|
|
||||
| Patch C v3 N=3 (uptime 200/391/582 s) | mean 2.352 MB/s @ 4 MB/s sender |
|
||||
| v3 single rep at uptime ~28 min (rep 2 of 2026-05-07 22:23) | 180 KB / 5 min = 600 B/s, sender saw "Connection reset by peer" |
|
||||
| v3 single rep at uptime ~47 min (N=3 first attempt 22:42) | 55 KB / 5 min = 180 B/s, sender timed out (exit 124) |
|
||||
| dmesg pattern observed at 47-min uptime | scan failures every 301-302 s starting at uptime 778 s (~13 min) |
|
||||
|
||||
The shape: **fresh chip → linear data flow at ~2 MB/s sustained → sometime around 13 min uptime, NetworkManager-triggered scans start failing → sometime around 28 min uptime, data throughput collapses to <1 KB/s while link still shows associated.**
|
||||
|
||||
Predecessor data is reference. Phase 0 will re-anchor at N=1 long-trace + 5 in-window stress probes; if the pattern doesn't reproduce, that's the campaign result.
|
||||
|
||||
## Mechanism candidates (Phase 4 will discriminate)
|
||||
|
||||
1. **Firmware-side resource exhaustion.** Per-scan or per-WSM-event accumulation in chip-side state. Scan-failed -22 (EINVAL) suggests firmware refusing the request — possibly out of scan handles, scan-buffer slots, or some other limit.
|
||||
2. **NetworkManager scan-fail recovery loop.** Each failed scan triggers NM retry. If retry overhead dominates the bh thread, data path starves. Verifiable by suppressing NM scans.
|
||||
3. **AP-side rate limiting.** Newton (AVM) AP could be applying QoS / fairness / probation after sustained 4 MB/s burst. Verifiable by Fritz!Box log access (Markus has it) or by switching to a different AP.
|
||||
4. **PSM state machine deadlock.** c7's `pm_unsupported` self-detect was supposed to handle this, but the latch state could become stale if a real PM_IND arrives mid-operation. Verifiable by `chip_pm_state` debugfs read at degradation onset.
|
||||
5. **SDIO bus clock degradation / mmc retune.** SDIO retune with `retune_protected` flag interacts with bes2600's data path. Verifiable by ftrace `mmc/mmc_request_*` event correlation with throughput drop.
|
||||
6. **Power-management busy-event accumulation.** `bes2600_pwr_set_busy_event` counters might leak — busy events not cleared lock the chip awake (no PSM) but also exhaust event capacity. Verifiable by `bes2600_pwr_busy_event_record` dump.
|
||||
|
||||
## Phase 0 measurement protocol (rig armed 2026-05-07 23:18:58 CEST, T0=1778188738)
|
||||
|
||||
Capturing for 35 minutes from fresh boot. All capture lives in `/root/bes2600-samples/run-20260507-bug5-degradation-rig/` on ohm.
|
||||
|
||||
### Always-on streams
|
||||
|
||||
| stream | tool | output |
|
||||
|---|---|---|
|
||||
| ftrace events | per-event `enable=1` | `trace.log` (via `trace_pipe`) |
|
||||
| cfg80211 events | `iw event -t -f` | `iw-event.log` |
|
||||
| kernel printks | `dmesg -wT` | `dmesg.log` |
|
||||
| netdev counters | per-30s shell loop | `snap.log` |
|
||||
|
||||
### ftrace event set
|
||||
|
||||
- `workqueue/workqueue_execute_start` — work dispatches
|
||||
- `workqueue/workqueue_queue_work` — work submissions
|
||||
- `mac80211/api_beacon_loss` — driver beacon-loss events
|
||||
- `mac80211/api_connection_loss` — driver-side conn-loss
|
||||
- `mac80211/api_disconnect` — driver-side disconnect
|
||||
- `mac80211/drv_hw_scan` — mac80211 → driver scan dispatch
|
||||
- `mac80211/drv_set_key` — key state changes
|
||||
- `cfg80211/rdev_assoc` — assoc requests
|
||||
- `cfg80211/rdev_deauth` — deauth requests
|
||||
- `cfg80211/rdev_disassoc` — disassoc requests
|
||||
- `cfg80211/cfg80211_assoc_comeback` — AP-side assoc-busy throttling
|
||||
- `cfg80211/cfg80211_send_auth_timeout` — auth timeouts
|
||||
- `cfg80211/cfg80211_scan_done` — scan completions
|
||||
- `power/suspend_resume` — PM transitions
|
||||
- `mmc/mmc_request_start` / `mmc_request_done` — bus-level transactions
|
||||
|
||||
### Scheduled stress probes
|
||||
|
||||
Sender on boltzmann (`/tmp/bug5-probe-loop.sh`) fires `pv -L 4m | nc ohm 12345` for 30 s at T+5/10/15/20/25 min. Each probe brackets uptime, RX-bytes pre, RX-bytes post, elapsed. Throughput-vs-uptime curve falls out of the snap.log + probe boundaries.
|
||||
|
||||
Probe markers logged via `logger -t bes2600-bug5 PROBE_N_START/END` so they appear in dmesg.log timeline.
|
||||
|
||||
## Anti-theatre receipts (must tick before claiming Phase 0 done)
|
||||
|
||||
- [ ] In-session baseline: long-capture across degradation window, N=1 for now; re-run if anomalous
|
||||
- [ ] ftrace events actually firing (verify by tail of trace.log mid-capture)
|
||||
- [ ] dmesg captures the scan-failure pattern timestamp (expected ~uptime 778 s)
|
||||
- [ ] Probes actually transferred data at fresh chip (T+5 should be > 1 MB/s)
|
||||
- [ ] At least one probe in-window after scan-failure onset (expected: T+15 or T+20)
|
||||
- [ ] Snap.log shows monotonic counter behaviour (no rx_bytes going backwards)
|
||||
|
||||
## Phase 1 hypothesis (provisional, refine after Phase 3 data)
|
||||
|
||||
Metric candidate: **probe throughput as function of uptime, with state-transition markers (first `wsm_generic_confirm 0x0007 failed`, first `[SCAN] Scan failed (-22)`, first NetworkManager-deauth-and-reassociate)**.
|
||||
|
||||
Discriminator question: does throughput collapse abruptly at the first scan failure, or gradually over a window? Abrupt = single-event causation; gradual = accumulator.
|
||||
|
||||
## Phase 4 candidates (post-Phase-3)
|
||||
|
||||
Depending on which mechanism (1-6) Phase 3 surfaces:
|
||||
- (1) firmware resource exhaustion: report to upstream; possibly disable NetworkManager scans pending firmware fix.
|
||||
- (2) NM scan-fail loop: configure `wpa_supplicant` to skip scans; or add scan-failure handling in driver to dampen retry cascade.
|
||||
- (3) AP-side: switch APs for testing; report to AVM if reproducible.
|
||||
- (4) PSM deadlock: extend c7 latch with timeout-or-progress recovery.
|
||||
- (5) SDIO retune: ftrace correlation guides the lock-ordering fix.
|
||||
- (6) PWR busy-event leak: audit set/clear pairs; add a warning-when-stale.
|
||||
|
||||
## Out-of-scope
|
||||
|
||||
- Patch C v3 closure (PR #5 merged, Phase 7 done).
|
||||
- Patch C2 (`ieee80211_rx_list` batch) — gated on Task #19 kerneldoc.
|
||||
- Patch D / E independent.
|
||||
- Reproduction at higher rates (8 MB/s ramp) — defer to Phase 4 once mechanism identified.
|
||||
|
||||
---
|
||||
|
||||
*Phase 0 plan written 2026-05-07 23:21 CEST by Claude (noether), at the close of Patch C v3 Phase 7. Rig armed; long capture in flight; probes scheduled at T+5/10/15/20/25 min. Post-capture analysis will populate Phase 3 results before Phase 4 plan branches off.*
|
||||
@@ -0,0 +1,127 @@
|
||||
# Patch C v3 — Phase 4 Plan: drop sdio_rx_work, match cw1200 architecture
|
||||
|
||||
**Author:** Claude (noether)
|
||||
**Status:** Phase 4 v3 — supersedes v2 (PR #10) after cw1200 mainline survey showed the race-free path is structural, not lock-based.
|
||||
**Decision:** drop the `sdio_rx_work` workqueue entirely; SDIO IRQ wakes `bh_wq`; bh thread does the SDIO read inline. Restores single-writer-from-bh invariant on `hw_bufs_used` *by construction*. No `atomic_t` prep needed.
|
||||
|
||||
---
|
||||
|
||||
## §0 Why v3 supersedes v2
|
||||
|
||||
PR #10's plan was: convert `hw_bufs_used` etc. to `atomic_t` (prep), then direct-deliver from `sdio_rx_work` (structural). That was a workaround for the race that *only existed because of the relay*.
|
||||
|
||||
The cw1200 mining (`~/src/linux-rockchip`, 228 cw1200 commits) showed the upstream answer: there is no relay. cw1200's IRQ handler bumps `bh_rx` and wakes the bh thread; the bh thread does the SDIO read itself inside `cw1200_bh_rx_helper` (`drivers/net/wireless/st/cw1200/bh.c:233`). Single thread = single writer for `hw_bufs_used` = no race. Same `int hw_bufs_used` as bes2600, never atomic_t'd in 16 years upstream because it never needed to be.
|
||||
|
||||
Patch C v3 brings bes2600 into that shape. The structural simplification is bigger than v2's diff but lands the right architecture in one move.
|
||||
|
||||
## §1 Goal
|
||||
|
||||
Same as Patch C v2 §1: ≥ 1 MB/s sustained receive @ 4 MB/s sender, < 15 % `_raw_spin_unlock_irqrestore` CPU%, no 30-min cascade to link-death. Stretch toward Phase 1's full 2 MB/s once Patch C2 (rx_list batch) lands separately.
|
||||
|
||||
## §2 Situation
|
||||
|
||||
- Cleanups branch is at Patch F merged (commit `b717251`). All Phase 5 reviews of the F series merged via PR #4.
|
||||
- ohm rebooted with F module live (srcversion `A9438692D6A8698F92AEEA1`) — F is the new baseline for Patch C v3 Phase 7 comparison.
|
||||
- Wired path `enu1` at `192.168.88.80` survives bes2600 wedges; lmcp `ohm` still goes through wlan0. Phase 7 telemetry collection over enu1.
|
||||
- Reboot-permission override active (ohm dev-allocated; I can `sudo reboot` directly — `feedback_user_pushes_reboot_button` override clause).
|
||||
|
||||
## §3 Baseline measurements
|
||||
|
||||
Carry forward from `run-20260507-patchC-preflight/baseline.tsv` (N=1, F-less Patch B module):
|
||||
|
||||
| metric | value |
|
||||
|---|---|
|
||||
| observed receive @ 4 MB/s | 1.362 MB/s |
|
||||
| sdio_rx_work dispatches | 86.4/s = 90.3 per 1000 RX packets |
|
||||
| sdio_tx_work dispatches | 276.1/s |
|
||||
| bes2600_bh_work redispatches | 0 (single long-lived) |
|
||||
|
||||
**Phase 6 prereq:** capture an N=3 baseline ON THE F MODULE before Patch C v3 code lands. Same instrumentation, same stress ramp. This is the post-F / pre-v3 reference. Without it, Phase 7's delta is C+F vs B+nothing — confounded.
|
||||
|
||||
## §4 Plan v3
|
||||
|
||||
### §4.1 What gets eliminated
|
||||
|
||||
- **`sdio_rx_work` (bes2600_sdio.c:829)** — function deleted. No longer queued, no longer runs.
|
||||
- **`self->rx_work` work_struct** — field deleted from `struct sbus_priv`. `INIT_WORK` removed.
|
||||
- **`self->rx_queue` + `self->rx_queue_lock`** — fields deleted. `skb_queue_head_init` removed. No SKB ever queued there.
|
||||
- **`bes2600_sdio_pipe_read`** — function deleted. No callers after this patch.
|
||||
- **`sbus_ops->pipe_read`** — sbus op slot deleted (or kept and stubbed; tx_loop.c also implements it for the test-loop bus, has to stay if test-loop is preserved).
|
||||
- **`queue_work(self->sdio_wq, &self->rx_work)`** at the 3 call sites in `bes2600_sdio.c` (lines 416, 941, 1199) — removed.
|
||||
|
||||
### §4.2 What gets added
|
||||
|
||||
- **A new `bes2600_bh_handle_rx_skb()`** in bh.c (same shape as Patch C added, same contract block; no longer needs to also wake the bh thread because we ARE the bh thread).
|
||||
- **A new helper `bes2600_sdio_read_rx_batch()`** in bes2600_sdio.c, exported, that does what `sdio_rx_work` used to do MINUS the queuing: lock → read ctrl_reg → memcpy_fromio → packets_check → for-each-frame extract+deliver. Called from bh.
|
||||
|
||||
### §4.3 What gets rewired
|
||||
|
||||
- **`bes2600_gpio_irq_handler`** in bes2600_sdio.c:413 (the GPIO-IRQ path used when CONFIG_BES2600_USE_GPIO_IRQ is set): drop `queue_work(self->sdio_wq, &self->rx_work)`; instead call `self->irq_handler(self->irq_priv)` directly (which is `bes2600_irq_handler` in bh.c, bumps `bh_rx` + wakes `bh_wq`). Matches cw1200_sdio_irq_handler shape.
|
||||
- **`bes2600_bh_rx_helper`** (bh.c:961, BES_SDIO_RX_MULTIPLE_ENABLE branch): instead of `pipe_read`-ing one SKB from the (now-gone) rx_queue, call the new `bes2600_sdio_read_rx_batch()` which does the SDIO read AND delivers each frame inline via `bes2600_bh_handle_rx_skb()`. Returns count delivered, or negative on error.
|
||||
- **`bes2600_bh()` outer loop**: after a successful rx_batch read, the helper signals whether to continue draining (more frames pending) — same shape as today's `BH_RX_CONT_LIMIT=3` outer loop.
|
||||
- **`bes2600_gpio_wakeup_mcu(SDIO_RX)`** + **`bes2600_gpio_allow_mcu_sleep(SDIO_RX)`** brackets: currently called inside sdio_rx_work. Move into bh thread around the `bes2600_sdio_read_rx_batch()` call. Same wake-flag bracketing, just from a different thread.
|
||||
- **`sdio_wq` workqueue**: keeps `tx_work` and (briefly) `scan_work`. Renamed or kept — cosmetic. Don't touch in this patch.
|
||||
|
||||
### §4.4 What stays untouched
|
||||
|
||||
- TX path (`sdio_tx_work`, `bes2600_bh_tx_helper`, `wsm_alloc_tx_buffer`). Independent.
|
||||
- WSM protocol layer (`wsm.c`, `wsm_handle_rx`). Same callees, just from bh thread now.
|
||||
- mac80211 RX delivery (`ieee80211_rx_irqsafe`). That's Patch C2.
|
||||
- `BES2600_RX_IN_BH` ifdef gate. Stays defined; the gated branch is now the only RX path.
|
||||
- Symptom-shaped artifacts (asm nop, BUG_ON in hot path) — still deferred, see task #24 post-cleanup.
|
||||
|
||||
## §5 Shared-state delta table (the v2 lesson, applied)
|
||||
|
||||
Every field `bes2600_bh_handle_rx_skb` mutates directly or transitively, with the v3 protection:
|
||||
|
||||
| field | written by (today) | written by (after v3) | concurrency | required action |
|
||||
|---|---|---|---|---|
|
||||
| `hw_priv->hw_bufs_used` | bh thread (TX submit + RX confirm), main.c init | **bh thread only** (RX moves into bh) | single-writer | none — `int` is fine, race-free by construction |
|
||||
| `hw_priv->hw_bufs_used_vif[i]` | bh thread (TX vif submit + RX vif confirm), main.c init | **bh thread only** | single-writer | none |
|
||||
| `hw_priv->wsm_rx_seq[i]` | sdio_rx_work today | bh thread | single-writer | none — moves cleanly between contexts |
|
||||
| `hw_priv->wsm_tx_pending[i]` | bh thread (inc on TX submit), bh+sdio_rx_work (dec on RX confirm) | **bh thread only** | single-writer | none |
|
||||
| `hw_priv->lmac_mon_timer` / `mcu_mon_timer` | mod_timer / del_timer_sync from bh + sdio_rx_work | bh thread only | timer API safe anyway | none |
|
||||
| `hw_priv->wsm_cmd.lock` | spinlock taken inside wsm_handle_rx | same | already protected | none |
|
||||
| `priv->bh_evt_wq` wake-up | wsm_release_tx_buffer when count→0 | same | wake_up is concurrency-safe | none |
|
||||
| `bes_pwr.lock` (inside bes2600_pwr_clear_busy_event) | bh thread (today) | bh thread | already protected | none |
|
||||
| `self->rx_data_cnt` etc. (sbus_priv stats) | sdio_rx_work | bh thread | single-writer | none |
|
||||
|
||||
**Zero fields require new locking.** The architectural pivot eliminates the race v2's atomic_t was working around.
|
||||
|
||||
## §6 Risks
|
||||
|
||||
1. **bh thread now holds the SDIO bus mutex during read** (currently held by sdio_rx_work). TX work in the same bh thread is unaffected (sdio_tx_work runs on a separate workqueue and shares the same mutex anyway). The sdio_lock contention pattern doesn't change.
|
||||
2. **Loss of "parallelism" between sdio_rx_work and bh TX**: sdio_rx_work and bh thread *appeared* to run in parallel today, but both serialize through `bes2600_sdio_lock(self)` for the actual bus operations. The parallelism was illusory. Net throughput should not regress.
|
||||
3. **bh thread CPU-busy-time per RX batch increases**: inline SDIO read is the same cost, just charged to bh instead of sdio_wq's worker. Mitigation: the per-IRQ workqueue dispatch cost (~86/s) is what we trade for it. Net: -86 dispatches/s, +0 µs per frame.
|
||||
4. **Multi-RX coalescing (BES_SDIO_RX_MULTIPLE_NUM=16)** stays. bes2600_sdio_extract_packets parses the multi-frame buffer same as before, just inline now. No functional change to chip-side behaviour.
|
||||
5. **GPIO wake-flag bracketing**: `bes2600_gpio_wakeup_mcu(SDIO_RX)` and `bes2600_gpio_allow_mcu_sleep(SDIO_RX)` currently bracket sdio_rx_work. Move them to bracket the new bh-side read. If the wake-flag accounting is sub-system-scoped (it is — flag bits per subsystem), this is a clean move.
|
||||
6. **IRQ re-enable in bh thread**: cw1200's bh re-enables IRQ via `__cw1200_irq_enable(priv, 1)` after each round. bes2600 has the analogous `__bes2600_irq_enable(0/1)` (commented out as the `asm volatile("nop")` symptom in `bh.c:1518-1520`). This patch does NOT re-engage the commented-out re-enable — that's still task #24's call. But if the IRQ stays disabled across rounds, we'd never receive the next IRQ. **Investigate before Phase 6 lands**: where does IRQ re-enable happen in the current bes2600 hot path? The sdio_func IRQ may be auto-managed by sdio core differently. Block Phase 6 on this audit.
|
||||
7. **Phase 7 wedge resilience**: if v3 has a different bug shape than v2's race (which it shouldn't, since the race is gone by construction), the wired path lets us collect telemetry from a wedged ohm.
|
||||
|
||||
## §7 Phase 5 / 6 / 7
|
||||
|
||||
- **Phase 5**: PR on `git.reauktion.de/marfrit/besser` with this artifact. Specifically request reviewer focus on §6 risk #6 (IRQ re-enable mechanism).
|
||||
- **Phase 6**: branch off cleanups (post-F): `bes2600/sdio-rx-no-relay`. Implement the file changes per §4. Build, install, smoke-test.
|
||||
- **Phase 7**:
|
||||
- First: N=3 stress-ramp **on F module** (post-F pre-v3 baseline). 10 min @ 1, 30 min @ 2, 30 min @ 4 MB/s. Use wired path for telemetry.
|
||||
- Then: install v3 module, identical N=3 ramp. Compare deltas.
|
||||
- Predicted: sdio_rx_work dispatch rate → 0/s (was 86/s). observed receive lifts toward ≥ 1.0 MB/s sustained. `_raw_spin_unlock_irqrestore` drops by the rx_queue lock contribution (was 1914/s acquires).
|
||||
|
||||
## §8 What gets dropped from v2 plan
|
||||
|
||||
- atomic_t prep refactor (`hw_bufs_used` → `atomic_t`): not needed. Single-writer invariant preserved structurally. Still a defensible standalone hardening patch *if mainlining bes2600 ever requires defense-in-depth*, but not on the Bug-#5 critical path.
|
||||
- `wsm_tx_pending[]` decrement-decision race (v2 risk #2): also moots. Both sides single-thread under v3.
|
||||
- v2 Phase 7's "C-prep should show zero delta" gate: replaced by "v3 should match cw1200's structural shape" gate.
|
||||
|
||||
## §9 Open question for reviewer
|
||||
|
||||
The big one is §6 risk #6 — IRQ re-enable. cw1200 explicitly does `__cw1200_irq_enable(priv, 1)` from bh after each round; bes2600 has the call **commented out** with an `asm volatile("nop")` placeholder. Either:
|
||||
|
||||
(a) bes2600's SDIO IRQ is level-triggered + auto-acked by SDIO core, so re-enable isn't needed (that would explain the nop).
|
||||
(b) The current code happens to work because sdio_rx_work is queued by the IRQ regardless of whether IRQ is "enabled" by the driver-side flag. After v3 we have to manually re-enable like cw1200 does.
|
||||
|
||||
Need to confirm (a) vs (b) before Phase 6 lands. Plan to grep for `__bes2600_irq_enable` callsites and trace back to whether it's load-bearing.
|
||||
|
||||
---
|
||||
|
||||
*Plan written 2026-05-07 by Claude (noether), after Patch F merged and Patch C v2 (PR #10) was superseded by the cw1200 architectural mining finding. Phase 5 review on PR. Don't curate.*
|
||||
@@ -0,0 +1,171 @@
|
||||
# Patch C2 — Phase 4 Plan: migrate ieee80211_rx_irqsafe → ieee80211_rx_list
|
||||
|
||||
**Author:** Claude (noether)
|
||||
**Status:** Phase 4 — pending Phase 5 PR review before any Phase 6 code.
|
||||
**Predecessor:** Patch C v3 (PR #5 merged, +73% throughput, no-relay architecture); Patch D + E + F + G also landed. Cleanups branch tip = 42fd0ce.
|
||||
**Task #19 contract**: `ieee80211_rx_list` callable from process context, **requires `local_bh_disable()` + `rcu_read_lock()` wrap**, **cannot mix with `ieee80211_rx_irqsafe()` for the same hardware** → all 6 sites convert in one shot.
|
||||
|
||||
---
|
||||
|
||||
## §0 Substrate
|
||||
|
||||
After Patch C v3:
|
||||
- bh thread is the sole RX-delivery context (no relay, no sdio_rx_work)
|
||||
- Per-frame work runs in process context (sleepable)
|
||||
- Single-writer-from-bh invariant covers `hw_bufs_used` and friends
|
||||
|
||||
`ieee80211_rx_irqsafe` is currently called from process context. Per kerneldoc (`include/net/mac80211.h:5399-5411`):
|
||||
|
||||
> **Like ieee80211_rx() but can be called in IRQ context** (internally defers to a tasklet.)
|
||||
|
||||
The tasklet hop is the cost we pay today for delivering each RX frame from process context. `ieee80211_rx_list` is the process-context replacement.
|
||||
|
||||
## §1 Goal
|
||||
|
||||
Per-frame: skip the tasklet hop. Batch: process multiple SKBs from one SDIO read inside a single `local_bh_disable()`/`rcu_read_lock()` window.
|
||||
|
||||
Phase 1 metric: **RX throughput @ 4 MB/s sender**, with v3 N=3 baseline = 2.352 MB/s. Hypothesis: small to moderate uplift (<10%) from removing the tasklet deferral. Larger improvement would be surprising — if observed, that's a finding to investigate.
|
||||
|
||||
## §2 Situation
|
||||
|
||||
- 6 call sites in bes2600 currently use `ieee80211_rx_irqsafe`:
|
||||
- `ap.c:96` (AP-mode link-id RX queue drain)
|
||||
- `sta.c:1487` (link-id rx_queue drain in ?)
|
||||
- `txrx.c:1960` (early-data + pm_unsupported branch — Patch E added)
|
||||
- `txrx.c:1967` (early-data + LINK_SOFT-not-set branch)
|
||||
- `txrx.c:1971` (normal RX path)
|
||||
- `wsm.c:2415` (beacon SKB delivery from `bes2600_beacon_handler`?)
|
||||
- All 6 must convert together (kerneldoc: cannot mix per hardware)
|
||||
- bh thread is single-writer post-v3 → `_rx_list`'s "calls must be synchronized" satisfied trivially
|
||||
- bh thread is process context → `_rx_list` callable
|
||||
|
||||
## §3 Baseline (carry forward)
|
||||
|
||||
From `notes/phase7-v3-2026-05-07.md` (v3 N=3 ramp, Phase 7 closed):
|
||||
|
||||
| metric | v3 fresh-chip N=3 |
|
||||
|---|---|
|
||||
| RX throughput @ 4 MB/s | mean 2.352 MB/s, min 2.102, max 2.590 |
|
||||
| sdio_rx_work dispatches | 0/s |
|
||||
| bh_work redispatches | 0 |
|
||||
|
||||
Phase 7 of C2 will compare against this baseline.
|
||||
|
||||
## §4 Plan
|
||||
|
||||
### §4.1 Conversion shape
|
||||
|
||||
Per call site:
|
||||
```c
|
||||
ieee80211_rx_irqsafe(priv->hw, skb);
|
||||
```
|
||||
becomes:
|
||||
```c
|
||||
ieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list);
|
||||
```
|
||||
|
||||
Where `priv->rx_list` is a `struct list_head` initialized once.
|
||||
|
||||
**Wrap requirement:** `local_bh_disable()` + `rcu_read_lock()` must be held across the call. Per the kerneldoc, that's also needed for batch correctness.
|
||||
|
||||
### §4.2 Wrap placement (the design decision)
|
||||
|
||||
**Option A — per-call wrap.** Wrap each individual `ieee80211_rx_list()` call. Simple but loses the batch benefit (each call's wrap+unwrap costs as much as the avoided tasklet defer).
|
||||
|
||||
**Option B — per-batch wrap.** Wrap the OUTER frame-iteration loop (e.g., the `for` in `bes2600_sdio_extract_packets`). All 16 SKBs from one SDIO read get delivered inside one wrap. This is the upstream-idiomatic pattern (mt76, iwl_pcie do this).
|
||||
|
||||
Choosing **Option B**. Concrete shape:
|
||||
|
||||
- `bes2600_sdio_read_rx_batch` (the per-SDIO-batch entry point added in Patch C v3) wraps the read+extract+deliver phase:
|
||||
```c
|
||||
rcu_read_lock();
|
||||
local_bh_disable();
|
||||
// existing read + extract_packets that calls bh_handle_rx_skb per frame
|
||||
local_bh_enable();
|
||||
rcu_read_unlock();
|
||||
```
|
||||
- Inside `bes2600_bh_handle_rx_skb`, the single `ieee80211_rx_irqsafe` swap becomes `ieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list)`.
|
||||
- The OTHER 5 call sites (in `ap.c`, `sta.c`, `txrx.c`'s branches, `wsm.c`) need the same treatment, but they're called from the bh thread (post-v3) so they're already in the right context. Each gets its own narrow wrap (Option A applied selectively because those paths process one frame at a time, not a batch).
|
||||
|
||||
### §4.3 The `rx_list` field
|
||||
|
||||
Add `struct list_head rx_list` to either `struct bes2600_common` (driver-wide) or `struct bes2600_vif` (per-vif). Per-vif is cleaner because the existing `priv->hw` parameter implies vif scope.
|
||||
|
||||
`INIT_LIST_HEAD(&priv->rx_list)` at vif setup; no teardown needed (mac80211 owns the SKBs once handed off).
|
||||
|
||||
**Open question for reviewer:** does the `rx_list` need to be drained explicitly after the batch (e.g., via a `list_for_each_entry_safe` + `netif_receive_skb_list_internal`)? Looking at mainline mt76 / iwl_pcie usage will clarify. Phase 6 must answer this before code lands.
|
||||
|
||||
### §4.4 What will NOT be touched
|
||||
|
||||
- The 6 call sites change atomically (all-or-nothing per kerneldoc) — no per-site progressive migration
|
||||
- `wsm.c:2415` beacon path: same conversion shape, but beacon delivery is once-per-beacon-interval (not hot path); could stay `_irqsafe` if upstream allows mixing per-SKB-type. Re-read kerneldoc carefully — it says "per hardware", not per-call-site, so we can't keep _irqsafe even on the slow paths.
|
||||
- bh thread structure (Patch C v3 stands)
|
||||
- atomic_t counters from Patch D
|
||||
- `pm_unsupported` lock-skip from Patch E
|
||||
- mac80211 batch-delivery semantics (mainline owns this; we just call the API)
|
||||
|
||||
### §4.5 Predicted delta in Phase 3 units
|
||||
|
||||
| metric | predicted |
|
||||
|---|---|
|
||||
| `rx_irqsafe` tasklet schedule rate | → 0 (function no longer called) |
|
||||
| RX throughput @ 4 MB/s sustained | 2.352 → +5-15% (medium confidence) |
|
||||
| `_raw_spin_unlock_irqrestore` CPU% | small drop (no tasklet schedule lock contribution) |
|
||||
|
||||
**Honest acknowledgment:** I don't have data on how much the tasklet hop actually costs. The improvement might be smaller than predicted if tasklet defer was already cheap on this kernel. If <2%, Phase 7 says "marginal but no regression" and we ship anyway for upstream-cleanliness.
|
||||
|
||||
### §4.6 Risks
|
||||
|
||||
1. **`ieee80211_rx_list` semantics surprise.** mainline drivers I have access to (mt76, iwl_pcie) use this via NAPI infrastructure. bes2600 doesn't have NAPI; we're doing process-context-direct. The kerneldoc says callable that way but we should verify a few mainline drivers actually do it. **Phase 6 contract-cite from at least one upstream caller** before code lands.
|
||||
|
||||
2. **`rx_list` lifetime in cross-batch / cross-vif scenarios.** Multiple vifs (P2P_MULTIVIF=y in Makefile) might race on the same hw's `rx_list`. The kerneldoc says "for a single hardware" — the list is per-call destination, which means each call appends to its argument list. Per-vif `rx_list` per-call is the natural shape. No per-hw aggregator needed.
|
||||
|
||||
3. **`local_bh_disable` cost in batch wrap.** Not free. If the batch is small (1-2 SKBs), the wrap might dominate. Estimated breakeven: 2-3 SKBs per wrap. Phase 7 should look at SKB-per-batch distribution to confirm.
|
||||
|
||||
4. **`rcu_read_lock` across SDIO read.** SDIO read can take multi-ms (multi-block transfers). RCU reader-cs across that is fine (no preemption blocked) but it's a longer reader-cs than typical. Verifiable but not a blocker — kerneldoc requires it.
|
||||
|
||||
5. **wsm.c:2415 (beacon) is a different SKB lifecycle** — `hw_priv->beacon` is owned by hw_priv, not allocated per-call. After `_rx_list` consumes it (by passing ownership to mac80211), `hw_priv->beacon` is dangling. **Phase 6 must verify the beacon path either reallocates after delivery or wasn't actually transferring ownership.** Risk #5 is the biggest open question.
|
||||
|
||||
### §4.7 Phase 5 review handover
|
||||
|
||||
PR on `git.reauktion.de/marfrit/besser` with this artifact. Specifically request reviewer focus on:
|
||||
- §4.2 wrap-placement choice (Option B vs A)
|
||||
- §4.3 rx_list scoping (per-vif)
|
||||
- §4.6 risks #1 (mainline-caller verification) and #5 (beacon path SKB ownership)
|
||||
|
||||
Don't curate.
|
||||
|
||||
### §4.8 Phase 6 implementation order
|
||||
|
||||
1. Branch off cleanups: `bes2600/rx-list-batch-delivery`
|
||||
2. Add `struct list_head rx_list` to `struct bes2600_vif`, `INIT_LIST_HEAD` in vif setup
|
||||
3. Convert all 6 call sites: `ieee80211_rx_irqsafe(...)` → `ieee80211_rx_list(...)`
|
||||
4. Wrap `bes2600_sdio_read_rx_batch` outer loop with `rcu_read_lock + local_bh_disable / local_bh_enable + rcu_read_unlock`
|
||||
5. For the non-bh-thread call sites (ap.c, sta.c, wsm.c beacon): per-call narrow wrap
|
||||
6. Verify beacon path in wsm.c:2415 (Risk #5)
|
||||
7. Build, install, smoke-test
|
||||
8. Phase 7 N=3 stress ramp — compare to v3 baseline
|
||||
|
||||
### §4.9 Phase 7 protocol (per `feedback_phase7_stress_ramp`)
|
||||
|
||||
- N=3 reps, 30s each at 4 MB/s, fresh-chip (uptime <15 min)
|
||||
- Use wired path (`ssh mfritsche@192.168.88.80`) for telemetry
|
||||
- Fresh nc listener per rep (per `feedback_rig_failure_is_finding`)
|
||||
- Compare: throughput delta + tasklet schedule rate (ftrace `irq:tasklet_*` events)
|
||||
- If predicted delta met → close C2 + memory entry
|
||||
- If NO delta → marginal patch but no regression; ship for upstream-cleanliness
|
||||
|
||||
## §5 Out of scope
|
||||
|
||||
- Patch D / E already shipped (PR #7, #8 merged)
|
||||
- Patch G already shipped (PR #6 merged)
|
||||
- bh.c `#if 0` graveyard removal (Task #24 hygiene)
|
||||
- Allwinner `sw_mci_check_r1_ready` (Task #25)
|
||||
|
||||
## §6 Summary
|
||||
|
||||
C2 is a 6-site mechanical migration with ONE design decision (per-batch wrap), TWO open questions for the reviewer (rx_list draining + beacon path SKB ownership), and SMALL expected throughput delta (<15%). Risk-low, upstream-prep-high. Worth shipping for the kernel.org submission story even if the throughput delta is marginal.
|
||||
|
||||
---
|
||||
|
||||
*Plan written 2026-05-08 by Claude (noether). Phase 5 review on PR. Phase 6 contingent on review passing.*
|
||||
@@ -0,0 +1,63 @@
|
||||
# Patch C2 Phase 7 — N=3 ramp results
|
||||
|
||||
**Date:** 2026-05-08
|
||||
**Module:** `bes2600.ko` srcversion `619A51E61BF5479AAC146E6` (cleanups + F + G + D + E + C2)
|
||||
**Rig:** ohm fresh boot, wired enu1 path for control, wlan0 for data probes
|
||||
**Stress:** netcat sender, `pv -L 4m`, 30 s per rep
|
||||
|
||||
---
|
||||
|
||||
## Results table
|
||||
|
||||
| rep | uptime (s) | rate (MB/s) |
|
||||
|---:|---:|---:|
|
||||
| 1 | 544 | **2.289** |
|
||||
| 2 | 716 | **2.165** |
|
||||
| 3 | 750 | **2.376** |
|
||||
|
||||
**N=3:** mean 2.277, median 2.289, min 2.165, max 2.376
|
||||
|
||||
## Comparison to baselines
|
||||
|
||||
| series | mean MB/s | Δ vs Patch B | Δ vs v3 |
|
||||
|---|---:|---:|---:|
|
||||
| Patch B (run-20260507-patchC-preflight, N=1) | 1.362 | — | -42% |
|
||||
| Patch C v3 N=3 (run-20260507-N3v3-rep*) | 2.352 | +73% | — |
|
||||
| Patch C v3 + F + G + D + E + C2 N=3 (this rep set) | 2.277 | +67% | -3% |
|
||||
|
||||
Δ vs v3 is **within rep variance** (v3 N=3 had min 2.102, max 2.590 → spread ±20%; this set's spread is similar). Statistically indistinguishable.
|
||||
|
||||
## Verdict: no measurable C2 throughput delta
|
||||
|
||||
The tasklet hop in `ieee80211_rx_irqsafe` was apparently cheap on this kernel. Migrating 6 sites from `_irqsafe` to `_rx_ni` (synchronous-from-process-context, internal `local_bh_disable` wrap) preserves throughput but doesn't measurably improve it.
|
||||
|
||||
**This was a predicted outcome.** The C2 Phase 4 plan §4.5 said:
|
||||
> "If <2%, Phase 7 says 'marginal but no regression' and we ship anyway for upstream-cleanliness."
|
||||
|
||||
Observed: -3% (within noise) → falls into the "marginal but no regression" bucket. Ship for the kernel.org submission story (no `_irqsafe` from process context = upstream-idiomatic) even though performance is unchanged.
|
||||
|
||||
## Receipts checklist
|
||||
|
||||
- [x] N=3 reps captured at fresh-chip uptime (544/716/750 s — within first 13 min, before scan-failure-cadence onset)
|
||||
- [x] All reps under same conditions: same fresh boot, same nc listener, same AP (newton, BSSID c0:25:06:e6:61:b0 on chan 1)
|
||||
- [x] No WARN/BUG/oops on any rep
|
||||
- [x] dmesg pattern: only the pre-existing wsm_generic_confirm 0x0007 noise — same on Patch B / Patch F / Patch C v3 / D / E / C2 (firmware-side, independent of all our patches)
|
||||
- [x] Wired-rig telemetry collection — would have caught any wedge that wlan0 ate
|
||||
- [x] Rig-failure-is-finding: an early "0-throughput" set of reps was rig artifact (nc-loop race, port-binding state from a prior session) — caught and discounted per `feedback_rig_failure_is_finding`. The recovered N=3 reps used setsid-detached listener + post-reboot fresh state.
|
||||
|
||||
## Phase 8 lesson
|
||||
|
||||
**Drop-in replacements with the right kerneldoc reading still need Phase 7 measurement.** I expected +5-15% from removing the tasklet schedule. Got -3% (noise). The cost we were saving was already amortised by something else (NAPI infra? per-CPU softirq scheduling?). The kerneldoc-correctness story stands; the perf story does not.
|
||||
|
||||
**Memory entry:** the perf-vs-correctness distinction is worth keeping. `_irqsafe → _rx_ni` is a CORRECTNESS / API-cleanliness move, not a performance optimization. Don't oversell predicted deltas without baseline measurement.
|
||||
|
||||
## Out-of-scope follow-ups
|
||||
|
||||
- Patch C v3 architectural win is the durable +73%. C / D / E / C2 / F / G are smaller cleanups that don't compound visibly.
|
||||
- Bug #5 RX-degradation campaign already closed (hypothesis falsified).
|
||||
- Task #24 (post-cleanup observation of bh.c symptom-shaped artifacts): mostly answered.
|
||||
- Task #25 (Allwinner sw_mci_check_r1_ready measurement): can be done during any future stress run; not on critical path.
|
||||
|
||||
---
|
||||
|
||||
*Phase 7 captured 2026-05-08 by Claude (noether). Patch C2 closes the post-Bug-#5 cleanup track. Throughput ceiling on this hardware = ~2.4 MB/s sustained @ 4 MB/s sender, fresh chip; further improvement would need firmware-side fixes (the wsm_generic_confirm 0x0007 path), not driver-side.*
|
||||
@@ -0,0 +1,94 @@
|
||||
# Patch C v3 Phase 7 — N=3 verification results
|
||||
|
||||
**Date:** 2026-05-07
|
||||
**Module:** `bes2600.ko` srcversion `371C6606B73AF19299228CA` (cleanups+F+v3)
|
||||
**Rig:** ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 path for telemetry
|
||||
**Stress:** netcat sender from boltzmann, `pv -L 4m` rate cap (4 MB/s), 3-min window per rep
|
||||
**Boot:** fresh — uptime 200 s / 391 s / 582 s at rep 1/2/3 starts (all within fresh-chip window before the ~13-min Bug #5 RX-degradation point)
|
||||
|
||||
---
|
||||
|
||||
## Results table
|
||||
|
||||
| rep | elapsed (s) | RX bytes | RX MB | MB/s | sdio_rx_work | sdio_tx_work | bes2600_bh_work redispatches |
|
||||
|---:|---:|---:|---:|---:|---:|---:|---:|
|
||||
| 1 | 180.72 | 447,758,333 | 427.0 | **2.363** | 0 | 368 | 0 |
|
||||
| 2 | 180.67 | 490,669,836 | 467.9 | **2.590** | 0 | 20 | 0 |
|
||||
| 3 | 180.69 | 398,224,992 | 379.8 | **2.102** | 0 | 39 | 0 |
|
||||
|
||||
**N=3 stats:** mean 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s · max 2.590 MB/s
|
||||
|
||||
## Comparison to baselines
|
||||
|
||||
### vs Patch B baseline (`run-20260507-patchC-preflight`, N=1, 5 min @ 4 MB/s, fresh chip)
|
||||
|
||||
| | Patch B | v3 mean | Δ |
|
||||
|---|---:|---:|---:|
|
||||
| throughput | 1.362 MB/s | 2.352 MB/s | **+73%** |
|
||||
|
||||
### vs original Bug #5 baseline (`run-20260506-0659-fresh`, N=3, decay over time)
|
||||
|
||||
Bug #5 anchor was 725 / 663 / **75** KB/s — rep 3 saw link-death at ~9 min.
|
||||
|
||||
| | Bug #5 floor (rep 3) | v3 floor (rep 3) | Δ |
|
||||
|---|---:|---:|---:|
|
||||
| throughput | 0.075 MB/s | 2.102 MB/s | **28× improvement** |
|
||||
|
||||
### vs Phase 4 v3 plan §4.5 predictions
|
||||
|
||||
| metric | predicted | observed | verdict |
|
||||
|---|---|---|---|
|
||||
| sdio_rx_work dispatch rate | → 0/s (high confidence) | 0/s all 3 reps | ✅ |
|
||||
| `bes2600_bh_work` redispatches | → 0 (high confidence) | 0 all 3 reps | ✅ |
|
||||
| observed RX @ 4 MB/s | floor lifts toward ≥ 1 MB/s sustained (medium) | 2.10 MB/s floor | ✅ exceeds prediction |
|
||||
| `_raw_spin_unlock_irqrestore` CPU% | 20% → 12-15% (medium) | not measured | deferred — perf-record run can confirm |
|
||||
|
||||
## Workqueue dispatch rate collapse
|
||||
|
||||
Patch B baseline (per `run-20260507-patchC-preflight`):
|
||||
- sdio_rx_work: 86.4/s
|
||||
- sdio_tx_work: 276.1/s
|
||||
- bes2600_bh_work redispatches: 0
|
||||
|
||||
v3 N=3 mean:
|
||||
- **sdio_rx_work: 0.0/s** (function deleted)
|
||||
- **sdio_tx_work: 0.8/s** (post-tx queue_work → self->irq_handler call; the chip-side TX driver no longer needs to wake a separate workqueue)
|
||||
- bes2600_bh_work redispatches: 0 (preserved invariant; bh thread still single long-lived work item)
|
||||
|
||||
The 99.7% reduction in `sdio_tx_work` dispatch rate is a side-effect of v3's IRQ→bh-direct rewiring: the post-TX `queue_work(self->sdio_wq, &self->rx_work)` call I replaced with `self->irq_handler()` was actually firing more often than I'd assumed (276/s on Patch B). Folding it into the bh wake-up cuts 275/s of workqueue dispatches that weren't doing anything useful.
|
||||
|
||||
## Risks observed
|
||||
|
||||
- **Bug #5 RX-degradation after ~13-min uptime is independent of v3.** Same scan-failure pattern observed (`wsm_generic_confirm failed for request 0x0007` + `[SCAN] Scan failed (-22)` every 300s) on v3 as on Patch B. v3 did NOT fix Bug #5; it fixed the v2-race that was ALSO present. RX-degradation is firmware-side, likely needs a separate campaign.
|
||||
- **N=3 reps were 3 minutes each instead of 5** to fit within the fresh-chip window. Direct comparison with Patch B's 5-min baseline is approximate; chip-side throughput in 3-min vs 5-min should be similar given the bug fires on uptime, not on transferred-bytes.
|
||||
- **No regression observed in 3×3 min = 9 min of stress.** The v2 race that wedged Patch C v1 within 13 s did NOT reproduce. v3's structural fix held.
|
||||
|
||||
## Phase 8 — lesson distilled
|
||||
|
||||
**The cw1200 mining was decisive.** Patch C v2 (atomic_t prep + direct-deliver on top of relay, PR #10 closed) would have worked correctly but kept the structural relay that was the source of the race. v3 removed the relay entirely — restoring single-writer-from-bh invariant by construction, no atomic_t needed, and delivering a 73% throughput improvement as side benefit.
|
||||
|
||||
Without the cw1200 history mine (`~/src/linux-rockchip`, 228 cw1200 commits over 16 years), v2's atomic_t prep would have shipped. The structural fix is upstream-grade because it matches the reference driver. v2's atomic_t wrapper would have been bes2600-specific bookkeeping with no upstream parallel — defensible as a fix, but worse to maintain.
|
||||
|
||||
**Memory entry:** *When you have an upstream-ancestral driver still in the kernel tree, mine its bug-fix history before patching the inherited fork. The architectural answer may already be there; you just have to look.*
|
||||
|
||||
## Receipts checklist (Phase 7 done)
|
||||
|
||||
- [x] N=3 reps captured at fresh-chip uptime (200/391/582 s)
|
||||
- [x] Same instrumentation pre/post (workqueue ftrace + rx_packets/rx_bytes counters)
|
||||
- [x] Predicted delta matched (sdio_rx_work → 0; bh redispatches → 0; throughput ≥ 1 MB/s sustained)
|
||||
- [x] No WARN/BUG/oops during stress on any rep
|
||||
- [x] Wired-rig telemetry collection (would have caught a wedge if v3 had one)
|
||||
- [x] Receiver `nc` listener restarted fresh per rep (avoiding rep-2-style TCP race)
|
||||
- [x] Stress-ramp memory honored: not steady-state low-rate; saw 4 MB/s saturate
|
||||
|
||||
## Out-of-scope follow-ups
|
||||
|
||||
- Patch C2 — `ieee80211_rx_list` batch delivery — gated on Task #19 kerneldoc verification.
|
||||
- Patch D — ba_lock atomicization — independent.
|
||||
- Patch E — ps_state_lock skip when pm_unsupported — independent.
|
||||
- Bug #5 RX-degradation after 13-min uptime — separate campaign, scan-failure pattern is the entry point.
|
||||
- Task #24 — observe whether `bh.c` `asm volatile("nop")` / commented-out `__bes2600_irq_enable(1)` / BUG_ON in hot path are still load-bearing post-v3. Already partially answered: `__bes2600_irq_enable` is a stub (PR #11 comment). The other artifacts can be re-read fresh.
|
||||
|
||||
---
|
||||
|
||||
*Phase 7 results captured 2026-05-07 by Claude (noether). v3 (PR #5) closes Patch C campaign with structural improvement + race fix + measurable throughput win.*
|
||||
Reference in New Issue
Block a user