bes2600: pre-empt AP-deauth-6 with mac80211 reassoc on decrypt-fail storm #1

Merged
marfrit merged 1 commits from bes2600/decrypt-storm-fast-recover into mobian 2026-05-06 18:48:29 +00:00
Owner

Patch A — Phase 6 implementation

Follows the Phase 4 plan merged at marfrit/besser PR #4 (commit 4acba3e7, notes/phase4-2026-05-06.md).

What it does

When bes2600_rx_cb receives a frame with WSM_STATUS_DECRYPTFAILURE from firmware, account it on a sliding window. If ≥ 5 decrypt-fails fire within ≤ 5 s on the same vif, a worker calls ieee80211_connection_loss(vif) to force mac80211 into clean disassociation. Userspace (NetworkManager / wpa_supplicant) then reconnects with fresh keys, before the AP gets a chance to fire its unprotected deauth-reason-6.

Why

  • Trigger B receipts: 07:13 storm (77 events / 24 s) on 2026-05-06 caused a 1 s recovery; the 11:03 storm (8 events / 9 s) caused a 109 s reauth blackhole. Both ended in AP unprotected-deauth-6.
  • Under sustained 1 MB/s load, decrypt-burst rate elevates ~35× vs idle; conditional escalation flips from 100 % → 0 %, but the underlying decrypt mismatches still happen.
  • This patch is host-side recovery only — does not address the chip/firmware root cause for why decrypts mismatch.

Reviewer feedback (from marfrit/besser PR #4) folded in

  • Threshold ≥ 5 / 5 s — kept as proposed ("Suggestion accepted").
  • API choice — ieee80211_connection_loss(vif) is the kernel-majority pattern (per include/net/mac80211.h doc-comment) for STA drivers signalling "link is gone, reassoc please". cfg80211_disconnected would bypass mac80211 state-machine.
  • Debugfs counter — added: DecryptStormRecoveries: %u line in the per-vif status seq_file under /sys/kernel/debug/ieee80211/phyN/bes2600/.
  • Patch B (Trigger A / beacon-loss path) parked, not in this PR.

Predicted Phase 7 delta vs unpatched baseline

metric before after
decrypt-burst rate (per hr) unchanged unchanged
AP-deauth-6 rate (per hr) baseline ≤ 0.2 × baseline
P(>5 s blackhole burst) 100 %
worst-case recovery 109 s < 5 s

Files touched

  • bes2600/bes2600.h — 4 new fields on struct bes2600_vif + 2 prototypes
  • bes2600/txrx.c — new helpers + call at the existing decrypt-fail log site (bes2600_rx_cb)
  • bes2600/sta.cbes2600_decrypt_storm_init() in bes2600_vif_setup
  • bes2600/debug.cDecryptStormRecoveries seq_printf

Verification status

  • checkpatch.pl --no-tree --strict: clean (0 / 0 / 0).
  • Source builds against kernel headers at /usr/src/linux-rockchip-rkr3/ not yet confirmed in this PR — would be next step before merge.
  • danctnix-flavor (linux-pinetab2 in-tree) port pending separate PR; this patch touches no timer APIs so the danctnix flavor should be near-identical.

Asks

  1. Threshold OK? (5 / 5 s — looser than typical decrypt-fail bursts)
  2. Should cancel_work_sync(&priv->decrypt_storm_recover_work) be added to the vif teardown path? Existing per-vif workers don't all have explicit cancel — happy to follow whichever the maintainer prefers.
  3. Per-VIF or per-hw_priv counter scope? I went with per-VIF (storms come through wsm_handle_rx with a vif-resolved priv); per-hw_priv would aggregate across multiple vifs.

🤖 Generated with Claude Code

## Patch A — Phase 6 implementation Follows the Phase 4 plan merged at `marfrit/besser` PR #4 (commit `4acba3e7`, `notes/phase4-2026-05-06.md`). ### What it does When `bes2600_rx_cb` receives a frame with `WSM_STATUS_DECRYPTFAILURE` from firmware, account it on a sliding window. If ≥ 5 decrypt-fails fire within ≤ 5 s on the same vif, a worker calls `ieee80211_connection_loss(vif)` to force mac80211 into clean disassociation. Userspace (NetworkManager / wpa_supplicant) then reconnects with fresh keys, **before the AP gets a chance to fire its unprotected deauth-reason-6**. ### Why - Trigger B receipts: 07:13 storm (77 events / 24 s) on 2026-05-06 caused a 1 s recovery; the 11:03 storm (8 events / 9 s) caused a **109 s reauth blackhole**. Both ended in AP unprotected-deauth-6. - Under sustained 1 MB/s load, decrypt-burst rate elevates ~35× vs idle; conditional escalation flips from 100 % → 0 %, but the underlying decrypt mismatches still happen. - This patch is host-side recovery only — does not address the chip/firmware root cause for why decrypts mismatch. ### Reviewer feedback (from `marfrit/besser` PR #4) folded in - Threshold ≥ 5 / 5 s — kept as proposed ("Suggestion accepted"). - API choice — `ieee80211_connection_loss(vif)` is the kernel-majority pattern (per `include/net/mac80211.h` doc-comment) for STA drivers signalling "link is gone, reassoc please". `cfg80211_disconnected` would bypass mac80211 state-machine. - Debugfs counter — added: `DecryptStormRecoveries: %u` line in the per-vif `status` seq_file under `/sys/kernel/debug/ieee80211/phyN/bes2600/`. - Patch B (Trigger A / beacon-loss path) parked, not in this PR. ### Predicted Phase 7 delta vs unpatched baseline | metric | before | after | |---|---|---| | decrypt-burst rate (per hr) | unchanged | unchanged | | AP-deauth-6 rate (per hr) | baseline | ≤ 0.2 × baseline | | P(>5 s blackhole | burst) | 100 % | ≤ 10 % | | worst-case recovery | 109 s | < 5 s | ### Files touched - `bes2600/bes2600.h` — 4 new fields on `struct bes2600_vif` + 2 prototypes - `bes2600/txrx.c` — new helpers + call at the existing decrypt-fail log site (`bes2600_rx_cb`) - `bes2600/sta.c` — `bes2600_decrypt_storm_init()` in `bes2600_vif_setup` - `bes2600/debug.c` — `DecryptStormRecoveries` seq_printf ### Verification status - `checkpatch.pl --no-tree --strict`: clean (0 / 0 / 0). - Source builds against kernel headers at `/usr/src/linux-rockchip-rkr3/` not yet confirmed in this PR — would be next step before merge. - danctnix-flavor (linux-pinetab2 in-tree) port pending separate PR; this patch touches no timer APIs so the danctnix flavor should be near-identical. ### Asks 1. Threshold OK? (5 / 5 s — looser than typical decrypt-fail bursts) 2. Should `cancel_work_sync(&priv->decrypt_storm_recover_work)` be added to the vif teardown path? Existing per-vif workers don't all have explicit cancel — happy to follow whichever the maintainer prefers. 3. Per-VIF or per-hw_priv counter scope? I went with per-VIF (storms come through wsm_handle_rx with a vif-resolved priv); per-hw_priv would aggregate across multiple vifs. 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marfrit added 1 commit 2026-05-06 17:51:29 +00:00
When the BES2600 firmware reports WSM_STATUS_DECRYPTFAILURE for a burst
of received frames (typically because the host's PTK or GTK has fallen
out of sync with the AP), the AP eventually concludes that the STA is
not authenticated and emits an unprotected deauth-reason-6 ("Class 2
frame received from non-authenticated station"). On the deployed
pinetab2 + bes2600 stack this AP-initiated deauth has been observed to
leave the link blackholed for up to 109 s before userspace finds a
different SSID/channel to recover on. (Receipts at
https://git.reauktion.de/marfrit/besser, notes/phase5-2026-05-06.md.)

Add a sliding-window counter on each bes2600_vif: when 5 decrypt
failures fire within 5 s, schedule a worker that calls
ieee80211_connection_loss(vif). mac80211 then performs immediate
disassociation; userspace (NetworkManager / wpa_supplicant) reconnects
with fresh keys before the AP gets a chance to fire its unprotected
deauth.

Predicted Phase 7 delta vs the unpatched baseline:
- decrypt-burst rate: unchanged (this does not address root cause)
- AP-deauth-6 rate: <= 0.2 of baseline
- conditional probability of >5s blackhole given a burst:
  100% -> <= 10%
- worst-case recovery time: 109s -> <5s

Contract pin: ieee80211_connection_loss() per
include/net/mac80211.h: "may also be called if the connection needs to
be terminated for some other reason... will cause immediate change to
disassociated state, without connection recovery attempts." Userspace
recovery is the existing NM/wpa_supplicant path. The worker context
satisfies the implicit process-context expectation.

Files touched:
- bes2600/bes2600.h: 4 new fields on struct bes2600_vif + 2 prototypes
- bes2600/txrx.c: new helpers + the call site at the existing
  WSM_STATUS_DECRYPTFAILURE log point (the unconditional "goto drop"
  branch in bes2600_rx_cb)
- bes2600/sta.c: bes2600_decrypt_storm_init() in bes2600_vif_setup
- bes2600/debug.c: DecryptStormRecoveries seq_printf in the per-vif
  status seq_file output

Threshold (5/5s) is set well above the steady-state per-vif decrypt-
fail rate observed in measurement (~1/min even under sustained 1 MB/s
load), so a true storm is required to trip it. The cw1200/cw1260
ancestor has no equivalent storm-recovery; this is a clean addition.

checkpatch.pl --no-tree --strict: clean (0/0/0).

Signed-off-by: Claude (noether) <claude@reauktion.de>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Author
Owner

Threshold OK? (5 / 5 s — looser than typical decrypt-fail bursts)

  • Yes.
    Should cancel_work_sync(&priv->decrypt_storm_recover_work) be added to the vif teardown path? Existing per-vif workers don't all have explicit cancel — happy to follow whichever the maintainer prefers.
    Question: so some do, some don't - if there is no hurt in calling cancel_work_sync twice (no double free? dereferencing of a previously invalidated pointer?) then, yes.
    Per-VIF or per-hw_priv counter scope? I went with per-VIF (storms come through wsm_handle_rx with a vif-resolved priv); per-hw_priv would aggregate across multiple vifs.
    Per-VIF.
Threshold OK? (5 / 5 s — looser than typical decrypt-fail bursts) - Yes. Should cancel_work_sync(&priv->decrypt_storm_recover_work) be added to the vif teardown path? Existing per-vif workers don't all have explicit cancel — happy to follow whichever the maintainer prefers. Question: so some do, some don't - if there is no hurt in calling cancel_work_sync twice (no double free? dereferencing of a previously invalidated pointer?) then, yes. Per-VIF or per-hw_priv counter scope? I went with per-VIF (storms come through wsm_handle_rx with a vif-resolved priv); per-hw_priv would aggregate across multiple vifs. Per-VIF.
marfrit force-pushed bes2600/decrypt-storm-fast-recover from b8f57a8443 to af4b52affc 2026-05-06 18:43:45 +00:00 Compare
Collaborator

Folded in the three resolutions, force-pushed af4b52a:

  • Threshold (5 / 5 s): kept as proposed.
  • cancel_work_sync on teardown: added in bes2600_remove_interface alongside the existing per-vif cancel_*_work_sync block. Safety check (per your concern): the kernel cancel_work_sync contract guarantees:
    • operates on any INIT_WORK'd work_struct (which we did in bes2600_decrypt_storm_init),
    • blocks until any in-flight handler returns — so no use-after-free of priv when mac80211 frees the vif,
    • is idempotent: a second call (or a call when nothing was scheduled) returns false, no double-anything,
    • does NOT touch the work_struct memory, so the caller is responsible for keeping it alive across the call (we are).
      Net: a single call here is sufficient and safe; calling it twice would also be harmless.
  • Per-VIF scope: kept (storms come through wsm_handle_rx with a vif-resolved priv).

Diff bumped from 86 → 87 lines (+ the one-line cancel). checkpatch --no-tree --strict still clean (0/0/0).

Not yet done: build verification against the running kernel + Phase 7 measurement on ohm. Holding for your re-review.

Folded in the three resolutions, force-pushed `af4b52a`: - **Threshold** (5 / 5 s): kept as proposed. - **`cancel_work_sync` on teardown**: added in `bes2600_remove_interface` alongside the existing per-vif `cancel_*_work_sync` block. **Safety check** (per your concern): the kernel `cancel_work_sync` contract guarantees: - operates on any `INIT_WORK`'d `work_struct` (which we did in `bes2600_decrypt_storm_init`), - blocks until any in-flight handler returns — so no use-after-free of `priv` when mac80211 frees the vif, - is idempotent: a second call (or a call when nothing was scheduled) returns `false`, no double-anything, - does NOT touch the work_struct memory, so the caller is responsible for keeping it alive across the call (we are). Net: a single call here is sufficient and safe; calling it twice would also be harmless. - **Per-VIF scope**: kept (storms come through `wsm_handle_rx` with a vif-resolved `priv`). Diff bumped from 86 → 87 lines (+ the one-line cancel). `checkpatch --no-tree --strict` still clean (0/0/0). Not yet done: build verification against the running kernel + Phase 7 measurement on ohm. Holding for your re-review.
Author
Owner

Thank you for your PR, please go ahead.

Thank you for your PR, please go ahead.
marfrit merged commit 789a9a4700 into mobian 2026-05-06 18:48:29 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/bes2600-dkms#1