Phase 4 plan: decrypt-storm fast-recover (Trigger B), with revised Phase 1 #4

Merged
marfrit merged 1 commits from claude-noether-2 into main 2026-05-06 17:30:49 +00:00
Owner

Phase 4 plan hand-off for the BES2600 WiFi-stability investigation, drafted after the Phase 5 review of #3 was merged.

What's in this PR

  • notes/phase4-2026-05-06.md — the verbatim Phase 4 plan.
  • A revised Phase 1 metric folding reviewer feedback from #3 (also count AP-deauth) plus today's new Trigger-A finding (mac80211 api_connection_loss is the post-resume P1 mechanism — receipts in two ftrace-instrumented reps at 17:23 and 18:03).

Plan summary

  • Patch A — Decrypt-storm fast-recover (Trigger B): at bes2600/txrx.c:1696, add a sliding-window counter; on threshold (≥5 decrypt-fails in 5 s), schedule ieee80211_connection_loss(vif) to pre-empt the AP's unprotected-deauth-6.
  • Patch B — Beacon-loss / Trigger A: PARKED behind one more diagnostic rep (10 s cadence on the beacon loss counter to confirm chip-side vs RF-side beacon drop).

Predicted delta (Phase 7 units = Phase 3 units)

  • decrypt-burst rate: unchanged
  • AP-deauth-6 rate: ≤ 0.2 × current (we pre-empt the AP)
  • conditional escalation to >5 s blackhole: 100 % → ≤ 10 %
  • worst-case recovery time: 109 s → < 5 s

Asks

  1. Is ≥5 / 5 s the right threshold shape, or should it be tighter / looser?
  2. ieee80211_connection_loss(vif) vs alternative kernel API (cfg80211_disconnected)?
  3. Should the patch include a debugfs counter for Phase 7 verification?
  4. Is parking Patch B correct, or should the plan cover both?

Where the receipts live

  • Phase 5 artifact (this campaign's prior PR): notes/phase5-2026-05-06.md
  • Run dir on ohm: /root/bes2600-samples/run-20260506-0659-fresh/
  • Source pin: bes2600/txrx.c:1696, bes2600/wsm.h:620, bes2600/wsm.c:1484
  • ftrace receipts for Trigger A: today's rep at uptime 38427 (api_connection_loss) → 38667 (kernel deauth-2)

🤖 Generated with Claude Code

Phase 4 plan hand-off for the BES2600 WiFi-stability investigation, drafted after the Phase 5 review of #3 was merged. ## What's in this PR - `notes/phase4-2026-05-06.md` — the verbatim Phase 4 plan. - A revised Phase 1 metric folding reviewer feedback from #3 (also count AP-deauth) plus today's new Trigger-A finding (mac80211 `api_connection_loss` is the post-resume P1 mechanism — receipts in two ftrace-instrumented reps at 17:23 and 18:03). ## Plan summary - **Patch A — Decrypt-storm fast-recover (Trigger B)**: at `bes2600/txrx.c:1696`, add a sliding-window counter; on threshold (≥5 decrypt-fails in 5 s), schedule `ieee80211_connection_loss(vif)` to pre-empt the AP's unprotected-deauth-6. - **Patch B — Beacon-loss / Trigger A**: PARKED behind one more diagnostic rep (10 s cadence on the `beacon loss` counter to confirm chip-side vs RF-side beacon drop). ## Predicted delta (Phase 7 units = Phase 3 units) - decrypt-burst rate: unchanged - AP-deauth-6 rate: ≤ 0.2 × current (we pre-empt the AP) - conditional escalation to >5 s blackhole: 100 % → ≤ 10 % - worst-case recovery time: 109 s → < 5 s ## Asks 1. Is `≥5 / 5 s` the right threshold shape, or should it be tighter / looser? 2. `ieee80211_connection_loss(vif)` vs alternative kernel API (`cfg80211_disconnected`)? 3. Should the patch include a debugfs counter for Phase 7 verification? 4. Is parking Patch B correct, or should the plan cover both? ## Where the receipts live - Phase 5 artifact (this campaign's prior PR): `notes/phase5-2026-05-06.md` - Run dir on ohm: `/root/bes2600-samples/run-20260506-0659-fresh/` - Source pin: `bes2600/txrx.c:1696`, `bes2600/wsm.h:620`, `bes2600/wsm.c:1484` - ftrace receipts for Trigger A: today's rep at uptime 38427 (api_connection_loss) → 38667 (kernel deauth-2) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marfrit added 1 commit 2026-05-06 17:10:38 +00:00
Drafts Patch A (decrypt-storm fast-recover, Trigger B) at txrx.c:1696
with sliding-window threshold + ieee80211_connection_loss reassoc.
Patch B (beacon-loss / Trigger A) parked behind one more diagnostic
rep with 10s snap-loop cadence on the beacon-loss counter.

Folds reviewer feedback from PR #3 + the new Trigger-A finding
(post-resume P1 = api_connection_loss-driven, two reps captured today
at 17:23 and 18:03) into a revised Phase 1 metric counting three
event classes.

Pending Phase 5 second-model review of the plan before Phase 6
implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
marfrit reviewed 2026-05-06 17:28:26 +00:00
marfrit left a comment
Author
Owner

Commented

Commented
@@ -0,0 +98,4 @@
## Asks of the reviewer
1. Is the threshold (≥5 decrypt-fails in 5 s) the right shape? Should it be more conservative (≥10 in 10 s)? More aggressive (≥3 in 3 s)? The 12 observed bursts ranged from 4 to 9 events per 60 s window (the Phase 1 looser definition). The patch threshold will fire on the same bursts under any of those choices; pick the one most defensible against false positives.
Author
Owner

Suggestion accepted

Suggestion accepted
@@ -0,0 +99,4 @@
## Asks of the reviewer
1. Is the threshold (≥5 decrypt-fails in 5 s) the right shape? Should it be more conservative (≥10 in 10 s)? More aggressive (≥3 in 3 s)? The 12 observed bursts ranged from 4 to 9 events per 60 s window (the Phase 1 looser definition). The patch threshold will fire on the same bursts under any of those choices; pick the one most defensible against false positives.
2. Is `ieee80211_connection_loss(vif)` the right kernel API? Alternative: `cfg80211_disconnected` with a reason code. Which is cleaner per mac80211 contract for a host-driven preemptive reassoc?
Author
Owner

What do proper in kernel drivers to? Opting for the majority vote there.

What do proper in kernel drivers to? Opting for the majority vote there.
@@ -0,0 +100,4 @@
1. Is the threshold (≥5 decrypt-fails in 5 s) the right shape? Should it be more conservative (≥10 in 10 s)? More aggressive (≥3 in 3 s)? The 12 observed bursts ranged from 4 to 9 events per 60 s window (the Phase 1 looser definition). The patch threshold will fire on the same bursts under any of those choices; pick the one most defensible against false positives.
2. Is `ieee80211_connection_loss(vif)` the right kernel API? Alternative: `cfg80211_disconnected` with a reason code. Which is cleaner per mac80211 contract for a host-driven preemptive reassoc?
3. Should Patch A include a debugfs counter exposing how many storms it has caught, so Phase 7 verification has a host-side counter rather than relying on journal grep alone?
Author
Owner

Yes

Yes
@@ -0,0 +101,4 @@
1. Is the threshold (≥5 decrypt-fails in 5 s) the right shape? Should it be more conservative (≥10 in 10 s)? More aggressive (≥3 in 3 s)? The 12 observed bursts ranged from 4 to 9 events per 60 s window (the Phase 1 looser definition). The patch threshold will fire on the same bursts under any of those choices; pick the one most defensible against false positives.
2. Is `ieee80211_connection_loss(vif)` the right kernel API? Alternative: `cfg80211_disconnected` with a reason code. Which is cleaner per mac80211 contract for a host-driven preemptive reassoc?
3. Should Patch A include a debugfs counter exposing how many storms it has caught, so Phase 7 verification has a host-side counter rather than relying on journal grep alone?
4. Patch B parked correctly, or fold it into this same Phase 4?
Author
Owner

Parked, one issue at a time to not get caught up in weird side effects.

Parked, one issue at a time to not get caught up in weird side effects.
marfrit merged commit 4acba3e707 into main 2026-05-06 17:30:49 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/besser#4