Phase 4 plan: Patch B (Trigger A / api_connection_loss) #5

Merged
marfrit merged 1 commits from claude-noether-3 into main 2026-05-07 10:45:29 +00:00
Owner

Phase 4 plan for Patch B drafted after Phase 7 verification of Patch A.

Status of the loop

  • Patch A (PR marfrit/bes2600-dkms#1) merged, deployed (srcversion 21BD07B3), and Phase-7-tested over 10h30m of sustained load on 2.4GHz.
  • DecryptStormRecoveries: 0 — Patch A dormant, no harm done, but no decrypt-storm fired during Phase 7 so Patch A's predicted delta is unobserved (not invalidated).
  • Trigger A (mac80211 api_connection_loss chain) WAS observed: 9 events / 10h30m, with one catastrophic blackhole at 02:42 (~86 s of assoc comeback timeouts, ending in AP unprotected-deauth-6 cluster).

Plan summary

  • Candidate B-1 (locked): extend the existing bes2600_chrdev_do_bus_reset() infrastructure (from c5.2) to fire on N consecutive api_connection_loss events on the same vif. Bus-reset gives the chip a fresh state; userspace re-associates.
  • Candidate B-2 (deferred): assoc-comeback-timer respect — likely a mac80211 / wpa_supplicant concern, not a bes2600 patch.
  • Candidate B-3 (speculative): vif-state scrub on disconnect — needs more instrumentation first.

Predicted delta (Phase 7 units)

metric observed predicted under B-1
api_connection_loss rate 0.86/h unchanged (we don't address trigger)
P(>5s blackhole | event) 11 % (1/9) ≤ 30 %
worst-case recovery 86 s < 10 s

Open asks (4 in the artifact)

  1. Is B-1 the right scope or should we instrument deeper before committing?
  2. Threshold (3 events / 60 s): tune up or down?
  3. Should the bus_reset be conditional on observing assoc-comeback timeouts in the same window?
  4. Is the assoc-comeback disrespect a mac80211 bug rather than a bes2600 bug?

Where the receipts live

  • Phase 5 (priors): notes/phase5-2026-05-06.md (PR #3, merged)
  • Phase 4 Patch A: notes/phase4-2026-05-06.md (PR #4, merged)
  • Patch A code: marfrit/bes2600-dkms PR #1 (merged)
  • Phase 7 receipts: this run dir on ohm — /root/bes2600-samples/run-20260506-2113-patchA/
  • Source pin for B-1's reuse target: bes2600/bes_chardev.c (c5.2 introduced bes2600_chrdev_do_bus_reset)

🤖 Generated with Claude Code

Phase 4 plan for **Patch B** drafted after Phase 7 verification of Patch A. ## Status of the loop - Patch A (PR `marfrit/bes2600-dkms#1`) merged, deployed (srcversion `21BD07B3`), and Phase-7-tested over 10h30m of sustained load on 2.4GHz. - `DecryptStormRecoveries: 0` — Patch A dormant, no harm done, but **no decrypt-storm fired during Phase 7** so Patch A's predicted delta is unobserved (not invalidated). - Trigger A (`mac80211 api_connection_loss` chain) WAS observed: 9 events / 10h30m, with one catastrophic blackhole at 02:42 (~86 s of `assoc comeback` timeouts, ending in AP unprotected-deauth-6 cluster). ## Plan summary - **Candidate B-1 (locked)**: extend the existing `bes2600_chrdev_do_bus_reset()` infrastructure (from c5.2) to fire on N consecutive `api_connection_loss` events on the same vif. Bus-reset gives the chip a fresh state; userspace re-associates. - **Candidate B-2 (deferred)**: assoc-comeback-timer respect — likely a mac80211 / wpa_supplicant concern, not a bes2600 patch. - **Candidate B-3 (speculative)**: vif-state scrub on disconnect — needs more instrumentation first. ## Predicted delta (Phase 7 units) | metric | observed | predicted under B-1 | |---|---|---| | api_connection_loss rate | 0.86/h | unchanged (we don't address trigger) | | P(>5s blackhole \| event) | 11 % (1/9) | ≤ 30 % | | worst-case recovery | 86 s | < 10 s | ## Open asks (4 in the artifact) 1. Is B-1 the right scope or should we instrument deeper before committing? 2. Threshold (3 events / 60 s): tune up or down? 3. Should the bus_reset be conditional on observing assoc-comeback timeouts in the same window? 4. Is the assoc-comeback disrespect a mac80211 bug rather than a bes2600 bug? ## Where the receipts live - Phase 5 (priors): `notes/phase5-2026-05-06.md` (PR #3, merged) - Phase 4 Patch A: `notes/phase4-2026-05-06.md` (PR #4, merged) - Patch A code: `marfrit/bes2600-dkms` PR #1 (merged) - Phase 7 receipts: this run dir on ohm — `/root/bes2600-samples/run-20260506-2113-patchA/` - Source pin for B-1's reuse target: `bes2600/bes_chardev.c` (c5.2 introduced `bes2600_chrdev_do_bus_reset`) 🤖 Generated with [Claude Code](https://claude.com/claude-code)
marfrit added 1 commit 2026-05-07 08:34:14 +00:00
Drafted after Phase 7 verification of Patch A (PR #1, srcversion
21BD07B3). 10h30m sustained load on 2.4GHz produced:
- 0 DecryptStormRecoveries (Patch A dormant; no decrypt-storm fired)
- 9 mac80211 api_connection_loss events
- 1 catastrophic blackhole at 02:42 (reason 4 inactivity → reauth
  with assoc-comeback timeouts → AP unprotected-deauth-6 cluster)

Phase 4 pivots to Trigger A (Patch B). Candidate B-1 lock proposal:
extend c5.2 bus_reset infrastructure to fire on N consecutive
api_connection_loss events; reuses existing recovery path.

Pending Phase 5 review before Phase 6 implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
marfrit reviewed 2026-05-07 08:54:05 +00:00
marfrit left a comment
Author
Owner

Done review.

Done review.
@@ -0,0 +147,4 @@
## Asks of the reviewer
1. Candidate B-1 (bus_reset on api_connection_loss flood) the right scope, or should we instrument deeper before committing?
Author
Owner

Right scope as of now.

Right scope as of now.
@@ -0,0 +148,4 @@
## Asks of the reviewer
1. Candidate B-1 (bus_reset on api_connection_loss flood) the right scope, or should we instrument deeper before committing?
2. Threshold (3 events / 60 s): too aggressive (false-positive bus_resets on transient RF issues) or about right?
Author
Owner

About right.

About right.
@@ -0,0 +149,4 @@
1. Candidate B-1 (bus_reset on api_connection_loss flood) the right scope, or should we instrument deeper before committing?
2. Threshold (3 events / 60 s): too aggressive (false-positive bus_resets on transient RF issues) or about right?
3. Should bus_reset be conditional on ALSO seeing post-deauth assoc-comeback timeouts, to avoid resetting on benign connection_loss events?
Author
Owner

The device is in one positition for hours - honestly, there should be no benign connection_loss events.

The device is in one positition for hours - honestly, there should be no benign connection_loss events.
@@ -0,0 +150,4 @@
1. Candidate B-1 (bus_reset on api_connection_loss flood) the right scope, or should we instrument deeper before committing?
2. Threshold (3 events / 60 s): too aggressive (false-positive bus_resets on transient RF issues) or about right?
3. Should bus_reset be conditional on ALSO seeing post-deauth assoc-comeback timeouts, to avoid resetting on benign connection_loss events?
4. Hypothesis 1 (assoc-comeback disrespected) — is this a mac80211/wpa_supplicant bug rather than a bes2600 bug? If yes, we file it elsewhere.
Author
Owner

Document, do not file, as per principle.

Document, do not file, as per principle.
marfrit merged commit ea509e810f into main 2026-05-07 10:45:29 +00:00
Sign in to join this conversation.
No Reviewers
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/besser#5