Phase 5 review: BES2600 WiFi-stability campaign artifact #2

2026-05-06T13:23:49Z

marfrit commented

2026-05-06 13:23:49 +00:00

Phase 5 hand-off artifact for the BES2600 WiFi-stability investigation, submitted as a branch on the besser umbrella for second-model review.

Summary

Phase 1 metric locked: rate of WSM_STATUS_DECRYPTFAILURE bursts (≥4 events / 60 s) per hour, plus conditional probability of an AP-side unprotected-deauth-reason-6 within 30 s.
Three Pattern-P1 events captured on hardware (ohm, srcversion 461AFB36...).
Rig: dynamic_debug + journal + iw-event + tcpdump-filtered + 60 s snap loop + netcat 1 MB/s load probe.
Headline finding: under sustained load, decrypt-failure burst rate elevates ~35x but stops escalating to AP-deauth (100% idle / 0% load conditional escalation).
Open contradictions and small-N caveats stated as-is.

Asks

Is the Phase 1 metric the right discriminator? Should we also count AP-deauth-6 with no preceding burst (the Event-3 / post-resume path)?
Is N=1 idle / N=12 load enough to report the escalation flip, or do we need N≥3 in each bin?
Cheap Phase 3 additions before Phase 4: enable ftrace mac80211/cfg80211, widen tcpdump to ring EAPOL with a moving 5-min window?
Is missing AP-side capture (no Fritz!Box logs) a blocking gap?

Where the receipts live

Artifact in this PR: notes/phase5-2026-05-06.md
Raw run dir on ohm: /root/bes2600-samples/run-20260506-0659-fresh/
Source pins: bes2600/txrx.c:1696, bes2600/wsm.h:620, bes2600/wsm.c:1484

🤖 Generated with Claude Code

Phase 5 hand-off artifact for the BES2600 WiFi-stability investigation, submitted as a branch on the besser umbrella for second-model review. ## Summary - Phase 1 metric locked: rate of `WSM_STATUS_DECRYPTFAILURE` bursts (≥4 events / 60 s) per hour, plus conditional probability of an AP-side unprotected-deauth-reason-6 within 30 s. - Three Pattern-P1 events captured on hardware (ohm, srcversion `461AFB36...`). - Rig: dynamic_debug + journal + iw-event + tcpdump-filtered + 60 s snap loop + netcat 1 MB/s load probe. - Headline finding: under sustained load, decrypt-failure burst rate elevates ~35x but stops escalating to AP-deauth (100% idle / 0% load conditional escalation). - Open contradictions and small-N caveats stated as-is. ## Asks 1. Is the Phase 1 metric the right discriminator? Should we also count AP-deauth-6 with no preceding burst (the Event-3 / post-resume path)? 2. Is N=1 idle / N=12 load enough to report the escalation flip, or do we need N≥3 in each bin? 3. Cheap Phase 3 additions before Phase 4: enable ftrace mac80211/cfg80211, widen tcpdump to ring EAPOL with a moving 5-min window? 4. Is missing AP-side capture (no Fritz!Box logs) a blocking gap? ## Where the receipts live - Artifact in this PR: `notes/phase5-2026-05-06.md` - Raw run dir on ohm: `/root/bes2600-samples/run-20260506-0659-fresh/` - Source pins: `bes2600/txrx.c:1696`, `bes2600/wsm.h:620`, `bes2600/wsm.c:1484` 🤖 Generated with [Claude Code](https://claude.com/claude-code)

marfrit added 1 commit 2026-05-06 13:23:49 +00:00

notes: phase 5 review artifact for BES2600 wifi-stability campaign 1a21212744

Captures Phase 0-3 receipts as of 2026-05-06: three Pattern-P1 events
reproduced (07:13, 11:03, yesterday 22:33), decrypt-failure metric locked
as Phase 1 with source pins (txrx.c:1696, wsm.h:620, wsm.c:1484), rig built
(snap loop + tcpdump filtered ring + iw event + dynamic_debug + netcat 1MB/s),
idle-vs-load comparison shows 35x burst-rate elevation under load with
conditional-escalation flip (100% idle / 0% load).

Pending Phase 5 second-model review before Phase 4 plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

marfrit reviewed 2026-05-06 13:32:23 +00:00

marfrit left a comment

Review done

notes/phase5-2026-05-06.md

						
				@@ -0,0 +185,4 @@

				### Open contradictions / things the loop has NOT yet resolved

				1. Event-3 (post-resume) had no decrypt-failures yet still ended in PREV_AUTH_NOT_VALID. There is a second P1 trigger path we have not pinned a mechanism for.

marfrit commented

2026-05-06 13:26:09 +00:00

Please suggest a mechanism.

notes/phase5-2026-05-06.md

						
				@@ -0,0 +209,4 @@

				## Asks of the reviewer

				1. Is the Phase 1 metric the right discriminator? In particular, does the conditional-probability column (escalation given burst) capture what we want, or should the metric also count "AP-deauth-6 with no preceding burst" (the Event-3 path)?

marfrit commented

2026-05-06 13:27:26 +00:00

Also count AP-deauth.

notes/phase5-2026-05-06.md

						
				@@ -0,0 +210,4 @@

				## Asks of the reviewer

				1. Is the Phase 1 metric the right discriminator? In particular, does the conditional-probability column (escalation given burst) capture what we want, or should the metric also count "AP-deauth-6 with no preceding burst" (the Event-3 path)?

				2. The escalation-rate flip (100% idle, 0% load) is at N=1 idle vs N=12 load. Is N=1 idle adequate to report this finding, or do we need ≥3 idle bursts before locking?

marfrit commented

2026-05-06 13:28:49 +00:00

As I have observed the behaviour previously without accurate measurements, the "user observes problem means with a confidentiality of p = 0.9 that the problem exists". No further measurements needed, as the issue is also tedious to reproduce like watching a kettle get to boiling point or watch grass grow.

notes/phase5-2026-05-06.md

						
				@@ -0,0 +212,4 @@

				1. Is the Phase 1 metric the right discriminator? In particular, does the conditional-probability column (escalation given burst) capture what we want, or should the metric also count "AP-deauth-6 with no preceding burst" (the Event-3 path)?

				2. The escalation-rate flip (100% idle, 0% load) is at N=1 idle vs N=12 load. Is N=1 idle adequate to report this finding, or do we need ≥3 idle bursts before locking?

				3. Anything in Phase 3 above flagged as "not yet measured" that would be cheap to add before Phase 4? Specifically:

				   - Should ftrace mac80211/cfg80211 events be enabled before next rep? Cost: ~10x journal volume. Benefit: bottom-half timing for the 100ms-scale stalls in the 109s blackhole.

marfrit commented

2026-05-06 13:30:21 +00:00

Good idea.

notes/phase5-2026-05-06.md

						
				@@ -0,0 +213,4 @@

				2. The escalation-rate flip (100% idle, 0% load) is at N=1 idle vs N=12 load. Is N=1 idle adequate to report this finding, or do we need ≥3 idle bursts before locking?

				3. Anything in Phase 3 above flagged as "not yet measured" that would be cheap to add before Phase 4? Specifically:

				   - Should ftrace mac80211/cfg80211 events be enabled before next rep? Cost: ~10x journal volume. Benefit: bottom-half timing for the 100ms-scale stalls in the 109s blackhole.

				   - Should tcpdump filter widen to include EAPOL frames captured in a moving 5-min window before each P1 event so we see the group-rekey directly?

marfrit commented

2026-05-06 13:30:35 +00:00

Yes.

notes/phase5-2026-05-06.md

						
				@@ -0,0 +214,4 @@

				3. Anything in Phase 3 above flagged as "not yet measured" that would be cheap to add before Phase 4? Specifically:

				   - Should ftrace mac80211/cfg80211 events be enabled before next rep? Cost: ~10x journal volume. Benefit: bottom-half timing for the 100ms-scale stalls in the 109s blackhole.

				   - Should tcpdump filter widen to include EAPOL frames captured in a moving 5-min window before each P1 event so we see the group-rekey directly?

				4. Is the lack of AP-side capture (no Fritz!Box logs) a blocking gap, or can the campaign proceed without it for now?

marfrit commented

2026-05-06 13:32:01 +00:00

The campaign can proceed without, as we would have to prepare an AP ourselves which is not guaranteed to reproduce the problem. Fritz!Box devices are opaque when it comes to low level WiFi logging.

marfrit referenced this pull request

2026-05-06 13:35:14 +00:00

Phase 5 review: BES2600 WiFi-stability campaign artifact #3

marfrit commented

2026-05-06 13:35:14 +00:00

Superseded by #3 (correct branch name claude-noether). Closing this — please continue review on the new PR. Your comments here are still visible for posterity.

Superseded by #3 (correct branch name `claude-noether`). Closing this — please continue review on the new PR. Your comments here are still visible for posterity.

marfrit closed this pull request

2026-05-06 13:35:15 +00:00

marfrit referenced this pull request

2026-05-06 13:37:02 +00:00

Phase 5 review: BES2600 WiFi-stability campaign artifact #3

claude-noether referenced this pull request

2026-05-07 18:51:06 +00:00

notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver (re-after-failure) #10

marfrit referenced this pull request

2026-05-09 11:02:43 +00:00

build: bes2600_chrdev_switch_subsys_glb() called from bes2600_btuart.c:81 but never defined/declared #17