69a1d0f8b1
Phase 7 verification of cleanups + Patch A + Patch B (srcversion 1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load ramped 1 MB/s → 4 MB/s on 2.4GHz newton. Patch A: predicted delta CONFIRMED at N=2 reproductions. - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation Patch B: installed, untriggered. 2 api_connection_loss events spaced 91 s apart, never tripping the 3-in-60s threshold. No false positives, no spurious bus_resets. Recovery delta unobserved (no harm done). Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth in ~4 s. New backlog item. Bug #5 documented separately (RX path degrades under throughput pressure; possible root of the original Phase-0 YouTube frame drops).
97 lines
6.1 KiB
Markdown
97 lines
6.1 KiB
Markdown
# BES2600 WiFi-stability campaign — Phase 7 verdict (Patches A + B)
|
||
|
||
Date assembled: 2026-05-07
|
||
Module under test: bes2600.ko srcversion `1B3B3ED096AAD7217FEDE11`
|
||
(cleanups + Patch A + Patch B)
|
||
Run dir: `/root/bes2600-samples/run-20260507-1248-patchB/` on ohm
|
||
|
||
Phase 7 verification window: 2026-05-07 12:48 → ~15:13 CEST (≈ 2 h 25 m)
|
||
of which: ~50 min @ 1 MB/s pv-cap, ~1 h 30 m @ 4 MB/s pv-cap on 2.4 GHz
|
||
newton (5b:32, signal -57 to -67 dBm).
|
||
|
||
---
|
||
|
||
## Result table (vs the Phase 4 predicted delta)
|
||
|
||
### Patch A — decrypt-storm fast-recover (Trigger B)
|
||
|
||
| metric | Phase 3 baseline | Phase 4 prediction | Phase 7-of-B observed |
|
||
|---|---|---|---|
|
||
| decrypt-burst rate | 8.18/h | unchanged | 2 bursts in ~22 min once 4MB/s pressure was on |
|
||
| AP-deauth-6 rate following burst | 100 % escalation | ≤ 0.2 × baseline | **0/2 = 0 % escalation** |
|
||
| recovery time given burst | up to 109 s | < 5 s | **~1 s** (×2) |
|
||
|
||
**Verdict: predicted delta CONFIRMED at N=2.** CLAUDE.md ideal is N=3; we're directionally locked at 2 reproductions, both behaving as predicted (threshold trip → `[bes2600] decrypt-storm fast-recover: forcing reassoc` log line → mac80211 disassoc → userspace reauth in ≈1 s).
|
||
|
||
#### Receipts (verbatim)
|
||
|
||
```
|
||
13:47:56 bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
|
||
13:47:57 wlan0: associated to cc:ce:1e:2b:74:17 (cross-BSSID, 1 s)
|
||
13:49:26 bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
|
||
13:49:27 wlan0: associated to c0:25:06:e6:5b:32 (back home, 1 s)
|
||
```
|
||
|
||
`DecryptStormRecoveries: 2` exposed via debugfs at `/sys/kernel/debug/ieee80211/phy0/bes2600/vif_0/status`.
|
||
|
||
### Patch B — connection-loss-storm bus_reset (Trigger A)
|
||
|
||
| metric | Phase 7-of-A observed | Phase 4 prediction | Phase 7-of-B observed |
|
||
|---|---|---|---|
|
||
| api_connection_loss rate | 0.86/h | unchanged | 2 events in ~2 h (≈ 1/h) |
|
||
| ConnectionLossStormRecoveries | n/a | trips on 3-in-60s bursts | **0** |
|
||
| Threshold trip events | n/a | (when burst occurs) | **0** (events spaced 91 s apart) |
|
||
|
||
**Verdict: installed but UNTRIGGERED.** The 3-in-60s threshold was never reached (max-cluster observed: 2-in-91s). No false positives, no spurious bus_resets. Predicted delta unobserved — same shape as Patch A's first Phase 7 run.
|
||
|
||
The threshold may be too conservative for typical event rates (we'd need a true api_connection_loss flood to trip it). Tuning is a future Phase-1 question if more reproductions accumulate.
|
||
|
||
### Trigger C — AP unprotected-deauth-6 cluster without preceding storm
|
||
|
||
```
|
||
12:53:10.475 → 12:53:11.756 AP fires 17 unprotected-deauth-6 from 5b:32 over 1.3 s
|
||
(2 mgmt-TX no-ack from our chip in the middle)
|
||
12:53:12.309 kernel: deauthenticating ... reason 2 = PREV_AUTH_NOT_VALID
|
||
12:53:14–15 reauth via 61:b0 → 5b:32, recovery in ~4 s
|
||
```
|
||
|
||
Neither Patch A (zero decrypt-fails preceded) nor Patch B (zero api_connection_loss) fired. Background: AVM Fritz!Boxes (newton) are reliable; the AP correctly classified ohm's frames as Class 2 from non-auth, meaning **bes2600 sent something the AP couldn't authenticate**. New backlog entry: `notes/observed-bugs.md` Bug #5 (RX path under throughput pressure) is the leading hypothesis surface.
|
||
|
||
Recovery was fast (4 s) so this isn't a P0 — but a Patch C investigation is warranted when prioritized.
|
||
|
||
---
|
||
|
||
## Bug #5 — RX path degradation under attempted-throughput pressure (NEW)
|
||
|
||
```
|
||
sender 1 MB/s → ohm receives 1015 KB/s, -57 dBm, RX MCS 4
|
||
sender 4 MB/s → ohm receives 563 KB/s, -67 dBm, RX MCS 3
|
||
```
|
||
|
||
Higher attempted-throughput on the sender side → LOWER observed throughput at ohm. Signal degraded ~10 dB, MCS dropped a notch. Link-physical max is ~8 MB/s; we're getting ~7 % of that under load.
|
||
|
||
**Hypothesis (Markus): driver/firmware locks itself to death under busy reads.** Plausibly the same root-cause as the Phase 0 YouTube DASH chunk-fetch drops (~10 frames per chunk fetch on hardware-decoder playback). Documented as Bug #5 in `notes/observed-bugs.md`.
|
||
|
||
---
|
||
|
||
## Lessons captured for memory (Phase 8 anchor)
|
||
|
||
1. **Stress-rate matters for verification.** Patch A's predicted delta only became observable when the netcat cap went 1 → 4 MB/s. The previous Phase 7 (10h30m @ 1 MB/s) saw zero decrypt-storms. Future Phase 7 protocols should plan a stress ramp from steady to near-saturation, not just the steady setting.
|
||
2. **"Untriggered, no harm" is a valid Phase 7 verdict** for installed patches. Patch B fits this exactly. The patch is ready; the trigger pattern just doesn't fire often enough in this RF / load regime to verify the recovery delta. Don't let unobserved verifications block the loop.
|
||
3. **Build infrastructure on `cleanups` not `mobian`.** The Phase 6 attempt to base Patch B on mobian forced a refactor mid-flight; the c-stack lives on cleanups, and re-using c5.2's `bes2600_chrdev_do_bus_reset` requires that. The cleanups branch is the campaign's working trunk.
|
||
4. **AP-side bug is unlikely on AVM hardware.** AVM Fritz!Boxes don't fire spurious deauth-6 storms. When ohm sees AP-deauth-6 unprovoked, the suspect chain is bes2600 sending something the AP can't authenticate. The bias toward "bes2600 is the broken thing" is empirically validated.
|
||
5. **AP-deauth-6 can fire without our local triggers.** Trigger C is a real failure mode neither Patch A nor B addresses. Adding a Phase-1-style metric for "AP-deauth-6 rate without preceding decrypt-storm or api_connection_loss" would surface Trigger C cleanly.
|
||
6. **`pv -L` cap interacts with TCP retransmit recovery.** When the link can't sustain the cap, TCP backs off and pv blocks. Observed throughput is then a **floor on chip RX capacity at that signal level**, not the sender's intent. Useful for chip-load-characterization, but the cap should be set based on observed pull-rate, not on the link's nominal MCS rate.
|
||
|
||
---
|
||
|
||
## Loop status
|
||
|
||
- Phase 7: closed.
|
||
- Patch A: confirmed (N=2). Stays in.
|
||
- Patch B: installed, dormant in this regime, no harm. Stays in.
|
||
- Bug #5: backlog, no patch yet. Documented.
|
||
- Trigger C: backlog candidate, no patch yet. Documented.
|
||
|
||
Next campaign cycle would be re-anchoring Phase 0 around Bug #5 or Trigger C.
|