Files
besser/notes/phase7-2026-05-07.md
claude-noether 69a1d0f8b1 notes: phase 7 verdict — Patch A confirmed, Patch B dormant
Phase 7 verification of cleanups + Patch A + Patch B (srcversion
1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load
ramped 1 MB/s → 4 MB/s on 2.4GHz newton.

Patch A: predicted delta CONFIRMED at N=2 reproductions.
  - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation
  - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation

Patch B: installed, untriggered. 2 api_connection_loss events spaced
91 s apart, never tripping the 3-in-60s threshold. No false positives,
no spurious bus_resets. Recovery delta unobserved (no harm done).

Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks
firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth
in ~4 s. New backlog item.

Bug #5 documented separately (RX path degrades under throughput
pressure; possible root of the original Phase-0 YouTube frame drops).
2026-05-07 15:18:36 +02:00

6.1 KiB
Raw Permalink Blame History

BES2600 WiFi-stability campaign — Phase 7 verdict (Patches A + B)

Date assembled: 2026-05-07 Module under test: bes2600.ko srcversion 1B3B3ED096AAD7217FEDE11 (cleanups + Patch A + Patch B) Run dir: /root/bes2600-samples/run-20260507-1248-patchB/ on ohm

Phase 7 verification window: 2026-05-07 12:48 → ~15:13 CEST (≈ 2 h 25 m) of which: ~50 min @ 1 MB/s pv-cap, ~1 h 30 m @ 4 MB/s pv-cap on 2.4 GHz newton (5b:32, signal -57 to -67 dBm).


Result table (vs the Phase 4 predicted delta)

Patch A — decrypt-storm fast-recover (Trigger B)

metric Phase 3 baseline Phase 4 prediction Phase 7-of-B observed
decrypt-burst rate 8.18/h unchanged 2 bursts in ~22 min once 4MB/s pressure was on
AP-deauth-6 rate following burst 100 % escalation ≤ 0.2 × baseline 0/2 = 0 % escalation
recovery time given burst up to 109 s < 5 s ~1 s (×2)

Verdict: predicted delta CONFIRMED at N=2. CLAUDE.md ideal is N=3; we're directionally locked at 2 reproductions, both behaving as predicted (threshold trip → [bes2600] decrypt-storm fast-recover: forcing reassoc log line → mac80211 disassoc → userspace reauth in ≈1 s).

Receipts (verbatim)

13:47:56  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
13:47:57  wlan0: associated to cc:ce:1e:2b:74:17     (cross-BSSID, 1 s)
13:49:26  bes2600_wlan: [bes2600] decrypt-storm fast-recover: forcing reassoc
13:49:27  wlan0: associated to c0:25:06:e6:5b:32     (back home, 1 s)

DecryptStormRecoveries: 2 exposed via debugfs at /sys/kernel/debug/ieee80211/phy0/bes2600/vif_0/status.

Patch B — connection-loss-storm bus_reset (Trigger A)

metric Phase 7-of-A observed Phase 4 prediction Phase 7-of-B observed
api_connection_loss rate 0.86/h unchanged 2 events in ~2 h (≈ 1/h)
ConnectionLossStormRecoveries n/a trips on 3-in-60s bursts 0
Threshold trip events n/a (when burst occurs) 0 (events spaced 91 s apart)

Verdict: installed but UNTRIGGERED. The 3-in-60s threshold was never reached (max-cluster observed: 2-in-91s). No false positives, no spurious bus_resets. Predicted delta unobserved — same shape as Patch A's first Phase 7 run.

The threshold may be too conservative for typical event rates (we'd need a true api_connection_loss flood to trip it). Tuning is a future Phase-1 question if more reproductions accumulate.

Trigger C — AP unprotected-deauth-6 cluster without preceding storm

12:53:10.475 → 12:53:11.756  AP fires 17 unprotected-deauth-6 from 5b:32 over 1.3 s
                              (2 mgmt-TX no-ack from our chip in the middle)
12:53:12.309  kernel: deauthenticating ... reason 2 = PREV_AUTH_NOT_VALID
12:53:1415  reauth via 61:b0 → 5b:32, recovery in ~4 s

Neither Patch A (zero decrypt-fails preceded) nor Patch B (zero api_connection_loss) fired. Background: AVM Fritz!Boxes (newton) are reliable; the AP correctly classified ohm's frames as Class 2 from non-auth, meaning bes2600 sent something the AP couldn't authenticate. New backlog entry: notes/observed-bugs.md Bug #5 (RX path under throughput pressure) is the leading hypothesis surface.

Recovery was fast (4 s) so this isn't a P0 — but a Patch C investigation is warranted when prioritized.


Bug #5 — RX path degradation under attempted-throughput pressure (NEW)

sender 1 MB/s  →  ohm receives 1015 KB/s,  -57 dBm,  RX MCS 4
sender 4 MB/s  →  ohm receives  563 KB/s,  -67 dBm,  RX MCS 3

Higher attempted-throughput on the sender side → LOWER observed throughput at ohm. Signal degraded ~10 dB, MCS dropped a notch. Link-physical max is ~8 MB/s; we're getting ~7 % of that under load.

Hypothesis (Markus): driver/firmware locks itself to death under busy reads. Plausibly the same root-cause as the Phase 0 YouTube DASH chunk-fetch drops (~10 frames per chunk fetch on hardware-decoder playback). Documented as Bug #5 in notes/observed-bugs.md.


Lessons captured for memory (Phase 8 anchor)

  1. Stress-rate matters for verification. Patch A's predicted delta only became observable when the netcat cap went 1 → 4 MB/s. The previous Phase 7 (10h30m @ 1 MB/s) saw zero decrypt-storms. Future Phase 7 protocols should plan a stress ramp from steady to near-saturation, not just the steady setting.
  2. "Untriggered, no harm" is a valid Phase 7 verdict for installed patches. Patch B fits this exactly. The patch is ready; the trigger pattern just doesn't fire often enough in this RF / load regime to verify the recovery delta. Don't let unobserved verifications block the loop.
  3. Build infrastructure on cleanups not mobian. The Phase 6 attempt to base Patch B on mobian forced a refactor mid-flight; the c-stack lives on cleanups, and re-using c5.2's bes2600_chrdev_do_bus_reset requires that. The cleanups branch is the campaign's working trunk.
  4. AP-side bug is unlikely on AVM hardware. AVM Fritz!Boxes don't fire spurious deauth-6 storms. When ohm sees AP-deauth-6 unprovoked, the suspect chain is bes2600 sending something the AP can't authenticate. The bias toward "bes2600 is the broken thing" is empirically validated.
  5. AP-deauth-6 can fire without our local triggers. Trigger C is a real failure mode neither Patch A nor B addresses. Adding a Phase-1-style metric for "AP-deauth-6 rate without preceding decrypt-storm or api_connection_loss" would surface Trigger C cleanly.
  6. pv -L cap interacts with TCP retransmit recovery. When the link can't sustain the cap, TCP backs off and pv blocks. Observed throughput is then a floor on chip RX capacity at that signal level, not the sender's intent. Useful for chip-load-characterization, but the cap should be set based on observed pull-rate, not on the link's nominal MCS rate.

Loop status

  • Phase 7: closed.
  • Patch A: confirmed (N=2). Stays in.
  • Patch B: installed, dormant in this regime, no harm. Stays in.
  • Bug #5: backlog, no patch yet. Documented.
  • Trigger C: backlog candidate, no patch yet. Documented.

Next campaign cycle would be re-anchoring Phase 0 around Bug #5 or Trigger C.