# BES2600 WiFi-stability campaign — Phase 5 review artifact Date assembled: 2026-05-06 (rig started 06:59 CEST) Run dir: /root/bes2600-samples/run-20260506-0659-fresh/ on ohm Module under test: bes2600.ko srcversion 461AFB369355AE598D79BDF (c-final + c5.2.1) This is the Phase 5 hand-off artifact. Per project CLAUDE.md: paste verbatim, do not curate. Anomalies, contradictions and small-N caveats are stated as-is. --- ## Phase 0 — Substrate / Motivation / Inventory ### Triggering observation (user-reported, 2026-05-06) 1. After hours of operation, near-100% WiFi quality eventually drops association. 2. YouTube with hardware decoder shows ~10 dropped frames per DASH chunk fetch. ### Hardware-in-the-loop - ohm = PineTab2 (RK3566 SoC, BES2600 over SDIO) - Reachable as ohm.fritz.box (192.168.88.168 on home AP, 10.141.179.63 on fallback) - Kernel: 6.19.10-danctnix1-1-pinetab2 - bes2600 module: srcversion 461AFB369355AE598D79BDF (c5.1+c5.1.1+c5.2+c6.1+c6.2+c7+c5.2.1) - bes2600_btuart: srcversion 8FF920B9C068EA2E7DB9BA8 (unchanged) ### What is measurable - journald (kernel + NetworkManager + wpa_supplicant) - bes2600 dynamic_debug: 13 callsites enabled with +pmf flag - iw event -t -f (cfg80211 events with reason codes) - tcpdump on wlan0 (managed mode — sees data + EAPOL + ARP/ICMP, NOT raw 802.11 mgmt) - per-60s snapshot loop: iw link, station dump, /proc/net/wireless, /sys/class/net/wlan0/statistics ### Predecessor anchor None — campaign re-anchored from in-session reps. User pre-rig claims partially replicated: - A5 (c7 latch trip count rises over session) — does NOT replicate. 0 hits in 12h boot for "PSM not honored" / "pm_unsupported" / "switching to skip". - A2 (~4h to drop) — does NOT cleanly replicate as a periodicity; intervals vary minutes to hours. ### Receipt checklist (Phase 0) - [x] Predecessor data treated as reference, not anchor — A5 explicitly falsified above - [ ] In-session baseline rep N=3: NOT yet — N=3 events observed across multiple boots, only N=1 idle bin and N=12 load bursts in current boot --- ## Phase 1 — Goal formulation (locked 2026-05-06 11:56 CEST) ### Measurable target > Quantify the rate of WSM_STATUS_DECRYPTFAILURE bursts (≥4 events within 60s) per hour of operation, AND the conditional probability of an AP-side unprotected-deauth-reason-6 within 30s of such a burst. Locked artifact: /root/bes2600-samples/run-20260506-0659-fresh/PHASE1.md Journal marker: 2026-05-06T11:56:14+02:00 — bes2600-test PHASE1_LOCKED ### Why this metric (not the original "assoc up/down per hour") Source pin: bes2600/txrx.c:1696 — bes_warn for [RX] Receive failure. The status field comes from the firmware's WSM RX-indication, parsed in wsm.c:1484 wsm_receive_indication via WSM_GET32(buf). Status code 4 = WSM_STATUS_DECRYPTFAILURE per wsm.h:620. Two observed Pattern-P1 events on 2026-05-06 chained: 1. decrypt-failure storm 2. AP unprotected-deauth-6 ("Class 2 frame received from non-authenticated station") 3. kernel local PREV_AUTH_NOT_VALID 4. reauth-stall on same channel; recovery via different SSID/channel A third P1 event (yesterday 2026-05-05 22:33) had ZERO decrypt-failures preceding — different trigger (post-resume). The metric will discriminate. --- ## Phase 2 — Situation Analysis ### Rig built (live as of artifact assembly) - snap loop PID 5712, ~6h25m elapsed to snapshots/snap.log (60s cadence) - tcpdump filtered ring PID 10174, ~4h20m elapsed to tcpdump/cap.pcap files - iw event PID 9852, ~4h20m elapsed to iw-event.log - dynamic_debug bes2600 13 callsites enabled - nc listener loop PID 17037 wrapper, 17039 active listener, port 12345 ### tcpdump filter applied `arp || icmp || icmp6 || ether proto 0x888e || port 67 || port 68 || port 53 || port 5353 || port 546 || port 547 || (tcp[tcpflags] and (tcp-syn or tcp-fin or tcp-rst) != 0)` (So we capture ARP, ICMP, EAPOL, DHCP4/6, DNS, mDNS, and TCP control flags. Bulk data dropped.) ### Anti-theatre receipts verified - dynamic_debug honored: 13 bes2600 callsites flipped to +p - journald persistent: /var/log/journal exists, ~376 MB - snap loop ticks accumulating - iw event capturing - tcpdump rotating - loopback self-test of nc listener: 2 MB through, OK ### Known limits of the rig - Monitor mode NOT available concurrent with managed (per iw phy phy0 info valid interface combinations). Raw 802.11 mgmt frames are invisible to tcpdump. - iw event partially compensates: deauth/auth frame headers are visible there with reason codes. - ftrace not enabled — would add bottom-half scheduling latency data; deferred. ### Receipt checklist (Phase 2) - [x] Re-read CLAUDE.md - [x] Re-read relevant memory entries - [x] Verified ohm reachable - [x] Verified dynamic_debug honored for bes2600 - [x] N/A: hardware UART not used --- ## Phase 3 — Baseline measurements (the wall) ### Three Pattern-P1 events captured #### Event 1 — 2026-05-06 07:13 (varied use, no suspend, on 2.4 GHz) ``` 07:13:16 bes2600_wlan: [RX] Receive failure: 4 (×6 in 1s) ... 77 events total over 24 seconds ... 07:13:41 iw-event: AP→ohm unprotected deauth reason 6 ("Class 2 frame received from non-authenticated station") 07:13:41 kernel: wlan0: deauthenticating from 5b:32 reason 2 = PREV_AUTH_NOT_VALID 07:13:42 kernel: wlan0: associated → 5b:33 (newton 5 GHz) ``` Recovery: 1 second, cross-band. #### Event 2 — 2026-05-06 11:03 (idle on newton 2.4 GHz) ``` 11:03:10 bes2600_wlan: [RX] Receive failure: 4 (×6 in 1s) 11:03:11 bes2600_wlan: [RX] Receive failure: 4 11:03:19 bes2600_wlan: [RX] Receive failure: 4 (×2 in 1s) 11:03:21 iw-event: AP→ohm unprotected deauth reason 6 11:03:22 kernel: wlan0: deauthenticating from 5b:32 reason 2 = PREV_AUTH_NOT_VALID 11:03:22 → 11:05:11 9 auth attempts on 3 newton BSSIDs ALL TIMED OUT (iw-event shows AP returned auth status 0 Successful twice but assoc step never completed) 11:05:11 kernel: wlan0: associated → 4e:64:5c:d8:11:62 (dingdongkingkong, ch 13) full handshake auth+assoc+connect in 110ms ``` Recovery: 109 seconds, cross-SSID + cross-channel only. EAPOL frames in entire 11:03:00 → 11:05:30 window: 0 (4WHS never attempted). Inbound packets to .168 in that window: 0. #### Event 3 — 2026-05-05 22:33 (post-resume from lid-close) ``` 22:31:22 NM: state activated → deactivating (reason 'sleeping') 22:31:22 kernel: wlan0: deauthenticating from 5b:32 reason 3 = DEAUTH_LEAVING 22:31:27 kernel: PM: suspend entry (deep) 22:31:31 kernel: PM: suspend exit (4s suspend) 22:31:35 kernel: wlan0: associated → 5b:32 [97 seconds of normal operation] 22:33:12 kernel: wlan0: deauthenticating from 5b:32 reason 2 = PREV_AUTH_NOT_VALID 22:33:13 kernel: wlan0: associated → 5b:33 (newton 5 GHz) ``` ZERO [RX] Receive failure events in the 22:31:35 → 22:33:12 window. Different trigger path. Recovery: 1s, cross-band. ### Comparison table (idle vs load) | Period | Duration | Decrypt-fails | Rate/hr | Bursts (≥4 in 60s) | Burst rate/hr | Escalations to AP-deauth-6 | |---|---|---|---|---|---|---| | 06:39–11:03 idle/varied | 4h24m | 8 | 1.8 | 1 | 0.23 | 1/1 = 100% | | 12:00–13:28 sustained 1MB/s | 1h28m | 104 | 70.9 | 12 | 8.18 | 0/12 = 0% | Burst-rate elevation under load: ~35x. ### List of 12 load-period bursts (start time, count within 60s) ``` 12:06:01 9 events 12:17:57 6 events 12:19:37 4 events 12:20:55 5 events 12:22:17 5 events 12:25:58 4 events 12:30:02 7 events 12:32:09 9 events 13:14:41 8 events 13:17:22 4 events 13:22:53 5 events 13:28:16 8 events ``` ### Open contradictions / things the loop has NOT yet resolved 1. Event-3 (post-resume) had no decrypt-failures yet still ended in PREV_AUTH_NOT_VALID. There is a second P1 trigger path we have not pinned a mechanism for. 2. Conditional probability flips between idle (100% of 1 burst escalates) and load (0% of 12 bursts escalate). Hypothesis: AP "Class 2 from unauth STA" heuristic is silenced by sustained host TX. NOT yet verified by AP-side capture (no AP logs available). 3. Cause of decrypt failures themselves remains hypothetical: PTK or GTK drift, or replay-counter mismatch. EAPOL group-rekey frames were NOT captured before either P1 event; needed wider tcpdump window. 4. Newton AP is 802.11v BSS-load capable (saw "comeback duration 1000 TU" at 12:03:23). Reason 9 (STA_REQ_ASSOC_WITHOUT_AUTH) at 12:03:24 shows AP-side state churn even within 1s of auth-success. May or may not be relevant to the P1 blackhole. ### Source citations (Phase 6 contract pins, recorded for review reference) - bes2600/txrx.c:1696 — bes_warn for [RX] Receive failure - bes2600/wsm.h:620 — WSM_STATUS_DECRYPTFAILURE = 4 - bes2600/wsm.c:1484 — wsm_receive_indication (parses status from firmware) ### Receipt checklist (Phase 3) - [x] Trace files / dmesg / regdump pasted verbatim above - [x] Raw before derived (event listings precede rates) - [x] Rig failure findings honestly recorded (monitor mode unavailable, ftrace deferred, no AP logs) --- ## What we are explicitly NOT yet at - Phase 4 plan: not started. Pending review. - Phase 6 implementation: not started. Pending plan + review. ## Asks of the reviewer 1. Is the Phase 1 metric the right discriminator? In particular, does the conditional-probability column (escalation given burst) capture what we want, or should the metric also count "AP-deauth-6 with no preceding burst" (the Event-3 path)? 2. The escalation-rate flip (100% idle, 0% load) is at N=1 idle vs N=12 load. Is N=1 idle adequate to report this finding, or do we need ≥3 idle bursts before locking? 3. Anything in Phase 3 above flagged as "not yet measured" that would be cheap to add before Phase 4? Specifically: - Should ftrace mac80211/cfg80211 events be enabled before next rep? Cost: ~10x journal volume. Benefit: bottom-half timing for the 100ms-scale stalls in the 109s blackhole. - Should tcpdump filter widen to include EAPOL frames captured in a moving 5-min window before each P1 event so we see the group-rekey directly? 4. Is the lack of AP-side capture (no Fritz!Box logs) a blocking gap, or can the campaign proceed without it for now?