notes: Bug #5 RX-degradation campaign — Phase 0 plan #13

Merged

marfrit merged 1 commits from claude-noether-11 into main

2026-05-07 21:52:14 +00:00

Author	SHA1	Message	Date
claude-noether	6bae531917	notes: Bug #5 RX-degradation campaign — Phase 0 plan + research question After Patch C v3 closed (PR #5 merged, Phase 7 N=3 verified at +73% throughput vs Patch B baseline), the post-13-min RX-degradation pattern remains. Reproduces on Patch B, F, and v3 alike — independent of the relay/race issues v3 addressed. Side-effect that was masked by the throughput floor while v2's race was the dominant variable. Research question (locked): Why does the bes2600 RX path collapse from ~2 MB/s sustained @ fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic wsm_generic_confirm failed for request 0x0007 + ieee80211 phy0: [SCAN] Scan failed (-22) every 300 s in the intervening window? Phase 0 protocol: - long-capture rig armed on ohm at uptime 0 (fresh boot 23:13 CEST) - ftrace events: workqueue, mac80211, cfg80211, mmc, sdhci, power - iw event (cfg80211 reason codes), dmesg follow, per-30s netdev counter snap, 5 stress probes at T+5/10/15/20/25 min Phase 0 will: - re-anchor the predecessor data via the long capture (in-session N=1; re-run if anomalous) - characterize state transitions (first scan-fail, first throughput drop) via cfg80211/mac80211 ftrace + iw event correlation - feed Phase 1 metric formulation Mechanism candidates (Phase 4 will discriminate): 1. Firmware-side resource exhaustion (per-scan accumulator) 2. NetworkManager scan-fail recovery loop competing with data 3. AP-side rate limiting / fairness probation 4. PSM state machine deadlock (c7 latch stale) 5. SDIO bus retune interaction 6. Power-management busy-event accumulator leak Out of scope: Patch C2/D/E, higher-rate ramp, reproducing on different APs. Independent campaign from Patch C closure.	2026-05-07 23:23:31 +02:00

Author

SHA1

Message

Date

claude-noether

6bae531917

notes: Bug #5 RX-degradation campaign — Phase 0 plan + research question

After Patch C v3 closed (PR #5 merged, Phase 7 N=3 verified at +73%
throughput vs Patch B baseline), the post-13-min RX-degradation
pattern remains.  Reproduces on Patch B, F, and v3 alike — independent
of the relay/race issues v3 addressed.  Side-effect that was masked
by the throughput floor while v2's race was the dominant variable.

Research question (locked):

  Why does the bes2600 RX path collapse from ~2 MB/s sustained @
  fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic
  wsm_generic_confirm failed for request 0x0007 + ieee80211 phy0:
  [SCAN] Scan failed (-22) every 300 s in the intervening window?

Phase 0 protocol:

  - long-capture rig armed on ohm at uptime 0 (fresh boot 23:13 CEST)
  - ftrace events: workqueue, mac80211, cfg80211, mmc, sdhci, power
  - iw event (cfg80211 reason codes), dmesg follow, per-30s netdev
    counter snap, 5 stress probes at T+5/10/15/20/25 min

Phase 0 will:

  - re-anchor the predecessor data via the long capture (in-session
    N=1; re-run if anomalous)
  - characterize state transitions (first scan-fail, first throughput
    drop) via cfg80211/mac80211 ftrace + iw event correlation
  - feed Phase 1 metric formulation

Mechanism candidates (Phase 4 will discriminate):

  1. Firmware-side resource exhaustion (per-scan accumulator)
  2. NetworkManager scan-fail recovery loop competing with data
  3. AP-side rate limiting / fairness probation
  4. PSM state machine deadlock (c7 latch stale)
  5. SDIO bus retune interaction
  6. Power-management busy-event accumulator leak

Out of scope: Patch C2/D/E, higher-rate ramp, reproducing on different
APs.  Independent campaign from Patch C closure.

2026-05-07 23:23:31 +02:00

notes: Bug #5 RX-degradation campaign — Phase 0 plan #13

1 Commits