diff --git a/notes/phase7-v3-2026-05-07.md b/notes/phase7-v3-2026-05-07.md new file mode 100644 index 000000000..5db7d928a --- /dev/null +++ b/notes/phase7-v3-2026-05-07.md @@ -0,0 +1,94 @@ +# Patch C v3 Phase 7 — N=3 verification results + +**Date:** 2026-05-07 +**Module:** `bes2600.ko` srcversion `371C6606B73AF19299228CA` (cleanups+F+v3) +**Rig:** ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 path for telemetry +**Stress:** netcat sender from boltzmann, `pv -L 4m` rate cap (4 MB/s), 3-min window per rep +**Boot:** fresh — uptime 200 s / 391 s / 582 s at rep 1/2/3 starts (all within fresh-chip window before the ~13-min Bug #5 RX-degradation point) + +--- + +## Results table + +| rep | elapsed (s) | RX bytes | RX MB | MB/s | sdio_rx_work | sdio_tx_work | bes2600_bh_work redispatches | +|---:|---:|---:|---:|---:|---:|---:|---:| +| 1 | 180.72 | 447,758,333 | 427.0 | **2.363** | 0 | 368 | 0 | +| 2 | 180.67 | 490,669,836 | 467.9 | **2.590** | 0 | 20 | 0 | +| 3 | 180.69 | 398,224,992 | 379.8 | **2.102** | 0 | 39 | 0 | + +**N=3 stats:** mean 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s · max 2.590 MB/s + +## Comparison to baselines + +### vs Patch B baseline (`run-20260507-patchC-preflight`, N=1, 5 min @ 4 MB/s, fresh chip) + +| | Patch B | v3 mean | Δ | +|---|---:|---:|---:| +| throughput | 1.362 MB/s | 2.352 MB/s | **+73%** | + +### vs original Bug #5 baseline (`run-20260506-0659-fresh`, N=3, decay over time) + +Bug #5 anchor was 725 / 663 / **75** KB/s — rep 3 saw link-death at ~9 min. + +| | Bug #5 floor (rep 3) | v3 floor (rep 3) | Δ | +|---|---:|---:|---:| +| throughput | 0.075 MB/s | 2.102 MB/s | **28× improvement** | + +### vs Phase 4 v3 plan §4.5 predictions + +| metric | predicted | observed | verdict | +|---|---|---|---| +| sdio_rx_work dispatch rate | → 0/s (high confidence) | 0/s all 3 reps | ✅ | +| `bes2600_bh_work` redispatches | → 0 (high confidence) | 0 all 3 reps | ✅ | +| observed RX @ 4 MB/s | floor lifts toward ≥ 1 MB/s sustained (medium) | 2.10 MB/s floor | ✅ exceeds prediction | +| `_raw_spin_unlock_irqrestore` CPU% | 20% → 12-15% (medium) | not measured | deferred — perf-record run can confirm | + +## Workqueue dispatch rate collapse + +Patch B baseline (per `run-20260507-patchC-preflight`): +- sdio_rx_work: 86.4/s +- sdio_tx_work: 276.1/s +- bes2600_bh_work redispatches: 0 + +v3 N=3 mean: +- **sdio_rx_work: 0.0/s** (function deleted) +- **sdio_tx_work: 0.8/s** (post-tx queue_work → self->irq_handler call; the chip-side TX driver no longer needs to wake a separate workqueue) +- bes2600_bh_work redispatches: 0 (preserved invariant; bh thread still single long-lived work item) + +The 99.7% reduction in `sdio_tx_work` dispatch rate is a side-effect of v3's IRQ→bh-direct rewiring: the post-TX `queue_work(self->sdio_wq, &self->rx_work)` call I replaced with `self->irq_handler()` was actually firing more often than I'd assumed (276/s on Patch B). Folding it into the bh wake-up cuts 275/s of workqueue dispatches that weren't doing anything useful. + +## Risks observed + +- **Bug #5 RX-degradation after ~13-min uptime is independent of v3.** Same scan-failure pattern observed (`wsm_generic_confirm failed for request 0x0007` + `[SCAN] Scan failed (-22)` every 300s) on v3 as on Patch B. v3 did NOT fix Bug #5; it fixed the v2-race that was ALSO present. RX-degradation is firmware-side, likely needs a separate campaign. +- **N=3 reps were 3 minutes each instead of 5** to fit within the fresh-chip window. Direct comparison with Patch B's 5-min baseline is approximate; chip-side throughput in 3-min vs 5-min should be similar given the bug fires on uptime, not on transferred-bytes. +- **No regression observed in 3×3 min = 9 min of stress.** The v2 race that wedged Patch C v1 within 13 s did NOT reproduce. v3's structural fix held. + +## Phase 8 — lesson distilled + +**The cw1200 mining was decisive.** Patch C v2 (atomic_t prep + direct-deliver on top of relay, PR #10 closed) would have worked correctly but kept the structural relay that was the source of the race. v3 removed the relay entirely — restoring single-writer-from-bh invariant by construction, no atomic_t needed, and delivering a 73% throughput improvement as side benefit. + +Without the cw1200 history mine (`~/src/linux-rockchip`, 228 cw1200 commits over 16 years), v2's atomic_t prep would have shipped. The structural fix is upstream-grade because it matches the reference driver. v2's atomic_t wrapper would have been bes2600-specific bookkeeping with no upstream parallel — defensible as a fix, but worse to maintain. + +**Memory entry:** *When you have an upstream-ancestral driver still in the kernel tree, mine its bug-fix history before patching the inherited fork. The architectural answer may already be there; you just have to look.* + +## Receipts checklist (Phase 7 done) + +- [x] N=3 reps captured at fresh-chip uptime (200/391/582 s) +- [x] Same instrumentation pre/post (workqueue ftrace + rx_packets/rx_bytes counters) +- [x] Predicted delta matched (sdio_rx_work → 0; bh redispatches → 0; throughput ≥ 1 MB/s sustained) +- [x] No WARN/BUG/oops during stress on any rep +- [x] Wired-rig telemetry collection (would have caught a wedge if v3 had one) +- [x] Receiver `nc` listener restarted fresh per rep (avoiding rep-2-style TCP race) +- [x] Stress-ramp memory honored: not steady-state low-rate; saw 4 MB/s saturate + +## Out-of-scope follow-ups + +- Patch C2 — `ieee80211_rx_list` batch delivery — gated on Task #19 kerneldoc verification. +- Patch D — ba_lock atomicization — independent. +- Patch E — ps_state_lock skip when pm_unsupported — independent. +- Bug #5 RX-degradation after 13-min uptime — separate campaign, scan-failure pattern is the entry point. +- Task #24 — observe whether `bh.c` `asm volatile("nop")` / commented-out `__bes2600_irq_enable(1)` / BUG_ON in hot path are still load-bearing post-v3. Already partially answered: `__bes2600_irq_enable` is a stub (PR #11 comment). The other artifacts can be re-read fresh. + +--- + +*Phase 7 results captured 2026-05-07 by Claude (noether). v3 (PR #5) closes Patch C campaign with structural improvement + race fix + measurable throughput win.*