Files
besser/notes/phase7-v3-2026-05-07.md
T
claude-noether 3a38286e6f notes: Patch C v3 Phase 7 N=3 results — +73% throughput, race fix verified
N=3 stress reps on ohm with v3 module (srcversion 371C6606B73AF19299228CA),
3 min @ 4 MB/s each, all within fresh-chip uptime window (200/391/582 s).

| rep | MB/s | sdio_rx_work | bh_work redispatches |
|----:|----:|-:|-:|
|  1  | 2.363 | 0 | 0 |
|  2  | 2.590 | 0 | 0 |
|  3  | 2.102 | 0 | 0 |

N=3 mean: 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s.

vs Patch B baseline (1.362 MB/s, run-20260507-patchC-preflight): +73%.
vs original Bug #5 floor (75 KB/s rep 3 death): 28× improvement.

Plan §4.5 prediction verified:
  - sdio_rx_work dispatch rate: 86.4/s -> 0/s (function deleted)
  - bes2600_bh_work redispatches: 0 (preserved invariant)
  - observed receive @ 4 MB/s: floor lifts toward >= 1 MB/s (exceeded —
    floor is 2.10 MB/s)

Bonus finding: sdio_tx_work dispatch rate dropped from 276.1/s to
0.8/s.  The post-tx queue_work(rx_work) call I rewired to
self->irq_handler() was actually firing more often than predicted;
folding it into bh-wake-up cuts ~99.7% of the workqueue dispatches.

No WARN/BUG/oops on any rep — the v2 race that wedged Patch C v1
within 13 s under stress did NOT reproduce on v3.

Phase 8 lesson distilled as feedback_mine_upstream_ancestor memory:
when patching a fork-from-upstream driver, mine the ancestor's
fix history BEFORE writing fixes from scratch.  cw1200 mining
drove the structural pivot from v2's atomic_t wrapper to v3's
no-relay architecture.  Without the mine, we'd have shipped v2.

Phase 7 receipts checklist met (N=3, fresh-chip, identical
instrumentation, predicted delta verified, no-WARN under stress).
2026-05-07 23:08:51 +02:00

5.8 KiB
Raw Blame History

Patch C v3 Phase 7 — N=3 verification results

Date: 2026-05-07 Module: bes2600.ko srcversion 371C6606B73AF19299228CA (cleanups+F+v3) Rig: ohm (PineTab2, RK3566 + BES2600 SDIO), wired enu1 path for telemetry Stress: netcat sender from boltzmann, pv -L 4m rate cap (4 MB/s), 3-min window per rep Boot: fresh — uptime 200 s / 391 s / 582 s at rep 1/2/3 starts (all within fresh-chip window before the ~13-min Bug #5 RX-degradation point)


Results table

rep elapsed (s) RX bytes RX MB MB/s sdio_rx_work sdio_tx_work bes2600_bh_work redispatches
1 180.72 447,758,333 427.0 2.363 0 368 0
2 180.67 490,669,836 467.9 2.590 0 20 0
3 180.69 398,224,992 379.8 2.102 0 39 0

N=3 stats: mean 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s · max 2.590 MB/s

Comparison to baselines

vs Patch B baseline (run-20260507-patchC-preflight, N=1, 5 min @ 4 MB/s, fresh chip)

Patch B v3 mean Δ
throughput 1.362 MB/s 2.352 MB/s +73%

vs original Bug #5 baseline (run-20260506-0659-fresh, N=3, decay over time)

Bug #5 anchor was 725 / 663 / 75 KB/s — rep 3 saw link-death at ~9 min.

Bug #5 floor (rep 3) v3 floor (rep 3) Δ
throughput 0.075 MB/s 2.102 MB/s 28× improvement

vs Phase 4 v3 plan §4.5 predictions

metric predicted observed verdict
sdio_rx_work dispatch rate → 0/s (high confidence) 0/s all 3 reps
bes2600_bh_work redispatches → 0 (high confidence) 0 all 3 reps
observed RX @ 4 MB/s floor lifts toward ≥ 1 MB/s sustained (medium) 2.10 MB/s floor exceeds prediction
_raw_spin_unlock_irqrestore CPU% 20% → 12-15% (medium) not measured deferred — perf-record run can confirm

Workqueue dispatch rate collapse

Patch B baseline (per run-20260507-patchC-preflight):

  • sdio_rx_work: 86.4/s
  • sdio_tx_work: 276.1/s
  • bes2600_bh_work redispatches: 0

v3 N=3 mean:

  • sdio_rx_work: 0.0/s (function deleted)
  • sdio_tx_work: 0.8/s (post-tx queue_work → self->irq_handler call; the chip-side TX driver no longer needs to wake a separate workqueue)
  • bes2600_bh_work redispatches: 0 (preserved invariant; bh thread still single long-lived work item)

The 99.7% reduction in sdio_tx_work dispatch rate is a side-effect of v3's IRQ→bh-direct rewiring: the post-TX queue_work(self->sdio_wq, &self->rx_work) call I replaced with self->irq_handler() was actually firing more often than I'd assumed (276/s on Patch B). Folding it into the bh wake-up cuts 275/s of workqueue dispatches that weren't doing anything useful.

Risks observed

  • Bug #5 RX-degradation after ~13-min uptime is independent of v3. Same scan-failure pattern observed (wsm_generic_confirm failed for request 0x0007 + [SCAN] Scan failed (-22) every 300s) on v3 as on Patch B. v3 did NOT fix Bug #5; it fixed the v2-race that was ALSO present. RX-degradation is firmware-side, likely needs a separate campaign.
  • N=3 reps were 3 minutes each instead of 5 to fit within the fresh-chip window. Direct comparison with Patch B's 5-min baseline is approximate; chip-side throughput in 3-min vs 5-min should be similar given the bug fires on uptime, not on transferred-bytes.
  • No regression observed in 3×3 min = 9 min of stress. The v2 race that wedged Patch C v1 within 13 s did NOT reproduce. v3's structural fix held.

Phase 8 — lesson distilled

The cw1200 mining was decisive. Patch C v2 (atomic_t prep + direct-deliver on top of relay, PR #10 closed) would have worked correctly but kept the structural relay that was the source of the race. v3 removed the relay entirely — restoring single-writer-from-bh invariant by construction, no atomic_t needed, and delivering a 73% throughput improvement as side benefit.

Without the cw1200 history mine (~/src/linux-rockchip, 228 cw1200 commits over 16 years), v2's atomic_t prep would have shipped. The structural fix is upstream-grade because it matches the reference driver. v2's atomic_t wrapper would have been bes2600-specific bookkeeping with no upstream parallel — defensible as a fix, but worse to maintain.

Memory entry: When you have an upstream-ancestral driver still in the kernel tree, mine its bug-fix history before patching the inherited fork. The architectural answer may already be there; you just have to look.

Receipts checklist (Phase 7 done)

  • N=3 reps captured at fresh-chip uptime (200/391/582 s)
  • Same instrumentation pre/post (workqueue ftrace + rx_packets/rx_bytes counters)
  • Predicted delta matched (sdio_rx_work → 0; bh redispatches → 0; throughput ≥ 1 MB/s sustained)
  • No WARN/BUG/oops during stress on any rep
  • Wired-rig telemetry collection (would have caught a wedge if v3 had one)
  • Receiver nc listener restarted fresh per rep (avoiding rep-2-style TCP race)
  • Stress-ramp memory honored: not steady-state low-rate; saw 4 MB/s saturate

Out-of-scope follow-ups

  • Patch C2 — ieee80211_rx_list batch delivery — gated on Task #19 kerneldoc verification.
  • Patch D — ba_lock atomicization — independent.
  • Patch E — ps_state_lock skip when pm_unsupported — independent.
  • Bug #5 RX-degradation after 13-min uptime — separate campaign, scan-failure pattern is the entry point.
  • Task #24 — observe whether bh.c asm volatile("nop") / commented-out __bes2600_irq_enable(1) / BUG_ON in hot path are still load-bearing post-v3. Already partially answered: __bes2600_irq_enable is a stub (PR #11 comment). The other artifacts can be re-read fresh.

Phase 7 results captured 2026-05-07 by Claude (noether). v3 (PR #5) closes Patch C campaign with structural improvement + race fix + measurable throughput win.