From f232476240869e592b842c3c2e8925845ffc5da6 Mon Sep 17 00:00:00 2001 From: "Claude (noether)" Date: Thu, 7 May 2026 19:04:53 +0200 Subject: [PATCH] =?UTF-8?q?notes:=20Patch=20C=20Phase=204=20plan=20?= =?UTF-8?q?=E2=80=94=20collapse=20sdio=5Frx=5Fwork=20into=20BH=20(item=201?= =?UTF-8?q?=20only)?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out and gated on Task #19 kerneldoc contract verification. Approach choice: Option A (sdio_rx_work delivers directly into wsm_handle_rx, removing rx_queue and its two synchronization points per frame) over Option B (subsume into bh thread). Option A has a smaller diff and clearer bisection story; the residual per-IRQ workqueue dispatch is preserved as a measurable Phase 7 data point that motivates or doesn't motivate a follow-on Option-B patch. Predicted delta in Phase 3 units, with confidence levels stated explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract citations (wsm_handle_rx callability from sdio_wq context; wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp. Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON in hot path) explicitly deferred to Task #24 per feedback_dont_patch_downstream_artifacts. Awaiting Phase 5 second-model review on DokuWiki. --- notes/patch-c-phase4-plan-2026-05-07.md | 184 ++++++++++++++++++++++++ 1 file changed, 184 insertions(+) create mode 100644 notes/patch-c-phase4-plan-2026-05-07.md diff --git a/notes/patch-c-phase4-plan-2026-05-07.md b/notes/patch-c-phase4-plan-2026-05-07.md new file mode 100644 index 000000000..a929baaa3 --- /dev/null +++ b/notes/patch-c-phase4-plan-2026-05-07.md @@ -0,0 +1,184 @@ +# Patch C — Phase 4 Plan: collapse sdio_rx_work into BH + +**Author:** Claude (noether) +**Status:** Phase 4 — pending Phase 5 second-model review before any Phase 6 code. +**Scope:** **item 1 only** (per merged PR #8 inline review: "do it sequentially; we're not on the clock"). +**Item 2** (batch deliver via `ieee80211_rx_list`) splits to **Patch C2**, gated on Task #19 kerneldoc verification. + +--- + +## §0 Substrate — anchored + +Bug #5 anchor (recorded 2026-05-07, see `notes/phase1-bug5-2026-05-07.md`): + +- Sender: netcat-over-WiFi, 4 MB/s cap, 2.4 GHz, AVM AP, single-STA +- Receiver: ohm (PineTab2, RK3566 + BES2600WM-SDIO) +- N=3 baseline reps: 725 / 663 / 75 KB/s (rep 3 saw link-death at ~9 min) +- `perf record -g` during 4MB/s window: `_raw_spin_unlock_irqrestore` ≈ 20% CPU +- ftrace lock-instrumentation, system-wide: `workqueue_execute_start` ≈ 5,643/sec +- Driver-side count: `wsm_cmd_send` 13/sec — wsm command path is *not* the dispatch source; the contributor is the per-SDIO-transaction relay. + +Root cause traced in PR #7 (Sonnet review) and concurred in PR #8 (Opus review): RX path adds two synchronization points per frame and one wait-queue wake-up per IRQ batch via `sdio_rx_work` → `rx_queue` → `bh_work` indirection. + +## §1 Goal (locked) + +Reduce per-RX-frame overhead enough that observed receive ≥ 1.0 MB/s sustained @ 4 MB/s sender, with `_raw_spin_unlock_irqrestore` < 15 % CPU during the 4 MB/s window. No 30-min cascade to link-death. + +(This is a partial step toward Phase 1's full target of ≥ 2 MB/s sustained @ 4 MB/s with < 10 % lock CPU. The full target is jointly addressed by Patch C + Patch C2; Patch C alone should *cross half the gap*.) + +## §2 Situation + +- `bes2600.ko` srcversion `1B3B3ED0…` deployed on ohm (c-stack + Patch A + Patch B). +- `cleanups` branch on `marfrit/bes2600-dkms` is the current source-of-truth. +- Build sandbox `/var/tmp/c6-sandbox/` on ohm, native `make -j4`. +- `BES2600_RX_IN_BH` is **defined** in Makefile — `bes2600_bh_rx_helper` is the active RX consumer. +- ohm reachable. Markus pushes the reboot button; never me. +- Test rig under `/root/bes2600-samples/` — `rep-trace.sh` per-rep capture script. + +## §3 Baseline measurements + +Reused from Bug #5 Phase 0 (above). No re-anchor needed for Patch C — same regime. + +**Specific Phase-3-units that this plan's predictions reference:** + +| metric | tool | current value (4MB/s window) | +|---|---|---| +| observed receive throughput | netcat receiver byte-count | 75–725 KB/s, rep-variance high | +| `_raw_spin_unlock_irqrestore` CPU% | perf record / report | ~20% | +| `workqueue_execute_start`/sec | ftrace `workqueue:workqueue_execute_start` | ~5,643/sec system-wide | +| `bes_sdio` workqueue dispatches | `cat /sys/kernel/tracing/events/workqueue/.../filter` filtered by `bes_sdio` | not measured pre-patch — **TODO before Phase 6** | +| RX SKB rate at mac80211 boundary | trace `mac80211:drv_rx_irqsafe` count | not measured pre-patch — **TODO before Phase 6** | + +Phase 6 must not start until the two TODOs above are filled in — otherwise Phase 7 has no reference point for the predicted-delta comparison. + +## §4 Plan + +### §4.1 What will be touched + +- `bes2600_sdio.c::sdio_rx_work` — the relay loop. After this patch, it still drains the SDIO bus into SKBs but **delivers SKBs directly into `wsm_handle_rx`** instead of `skb_queue_tail`-ing them onto `self->rx_queue`. +- `bes2600_sdio.c::bes2600_sdio_extract_packets` — the inner per-SKB extractor. Changes the in-loop action from `skb_queue_tail(&self->rx_queue, skb)` to a direct call (or callback) into the wsm dispatcher. +- `bes2600_sdio.c::bes2600_sdio_pipe_read` — becomes unused, removed. +- `bh.c::bes2600_bh_rx_helper` — its `BES_SDIO_RX_MULTIPLE_ENABLE` branch is no longer reachable for RX (RX path no longer feeds bh). Either gate the helper, or remove the helper outright if `bh_rx` atomic is no longer raised on RX. + +### §4.2 What will NOT be touched + +- `ieee80211_rx_irqsafe()` call sites — that's Patch C2 (item 2). +- TX path — `sdio_tx_work`, `bes2600_bh_tx_helper`, etc. Untouched. +- `sdio_wq` workqueue alloc — stays. After patch it hosts only `tx_work` + `scan_work` + (briefly during patch) `rx_work`. Renaming is cosmetic and out of scope. +- The bh thread itself — still runs, still handles TX, still watches the timeouts. +- `bh.c` `#if 0` graveyard — separate hygiene patch, not bundled. +- `__bes2600_irq_enable(1)` commented-out / `asm volatile("nop")` placeholder — **deferred** per `feedback_dont_patch_downstream_artifacts`. These are symptom-shaped; Patch C may dissolve them. Re-evaluate at Task #24 (post-Patch-E observation). +- `bh_rx` / `bh_tx` atomic split — out of scope. + +### §4.3 Approach choice — Option A (sdio_rx_work direct delivery) + +Two structural options surveyed in PR #8 §2.1; recap: + +| | Option A: direct delivery from sdio_rx_work | Option B: subsume sdio_rx_work into bh thread | +|---|---|---| +| diff size | small | medium | +| eliminates `rx_queue->lock` × 2 per frame | yes | yes | +| eliminates `sdio_wq.rx_work` workqueue dispatch per IRQ | no | yes | +| changes who calls `wsm_handle_rx` | sdio_wq context (already process context) | bh thread | +| TX/RX SDIO bus contention | unchanged (sdio_rx_work and sdio_tx_work already share `bes2600_sdio_lock`) | adds bh ↔ sdio_tx_work contention on the SDIO mutex | +| bisection isolation | clean: only the rx_queue handoff is removed | mixes "remove handoff" with "subsume thread" | + +**Choosing Option A.** Reasons: +1. Smaller diff = clearer Phase-7 attribution. If RX KB/s rises, we know it was the rx_queue handoff, not the workqueue topology. +2. Per Markus's PR #8 review: split was for bisection clarity. Option A is narrower than Option B. +3. The remaining cost (per-IRQ `sdio_wq.rx_work` dispatch) is ≤ 1 dispatch per IRQ batch; multi-RX coalescing means several frames per dispatch. If Phase 7 of Patch C shows that dispatch IS the residual cost, that becomes a concrete data point and motivates a *measured* Option-B follow-up, not a speculative one. + +### §4.4 Implementation sketch (preview — actual code in Phase 6) + +**Today** (`bes2600_sdio.c:783–831`): +```c +static int bes2600_sdio_extract_packets(...) { + for each packet: + skb = dev_alloc_skb(...); + memcpy(skb->data, &data[pos], packet_len); + spin_lock(&self->rx_queue_lock); + skb_queue_tail(&self->rx_queue, skb); // ← handoff + spin_unlock(&self->rx_queue_lock); +} +static void sdio_rx_work(...) { + bes2600_sdio_extract_packets(...); + self->irq_handler(self->irq_priv); // ← wakes bh_wq +} +// bh thread later: pipe_read = skb_dequeue(rx_queue) → wsm_handle_rx(skb) +``` + +**After patch** (sketch): +```c +static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8 *data) { + for each packet: + skb = dev_alloc_skb(...); + memcpy(skb->data, &data[pos], packet_len); + ret = wsm_handle_rx(self->core, wsm_id_from(skb), wsm_hdr_of(skb), &skb); + if (skb) dev_kfree_skb(skb); + // no rx_queue, no spinlock, no wake-up +} +static void sdio_rx_work(...) { + bes2600_sdio_extract_packets(...); + // self->irq_handler(...) is no longer called for RX-only wakes + // (it remains called for TX-confirm-completion paths, if any) +} +``` + +Caveats discovered during sketch: +- `wsm_handle_rx`'s signature wants `(hw_priv, id, wsm_hdr*, **skb)`. `extract_packets` doesn't currently parse the wsm header — we either parse it inline (cheap; the cost is one `__le16_to_cpu`) or defer parsing into a new `bes2600_sdio_deliver_rx(skb)` helper that wraps it. +- `hw_priv` is reachable as `self->core`. +- Need to verify `wsm_handle_rx` is callable from sdio_wq context. **Hypothesis:** yes, because today's bh thread is also process-context-via-workqueue and that's where wsm_handle_rx already runs. Phase 6 contract-cite from `wsm.h` / call-graph confirms. +- The `irq_handler(self->irq_priv)` wakeup at sdio_rx_work:902 — keep it, but confirm whether bh actually has remaining work after RX is gone. Possibilities: TX-confirm completions (`wsm_release_tx_buffer`) still need a bh wake. Verify in Phase 6. + +### §4.5 Predicted delta (Phase 3 units) + +Conservative because Patch C is item 1 only, not items 1+2. + +| metric | predicted change | confidence | +|---|---|---| +| `rx_queue->lock` acquire/release rate | → 0 (lock is removed entirely; struct field deleted) | high | +| RX-path wait-queue wakes (`bh_wq` from sdio_rx_work for RX) | → 0 (TX-confirm wakes remain) | high | +| `_raw_spin_unlock_irqrestore` CPU% | 20 % → 12–15 % | **medium** — the rx_queue lock is one of several contributors; I don't have per-lock breakdown pre-patch | +| `workqueue_execute_start`/sec | marginal change (≤ 5 %) | high — sdio_wq dispatch still happens per IRQ | +| observed receive @ 4 MB/s | floor lifts from 75 KB/s → ≥ 1.0 MB/s; rep-variance shrinks | **medium** — rep 3's link death has multiple causes (decrypt-storm path is Patch A's territory; AP-side `aid 30` rejection is also possible) | +| Phase 7 N=3 outcome | all reps ≥ 1 MB/s sustained for 30 min @ 4 MB/s | **medium** | + +**Honest acknowledgement:** the medium-confidence predictions are the ones where Phase 7 either confirms the model or surfaces a new bug. If `_raw_spin_unlock_irqrestore` only drops to 18 %, the next-largest contributor was something else — `pool->lock` (workqueue infrastructure) or `ba_lock` — and Patch D/E/C2 become the answer. + +### §4.6 Risks + +1. **`wsm_handle_rx` not callable from sdio_wq**: low probability (process context, same shape as today's bh), but a cite-failure here means revert to Option B. **Phase 6 must produce a `wsm.h` contract citation** before code lands. +2. **TX-confirm wake-ups stop firing**: if `wsm_handle_rx` was the only thing that ultimately bumped `bh_tx`, removing it from bh's input causes TX-confirm starvation. Mitigation: keep `irq_handler(irq_priv)` call in sdio_rx_work for now; let the bh's wait_event re-evaluate `bh_tx` on every wake. **Verify in Phase 6 that `wsm_release_tx_buffer` still wakes bh.** +3. **SKB allocation under memory pressure**: `dev_alloc_skb` in extract_packets currently `msleep(100)` retries up to 10×. Calling `wsm_handle_rx` directly from extract_packets keeps us in sdio_wq context during sleep; that's the same as today, so no new risk. +4. **rcu / locking invariants in `wsm_handle_rx`**: it traverses `priv->vif_list`, may grab `priv->vif_lock`. Currently called from bh thread. After patch: called from sdio_wq context. Both are process context, both can sleep. No new risk *unless* there's a held lock at sdio_wq level that wsm_handle_rx tries to re-acquire. **Phase 6 lock-graph audit required.** +5. **`bes2600_chrdev_is_bus_error()` early-return**: currently checked in `pipe_read`. After patch, must move into `extract_packets` or `sdio_rx_work` so RX during a bus-error window still gets dropped, not passed to mac80211. +6. **Multi-vif RX serialization**: the `rx_queue` is per-sbus_priv, not per-vif. After patch, multi-vif demux happens inside `wsm_handle_rx` (same as today). No new risk; same ceiling. + +### §4.7 Phase 5 review handover + +Goal/Situation/Measurements/Plan paste verbatim into DokuWiki when Markus initiates handover. **Do not curate** the plan for the reviewer — including the "medium-confidence" predictions and the §4.6 risk list verbatim. Reviewer should see the same uncertainty I have. + +### §4.8 Phase 7 protocol (after Phase 6 lands) + +Per `feedback_phase7_stress_ramp.md` — **stress ramp, not steady cap**: + +1. Pre-patch baseline (re-anchor): 5 min @ 1 MB/s, 10 min @ 2 MB/s, 30 min @ 4 MB/s. Capture ftrace `workqueue/`, `lock/`, `mac80211/`, `mmc/`. perf record during the 4 MB/s window. +2. Apply Patch C, install, reboot (Markus pushes). +3. Post-patch: identical ramp, identical instrumentation. +4. Compute deltas in **the same units** as §3 baseline. Compare to §4.5 predictions. Any unexplained delta is a finding, not a footnote — log it and loop back to Phase 4 if the model is wrong. +5. **N=3 reps** post-patch. The user's stress-ramp memory and the receipts checklist both require this. +6. Capture `sdio_work_debug` output and `dmesg` if any storm fires (Patch A's counter should hold steady). +7. If Phase 7 numbers match prediction → Phase 8 memory update + proceed to Patch C2. +8. If they don't match → loop back to Phase 4. Don't paper-fix. + +## §5 Out-of-scope items recorded for follow-on patches + +- **Patch C2**: items 2 — `ieee80211_rx_list` batch delivery. Gated on Task #19 kerneldoc verification. +- **Patch D**: ba_lock atomicization at `txrx.c:998-1005, 1632`. Independent. +- **Patch E**: ps_state_lock skip when `pm_unsupported = true` at `txrx.c:1942-1948`. Independent, gated on c7 latch. +- **Task #24**: post-Patch-E observation of bh.c `asm volatile("nop")`, commented-out `__bes2600_irq_enable(1)`, BUG_ON in steady-state hot path. Symptom-shaped; observe before patching. +- **Task #25**: measure `sw_mci_check_r1_ready` on RK3566 during testing. + +--- + +*Plan written 2026-05-07 by Claude (noether). Awaiting Phase 5 second-model review on DokuWiki, initiated by Markus.* -- 2.47.3