Files
besser/notes/patch-c-phase4-plan-2026-05-07.md
claude-noether f232476240 notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only)
Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C
is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work
indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out
and gated on Task #19 kerneldoc contract verification.

Approach choice: Option A (sdio_rx_work delivers directly into
wsm_handle_rx, removing rx_queue and its two synchronization points per
frame) over Option B (subsume into bh thread). Option A has a smaller
diff and clearer bisection story; the residual per-IRQ workqueue
dispatch is preserved as a measurable Phase 7 data point that motivates
or doesn't motivate a follow-on Option-B patch.

Predicted delta in Phase 3 units, with confidence levels stated
explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract
citations (wsm_handle_rx callability from sdio_wq context;
wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress
ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp.

Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON
in hot path) explicitly deferred to Task #24 per
feedback_dont_patch_downstream_artifacts.

Awaiting Phase 5 second-model review on DokuWiki.
2026-05-07 19:04:53 +02:00

185 lines
13 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Patch C — Phase 4 Plan: collapse sdio_rx_work into BH
**Author:** Claude (noether)
**Status:** Phase 4 — pending Phase 5 second-model review before any Phase 6 code.
**Scope:** **item 1 only** (per merged PR #8 inline review: "do it sequentially; we're not on the clock").
**Item 2** (batch deliver via `ieee80211_rx_list`) splits to **Patch C2**, gated on Task #19 kerneldoc verification.
---
## §0 Substrate — anchored
Bug #5 anchor (recorded 2026-05-07, see `notes/phase1-bug5-2026-05-07.md`):
- Sender: netcat-over-WiFi, 4 MB/s cap, 2.4 GHz, AVM AP, single-STA
- Receiver: ohm (PineTab2, RK3566 + BES2600WM-SDIO)
- N=3 baseline reps: 725 / 663 / 75 KB/s (rep 3 saw link-death at ~9 min)
- `perf record -g` during 4MB/s window: `_raw_spin_unlock_irqrestore` ≈ 20% CPU
- ftrace lock-instrumentation, system-wide: `workqueue_execute_start` ≈ 5,643/sec
- Driver-side count: `wsm_cmd_send` 13/sec — wsm command path is *not* the dispatch source; the contributor is the per-SDIO-transaction relay.
Root cause traced in PR #7 (Sonnet review) and concurred in PR #8 (Opus review): RX path adds two synchronization points per frame and one wait-queue wake-up per IRQ batch via `sdio_rx_work``rx_queue``bh_work` indirection.
## §1 Goal (locked)
Reduce per-RX-frame overhead enough that observed receive ≥ 1.0 MB/s sustained @ 4 MB/s sender, with `_raw_spin_unlock_irqrestore` < 15 % CPU during the 4 MB/s window. No 30-min cascade to link-death.
(This is a partial step toward Phase 1's full target of ≥ 2 MB/s sustained @ 4 MB/s with < 10 % lock CPU. The full target is jointly addressed by Patch C + Patch C2; Patch C alone should *cross half the gap*.)
## §2 Situation
- `bes2600.ko` srcversion `1B3B3ED0…` deployed on ohm (c-stack + Patch A + Patch B).
- `cleanups` branch on `marfrit/bes2600-dkms` is the current source-of-truth.
- Build sandbox `/var/tmp/c6-sandbox/` on ohm, native `make -j4`.
- `BES2600_RX_IN_BH` is **defined** in Makefile — `bes2600_bh_rx_helper` is the active RX consumer.
- ohm reachable. Markus pushes the reboot button; never me.
- Test rig under `/root/bes2600-samples/``rep-trace.sh` per-rep capture script.
## §3 Baseline measurements
Reused from Bug #5 Phase 0 (above). No re-anchor needed for Patch C — same regime.
**Specific Phase-3-units that this plan's predictions reference:**
| metric | tool | current value (4MB/s window) |
|---|---|---|
| observed receive throughput | netcat receiver byte-count | 75725 KB/s, rep-variance high |
| `_raw_spin_unlock_irqrestore` CPU% | perf record / report | ~20% |
| `workqueue_execute_start`/sec | ftrace `workqueue:workqueue_execute_start` | ~5,643/sec system-wide |
| `bes_sdio` workqueue dispatches | `cat /sys/kernel/tracing/events/workqueue/.../filter` filtered by `bes_sdio` | not measured pre-patch — **TODO before Phase 6** |
| RX SKB rate at mac80211 boundary | trace `mac80211:drv_rx_irqsafe` count | not measured pre-patch — **TODO before Phase 6** |
Phase 6 must not start until the two TODOs above are filled in — otherwise Phase 7 has no reference point for the predicted-delta comparison.
## §4 Plan
### §4.1 What will be touched
- `bes2600_sdio.c::sdio_rx_work` — the relay loop. After this patch, it still drains the SDIO bus into SKBs but **delivers SKBs directly into `wsm_handle_rx`** instead of `skb_queue_tail`-ing them onto `self->rx_queue`.
- `bes2600_sdio.c::bes2600_sdio_extract_packets` — the inner per-SKB extractor. Changes the in-loop action from `skb_queue_tail(&self->rx_queue, skb)` to a direct call (or callback) into the wsm dispatcher.
- `bes2600_sdio.c::bes2600_sdio_pipe_read` — becomes unused, removed.
- `bh.c::bes2600_bh_rx_helper` — its `BES_SDIO_RX_MULTIPLE_ENABLE` branch is no longer reachable for RX (RX path no longer feeds bh). Either gate the helper, or remove the helper outright if `bh_rx` atomic is no longer raised on RX.
### §4.2 What will NOT be touched
- `ieee80211_rx_irqsafe()` call sites — that's Patch C2 (item 2).
- TX path — `sdio_tx_work`, `bes2600_bh_tx_helper`, etc. Untouched.
- `sdio_wq` workqueue alloc — stays. After patch it hosts only `tx_work` + `scan_work` + (briefly during patch) `rx_work`. Renaming is cosmetic and out of scope.
- The bh thread itself — still runs, still handles TX, still watches the timeouts.
- `bh.c` `#if 0` graveyard — separate hygiene patch, not bundled.
- `__bes2600_irq_enable(1)` commented-out / `asm volatile("nop")` placeholder — **deferred** per `feedback_dont_patch_downstream_artifacts`. These are symptom-shaped; Patch C may dissolve them. Re-evaluate at Task #24 (post-Patch-E observation).
- `bh_rx` / `bh_tx` atomic split — out of scope.
### §4.3 Approach choice — Option A (sdio_rx_work direct delivery)
Two structural options surveyed in PR #8 §2.1; recap:
| | Option A: direct delivery from sdio_rx_work | Option B: subsume sdio_rx_work into bh thread |
|---|---|---|
| diff size | small | medium |
| eliminates `rx_queue->lock` × 2 per frame | yes | yes |
| eliminates `sdio_wq.rx_work` workqueue dispatch per IRQ | no | yes |
| changes who calls `wsm_handle_rx` | sdio_wq context (already process context) | bh thread |
| TX/RX SDIO bus contention | unchanged (sdio_rx_work and sdio_tx_work already share `bes2600_sdio_lock`) | adds bh ↔ sdio_tx_work contention on the SDIO mutex |
| bisection isolation | clean: only the rx_queue handoff is removed | mixes "remove handoff" with "subsume thread" |
**Choosing Option A.** Reasons:
1. Smaller diff = clearer Phase-7 attribution. If RX KB/s rises, we know it was the rx_queue handoff, not the workqueue topology.
2. Per Markus's PR #8 review: split was for bisection clarity. Option A is narrower than Option B.
3. The remaining cost (per-IRQ `sdio_wq.rx_work` dispatch) is ≤ 1 dispatch per IRQ batch; multi-RX coalescing means several frames per dispatch. If Phase 7 of Patch C shows that dispatch IS the residual cost, that becomes a concrete data point and motivates a *measured* Option-B follow-up, not a speculative one.
### §4.4 Implementation sketch (preview — actual code in Phase 6)
**Today** (`bes2600_sdio.c:783831`):
```c
static int bes2600_sdio_extract_packets(...) {
for each packet:
skb = dev_alloc_skb(...);
memcpy(skb->data, &data[pos], packet_len);
spin_lock(&self->rx_queue_lock);
skb_queue_tail(&self->rx_queue, skb); // ← handoff
spin_unlock(&self->rx_queue_lock);
}
static void sdio_rx_work(...) {
bes2600_sdio_extract_packets(...);
self->irq_handler(self->irq_priv); // ← wakes bh_wq
}
// bh thread later: pipe_read = skb_dequeue(rx_queue) → wsm_handle_rx(skb)
```
**After patch** (sketch):
```c
static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8 *data) {
for each packet:
skb = dev_alloc_skb(...);
memcpy(skb->data, &data[pos], packet_len);
ret = wsm_handle_rx(self->core, wsm_id_from(skb), wsm_hdr_of(skb), &skb);
if (skb) dev_kfree_skb(skb);
// no rx_queue, no spinlock, no wake-up
}
static void sdio_rx_work(...) {
bes2600_sdio_extract_packets(...);
// self->irq_handler(...) is no longer called for RX-only wakes
// (it remains called for TX-confirm-completion paths, if any)
}
```
Caveats discovered during sketch:
- `wsm_handle_rx`'s signature wants `(hw_priv, id, wsm_hdr*, **skb)`. `extract_packets` doesn't currently parse the wsm header — we either parse it inline (cheap; the cost is one `__le16_to_cpu`) or defer parsing into a new `bes2600_sdio_deliver_rx(skb)` helper that wraps it.
- `hw_priv` is reachable as `self->core`.
- Need to verify `wsm_handle_rx` is callable from sdio_wq context. **Hypothesis:** yes, because today's bh thread is also process-context-via-workqueue and that's where wsm_handle_rx already runs. Phase 6 contract-cite from `wsm.h` / call-graph confirms.
- The `irq_handler(self->irq_priv)` wakeup at sdio_rx_work:902 — keep it, but confirm whether bh actually has remaining work after RX is gone. Possibilities: TX-confirm completions (`wsm_release_tx_buffer`) still need a bh wake. Verify in Phase 6.
### §4.5 Predicted delta (Phase 3 units)
Conservative because Patch C is item 1 only, not items 1+2.
| metric | predicted change | confidence |
|---|---|---|
| `rx_queue->lock` acquire/release rate | → 0 (lock is removed entirely; struct field deleted) | high |
| RX-path wait-queue wakes (`bh_wq` from sdio_rx_work for RX) | → 0 (TX-confirm wakes remain) | high |
| `_raw_spin_unlock_irqrestore` CPU% | 20 % → 1215 % | **medium** — the rx_queue lock is one of several contributors; I don't have per-lock breakdown pre-patch |
| `workqueue_execute_start`/sec | marginal change (≤ 5 %) | high — sdio_wq dispatch still happens per IRQ |
| observed receive @ 4 MB/s | floor lifts from 75 KB/s → ≥ 1.0 MB/s; rep-variance shrinks | **medium** — rep 3's link death has multiple causes (decrypt-storm path is Patch A's territory; AP-side `aid 30` rejection is also possible) |
| Phase 7 N=3 outcome | all reps ≥ 1 MB/s sustained for 30 min @ 4 MB/s | **medium** |
**Honest acknowledgement:** the medium-confidence predictions are the ones where Phase 7 either confirms the model or surfaces a new bug. If `_raw_spin_unlock_irqrestore` only drops to 18 %, the next-largest contributor was something else — `pool->lock` (workqueue infrastructure) or `ba_lock` — and Patch D/E/C2 become the answer.
### §4.6 Risks
1. **`wsm_handle_rx` not callable from sdio_wq**: low probability (process context, same shape as today's bh), but a cite-failure here means revert to Option B. **Phase 6 must produce a `wsm.h` contract citation** before code lands.
2. **TX-confirm wake-ups stop firing**: if `wsm_handle_rx` was the only thing that ultimately bumped `bh_tx`, removing it from bh's input causes TX-confirm starvation. Mitigation: keep `irq_handler(irq_priv)` call in sdio_rx_work for now; let the bh's wait_event re-evaluate `bh_tx` on every wake. **Verify in Phase 6 that `wsm_release_tx_buffer` still wakes bh.**
3. **SKB allocation under memory pressure**: `dev_alloc_skb` in extract_packets currently `msleep(100)` retries up to 10×. Calling `wsm_handle_rx` directly from extract_packets keeps us in sdio_wq context during sleep; that's the same as today, so no new risk.
4. **rcu / locking invariants in `wsm_handle_rx`**: it traverses `priv->vif_list`, may grab `priv->vif_lock`. Currently called from bh thread. After patch: called from sdio_wq context. Both are process context, both can sleep. No new risk *unless* there's a held lock at sdio_wq level that wsm_handle_rx tries to re-acquire. **Phase 6 lock-graph audit required.**
5. **`bes2600_chrdev_is_bus_error()` early-return**: currently checked in `pipe_read`. After patch, must move into `extract_packets` or `sdio_rx_work` so RX during a bus-error window still gets dropped, not passed to mac80211.
6. **Multi-vif RX serialization**: the `rx_queue` is per-sbus_priv, not per-vif. After patch, multi-vif demux happens inside `wsm_handle_rx` (same as today). No new risk; same ceiling.
### §4.7 Phase 5 review handover
Goal/Situation/Measurements/Plan paste verbatim into DokuWiki when Markus initiates handover. **Do not curate** the plan for the reviewer — including the "medium-confidence" predictions and the §4.6 risk list verbatim. Reviewer should see the same uncertainty I have.
### §4.8 Phase 7 protocol (after Phase 6 lands)
Per `feedback_phase7_stress_ramp.md`**stress ramp, not steady cap**:
1. Pre-patch baseline (re-anchor): 5 min @ 1 MB/s, 10 min @ 2 MB/s, 30 min @ 4 MB/s. Capture ftrace `workqueue/`, `lock/`, `mac80211/`, `mmc/`. perf record during the 4 MB/s window.
2. Apply Patch C, install, reboot (Markus pushes).
3. Post-patch: identical ramp, identical instrumentation.
4. Compute deltas in **the same units** as §3 baseline. Compare to §4.5 predictions. Any unexplained delta is a finding, not a footnote — log it and loop back to Phase 4 if the model is wrong.
5. **N=3 reps** post-patch. The user's stress-ramp memory and the receipts checklist both require this.
6. Capture `sdio_work_debug` output and `dmesg` if any storm fires (Patch A's counter should hold steady).
7. If Phase 7 numbers match prediction → Phase 8 memory update + proceed to Patch C2.
8. If they don't match → loop back to Phase 4. Don't paper-fix.
## §5 Out-of-scope items recorded for follow-on patches
- **Patch C2**: items 2 — `ieee80211_rx_list` batch delivery. Gated on Task #19 kerneldoc verification.
- **Patch D**: ba_lock atomicization at `txrx.c:998-1005, 1632`. Independent.
- **Patch E**: ps_state_lock skip when `pm_unsupported = true` at `txrx.c:1942-1948`. Independent, gated on c7 latch.
- **Task #24**: post-Patch-E observation of bh.c `asm volatile("nop")`, commented-out `__bes2600_irq_enable(1)`, BUG_ON in steady-state hot path. Symptom-shaped; observe before patching.
- **Task #25**: measure `sw_mci_check_r1_ready` on RK3566 during testing.
---
*Plan written 2026-05-07 by Claude (noether). Awaiting Phase 5 second-model review on DokuWiki, initiated by Markus.*