Files

T

claude-noether f232476240 notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only)

Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C
is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work
indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out
and gated on Task #19 kerneldoc contract verification.

Approach choice: Option A (sdio_rx_work delivers directly into
wsm_handle_rx, removing rx_queue and its two synchronization points per
frame) over Option B (subsume into bh thread). Option A has a smaller
diff and clearer bisection story; the residual per-IRQ workqueue
dispatch is preserved as a measurable Phase 7 data point that motivates
or doesn't motivate a follow-on Option-B patch.

Predicted delta in Phase 3 units, with confidence levels stated
explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract
citations (wsm_handle_rx callability from sdio_wq context;
wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress
ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp.

Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON
in hot path) explicitly deferred to Task #24 per
feedback_dont_patch_downstream_artifacts.

Awaiting Phase 5 second-model review on DokuWiki.

2026-05-07 19:04:53 +02:00

13 KiB

Raw Blame History

Patch C — Phase 4 Plan: collapse sdio_rx_work into BH

Author: Claude (noether) Status: Phase 4 — pending Phase 5 second-model review before any Phase 6 code. Scope: item 1 only (per merged PR #8 inline review: "do it sequentially; we're not on the clock"). Item 2 (batch deliver via ieee80211_rx_list) splits to Patch C2, gated on Task #19 kerneldoc verification.

§0 Substrate — anchored

Bug #5 anchor (recorded 2026-05-07, see notes/phase1-bug5-2026-05-07.md):

Sender: netcat-over-WiFi, 4 MB/s cap, 2.4 GHz, AVM AP, single-STA
Receiver: ohm (PineTab2, RK3566 + BES2600WM-SDIO)
N=3 baseline reps: 725 / 663 / 75 KB/s (rep 3 saw link-death at ~9 min)
perf record -g during 4MB/s window: _raw_spin_unlock_irqrestore ≈ 20% CPU
ftrace lock-instrumentation, system-wide: workqueue_execute_start ≈ 5,643/sec
Driver-side count: wsm_cmd_send 13/sec — wsm command path is not the dispatch source; the contributor is the per-SDIO-transaction relay.

Root cause traced in PR #7 (Sonnet review) and concurred in PR #8 (Opus review): RX path adds two synchronization points per frame and one wait-queue wake-up per IRQ batch via sdio_rx_work → rx_queue → bh_work indirection.

§1 Goal (locked)

Reduce per-RX-frame overhead enough that observed receive ≥ 1.0 MB/s sustained @ 4 MB/s sender, with _raw_spin_unlock_irqrestore < 15 % CPU during the 4 MB/s window. No 30-min cascade to link-death.

(This is a partial step toward Phase 1's full target of ≥ 2 MB/s sustained @ 4 MB/s with < 10 % lock CPU. The full target is jointly addressed by Patch C + Patch C2; Patch C alone should cross half the gap.)

§2 Situation

bes2600.ko srcversion 1B3B3ED0… deployed on ohm (c-stack + Patch A + Patch B).
cleanups branch on marfrit/bes2600-dkms is the current source-of-truth.
Build sandbox /var/tmp/c6-sandbox/ on ohm, native make -j4.
BES2600_RX_IN_BH is defined in Makefile — bes2600_bh_rx_helper is the active RX consumer.
ohm reachable. Markus pushes the reboot button; never me.
Test rig under /root/bes2600-samples/ — rep-trace.sh per-rep capture script.

§3 Baseline measurements

Reused from Bug #5 Phase 0 (above). No re-anchor needed for Patch C — same regime.

Specific Phase-3-units that this plan's predictions reference:

metric	tool	current value (4MB/s window)
observed receive throughput	netcat receiver byte-count	75–725 KB/s, rep-variance high
`_raw_spin_unlock_irqrestore` CPU%	perf record / report	~20%
`workqueue_execute_start`/sec	ftrace `workqueue:workqueue_execute_start`	~5,643/sec system-wide
`bes_sdio` workqueue dispatches	`cat /sys/kernel/tracing/events/workqueue/.../filter` filtered by `bes_sdio`	not measured pre-patch — TODO before Phase 6
RX SKB rate at mac80211 boundary	trace `mac80211:drv_rx_irqsafe` count	not measured pre-patch — TODO before Phase 6

Phase 6 must not start until the two TODOs above are filled in — otherwise Phase 7 has no reference point for the predicted-delta comparison.

§4 Plan

§4.1 What will be touched

bes2600_sdio.c::sdio_rx_work — the relay loop. After this patch, it still drains the SDIO bus into SKBs but delivers SKBs directly into wsm_handle_rx instead of skb_queue_tail-ing them onto self->rx_queue.
bes2600_sdio.c::bes2600_sdio_extract_packets — the inner per-SKB extractor. Changes the in-loop action from skb_queue_tail(&self->rx_queue, skb) to a direct call (or callback) into the wsm dispatcher.
bes2600_sdio.c::bes2600_sdio_pipe_read — becomes unused, removed.
bh.c::bes2600_bh_rx_helper — its BES_SDIO_RX_MULTIPLE_ENABLE branch is no longer reachable for RX (RX path no longer feeds bh). Either gate the helper, or remove the helper outright if bh_rx atomic is no longer raised on RX.

§4.2 What will NOT be touched

ieee80211_rx_irqsafe() call sites — that's Patch C2 (item 2).
TX path — sdio_tx_work, bes2600_bh_tx_helper, etc. Untouched.
sdio_wq workqueue alloc — stays. After patch it hosts only tx_work + scan_work + (briefly during patch) rx_work. Renaming is cosmetic and out of scope.
The bh thread itself — still runs, still handles TX, still watches the timeouts.
bh.c #if 0 graveyard — separate hygiene patch, not bundled.
__bes2600_irq_enable(1) commented-out / asm volatile("nop") placeholder — deferred per feedback_dont_patch_downstream_artifacts. These are symptom-shaped; Patch C may dissolve them. Re-evaluate at Task #24 (post-Patch-E observation).
bh_rx / bh_tx atomic split — out of scope.

§4.3 Approach choice — Option A (sdio_rx_work direct delivery)

Two structural options surveyed in PR #8 §2.1; recap:

	Option A: direct delivery from sdio_rx_work	Option B: subsume sdio_rx_work into bh thread
diff size	small	medium
eliminates `rx_queue->lock` × 2 per frame	yes	yes
eliminates `sdio_wq.rx_work` workqueue dispatch per IRQ	no	yes
changes who calls `wsm_handle_rx`	sdio_wq context (already process context)	bh thread
TX/RX SDIO bus contention	unchanged (sdio_rx_work and sdio_tx_work already share `bes2600_sdio_lock`)	adds bh ↔ sdio_tx_work contention on the SDIO mutex
bisection isolation	clean: only the rx_queue handoff is removed	mixes "remove handoff" with "subsume thread"

Choosing Option A. Reasons:

Smaller diff = clearer Phase-7 attribution. If RX KB/s rises, we know it was the rx_queue handoff, not the workqueue topology.
Per Markus's PR #8 review: split was for bisection clarity. Option A is narrower than Option B.
The remaining cost (per-IRQ sdio_wq.rx_work dispatch) is ≤ 1 dispatch per IRQ batch; multi-RX coalescing means several frames per dispatch. If Phase 7 of Patch C shows that dispatch IS the residual cost, that becomes a concrete data point and motivates a measured Option-B follow-up, not a speculative one.

§4.4 Implementation sketch (preview — actual code in Phase 6)

Today (bes2600_sdio.c:783–831):

static int bes2600_sdio_extract_packets(...) {
    for each packet:
        skb = dev_alloc_skb(...);
        memcpy(skb->data, &data[pos], packet_len);
        spin_lock(&self->rx_queue_lock);
        skb_queue_tail(&self->rx_queue, skb);    // ← handoff
        spin_unlock(&self->rx_queue_lock);
}
static void sdio_rx_work(...) {
    bes2600_sdio_extract_packets(...);
    self->irq_handler(self->irq_priv);           // ← wakes bh_wq
}
// bh thread later: pipe_read = skb_dequeue(rx_queue) → wsm_handle_rx(skb)

After patch (sketch):

static int bes2600_sdio_extract_packets(struct sbus_priv *self, u32 ctrl_reg, u8 *data) {
    for each packet:
        skb = dev_alloc_skb(...);
        memcpy(skb->data, &data[pos], packet_len);
        ret = wsm_handle_rx(self->core, wsm_id_from(skb), wsm_hdr_of(skb), &skb);
        if (skb) dev_kfree_skb(skb);
        // no rx_queue, no spinlock, no wake-up
}
static void sdio_rx_work(...) {
    bes2600_sdio_extract_packets(...);
    // self->irq_handler(...) is no longer called for RX-only wakes
    // (it remains called for TX-confirm-completion paths, if any)
}

Caveats discovered during sketch:

wsm_handle_rx's signature wants (hw_priv, id, wsm_hdr*, **skb). extract_packets doesn't currently parse the wsm header — we either parse it inline (cheap; the cost is one __le16_to_cpu) or defer parsing into a new bes2600_sdio_deliver_rx(skb) helper that wraps it.
hw_priv is reachable as self->core.
Need to verify wsm_handle_rx is callable from sdio_wq context. Hypothesis: yes, because today's bh thread is also process-context-via-workqueue and that's where wsm_handle_rx already runs. Phase 6 contract-cite from wsm.h / call-graph confirms.
The irq_handler(self->irq_priv) wakeup at sdio_rx_work:902 — keep it, but confirm whether bh actually has remaining work after RX is gone. Possibilities: TX-confirm completions (wsm_release_tx_buffer) still need a bh wake. Verify in Phase 6.

§4.5 Predicted delta (Phase 3 units)

Conservative because Patch C is item 1 only, not items 1+2.

metric	predicted change	confidence
`rx_queue->lock` acquire/release rate	→ 0 (lock is removed entirely; struct field deleted)	high
RX-path wait-queue wakes (`bh_wq` from sdio_rx_work for RX)	→ 0 (TX-confirm wakes remain)	high
`_raw_spin_unlock_irqrestore` CPU%	20 % → 12–15 %	medium — the rx_queue lock is one of several contributors; I don't have per-lock breakdown pre-patch
`workqueue_execute_start`/sec	marginal change (≤ 5 %)	high — sdio_wq dispatch still happens per IRQ
observed receive @ 4 MB/s	floor lifts from 75 KB/s → ≥ 1.0 MB/s; rep-variance shrinks	medium — rep 3's link death has multiple causes (decrypt-storm path is Patch A's territory; AP-side `aid 30` rejection is also possible)
Phase 7 N=3 outcome	all reps ≥ 1 MB/s sustained for 30 min @ 4 MB/s	medium

Honest acknowledgement: the medium-confidence predictions are the ones where Phase 7 either confirms the model or surfaces a new bug. If _raw_spin_unlock_irqrestore only drops to 18 %, the next-largest contributor was something else — pool->lock (workqueue infrastructure) or ba_lock — and Patch D/E/C2 become the answer.

§4.6 Risks

wsm_handle_rx not callable from sdio_wq: low probability (process context, same shape as today's bh), but a cite-failure here means revert to Option B. Phase 6 must produce a wsm.h contract citation before code lands.
TX-confirm wake-ups stop firing: if wsm_handle_rx was the only thing that ultimately bumped bh_tx, removing it from bh's input causes TX-confirm starvation. Mitigation: keep irq_handler(irq_priv) call in sdio_rx_work for now; let the bh's wait_event re-evaluate bh_tx on every wake. Verify in Phase 6 that wsm_release_tx_buffer still wakes bh.
SKB allocation under memory pressure: dev_alloc_skb in extract_packets currently msleep(100) retries up to 10×. Calling wsm_handle_rx directly from extract_packets keeps us in sdio_wq context during sleep; that's the same as today, so no new risk.
rcu / locking invariants in wsm_handle_rx: it traverses priv->vif_list, may grab priv->vif_lock. Currently called from bh thread. After patch: called from sdio_wq context. Both are process context, both can sleep. No new risk unless there's a held lock at sdio_wq level that wsm_handle_rx tries to re-acquire. Phase 6 lock-graph audit required.
bes2600_chrdev_is_bus_error() early-return: currently checked in pipe_read. After patch, must move into extract_packets or sdio_rx_work so RX during a bus-error window still gets dropped, not passed to mac80211.
Multi-vif RX serialization: the rx_queue is per-sbus_priv, not per-vif. After patch, multi-vif demux happens inside wsm_handle_rx (same as today). No new risk; same ceiling.

§4.7 Phase 5 review handover

Goal/Situation/Measurements/Plan paste verbatim into DokuWiki when Markus initiates handover. Do not curate the plan for the reviewer — including the "medium-confidence" predictions and the §4.6 risk list verbatim. Reviewer should see the same uncertainty I have.

§4.8 Phase 7 protocol (after Phase 6 lands)

Per feedback_phase7_stress_ramp.md — stress ramp, not steady cap:

Pre-patch baseline (re-anchor): 5 min @ 1 MB/s, 10 min @ 2 MB/s, 30 min @ 4 MB/s. Capture ftrace workqueue/, lock/, mac80211/, mmc/. perf record during the 4 MB/s window.
Apply Patch C, install, reboot (Markus pushes).
Post-patch: identical ramp, identical instrumentation.
Compute deltas in the same units as §3 baseline. Compare to §4.5 predictions. Any unexplained delta is a finding, not a footnote — log it and loop back to Phase 4 if the model is wrong.
N=3 reps post-patch. The user's stress-ramp memory and the receipts checklist both require this.
Capture sdio_work_debug output and dmesg if any storm fires (Patch A's counter should hold steady).
If Phase 7 numbers match prediction → Phase 8 memory update + proceed to Patch C2.
If they don't match → loop back to Phase 4. Don't paper-fix.

§5 Out-of-scope items recorded for follow-on patches

Patch C2: items 2 — ieee80211_rx_list batch delivery. Gated on Task #19 kerneldoc verification.
Patch D: ba_lock atomicization at txrx.c:998-1005, 1632. Independent.
Patch E: ps_state_lock skip when pm_unsupported = true at txrx.c:1942-1948. Independent, gated on c7 latch.
Task #24: post-Patch-E observation of bh.c asm volatile("nop"), commented-out __bes2600_irq_enable(1), BUG_ON in steady-state hot path. Symptom-shaped; observe before patching.
Task #25: measure sw_mci_check_r1_ready on RK3566 during testing.

Plan written 2026-05-07 by Claude (noether). Awaiting Phase 5 second-model review on DokuWiki, initiated by Markus.

13 KiB Raw Blame History Unescape Escape