After Patch C v3 / D / E / F / G all merged, the remaining cleanup target is the per-RX-frame tasklet defer that ieee80211_rx_irqsafe introduces. Patch C2 migrates all 6 call sites in bes2600 to ieee80211_rx_list, the process-context API verified per the kerneldoc audit (Task #19, mainline include/net/mac80211.h:5324-5345). Key constraints from kerneldoc: - cannot mix _list and _irqsafe for the same hardware (=> all 6 sites convert atomically) - requires local_bh_disable + rcu_read_lock wrap - calls must be synchronized for a single hardware (=> bh-thread-as-sole-RX-context post-v3 satisfies trivially) Plan §4.2 design decision: per-batch wrap (Option B), wrapping bes2600_sdio_read_rx_batch outer loop, rather than per-call wrap. Captures the actual batch benefit. Open questions for the Phase 5 reviewer: 1. rx_list draining semantics — does mainline expect explicit netif_receive_skb_list at end-of-batch, or does mac80211 internal-deliver? Need to verify by reading mt76 / iwl_pcie usage before Phase 6 lands. 2. beacon path (wsm.c:2415) SKB ownership — hw_priv->beacon is long-lived; after _rx_list consumes it, the field would be dangling. Audit before Phase 6. Predicted throughput delta: +5-15% over v3 N=3 baseline (2.352 MB/s), medium confidence. Smaller-than-expected delta = "marginal but no regression, ship for upstream-cleanliness". Phase 7 N=3 ramp uses wired enu1 path + per-rep fresh nc listener per the rig-failure-is-finding lesson.
9.5 KiB
Patch C2 — Phase 4 Plan: migrate ieee80211_rx_irqsafe → ieee80211_rx_list
Author: Claude (noether)
Status: Phase 4 — pending Phase 5 PR review before any Phase 6 code.
Predecessor: Patch C v3 (PR #5 merged, +73% throughput, no-relay architecture); Patch D + E + F + G also landed. Cleanups branch tip = 42fd0ce.
Task #19 contract: ieee80211_rx_list callable from process context, requires local_bh_disable() + rcu_read_lock() wrap, cannot mix with ieee80211_rx_irqsafe() for the same hardware → all 6 sites convert in one shot.
§0 Substrate
After Patch C v3:
- bh thread is the sole RX-delivery context (no relay, no sdio_rx_work)
- Per-frame work runs in process context (sleepable)
- Single-writer-from-bh invariant covers
hw_bufs_usedand friends
ieee80211_rx_irqsafe is currently called from process context. Per kerneldoc (include/net/mac80211.h:5399-5411):
Like ieee80211_rx() but can be called in IRQ context (internally defers to a tasklet.)
The tasklet hop is the cost we pay today for delivering each RX frame from process context. ieee80211_rx_list is the process-context replacement.
§1 Goal
Per-frame: skip the tasklet hop. Batch: process multiple SKBs from one SDIO read inside a single local_bh_disable()/rcu_read_lock() window.
Phase 1 metric: RX throughput @ 4 MB/s sender, with v3 N=3 baseline = 2.352 MB/s. Hypothesis: small to moderate uplift (<10%) from removing the tasklet deferral. Larger improvement would be surprising — if observed, that's a finding to investigate.
§2 Situation
- 6 call sites in bes2600 currently use
ieee80211_rx_irqsafe:ap.c:96(AP-mode link-id RX queue drain)sta.c:1487(link-id rx_queue drain in ?)txrx.c:1960(early-data + pm_unsupported branch — Patch E added)txrx.c:1967(early-data + LINK_SOFT-not-set branch)txrx.c:1971(normal RX path)wsm.c:2415(beacon SKB delivery frombes2600_beacon_handler?)
- All 6 must convert together (kerneldoc: cannot mix per hardware)
- bh thread is single-writer post-v3 →
_rx_list's "calls must be synchronized" satisfied trivially - bh thread is process context →
_rx_listcallable
§3 Baseline (carry forward)
From notes/phase7-v3-2026-05-07.md (v3 N=3 ramp, Phase 7 closed):
| metric | v3 fresh-chip N=3 |
|---|---|
| RX throughput @ 4 MB/s | mean 2.352 MB/s, min 2.102, max 2.590 |
| sdio_rx_work dispatches | 0/s |
| bh_work redispatches | 0 |
Phase 7 of C2 will compare against this baseline.
§4 Plan
§4.1 Conversion shape
Per call site:
ieee80211_rx_irqsafe(priv->hw, skb);
becomes:
ieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list);
Where priv->rx_list is a struct list_head initialized once.
Wrap requirement: local_bh_disable() + rcu_read_lock() must be held across the call. Per the kerneldoc, that's also needed for batch correctness.
§4.2 Wrap placement (the design decision)
Option A — per-call wrap. Wrap each individual ieee80211_rx_list() call. Simple but loses the batch benefit (each call's wrap+unwrap costs as much as the avoided tasklet defer).
Option B — per-batch wrap. Wrap the OUTER frame-iteration loop (e.g., the for in bes2600_sdio_extract_packets). All 16 SKBs from one SDIO read get delivered inside one wrap. This is the upstream-idiomatic pattern (mt76, iwl_pcie do this).
Choosing Option B. Concrete shape:
bes2600_sdio_read_rx_batch(the per-SDIO-batch entry point added in Patch C v3) wraps the read+extract+deliver phase:rcu_read_lock(); local_bh_disable(); // existing read + extract_packets that calls bh_handle_rx_skb per frame local_bh_enable(); rcu_read_unlock();- Inside
bes2600_bh_handle_rx_skb, the singleieee80211_rx_irqsafeswap becomesieee80211_rx_list(priv->hw, NULL, skb, &priv->rx_list). - The OTHER 5 call sites (in
ap.c,sta.c,txrx.c's branches,wsm.c) need the same treatment, but they're called from the bh thread (post-v3) so they're already in the right context. Each gets its own narrow wrap (Option A applied selectively because those paths process one frame at a time, not a batch).
§4.3 The rx_list field
Add struct list_head rx_list to either struct bes2600_common (driver-wide) or struct bes2600_vif (per-vif). Per-vif is cleaner because the existing priv->hw parameter implies vif scope.
INIT_LIST_HEAD(&priv->rx_list) at vif setup; no teardown needed (mac80211 owns the SKBs once handed off).
Open question for reviewer: does the rx_list need to be drained explicitly after the batch (e.g., via a list_for_each_entry_safe + netif_receive_skb_list_internal)? Looking at mainline mt76 / iwl_pcie usage will clarify. Phase 6 must answer this before code lands.
§4.4 What will NOT be touched
- The 6 call sites change atomically (all-or-nothing per kerneldoc) — no per-site progressive migration
wsm.c:2415beacon path: same conversion shape, but beacon delivery is once-per-beacon-interval (not hot path); could stay_irqsafeif upstream allows mixing per-SKB-type. Re-read kerneldoc carefully — it says "per hardware", not per-call-site, so we can't keep _irqsafe even on the slow paths.- bh thread structure (Patch C v3 stands)
- atomic_t counters from Patch D
pm_unsupportedlock-skip from Patch E- mac80211 batch-delivery semantics (mainline owns this; we just call the API)
§4.5 Predicted delta in Phase 3 units
| metric | predicted |
|---|---|
rx_irqsafe tasklet schedule rate |
→ 0 (function no longer called) |
| RX throughput @ 4 MB/s sustained | 2.352 → +5-15% (medium confidence) |
_raw_spin_unlock_irqrestore CPU% |
small drop (no tasklet schedule lock contribution) |
Honest acknowledgment: I don't have data on how much the tasklet hop actually costs. The improvement might be smaller than predicted if tasklet defer was already cheap on this kernel. If <2%, Phase 7 says "marginal but no regression" and we ship anyway for upstream-cleanliness.
§4.6 Risks
-
ieee80211_rx_listsemantics surprise. mainline drivers I have access to (mt76, iwl_pcie) use this via NAPI infrastructure. bes2600 doesn't have NAPI; we're doing process-context-direct. The kerneldoc says callable that way but we should verify a few mainline drivers actually do it. Phase 6 contract-cite from at least one upstream caller before code lands. -
rx_listlifetime in cross-batch / cross-vif scenarios. Multiple vifs (P2P_MULTIVIF=y in Makefile) might race on the same hw'srx_list. The kerneldoc says "for a single hardware" — the list is per-call destination, which means each call appends to its argument list. Per-vifrx_listper-call is the natural shape. No per-hw aggregator needed. -
local_bh_disablecost in batch wrap. Not free. If the batch is small (1-2 SKBs), the wrap might dominate. Estimated breakeven: 2-3 SKBs per wrap. Phase 7 should look at SKB-per-batch distribution to confirm. -
rcu_read_lockacross SDIO read. SDIO read can take multi-ms (multi-block transfers). RCU reader-cs across that is fine (no preemption blocked) but it's a longer reader-cs than typical. Verifiable but not a blocker — kerneldoc requires it. -
wsm.c:2415 (beacon) is a different SKB lifecycle —
hw_priv->beaconis owned by hw_priv, not allocated per-call. After_rx_listconsumes it (by passing ownership to mac80211),hw_priv->beaconis dangling. Phase 6 must verify the beacon path either reallocates after delivery or wasn't actually transferring ownership. Risk #5 is the biggest open question.
§4.7 Phase 5 review handover
PR on git.reauktion.de/marfrit/besser with this artifact. Specifically request reviewer focus on:
- §4.2 wrap-placement choice (Option B vs A)
- §4.3 rx_list scoping (per-vif)
- §4.6 risks #1 (mainline-caller verification) and #5 (beacon path SKB ownership)
Don't curate.
§4.8 Phase 6 implementation order
- Branch off cleanups:
bes2600/rx-list-batch-delivery - Add
struct list_head rx_listtostruct bes2600_vif,INIT_LIST_HEADin vif setup - Convert all 6 call sites:
ieee80211_rx_irqsafe(...)→ieee80211_rx_list(...) - Wrap
bes2600_sdio_read_rx_batchouter loop withrcu_read_lock + local_bh_disable / local_bh_enable + rcu_read_unlock - For the non-bh-thread call sites (ap.c, sta.c, wsm.c beacon): per-call narrow wrap
- Verify beacon path in wsm.c:2415 (Risk #5)
- Build, install, smoke-test
- Phase 7 N=3 stress ramp — compare to v3 baseline
§4.9 Phase 7 protocol (per feedback_phase7_stress_ramp)
- N=3 reps, 30s each at 4 MB/s, fresh-chip (uptime <15 min)
- Use wired path (
ssh mfritsche@192.168.88.80) for telemetry - Fresh nc listener per rep (per
feedback_rig_failure_is_finding) - Compare: throughput delta + tasklet schedule rate (ftrace
irq:tasklet_*events) - If predicted delta met → close C2 + memory entry
- If NO delta → marginal patch but no regression; ship for upstream-cleanliness
§5 Out of scope
- Patch D / E already shipped (PR #7, #8 merged)
- Patch G already shipped (PR #6 merged)
- bh.c
#if 0graveyard removal (Task #24 hygiene) - Allwinner
sw_mci_check_r1_ready(Task #25)
§6 Summary
C2 is a 6-site mechanical migration with ONE design decision (per-batch wrap), TWO open questions for the reviewer (rx_list draining + beacon path SKB ownership), and SMALL expected throughput delta (<15%). Risk-low, upstream-prep-high. Worth shipping for the kernel.org submission story even if the throughput delta is marginal.
Plan written 2026-05-08 by Claude (noether). Phase 5 review on PR. Phase 6 contingent on review passing.