Compare commits
6 Commits
| Author | SHA1 | Date | |
|---|---|---|---|
| 809e3cce84 | |||
| 4344873f2d | |||
| 679083d1aa | |||
| 594f73c6b4 | |||
| 928268f477 | |||
| 425eb92456 |
@@ -0,0 +1,180 @@
|
|||||||
|
# BES2600 architecture review — Bug #5
|
||||||
|
|
||||||
|
Date assembled: 2026-05-07
|
||||||
|
Reviewer: Claude Sonnet (general-purpose subagent, model=sonnet)
|
||||||
|
Driver source: `~/src/besser/bes2600-dkms-mobian/bes2600/` on boltzmann
|
||||||
|
|
||||||
|
This is the architect-review pass requested in `notes/observed-bugs.md` after the Phase 0 measurement showed the throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. The reviewer was given the measurement summary, source location, and a focused brief; output is a ranked restructuring map with file:line citations for every concrete claim.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Measurement context (input to the reviewer)
|
||||||
|
|
||||||
|
```
|
||||||
|
Reproduction: pv -L 4M < /dev/zero | nc ohm 12345
|
||||||
|
Module under test: bes2600.ko srcversion 1B3B3ED0... (cleanups + Patch A + Patch B)
|
||||||
|
Hardware: PineTab2, RK3566 Cortex-A55 ARMv8.5, kernel 6.19.10-danctnix1
|
||||||
|
Link rate: 65 Mb/s ≈ 8 MB/s theoretical
|
||||||
|
Observed throughput: 725 KB/s (Phase 0 anchor at N=3)
|
||||||
|
rep 3 cascaded into beacon-loss disconnect at ~9 min in
|
||||||
|
|
||||||
|
Per-second event rates (3-min capture under 4 MB/s pv-cap):
|
||||||
|
workqueue_execute_start: 5,643/sec ← architectural floor
|
||||||
|
bes2600_rx_cb: 611/sec
|
||||||
|
bes2600_bh_wakeup: 267/sec
|
||||||
|
wsm_cmd_send: 13/sec (host-to-chip command rate, surprisingly low)
|
||||||
|
lock contention_begin: 50/sec (modest)
|
||||||
|
mmc_request_start: ~5,800/sec (matches workqueue rate — every SDIO transaction is its own work item)
|
||||||
|
|
||||||
|
perf record top symbol: _raw_spin_unlock_irqrestore (~20 % CPU samples)
|
||||||
|
Dominant callstack: process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0
|
||||||
|
```
|
||||||
|
|
||||||
|
The implication: ~9 workqueue dispatches fire per frame delivered to mac80211. Items below address that ratio in descending order of predicted leverage.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 1 — Two-hop workqueue relay: SDIO IRQ → `sdio_rx_work` → BH loop → mac80211
|
||||||
|
|
||||||
|
**File:line:** `bes2600_sdio.c:416` (IRQ handler dispatches `rx_work`); `bes2600_sdio.c:829` (`sdio_rx_work` body); `bh.c:1330–1538` (BH main loop, `BES2600_RX_IN_BH` path); `bes2600_sdio.c:1267` (`bes_sdio` workqueue, `max_active=2`).
|
||||||
|
|
||||||
|
**Current shape:** Every SDIO interrupt fires `queue_work(sdio_wq, &rx_work)`. `sdio_rx_work` reads up to `BES_SDIO_RX_MULTIPLE_NUM=16` frames (`hwio.h:294`) into per-frame SKBs, enqueues each onto `sbus_priv.rx_queue` under `rx_queue_lock`, then returns. Meanwhile the BH kthread (one work item queued at boot in `bh.c:93`, running an infinite loop inside `bes2600_bh()`) calls `pipe_read()` → `spin_lock(rx_queue_lock)` → `skb_dequeue()` → `wsm_handle_rx()` → `ieee80211_rx_irqsafe()` one frame at a time. When `pipe_read()` returns NULL and pending TX exists, `bes2600_sdio_pipe_read()` at `bes2600_sdio.c:941` re-dispatches `rx_work` — so a sustained RX stream fires **one `queue_work` per BH wakeup, not per burst**. That explains why `bh_wakeup` events are only 267/sec while `workqueue_execute_start` is 5,643/sec: the SDIO layer is firing a new `rx_work` item for every frame the BH loop drains.
|
||||||
|
|
||||||
|
**Proposed shape:** Collapse `sdio_rx_work` and `pipe_read()` into the BH loop directly. The BH already runs in a dedicated `WQ_HIGHPRI | WQ_CPU_INTENSIVE` workqueue (`bh.c:66`) and (with `BES2600_RX_IN_BH` defined per `Makefile:159`) `bes2600_bh_rx_helper()` already dequeues from `rx_queue`. Merge `sdio_rx_work` into a function called synchronously from `bes2600_bh_rx_helper()` before the dequeue, guarded by a trylock so re-entry is safe. This eliminates O(N) `queue_work` calls per burst while keeping the BH as the single SDIO-access context.
|
||||||
|
|
||||||
|
**Predicted delta vs Phase 1 metric:** Eliminates ~5 of the ~9 redundant workqueue dispatches per frame. 2–4× throughput improvement and a proportional drop in `_raw_spin_unlock_irqrestore` CPU cost.
|
||||||
|
|
||||||
|
**Effort:** Medium. SDIO host-lock protocol (`sdio_claim_host`/`sdio_release_host`) is already managed inside `sdio_rx_work`; moving the body is mechanical but requires care around the `sdio_wq` `max_active=2` concurrency assumption.
|
||||||
|
|
||||||
|
**Risks:** `sdio_rx_work` runs with `sdio_claim_host` held for the entire burst. Inside the BH it serialises all SDIO access fine. Watch `bes2600_sdio.c:1889` — flushes `rx_work` during teardown; that path must remain.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 2 — `ieee80211_rx_irqsafe` instead of `ieee80211_rx` (pre-NAPI cw1200 ancestor pattern)
|
||||||
|
|
||||||
|
**File:line:** `txrx.c:1947`, `txrx.c:1950`, `ap.c:99`, `sta.c:1487`, `wsm.c:2416`.
|
||||||
|
|
||||||
|
**Current shape:** Every RX frame is delivered via `ieee80211_rx_irqsafe()`. This function enqueues the SKB onto a per-cpu `tasklet_rx` list and schedules a software IRQ. Under sustained load: one softirq wakeup per frame — 611 softirq wakeups/sec on top of the workqueue overhead.
|
||||||
|
|
||||||
|
**Proposed shape:** Switch to `ieee80211_rx_ni()` (process context, which `wsm_handle_rx` is already in) or, better, batch-deliver frames using `ieee80211_rx_list()` (introduced in kernel 5.12, available in 6.19). Accumulate frames from a single `sdio_rx_work` burst into a `list_head`, then call `ieee80211_rx_list()` once per burst.
|
||||||
|
|
||||||
|
**mac80211 contract:** `ieee80211_rx_list()` is safe from process context with the same `ieee80211_rx_status` rules as `ieee80211_rx_ni()`. Per `include/net/mac80211.h` — kerneldoc states it takes the RX path atomically only when called from softirq context; from process context it uses the same path as `ieee80211_rx_ni()`.
|
||||||
|
|
||||||
|
**Predicted delta:** Reduces per-frame softirq overhead. Hard to isolate independently of item 1, but combined the two deliver the < 10 % CPU-in-lock target.
|
||||||
|
|
||||||
|
**Effort:** Small (once item 1 is done — the batch list naturally exists at the burst boundary).
|
||||||
|
|
||||||
|
**Risks:** Must hold `rcu_read_lock()` at call site; `skb->cb` (`IEEE80211_SKB_RXCB`) must be filled before the call, as today. The `early_data` path at `txrx.c:1942` uses `skb_queue_tail` into a per-link queue before calling `ieee80211_rx_irqsafe` — that path must be excluded from batch collection.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 3 — Per-frame `queue_work(sdio_wq, &tx_work)` in the TX send path
|
||||||
|
|
||||||
|
**File:line:** `bes2600_sdio.c:1236` (inside `bes2600_sdio_pipe_send()`).
|
||||||
|
|
||||||
|
**Current shape:** Every call to `bes2600_sdio_pipe_send()` appends one descriptor to `tx_bufferlist` and immediately calls `queue_work(sdio_wq, &tx_work)`. `sdio_tx_work` then drains the list with scatterlist batching (up to `BES_SDIO_TX_MULTIPLE_NUM=16` frames per SDIO transfer). At low rates the workqueue's pending-but-not-started dedup means only one dispatch fires; at high TX rates — especially after `atomic_add(1, &hw_priv->bh_tx)` in `bh.c` reschedules TX — successive `pipe_send` calls each hit `queue_work` before the previous fires, multiplying dispatches.
|
||||||
|
|
||||||
|
**Proposed shape:** Stage all frames into `tx_bufferlist` in the BH TX loop, then flush `sdio_tx_work` synchronously (call the work function body directly) before returning to the wait-event. The TX mirror of item 1.
|
||||||
|
|
||||||
|
**Predicted delta:** Removes redundant TX-side `queue_work` calls. Lower priority than RX side given current TX rate (13 `wsm_cmd_send`/sec is host→chip control plane; data-plane TX is also limited by firmware buffer count `numInpChBufs`), but it does remove one source of the 5,643/sec workqueue count.
|
||||||
|
|
||||||
|
**Effort:** Small.
|
||||||
|
|
||||||
|
**Risks:** `sdio_tx_work` calls `sdio_claim_host`/`sdio_release_host` internally. Running directly from BH context requires confirming no deadlock with the SDIO bus claim that `sdio_rx_work` (now merged per item 1) holds. The TX flush must happen after the RX burst, matching the existing BH loop structure (`rx:` → `tx:` ordering in `bh.c:1444`).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 4 — `ba_lock` per-frame acquisition in the RX path
|
||||||
|
|
||||||
|
**File:line:** `txrx.c:998–1005` (`bes2600_rx_h_ba_stat()`); `txrx.c:1159–1164` and `txrx.c:1682–1698` (`tsm_lock`, `CONFIG_BES2600_TESTMODE`-gated).
|
||||||
|
|
||||||
|
**Current shape:** `bes2600_rx_cb()` calls `bes2600_rx_h_ba_stat()` for every non-multicast data frame. That function acquires `ba_lock` under BH (`spin_lock_bh`) to increment `ba_acc_rx` and `ba_cnt_rx`, then sets a timer. At 611 frames/sec that's 611 lock acquisitions/sec on `ba_lock` alone.
|
||||||
|
|
||||||
|
**Proposed shape:** Replace per-frame `ba_lock` with a per-cpu counter (or `atomic64_t`) for `ba_acc_rx` and `ba_cnt_rx`. The timer arm (`mod_timer`) is the actual reason for the lock — a `READ_ONCE`/`cmpxchg` on a flag to detect first-frame-in-interval is sufficient.
|
||||||
|
|
||||||
|
**Predicted delta:** Removes 611 lock acquisitions/sec from the RX hot path. Not the dominant cost but the next bottleneck after items 1–2 land.
|
||||||
|
|
||||||
|
**Effort:** Small.
|
||||||
|
|
||||||
|
**Risks:** `ba_lock` also serialises TX-side block-ack accounting (`txrx.c:1632`). The per-cpu approach requires a fold step in the timer callback — cheap.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 5 — Skip `ps_state_lock` acquisitions when PSM is known-disabled
|
||||||
|
|
||||||
|
**File:line:** `bes2600.h:320` (decl); `txrx.c:1340–1365` (`vif_lock`); `txrx.c:1415–1426` (`ps_state_lock`); `txrx.c:1942–1948` (RX-side `ps_state_lock`).
|
||||||
|
|
||||||
|
**Current shape:** `ps_state_lock` is taken on every TX frame if powersave is active. Per memory `reference_bes2600_firmware_no_psm.md`, **PSM is non-functional on this firmware** — c7 already self-detects this and latches `pm_unsupported = true`. The `ps_state_lock` guards in the RX callback and TX path are therefore taking dead overhead.
|
||||||
|
|
||||||
|
**Proposed shape:** Add a `READ_ONCE()` check on `powersave_enabled` before taking `ps_state_lock`; if false, skip the lock and the PSM state update entirely. Since c7's `pm_unsupported` latches, this is safe.
|
||||||
|
|
||||||
|
**Predicted delta:** Small absolute gain at current TX rate, but prevents fast path from regressing as throughput improves.
|
||||||
|
|
||||||
|
**Effort:** Small.
|
||||||
|
|
||||||
|
**Risks:** `powersave_enabled` is written from process context (`bh.c:403`). `READ_ONCE` without lock is safe — at worst one spurious PSM notification, not a state corruption.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 6 — Firmware block-read size cap (`EFFECTIVE_BUF_SIZE = 8190 bytes`)
|
||||||
|
|
||||||
|
**File:line:** `bh.c:33–36` (defines); `bes2600_sdio.c:721–783` (`bes2600_sdio_extract_packets()`); `hwio.h:294` (`BES_SDIO_RX_MULTIPLE_NUM=16`).
|
||||||
|
|
||||||
|
**Current shape:** `BES_SDIO_RX_MULTIPLE_NUM=16` and `BES_SDIO_OPTIMIZED_LEN` both defined (`Makefile:90,92`). The RX burst reads `PACKET_TOTAL_LEN(ctrl_reg)` bytes in a single CMD53; each sub-packet bounded by `EFFECTIVE_BUF_SIZE = (0x1000-4)*2 - 2 = 8190` bytes. At 611 frames/sec ÷ 267 BH wakeups/sec ≈ **2.3 frames per wakeup** — well under the 16-frame limit. **Not the bottleneck today.**
|
||||||
|
|
||||||
|
**Proposed shape:** No change needed now. Re-evaluate after items 1–3 land if throughput rises past ~3 MB/s. Verify ctrl_reg `PACKET_TOTAL_LEN` field values during high load — requires firmware-trace observation we don't currently have.
|
||||||
|
|
||||||
|
**Effort:** N/A.
|
||||||
|
|
||||||
|
**Risks:** Increasing beyond 16 requires a larger DMA allocation (currently `1632 × 16 = 26 KB`). Cortex-M4F firmware side is opaque.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 7 — Duplicate workqueues (`hw_priv->workqueue` vs `hw_priv->bh_workqueue` vs `sbus_priv->sdio_wq`)
|
||||||
|
|
||||||
|
**File:line:** `bes2600.h:323` (`workqueue`); `bes2600.h:385` (`bh_workqueue`); `bes2600_sdio.c:63` (`sdio_wq`). `txrx.c` has 10 `queue_work(hw_priv->workqueue, ...)` calls for control-plane work.
|
||||||
|
|
||||||
|
**Current shape:** Three distinct workqueues. The 5,643 `workqueue_execute_start`/sec are dominated by `sdio_wq` items, not `workqueue`. `workqueue` items are control-plane events at rates well below the data-plane.
|
||||||
|
|
||||||
|
**Proposed shape:** After item 1 (merging `sdio_rx_work` into BH), `sdio_wq` only carries `tx_work`. After item 3 (synchronous TX flush from BH), `sdio_wq` is idle during normal data-plane and could be replaced with `system_highpri_wq`.
|
||||||
|
|
||||||
|
**Effort:** Small (follow-on to items 1 and 3).
|
||||||
|
|
||||||
|
**Risks:** None if items 1 and 3 land first.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Item 8 — `BH_RX_CONT_LIMIT=3` cap on RX burst per BH wakeup
|
||||||
|
|
||||||
|
**File:line:** `bh.c:1380–1405` (timeout detection); `bh.c:1330` (`BH_RX_CONT_LIMIT=3`); `bh.c:1331` (`BH_TX_CONT_LIMIT=20`).
|
||||||
|
|
||||||
|
**Current shape:** BH loop limits RX burst to 3 consecutive iterations before breaking back to wait-event. At 611 frames/sec ÷ 267 wakeups/sec ≈ 2.3 frames per wakeup → not the bottleneck today. **After items 1–3 land**, per-burst frame rate will rise and `BH_RX_CONT_LIMIT=3` becomes the ceiling.
|
||||||
|
|
||||||
|
**Proposed shape:** Raise to 16 (matching `BES_SDIO_RX_MULTIPLE_NUM`) after items 1–3 are deployed and re-measured.
|
||||||
|
|
||||||
|
**Effort:** Trivial (constant change), but must wait for Phase 7 measurements post 1–3.
|
||||||
|
|
||||||
|
**Risks:** Too high a limit under firmware anomaly (corrupted ctrl_reg) can spin BH long enough to miss beacon ACK deadline. Bound to `BES_SDIO_RX_MULTIPLE_NUM` as safe ceiling.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Ranking summary
|
||||||
|
|
||||||
|
| Rank | Item | Predicted gain | Effort |
|
||||||
|
|------|------|----------------|--------|
|
||||||
|
| 1 | Collapse `sdio_rx_work` relay into BH loop | ~5x workqueue dispatch reduction | Medium |
|
||||||
|
| 2 | Batch deliver via `ieee80211_rx_list()` | Removes per-frame softirq | Small |
|
||||||
|
| 3 | Synchronous TX flush from BH | Removes TX-side dispatch noise | Small |
|
||||||
|
| 4 | Replace `ba_lock` per-frame with atomic/per-cpu | Removes 611 lock/sec from RX hot path | Small |
|
||||||
|
| 5 | Skip `ps_state_lock` when PSM-known-disabled | Removes dead overhead | Small |
|
||||||
|
| 6 | Raise `BH_RX_CONT_LIMIT` after 1–3 land | Unlocks residual throughput | Trivial |
|
||||||
|
| 7 | Consolidate workqueues post-items 1&3 | Cleanup | Small |
|
||||||
|
| 8 | Firmware block-read size | Not bottleneck at current rates | N/A |
|
||||||
|
|
||||||
|
**Items 1 + 2 together are the structural answer to the measurement**: ~9 workqueue events per delivered frame collapse to ~1, and the per-frame softirq cost disappears. Items 3–5 clean up the next layer. The beacon-loss cascade at 9 minutes is almost certainly starvation of the BH wait-event under the per-frame workqueue storm — item 1 removes the mechanism that makes the cascade possible.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Next campaign step
|
||||||
|
|
||||||
|
A Phase 4 plan locking item 1 (and possibly item 2) follows in a separate PR. The remaining items go on the campaign backlog as follow-on patches once the Phase 7 verification of item-1-or-1-plus-2 confirms the predicted delta.
|
||||||
@@ -82,6 +82,48 @@ without board power-cycle").
|
|||||||
**Status**: task c3 (indirectly, via bes_chardev removal which currently
|
**Status**: task c3 (indirectly, via bes_chardev removal which currently
|
||||||
gates the signal/nosignal mode switch path).
|
gates the signal/nosignal mode switch path).
|
||||||
|
|
||||||
|
## Architect review — now BUG-#5-blocking (was backlog)
|
||||||
|
|
||||||
|
The Phase 0 perf trace for Bug #5 first exposed a "when in doubt, add a
|
||||||
|
lock" pattern (~20 % CPU in `_raw_spin_unlock_irqrestore`). The
|
||||||
|
follow-up ftrace measurement (2026-05-07 17:00) refined the root cause
|
||||||
|
to an architectural problem: **the bes2600 driver dispatches every
|
||||||
|
SDIO transaction through the kernel workqueue**. Numbers from a 3-min
|
||||||
|
4 MB/s ohm capture (post-reboot, srcversion `1B3B3ED0`):
|
||||||
|
|
||||||
|
```
|
||||||
|
wsm_cmd_send: 13/sec (host-to-chip command rate, surprisingly low)
|
||||||
|
bes2600_rx_cb: 611/sec
|
||||||
|
bes2600_bh_wakeup: 267/sec
|
||||||
|
lock contention_begin: 50/sec
|
||||||
|
workqueue_execute_start: 5,643/sec ← DOMINATES; matches the mmc
|
||||||
|
transaction rate from earlier perf
|
||||||
|
```
|
||||||
|
|
||||||
|
5.6 k workqueue dispatches per second is the throughput floor — not a
|
||||||
|
specific lock, not WSM-command rate, not decrypt-state. A surgical fix
|
||||||
|
to any single function won't move the floor; the architecture needs
|
||||||
|
to be restructured to amortise SDIO transactions across fewer work-
|
||||||
|
items (or move SDIO RX out of the workqueue entirely).
|
||||||
|
|
||||||
|
This is where the **Claude Sonnet architect review** belongs: a
|
||||||
|
top-to-bottom assessment of `~/src/besser/bes2600-dkms-mobian/bes2600/`
|
||||||
|
focused on:
|
||||||
|
|
||||||
|
- the workqueue dispatch shape (most actionable)
|
||||||
|
- needless lock proliferation (the original signal)
|
||||||
|
- BH / RX scheduling boundaries
|
||||||
|
- error-handling coverage and dead-code from the cw1200 ancestor
|
||||||
|
- API contract violations relative to mainline mac80211
|
||||||
|
|
||||||
|
Output: ranked list of restructuring targets, with predicted-delta
|
||||||
|
estimates against the Phase 1 metric (≥ 2 MB/s sustained @ 4 MB/s cap,
|
||||||
|
< 10 % CPU in lock-cycling, no link cascade in 30 min).
|
||||||
|
|
||||||
|
**Status**: now blocking on Bug #5 (was independent track). Surgical
|
||||||
|
patches B5-1, B5-2, B5-3 from the original Phase 4 candidate list are
|
||||||
|
all DEFERRED until the architect review's restructuring map is in.
|
||||||
|
|
||||||
## Bug #5 — RX path degrades under attempted-throughput pressure
|
## Bug #5 — RX path degrades under attempted-throughput pressure
|
||||||
|
|
||||||
**Suspect file**: bes2600 RX path (`txrx.c bes2600_rx_cb`, `bh.c bes2600_bh_work`,
|
**Suspect file**: bes2600 RX path (`txrx.c bes2600_rx_cb`, `bh.c bes2600_bh_work`,
|
||||||
|
|||||||
@@ -0,0 +1,190 @@
|
|||||||
|
# BES2600 WiFi structural analysis and code critique
|
||||||
|
|
||||||
|
**Author:** Claude (noether) — second-opinion as Opus 4.7 against Sonnet 4.6's review of 2026-05-07
|
||||||
|
**Scope:** the WiFi half of the BES2600 driver as it lives in `bes2600-dkms-mobian/bes2600/` on top of the `cleanups` branch (srcversion `1B3B3ED0…`, c-stack + Patch A + Patch B deployed).
|
||||||
|
**Reading frame:** Bug #5 prompted Sonnet's review; this writeup is independent — same source tree, different model, different priors. Where I concur I tighten; where I disagree I say so.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Top-line
|
||||||
|
|
||||||
|
The BES2600 WiFi driver is **not a BES2600 driver**. It is a CW12xx driver wearing a BES2600 nameplate. That sentence is not rhetoric — it is the design fact that explains every other smell I will list below.
|
||||||
|
|
||||||
|
- 30+ live references to `CW12XX_MAX_VIFS` across 9 files.
|
||||||
|
- `cw12xx_hwpriv_to_vifpriv()` / `cw12xx_get_vif_from_ieee80211()` are the active vif accessors.
|
||||||
|
- `is_hardware_cw1250(hw_priv) || is_hardware_cw1260(hw_priv)` is a runtime branch in `ap.c:1892` — the chip is BES2600, neither check ever matches, the branch is dead on this hardware but compiled in.
|
||||||
|
- `CW1200_MAX_SW_RETRY_CNT` gates the active retry-decision logic in `bh.c:1269` (inside `KEY_FRAME_SW_RETRY`).
|
||||||
|
- The header opens with "Based on the mac80211 Prism54 code, which is Copyright (c) 2006, Michael Wu" → **prism54 → islsm → ST-E CW1200 → CW1260 → CW12xx → BES2600**: at least five generations of vendor-SDK descent, with each generation preserving its predecessor as #if-0 blocks rather than removing it.
|
||||||
|
|
||||||
|
This is the Phase 6 "transcription trap" from `CLAUDE.md`, frozen into the codebase: every generation copied behaviour rather than re-derive against the API contract. The result is a driver that *works*, but whose structural choices are decisions made for a 2010 ST-Ericsson chip, not a 2022 Bestechnic one.
|
||||||
|
|
||||||
|
The downstream consequence — and the thing that actually pinches us in Bug #5 — is that the **hot path was designed for cw1200's IRQ-driven SPI bus, not for SDIO with multi-block coalescing**. Items 1 + 2 of Sonnet's review are the right surgical fix. The deep fix is bigger than the budget of any one campaign.
|
||||||
|
|
||||||
|
## 2. Concurrence with Sonnet — refined
|
||||||
|
|
||||||
|
### 2.1 RX relay (Sonnet item 1) — concur, refine
|
||||||
|
|
||||||
|
The flow on this build (`-DBES2600_RX_IN_BH` in Makefile, so this is the *real* path):
|
||||||
|
|
||||||
|
```
|
||||||
|
SDIO IRQ
|
||||||
|
→ bes2600_gpio_irq_handler (bes2600_sdio.c:413)
|
||||||
|
→ queue_work(self->sdio_wq, &self->rx_work) (bes2600_sdio.c:416)
|
||||||
|
→ sdio_rx_work runs (bes2600_sdio.c:829)
|
||||||
|
→ bes2600_sdio_lock + memcpy_fromio
|
||||||
|
→ bes2600_sdio_extract_packets (skb_queue_tail to self->rx_queue)
|
||||||
|
→ self->irq_handler(self->irq_priv) (function call, not workqueue)
|
||||||
|
→ atomic_add_return(1, &hw_priv->bh_rx) (bh.c:130)
|
||||||
|
→ wake_up(&hw_priv->bh_wq)
|
||||||
|
→ bh_work (already running, never re-queued):
|
||||||
|
wait_event_interruptible_timeout returns
|
||||||
|
→ bes2600_bh_rx_helper (bh.c:961)
|
||||||
|
→ priv->sbus_ops->pipe_read (skb_dequeue from self->rx_queue)
|
||||||
|
→ wsm_handle_rx (wsm.c)
|
||||||
|
→ bes2600_rx_cb (txrx.c:1642)
|
||||||
|
→ ieee80211_rx_irqsafe(skb) (txrx.c:1947 / 1950)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Where I refine Sonnet:** the "9 workqueue events per delivered RX frame" claim doesn't survive source reading. Per IRQ *batch* there is **one** workqueue dispatch (sdio_wq.rx_work). `bh_work` is registered once, runs as a long-lived work item using `wait_event_interruptible_timeout` to sleep — the wake-up path is a wait-queue, not a workqueue dispatch. `ieee80211_rx_irqsafe` schedules a mac80211 tasklet, not a workqueue. The 5,643 `workqueue_execute_start/sec` ftrace count from Bug #5 is **system-wide**, not bes2600-only — it should not be quoted as "per frame" without per-pid filtering.
|
||||||
|
|
||||||
|
**What is real:** the indirection adds two synchronization points per frame (`skb_queue_tail` + `skb_dequeue`, each `&rx_queue->lock`) plus a cross-CPU wake-up plus a tasklet schedule. That's enough to dominate at 4 MB/s. The collapse is justified — just not by the 9× number.
|
||||||
|
|
||||||
|
### 2.2 ieee80211_rx_irqsafe from process context (Sonnet item 2) — concur, gated on contract verification
|
||||||
|
|
||||||
|
Confirmed: `ieee80211_rx_irqsafe` is the right primitive only when called from hard-IRQ context — it defers to a tasklet. From process context (which is where `bh_work` and `sdio_rx_work` both live), it adds a tasklet hop for nothing.
|
||||||
|
|
||||||
|
`ieee80211_rx_list(hw, sta, &skbs)` is the correct call shape if, and only if, two contract claims hold:
|
||||||
|
1. callable from process context with `local_bh_disable()` wrap (or callable bare),
|
||||||
|
2. SKB list invariants don't impose NAPI-poll semantics we can't honour.
|
||||||
|
|
||||||
|
Sonnet asserted both; I have **not** verified them against `include/net/mac80211.h` kerneldoc on a 6.19-class kernel. **Task #19 blocks Patch C on that verification.** Until it lands, treat the API claim as unconfirmed — this is exactly the Phase 6 contract-citation rule, and skipping it would be the same trap the older driver fell into.
|
||||||
|
|
||||||
|
### 2.3 ba_lock per-frame (Sonnet item 4) — concur
|
||||||
|
|
||||||
|
`txrx.c:998-1005` (TX path) and `txrx.c:1632-1640` (RX path): `spin_lock_bh(&hw_priv->ba_lock)` to bump 4 ints (`ba_acc`, `ba_cnt`, `ba_acc_rx`, `ba_cnt_rx`) and conditionally `mod_timer(&hw_priv->ba_timer, …)`. The TODO comment in `bes2600.h:359-365` literally says *"TODO: Same as above"* on every field — the original author flagged it as deferred work, then shipped.
|
||||||
|
|
||||||
|
Replace with `atomic_t` for the four counters and `cmpxchg`-guarded `mod_timer` for the arm-once invariant. Patch D.
|
||||||
|
|
||||||
|
### 2.4 ps_state_lock when pm_unsupported (Sonnet item 5) — concur
|
||||||
|
|
||||||
|
`txrx.c:1942-1948`: per-RX-frame `spin_lock_bh(&priv->ps_state_lock)` on the early-data path, protecting a check on `entry->status == BES2600_LINK_SOFT`. The lock exists to coordinate with the AP-side power-save state machine.
|
||||||
|
|
||||||
|
c7's contribution (`pm_unsupported = true`) means we already know this firmware doesn't honour PSM; the LINK_SOFT branch is an AP-mode soft-link state that won't transition under us when PSM is dead. Gate the lock acquisition on `!hw_priv->pm_unsupported`. Patch E.
|
||||||
|
|
||||||
|
(This patch is *narrower* than Sonnet framed it: it only applies when `pm_unsupported` latches on, which is at boot for our firmware. Production reality on this hardware = always; but the patch must remain conditional in case a future firmware fixes PSM and c7 self-clears the flag.)
|
||||||
|
|
||||||
|
## 3. Push-back against Sonnet
|
||||||
|
|
||||||
|
### 3.1 "BES_SDIO_OPTIMIZED_LEN config flag"
|
||||||
|
|
||||||
|
Not a runtime/Kconfig knob on this build. `Makefile:18` hard-codes `ccflags-y += -DBES_SDIO_OPTIMIZED_LEN`. Whether to keep it is a separate question, but Sonnet's recommendation should not have framed it as toggleable.
|
||||||
|
|
||||||
|
### 3.2 "Multiple workqueues are unconditionally bad"
|
||||||
|
|
||||||
|
There are three driver-side workqueues:
|
||||||
|
|
||||||
|
| name | purpose | dispatch shape |
|
||||||
|
|---|---|---|
|
||||||
|
| `bh_workqueue` | hosts the single long-running `bh_work` | one-shot at register, wait-queue driven thereafter |
|
||||||
|
| `sdio_wq` | sdio_rx_work + sdio_tx_work + sdio_scan_work | per-IRQ-batch dispatch |
|
||||||
|
| `hw_priv->workqueue` | scan, AP, PM, multicast-start, link-id, set-tim, … | per-event dispatch (~20 producers) |
|
||||||
|
|
||||||
|
**`bh_workqueue` is fine** — it runs a single work item forever, which is just a kthread-shaped-as-workqueue. The cost is one alloc_workqueue at register and zero ongoing dispatch overhead. Don't kill it.
|
||||||
|
|
||||||
|
**`sdio_wq` is the actual surgical target** — collapsing item 1 means subsuming `sdio_rx_work` into the bh-loop, after which `sdio_wq` only hosts tx_work and scan_work and could be merged with `hw_priv->workqueue` for cleanup. But that merge is cosmetic; do it later or never.
|
||||||
|
|
||||||
|
**`hw_priv->workqueue` shouldn't be touched.** It hosts ~20 unrelated producers; merging it into sdio_wq is the wrong direction (priority inversion risk under coex pressure).
|
||||||
|
|
||||||
|
### 3.3 "BH_RX_CONT_LIMIT=3 is the bottleneck"
|
||||||
|
|
||||||
|
Half-true. The limit caps the burst-RX pass to 3 frames before yielding to TX work. Raising it past 3 only helps if RX has steady backlog, which under our 4 MB/s ramp it does. But there's also `BH_TX_CONT_LIMIT=20` paired with it — TX gets 20-frame bursts, RX gets 3. The asymmetry is from a previous campaign that found TX-starvation, and **flipping it without re-running that campaign is a regression risk**. Treat the constant as a phase-7-knob, not a one-liner.
|
||||||
|
|
||||||
|
## 4. New findings Sonnet did not surface
|
||||||
|
|
||||||
|
### 4.1 `bh.c` carries ~700 lines of `#if 0` dead code
|
||||||
|
|
||||||
|
`bh.c:196-877` is the cw1200 ancestor `bes2600_bh()` preserved verbatim alongside the active impl at `bh.c:1332+`. Same function name, same `goto rx:` / `goto tx:` labels, same loop variables. The fossil block contains a typo (`if ((i = (CW12XX_MAX_VIFS - 1)) || !priv)` at lines 438 and 562 — single `=` is assignment-not-compare; live code at `ap.c:696` uses `==` correctly) which would be a real bug if compiled. **It is not compiled** — `#if 0` saves us — but this is the maintenance hazard you discover *first* when reading the file in a hurry.
|
||||||
|
|
||||||
|
Action: kill the `#if 0` block. Standalone hygiene patch, not on the Bug-#5 critical path.
|
||||||
|
|
||||||
|
### 4.2 Allwinner-specific code in the SDIO bus path
|
||||||
|
|
||||||
|
`bes2600_sdio.c:475` calls `sw_mci_check_r1_ready(self->func->card->host, 1000)` from inside the IRQ-setup error path. This is the Allwinner mmc driver's R1-ready helper — not portable to RK3566's `dw_mmc-rockchip` host driver.
|
||||||
|
|
||||||
|
The call is reachable only on `set_func` cleanup (a comparatively rare error path), but it is a build-time portability hazard. Most likely a stub macro on non-Allwinner builds; verify on ohm or wrap behind `#ifdef CONFIG_MMC_SUNXI`.
|
||||||
|
|
||||||
|
### 4.3 `asm volatile ("nop")` placeholder in the live BH loop
|
||||||
|
|
||||||
|
`bh.c:1518` is where IRQ re-enable used to be (`__bes2600_irq_enable(1)` is commented out two lines above). The author left a literal nop instruction "asm volatile" instead of removing the dead block. Either re-enable IRQs (if the code was deleted prematurely) or remove the nop (if IRQs are intentionally always-on). This is non-cosmetic — it indicates an unresolved IRQ-handling decision.
|
||||||
|
|
||||||
|
### 4.4 `BUG_ON` in the steady-state hot path
|
||||||
|
|
||||||
|
`bh.c:1488`: `BUG_ON(hw_priv->hw_bufs_used > hw_priv->wsm_caps.numInpChBufs)` runs *every* BH iteration. Tripping it locks up the kernel during normal operation — by definition the wrong response to a bookkeeping bug. Should be `WARN_ON_ONCE` + bail-out. (Same critique applies to several other `BUG_ON`s in `bh.c` — search the active `#else` block.)
|
||||||
|
|
||||||
|
### 4.5 Build-system is a vendor SDK, not a kernel-style driver
|
||||||
|
|
||||||
|
`Makefile:1-50` defaults: `CONFIG_BES2600_TESTMODE ?= y`, `WIFI_BT_COEXIST_EPTA_ENABLE ?= y`, `BES2600_INTEGRATED_MODULE_V1/V2/V3` for *xiaomi R329 wifi module*, *sicun QM215 wifi module*, *bes evb*. 86 `#ifdef CONFIG_BES2600_TESTMODE` sites — testmode is essentially compiled-in dead code in non-test builds.
|
||||||
|
|
||||||
|
The driver was built by Bestechnic to ship per-customer board variants from one source tree. Upstreaming will require ripping that whole apparatus out, replacing with `Kconfig` toggles and platform-data lookups. This is **not** a Bug-#5 dependency, but it is a debt that pollutes every other patch — diff hunks land in `#ifdef`-walled territory and conflict on rebases for unrelated reasons.
|
||||||
|
|
||||||
|
### 4.6 8 `EXPORT_SYMBOL` declarations from a single-binary module
|
||||||
|
|
||||||
|
The driver exports `bes2600_irq_handler`, `bes2600_bh_wakeup`, `bes2600_bh_suspend`, `bes2600_bh_resume`, etc. — for whom? The only known consumer is `bes2600_btuart`, the BT sibling module. Either the BT module needs a coherent shared-driver API surface (refactor target), or these exports should become `static`. Random sibling-module coupling via global symbols is a known kernel anti-pattern.
|
||||||
|
|
||||||
|
### 4.7 No `__must_check` on functions that obviously return errors
|
||||||
|
|
||||||
|
Almost every `bes2600_data_read` / `bes2600_data_write` / `bes2600_reg_read*` call site is wrapped in `WARN_ON()`. That's defensive but not enforced. A single missed return-check (compiler will not warn) is a silent SDIO-path bug. Annotation cost is one keyword per declaration; benefit is a class of bugs caught at compile time.
|
||||||
|
|
||||||
|
### 4.8 `rx_queue` is per-sbus_priv, not per-vif
|
||||||
|
|
||||||
|
Multi-vif RX serializes through one `skb_queue` on the sbus side (`bes2600_sdio.c:867` queues to `self->rx_queue`, only dequeued by the single bh thread). For STA-only operation this doesn't matter; for STA+AP concurrent or P2P-multivif it's a structural ceiling on aggregate RX throughput. Out of scope for Bug #5 but worth recording — Markus's "P2P_MULTIVIF=y" Makefile default makes this potentially observable.
|
||||||
|
|
||||||
|
## 5. Ordering recommendation for the cleanup roadmap
|
||||||
|
|
||||||
|
Given (a) the current Bug-#5 budget, (b) Phase-7 stress-ramp cost per patch, (c) the constraint that the cleanups branch must rebase cleanly on Mobian's `mobian` for re-MR:
|
||||||
|
|
||||||
|
| order | patch | scope | phase-7 cost | risk |
|
||||||
|
|---|---|---|---|---|
|
||||||
|
| 1 | **Patch C (items 1+2 wrapped)** | hot path: collapse sdio_rx_work into bh, batch deliver via ieee80211_rx_list | full ramp 1→4→8 MB/s | high — touches RX hot path |
|
||||||
|
| 2 | **Patch D (item 4)** | ba_lock → atomics + cmpxchg-guarded mod_timer | minimal — lock-stat delta + 5min @ 4MB/s smoke | low |
|
||||||
|
| 3 | **Patch E (item 5)** | ps_state_lock skip when pm_unsupported | minimal — same as D | low (gated on c7's existing latch) |
|
||||||
|
| ∞ | bh.c #if 0 graveyard removal | pure delete | none — recompile + smoke | zero |
|
||||||
|
| ∞ | CW12XX → BES2600 rename | mass rename | none — but every open patch conflicts | high churn cost, zero behaviour change |
|
||||||
|
| **NOT** | Allwinner abstraction layer | wrap sw_mci_check_r1_ready | n/a | scope-creep; do only if RK3566 fails on it |
|
||||||
|
| **NOT** | Vendor-SDK Makefile rewrite | Kconfigify | n/a | upstream-prep work, not Bug-#5 |
|
||||||
|
| **NOT** | bh_workqueue / sdio_wq merge | structural | n/a | speculation, no measured win |
|
||||||
|
|
||||||
|
Patch C is high-risk; merging items 1 and 2 into one patch is the user's call (made: "wrap them together") but should **be reviewed Phase-5 before Phase-6 implementation lands** — exactly the receipts-checklist that this CLAUDE.md exists to enforce. Splitting Patch C into 1-then-2 is *also* defensible; if Phase 7 finds item 1 regressed something, item 2 in isolation is harder to bisect.
|
||||||
|
|
||||||
|
## 6. Things I would explicitly NOT do
|
||||||
|
|
||||||
|
- **Don't paint the bikeshed on naming.** CW12XX → BES2600 rename is a 30+ file mass-substitute that conflict-spams every open topic branch. It is the right fix *for upstreaming*, not for the cleanups branch.
|
||||||
|
- **Don't refactor the workqueue topology.** Three workqueues is fine. Two workqueues for cosmetic reasons risks priority inversion under coex pressure.
|
||||||
|
- **Don't replace the BH thread architecture.** It works, the wait-queue model is well-suited to the IRQ → drain pattern, and replacing it with NAPI or threaded-IRQ would re-do six years of debugging in a single patch.
|
||||||
|
- **Don't strip the `#ifdef CONFIG_BES2600_TESTMODE` blocks** until upstream-prep. They are vendor-SDK debt but harmless dead code.
|
||||||
|
- **Don't wrap the Allwinner helper** unless RK3566 actually trips it. The path is rare-error.
|
||||||
|
|
||||||
|
## 7. What I would tell a fresh reviewer in one paragraph
|
||||||
|
|
||||||
|
> *This driver is genealogically a CW1200 driver (ST-Ericsson, ~2010) with chip-name search-and-replace done halfway. The hot path was designed for SPI with one-frame-per-IRQ; SDIO multi-block coalescing was bolted on with a worker-queue handoff that adds two synchronization points per frame. Bug #5's RX-throughput regression at 4 MB/s is a direct consequence: at low rate the handoff overhead is invisible; at high rate it dominates. Three small patches (Patches C, D, E) reclaim most of the floor without touching the genealogy. The genealogy itself is technical debt for upstreaming, not a Bug-#5 dependency. Don't conflate the two.*
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Disagreements summary
|
||||||
|
|
||||||
|
| Sonnet claim | My finding |
|
||||||
|
|---|---|
|
||||||
|
| "9 workqueue events per delivered RX frame" | overstated; per IRQ batch is 1 workqueue dispatch on this build. The 5,643/sec ftrace count is system-wide, needs per-pid filtering before claiming as bes2600 dispatch rate. |
|
||||||
|
| "BES_SDIO_OPTIMIZED_LEN config flag" | hard-baked in Makefile as `-D…` ccflags, not toggleable |
|
||||||
|
| Item 4 / Item 5 sized as one patch each | concur — separate small patches as Markus directed |
|
||||||
|
| Item 1 + 2 mergeable | concur — directionally; predicated on `ieee80211_rx_list()` contract (Task #19) |
|
||||||
|
|
||||||
|
## 9. Open questions for Markus
|
||||||
|
|
||||||
|
1. **Patch C split-or-merge:** user directive is "wrap together". I'd note that a Phase-7 regression in the merged patch is harder to bisect than two sequential Phase-7 runs. Keeping the directive but recording the bisect-cost as known.
|
||||||
|
2. **`__bes2600_irq_enable(1)` commented out:** is IRQ re-enable intentionally always-on now, or is the `nop` a deletion-in-progress bug? Reading the c-stack history doesn't tell me. Worth a "what was this for" pass before any RX-architecture patch lands.
|
||||||
|
3. **`sw_mci_check_r1_ready` on RK3566:** should we test or just trust the path is rare-error? My read is: trust + `WARN_ON` if it's ever called, then react.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
*Written 2026-05-07. Reviewing as Opus 4.7 against Sonnet 4.6's review of the same source tree. Independent reads of: bh.c, bes2600_sdio.c (sdio_rx_work + pipe_read + IRQ handler), txrx.c (RX delivery sites + ba_lock + ps_state_lock sites), bes2600.h (struct lock topology), Makefile (build-system shape). No simulator runs; this is a static-analysis writeup, the dynamic verification of any claim above belongs in Phase 7 of the corresponding patch.*
|
||||||
Reference in New Issue
Block a user