Sonnet architect review for Bug #5 — ranked restructuring map #7

Merged
marfrit merged 1 commits from claude-noether-5 into main 2026-05-07 16:01:55 +00:00
Collaborator

Architect review delegated to a Sonnet sub-agent given the Phase 0 measurement context (5,643 workqueue dispatches/sec, 13/sec wsm_cmd_send → ratio of ~9 workqueue events per delivered RX frame).

Headline

Rank Item Predicted gain Effort
1 Collapse sdio_rx_work relay into BH loop ~5x dispatch reduction Medium
2 Batch deliver via ieee80211_rx_list() per-frame softirq removed Small
3 Synchronous TX flush from BH TX-side dispatch noise Small
4 Replace ba_lock per-frame with atomic / per-cpu 611 lock/sec removed Small
5 Skip ps_state_lock when PSM-known-disabled dead overhead removed Small
6 Raise BH_RX_CONT_LIMIT after 1-3 land unlocks residual throughput Trivial
7 Consolidate workqueues post 1+3 cleanup Small
8 Firmware block-read size not bottleneck today N/A

Items 1 + 2 together are the structural answer: ~9 workqueue events per delivered frame collapse to ~1, and the per-frame softirq cost disappears. The 9-minute beacon-loss cascade we saw in rep 3 of the Phase 0 anchor is almost certainly starvation of the BH wait-event under the per-frame workqueue storm — item 1 removes the mechanism.

Asks

  1. Lock items 1+2 together as the next Phase 4 patch (call it Patch C), or split — item 1 first, then item 2 in a follow-up?
  2. Ship items 4 + 5 (the small lock-removal cleanups) as separate small patches even though they're not the hot path? They're cheap and individually verifiable.
  3. Anything in the source citations that smells like Sonnet got a contract wrong (mac80211 API, SDIO host-lock, BES_SDIO_OPTIMIZED_LEN semantics)? I haven't independently verified the kerneldoc claim about ieee80211_rx_list() from process context.

Where the receipts live

  • Phase 0 anchor + lock-instrumentation: /root/bes2600-samples/run-20260507-1248-patchB/bug5/ on ohm
  • Source pinned in the report
  • Deployed module srcversion: 1B3B3ED096AAD7217FEDE11

🤖 Generated with Claude Code

Architect review delegated to a Sonnet sub-agent given the Phase 0 measurement context (5,643 workqueue dispatches/sec, 13/sec wsm_cmd_send → ratio of ~9 workqueue events per delivered RX frame). ## Headline | Rank | Item | Predicted gain | Effort | |---|---|---|---| | 1 | Collapse `sdio_rx_work` relay into BH loop | ~5x dispatch reduction | Medium | | 2 | Batch deliver via `ieee80211_rx_list()` | per-frame softirq removed | Small | | 3 | Synchronous TX flush from BH | TX-side dispatch noise | Small | | 4 | Replace `ba_lock` per-frame with atomic / per-cpu | 611 lock/sec removed | Small | | 5 | Skip `ps_state_lock` when PSM-known-disabled | dead overhead removed | Small | | 6 | Raise `BH_RX_CONT_LIMIT` after 1-3 land | unlocks residual throughput | Trivial | | 7 | Consolidate workqueues post 1+3 | cleanup | Small | | 8 | Firmware block-read size | not bottleneck today | N/A | **Items 1 + 2 together are the structural answer**: ~9 workqueue events per delivered frame collapse to ~1, and the per-frame softirq cost disappears. The 9-minute beacon-loss cascade we saw in rep 3 of the Phase 0 anchor is almost certainly starvation of the BH wait-event under the per-frame workqueue storm — item 1 removes the mechanism. ## Asks 1. Lock items 1+2 together as the next Phase 4 patch (call it Patch C), or split — item 1 first, then item 2 in a follow-up? 2. Ship items 4 + 5 (the small lock-removal cleanups) as separate small patches even though they're not the hot path? They're cheap and individually verifiable. 3. Anything in the source citations that smells like Sonnet got a contract wrong (mac80211 API, SDIO host-lock, BES_SDIO_OPTIMIZED_LEN semantics)? I haven't independently verified the kerneldoc claim about `ieee80211_rx_list()` from process context. ## Where the receipts live - Phase 0 anchor + lock-instrumentation: `/root/bes2600-samples/run-20260507-1248-patchB/bug5/` on ohm - Source pinned in the report - Deployed module srcversion: `1B3B3ED096AAD7217FEDE11` 🤖 Generated with [Claude Code](https://claude.com/claude-code)
claude-noether added 1 commit 2026-05-07 15:57:34 +00:00
Sonnet (general-purpose subagent, model=sonnet) reviewed
~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement
context. Output: 8-item ranked restructuring map, file:line cited.

Headline:
- Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue
  dispatch reduction, medium effort)
- Item 2: batch deliver via ieee80211_rx_list (small effort, removes
  per-frame softirq)
- Items 1 + 2 together collapse "9 workqueue events per delivered
  frame" to ~1.

Items 3-5 clean up next-layer overhead (TX-side queue_work,
per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8
are follow-ons to be re-measured after 1-3 land.

Phase 4 plan locking the lead candidate(s) follows in a separate PR.
Owner

Lock items 1+2 together as the next Phase 4 patch (call it Patch C), or split — item 1 first, then item 2 in a follow-up? Let us try to wrap them together (1 and 2).
Ship items 4 + 5 (the small lock-removal cleanups) as separate small patches even though they're not the hot path? They're cheap and individually verifiable. Yes.
Anything in the source citations that smells like Sonnet got a contract wrong (mac80211 API, SDIO host-lock, BES_SDIO_OPTIMIZED_LEN semantics)? I haven't independently verified the kerneldoc claim about ieee80211_rx_list() from process context. Please add this as a task. Also, second opinion as Opus - what do you think about the wifi part? Create a write up - BES2600 WiFi structual analysis and code critique.

Lock items 1+2 together as the next Phase 4 patch (call it Patch C), or split — item 1 first, then item 2 in a follow-up? Let us try to wrap them together (1 and 2). Ship items 4 + 5 (the small lock-removal cleanups) as separate small patches even though they're not the hot path? They're cheap and individually verifiable. Yes. Anything in the source citations that smells like Sonnet got a contract wrong (mac80211 API, SDIO host-lock, BES_SDIO_OPTIMIZED_LEN semantics)? I haven't independently verified the kerneldoc claim about ieee80211_rx_list() from process context. Please add this as a task. Also, second opinion as Opus - what do you think about the wifi part? Create a write up - BES2600 WiFi structual analysis and code critique.
marfrit merged commit 4344873f2d into main 2026-05-07 16:01:55 +00:00
Sign in to join this conversation.
No Reviewers
No Label
2 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/besser#7