Sonnet (general-purpose subagent, model=sonnet) reviewed
~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement
context. Output: 8-item ranked restructuring map, file:line cited.
Headline:
- Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue
dispatch reduction, medium effort)
- Item 2: batch deliver via ieee80211_rx_list (small effort, removes
per-frame softirq)
- Items 1 + 2 together collapse "9 workqueue events per delivered
frame" to ~1.
Items 3-5 clean up next-layer overhead (TX-side queue_work,
per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8
are follow-ons to be re-measured after 1-3 land.
Phase 4 plan locking the lead candidate(s) follows in a separate PR.
Follow-up ftrace measurement (post-reboot, 3-min 4MB/s capture):
- workqueue_execute_start: 5,643/sec ← dominates
- wsm_cmd_send: only 13/sec (host-to-chip command path NOT the hotspot)
- lock contention: 50/sec (modest)
The throughput floor is set by per-SDIO-transaction workqueue dispatch
overhead. Surgical patches B5-1/B5-2/B5-3 from the prior Phase 4 plan
all targeted the wrong layer; deferring those until an architectural
restructuring map is produced.
Promoting the Sonnet architect review from "backlog" to
"blocking on Bug #5" — the next step is a restructuring assessment,
not another patch.
Phase 0 anchored at N=3 reps (10min @ 4MB/s pv-cap on 2.4GHz):
- rep1+2: ~700 KB/s sustained (10% of link capacity)
- rep3: link death at ~9 min in (passive mode, beacon-loss cascade)
Hot symbol identified: _raw_spin_unlock_irqrestore at ~20% CPU in both
healthy and failed reps, callstack process_one_work → wsm_configuration
→ wsm_cmd_send → bes2600_bh.isra.0 → spin-unlock.
Phase 1 metric locked: ≥2 MB/s sustained throughput, <10% CPU in lock-
cycling, no link death under 30 min continuous load.
Three Phase 4 candidates drafted (B5-1: shrink wsm_cmd_send lock scope;
B5-2: coalesce vif_list_lock in BH dispatcher; B5-3: SPSC ringbuffer for
WSM commands). Locking pending review.
Phase 7 verification of cleanups + Patch A + Patch B (srcversion
1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load
ramped 1 MB/s → 4 MB/s on 2.4GHz newton.
Patch A: predicted delta CONFIRMED at N=2 reproductions.
- 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation
- 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation
Patch B: installed, untriggered. 2 api_connection_loss events spaced
91 s apart, never tripping the 3-in-60s threshold. No false positives,
no spurious bus_resets. Recovery delta unobserved (no harm done).
Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks
firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth
in ~4 s. New backlog item.
Bug #5 documented separately (RX path degrades under throughput
pressure; possible root of the original Phase-0 YouTube frame drops).
Observed 2026-05-07: bumping the netcat sender from 1 MB/s to 4 MB/s
DECREASED ohm's observed RX rate (1015 KB/s → 563 KB/s) and degraded
the link (signal -57 → -67 dBm, MCS 4 → 3). Chip can't sustain near-
link-rate RX even though theoretical capacity is ~8 MB/s.
Hypothesis: driver/firmware lock contention or busy-wait on the RX
SDIO path. Plausibly explains the original Phase-0 observation that
YouTube DASH chunks drop ~10 frames per chunk fetch — chunk fetch is
a brief near-line-rate burst that this bug would be triggered by.
Drafts Patch A (decrypt-storm fast-recover, Trigger B) at txrx.c:1696
with sliding-window threshold + ieee80211_connection_loss reassoc.
Patch B (beacon-loss / Trigger A) parked behind one more diagnostic
rep with 10s snap-loop cadence on the beacon-loss counter.
Folds reviewer feedback from PR #3 + the new Trigger-A finding
(post-resume P1 = api_connection_loss-driven, two reps captured today
at 17:23 and 18:03) into a revised Phase 1 metric counting three
event classes.
Pending Phase 5 second-model review of the plan before Phase 6
implementation.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Suppresses the WSM-scan WARN cascade in bes2600_scan_work +
wsm_handle_rx by (a) pre-checking BT A2DP coex state and
backing off after N consecutive firmware rejections, and
(b) demoting the WARN() in wsm_generic_confirm to bes_devel
(the upstream-caller's wiphy_warn with request-id is kept).
Deployed + verified on ohm (srcversion A5C8146A…): WARN splat
count 0 per 1 min boot (pre-patch: 32 per 25 min). WiFi
immediately roamed from ch1 signal-47 to ch48 5240 MHz newton
because scan completions now land cleanly with mac80211.
Other counters still 0. Net: +83/-2 lines across 3 files.
Standalone single-patch series in both Mobian-paths and
drivers/staging/bes2600/ paths variants, checkpatch --strict
clean.
Salsa-CI's lintian stage flagged the pre-existing boilerplate
paragraph in debian/copyright as 'old-fsf-address-in-copyright-
file' when cleanups first hit CI. Replace the '51 Franklin
Street, Fifth Floor, Boston, MA 02110-1301 USA' literal with a
'https://www.gnu.org/licenses/' reference; the
/usr/share/common-licenses/LGPL-2.1 reference a few lines later
is unchanged, so license-text location is still covered.
Pushed to salsa and gitea as commit f31c57a on branch cleanups
and as standalone topic debian/copyright-fsf-address on gitea.
Demote 'wait pm ind timeout' from bes_err() to bes_devel() in
bes2600_pwr_enter_lp_mode(). The cascade this used to warn about
is already suppressed by c2 (pm-gate-on-handshake); the remaining
log line is benign steady-state noise (3-9 events per 10-min
uptime on PineTab2). Deployed + verified on ohm (srcversion
ED89A26…): err-priority count 0, WiFi associated, no
regression. 1-line patch.
Completes the filp_open/kernel_read/kernel_write removal pass
across the driver. Deletes bes_fw.c DATA_DUMP_OBSERVE blocks
(4 #ifdefs gated on a commented-out #define, dead by default;
would fail to build on modern kernels due to removed
get_fs/set_fs) and main.c's orphan access_file() helper
(no callers in-tree, also relies on get_fs/set_fs).
With c1.2 + c1.3 + c1.4 combined: zero filp_open /
kernel_read / kernel_write / vfs_read / vfs_write references
anywhere in the driver -- precondition for a linux-wireless
RFC for drivers/staging/bes2600/ unlocked.
Deployed + verified on ohm (srcversion 12BAFB9C…): WiFi
associated, no KFENCE / sdio_tx_work / RX failure / PS Mode
Error / factory cali data get failed. Net: -69 lines.
Follow-up to staging-prep-series + factory-drop-kernel-write:
remove the BES2600_WRITE_DPD_TO_FILE-gated filp_open / kernel_read /
kernel_write DPD file paths from bes_chardev.c, including the three
functions, the no_dpd module_param, and the associated Makefile
knob + three PATH macros (BES2600_DPD_PATH, BES2600_DEFAULT_DPD_PATH,
BES2600_DPD_GOLDEN_PATH).
Deployed + verified on ohm (srcversion C1283A24…): WiFi associated,
KFENCE / sdio_tx_work splat / RX failure / PS Mode Error / factory
cali data get failed = 0, no DPD regression. Net: -155 lines.
Standalone single-patch series in both Mobian-paths and
drivers/staging/bes2600/ paths variants, checkpatch --strict clean.
Follow-up to staging-prep-series: remove kernel_write() +
filp_open(O_CREAT) from factory_section_write_file() in
bes2600_factory.c. Serialised blob now lives only in-memory +
file_buffer for the session; no longer persists to
/lib/firmware/bes2600/bes2600_factory.txt across reboots.
Deployed + verified on ohm (srcversion BD0D1AEC...): WiFi
associated, KFENCE / sdio_tx_work splat / RX failure / PS Mode
Error / factory cali data get failed all 0. Standalone single-
patch series in both Mobian-paths and drivers/staging/bes2600/
paths variants, checkpatch --strict clean.
Fold bes2600/tx-sdio-dma-oob into the linear series as 7/7.
Re-cover-letter and update testing matrix. Update UPSTREAM.md table
and submission route to list the 7th branch. The PS Mode Error
residual note is removed from the known-limitations section -- it
stopped recurring after 7/7 deployed.
Both staging-prep-series/ (Mobian paths) and staging-prep-series-
danctnix/ (drivers/staging/bes2600/ paths) variants regenerated;
all 14 patch files checkpatch --strict clean (0/0/0).
Net: +117 / -550 lines across 9 files.
Diagnosed via KFENCE splat on ohm (6.19.10-danctnix1, PineTab2):
sdio_tx_work passes block-size-aligned length to dma_map_sg while
the underlying skb linear head is only tx_buffer->len long. Fix:
driver-owned DMA bounce page, memcpy + zero-pad per TX buffer.
Both Mobian paths and drivers/staging/bes2600/ paths variants,
checkpatch-clean.
The danctnix-series cover letter had an invalid 41-hex-char
placeholder SHA on its 'From' header line (40 zeros with an
f1d22ab0 prefix) -- a relic of manual generation. Replaced with
the valid 40-char SHA already used by the Mobian-series cover
letter. The value is cosmetic (cover letters aren't real commits)
but an invalid SHA will confuse 'git am'.
Em-dashes (U+2014), en-dashes (U+2013), and arrow glyphs (U+2192,
U+2194) leaked into commit messages and the cover letter. Linux
kernel patches are expected to be plain ASCII. Substitute:
-- for em-dash
- for en-dash
-> for rightward arrow
<-> for left-right arrow
Applied to both staging-prep-series/ (Mobian paths) and
staging-prep-series-danctnix/ (drivers/staging/bes2600/ paths).
checkpatch.pl --no-tree --strict: 0/0/0 on all 12 patches.
Danctnix is the de-facto upstream for PineTab2 Arch Linux ARM images and
carries the bes2600 driver in-tree at drivers/staging/bes2600/ in
codeberg.org/DanctNIX/linux-pinetab2 (tag v6.19.10-danctnix1).
Same 6 commits as the Mobian series (staging-prep-series/), regenerated
with paths rooted at drivers/staging/bes2600/ so 'git am' applies
cleanly onto a fresh v6.19.10-danctnix1 clone with no path mangling.
Per-patch content is byte-identical to the Mobian series; the
commit-message bodies are preserved. checkpatch.pl --no-tree --strict
passes for all six.
UPSTREAM.md extended with a 'Near-term alt: danctnix linux-pinetab2
(codeberg)' section covering the submission route (codeberg fork +
PR, or Danct12-direct).
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
UPSTREAM.md captures the submission-ready state:
- patch-by-patch intent and testing status
- CW1200 lineage narrative
- branch-to-patch mapping on marfrit/bes2600-dkms
- submission routes (near-term Mobian MR, longer-term linux-wireless
RFC for drivers/staging/)
- known limitations left for follow-up
- recommended CC list for a future linux-wireless RFC
patches/staging-prep-series/ contains the linear 6-patch series with
cover letter, generated from the bes2600/staging-prep-series branch on
marfrit/bes2600-dkms (cherry-picked off mobian in dependency order).
All patches checkpatch.pl --no-tree --strict clean.
Branch mapping:
1/6 bes2600/factory-request-firmware (c1)
2/6 bes2600/factory-no-efuse-flag (c5, stacked on c1)
3/6 bes2600/factory-thread-dev (c1.1, stacked on c1+c5)
4/6 bes2600/pm-gate-on-handshake (c2, standalone)
5/6 bes2600/remove-chardev-user-interface (c3, standalone)
6/6 bes2600/enable-testmode (c4, standalone)
Total: 79 insertions, 549 deletions. Net -470 lines.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Fifth and (for now) final patch in the BESser staging-prep series.
enable-testmode/
0001-bes2600-enable-CONFIG_BES2600_TESTMODE-by-default-fi.patch
Flips the CONFIG_BES2600_TESTMODE Makefile default from n to y, which
exposes the mac80211 testmode_cmd surface (-> firmware
patch_wifi_testMode) through the standard nl80211 testmode interface.
This is the replacement path for the /dev/bes2600 signal/no-signal
switching that the preceding c3 patch removed.
Enabling the flag exposes accumulated bit-rot in the testmode code:
~41 calls to bes2600_info/err/warn/dbg/err_with_cond which have no
corresponding #define anywhere in-tree, plus 3 functions
(bes2600_start_stop_tsm, bes2600_get_tsm_params, bes2600_get_roam_delay)
with external linkage but no prototypes. Both classes of error are
fixed in the same commit:
- Add shim macros to bes_log.h rewiring bes2600_info() etc to the
existing bes_info() / bes_err() / bes_warn() / bes_devel() family,
ignoring the legacy BES2600_DBG_* subsystem-id first argument.
- Define BES2600_DBG_SBUS / _DOWNLOAD / _ITP / _TEST_MODE as 0
constants for documentation and grep-ability.
- Mark the 3 TSM / roam-delay helpers static (they are only called
from bes2600_testmode_cmd in the same file).
Verified on PineTab2 (BES2600WM + RK3566) running linux-pinetab2
6.19.10-danctnix1-1 + CONFIG_NL80211_TESTMODE=y:
- Module builds cleanly
- 'iw phy0' lists 'testmode' under Supported commands
- wifi stays associated post-reboot; bug #2 (PM handshake timeout)
and bug #3 (SDIO TX splat) counts remain 0 across the c1+c5+c2+c3+c4
stack
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
patches/ now mirrors the topic-branch structure on
marfrit/bes2600-dkms:
factory-series/ = bes2600/factory-no-efuse-flag (c1 + c5 stacked)
0001-*-request_firmware-*.patch = c1 (request_firmware() read path)
0002-*-STANDARD_FACTORY_EFUSE_FLAG-*.patch = c5 (default flag off)
pm-gate-on-handshake/ = bes2600/pm-gate-on-handshake (c2 standalone off mobian)
0001-*-gate-device-LP-mode-entry-*.patch = c2 (gate + ETIMEDOUT)
c1 + c5 are stacked because c5's fix depends on c1's rewrite of
factory_section_read_file() having been applied first (otherwise the
parse-fail error is masked).
c2 is standalone because bes_pwr.c is orthogonal to the factory/
request_firmware work; it can be submitted independently.
All three patches verified on PineTab2 (BES2600WM + RK3566) running
linux-pinetab2 6.19.10-danctnix1-1. Bug #1 (factory.txt read path),
bug #1.5 (parse fail), bug #2 (PM handshake timeout spam) all resolved
by this series. Bug #3 (SDIO TX WARN) is reduced to a single boot-time
event that does not cascade.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>
Regenerates patches/ as a proper series off marfrit/bes2600-dkms:mobian:
0001-bes2600-use-request_firmware-for-factory.txt-read.patch (c1)
0002-bes2600-default-STANDARD_FACTORY_EFUSE_FLAG-off-for-.patch (c5)
c5 covers two changes in a single commit:
- Makefile: STANDARD_FACTORY_EFUSE_FLAG default flip from y to n
(the PineTab2 shipped factory.txt has no ##select_efuse_flag
section, so the driver was expecting 31 sscanf fields and failing
on the 30-field file).
- wsm.h: drop the #if defined(STANDARD_FACTORY_EFUSE_FLAG) guard
around the wsm_save_factory_txt_to_mcu() prototype. The function
definition in wsm.c and the call site in sta.c were always ungated,
so with the new flag default gcc -Werror=missing-prototypes would
otherwise break the build.
Both patches verified on PineTab2 (BES2600WM + RK3566) running
linux-pinetab2 6.19.10-danctnix1-1: post-reboot dmesg no longer shows
the factory-read or factory-parse error sequence.
Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>