Commit Graph

65 Commits

Author SHA1 Message Date
claude-noether 122582e270 danctnix-besser: pkgrel=3 — refine Patch I, add SCS-off + GCC15 workaround
Three things bundled because they were verified together in the same
deploy cycle on ohm (kernel built fresh on boltzmann 2026-05-18):

1. 0002 (Patch I) refined: refuse only multi-channel 5 GHz scans
   (n_channels > 1).  Original Patch I refused everything, which
   blocked NM's per-frequency BSS discovery and made 5 GHz association
   impossible.  Tighter guard preserves the storm fix and unblocks
   5 GHz attachment via NM 802-11-wireless.band=a profiles.

   Verified on ohm with pkgrel=3: associated to BSSID
   c0:25:06:e6:5b:33 on 5240 MHz (ch.48), TX 150 Mbit/s MCS 7
   HT40 short-GI vs 72.2 Mbit/s on 2.4 GHz.  Pattern A still 0.

   Source-of-truth: marfrit/bes2600-dkms branch bes2600/scan-filter-5ghz
   commits 093a503 + 8cd10f4 (squashed into this single 0002 file).

2. 0003 (new): arm64 xor-neon Makefile workaround for GCC 15.2.1
   strict pragma validator vs arm_neon.h target() blocks losing
   -ffixed-x18 under SCS=y.  This is a defensive workaround;
   currently dead-coded (SCS=n below) but in place for the day SCS
   re-enable becomes possible (tracked in besser#20).

3. config: CONFIG_SHADOW_CALL_STACK=n override for the current GCC
   15.2.1 toolchain issue.  Restore to =y once GCC upstream fixes
   the arm_neon.h pragma interaction (besser#20).

pkgrel bumped 2 -> 3.

Refs: besser#1 (closed), besser#20, kernel-agent#25 (PR mirroring
this into the kernel-agent patch tree — needs follow-up to pick
up the refinement).
2026-05-18 15:57:05 +02:00
claude-noether ae175f9745 danctnix-besser: ship patch 0002 — filter 5 GHz scans at driver boundary
Adds 0002-bes2600-filter-5ghz-scan.patch on top of the existing
cumulative series, addressing besser issue #1 (recurring
wsm_generic_confirm 0x0007 / [SCAN] Scan failed (-22) pattern).

The fix refuses 5 GHz hw_scan iterations in bes2600_hw_scan; the
firmware-reject cascade for the 5 GHz leg of mac80211's per-band
hw_scan loop is short-circuited.  Source-of-truth commit lives on
marfrit/bes2600-dkms branch bes2600/scan-filter-5ghz (sha 093a503).

Predicted Phase 7 delta: Pattern A rate 14/h -> 0/h. See besser#1
comment 1171 for the full Phase 0-4 analysis and Phase 5 review.

pkgrel bumped to 2.
2026-05-18 11:28:33 +02:00
claude-noether 693e9b42aa danctnix-besser README: install/verify/rollback + per-patch source link
Two readiness gaps surfaced after the end-to-end install verification on
ohm 2026-05-08:

(1) The "Building" section was a one-liner ("makepkg -s ... pacman -U
    ... reboot") with no actual install commands.  Replaced with proper
    Building / Installing / Verifying / Rolling back sections, using
    the exact commands that worked end-to-end on ohm:

    - sudo pacman -U <pkg.tar.zst>
    - The new conflicts/provides metadata means no --overwrite needed
    - PineTab2 U-Boot script update via /boot/boot.txt + mkscr
    - Off-device backup (boot.scr.pre-besser) for trivial rollback
    - Post-reboot checks: uname -r, lsmod, /sys/module/bes2600/srcversion

(2) The "What's in the patchset" table listed Patch G / Patch B / etc.
    without linking to the actual commits.  Added a preamble pointer to
    the cleanups branch on marfrit/bes2600-dkms gitea, which is the
    source-of-truth for individual commits + Phase-7 verification logs.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 17:31:35 +02:00
claude-noether 0f783a1e69 danctnix-besser PKGBUILD polish: drop-in replacement metadata + DTB strip fix
(1) Add `provides=("linux-pinetab2=$pkgver-$pkgrel")` and
    `conflicts=(linux-pinetab2)` so pacman -U cleanly replaces the
    upstream linux-pinetab2 package without needing --overwrite for the
    shared rk3566-pinetab2-*.dtb files.

    Verified end-to-end on ohm 2026-05-08: with these declarations
    pacman would refuse coexistence (matching the actual filesystem
    reality - both packages own the same DTB paths) and accept upgrade
    when removing the old package.

    Keeping `replaces=(wireguard-arch)` from upstream linux-pinetab2.
    Not adding linux-pinetab2 to replaces= since the soft-upstream
    intent is opt-in sidegrade, not auto-install on -Syu.

(2) Replace the bash for-loop DTB strip with find -delete.

    The original loop silently no-op'd during the makepkg-fakeroot
    package() phase: build verification of the published .pkg.tar.zst
    showed 236 DTBs, 234 of them unrelated boards (px30-*, rk3308-*,
    rk3328-*, rk3399-*, etc).  Root cause not pinned down (suspected
    nullglob or cwd interaction), but find -mindepth 1 -maxdepth 1
    ! -name 'rk3566-pinetab2-*' -delete is robust to that environment
    and correctly identifies 2 to keep / 234 to remove on the existing
    pkgdir.

    Net pkg size impact: ~5 MB reduction (most non-pinetab2 DTBs are
    20-40 KB).

No kernel rebuild required - PKGBUILD-only metadata + package() logic
change.  Will take effect on the next makepkg run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 15:08:16 +02:00
claude-noether 843d40231f danctnix-besser: regen cumulative patch with bes_chardev.{c,h} merge fix
Build (PID 558898 on boltzmann) failed at bes2600_btuart.c:81:
  error: implicit declaration of function 'bes2600_chrdev_switch_subsys_glb'

Root cause: the original danctnix-flavor adaptation overlaid Mobian's
heavily-trimmed bes_chardev.{c,h} on top of pristine danctnix.  Mobian's
flavor (694 lines) had stripped out the BT/WiFi subsystem-switch
orchestration that pristine danctnix (1387 lines) carries and that
danctnix-only bes2600_btuart.c calls.

Fix: restore pristine danctnix bes_chardev.{c,h} as the baseline for
those two files in the danctnix flavor, then reapply Mobian's
campaign-relevant changes:
  - Patch G: SPDX-License-Identifier header + corrected attribution
  - Patch B: bes2600_chrdev_do_bus_reset + _trigger_bus_reset
    (definitions in bes_chardev.c, declarations in bes_chardev.h,
    EXPORT_SYMBOL_GPL on _trigger_bus_reset since it is called from
    sta.c connection-loss-storm fast-recover path)

Phase 6 thread-safety contract: bus_reset functions read
bes2600_cdev.{sbus_ops,sbus_priv} without locking, identical to the
Mobian-flavor source-of-truth - acceptable given the bus_reset is
invoked from already-serialized higher-level error paths in sta.c.

48 files unchanged in count, +1412/-1243 (was +1426/-2003).  The
delta vs the previous patch is concentrated in bes_chardev.{c,h}:
+776/-16 in .c (restoring the BT/WiFi switching infrastructure plus
appending Patch B), +2/-2 in .h (declarations + SPDX).

Patch verified to apply cleanly to v7.0-danctnix1 baseline.
b2sum updated in PKGBUILD.

Build retrigger pending on his.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 13:59:04 +02:00
claude-noether 6ab61b9a06 danctnix-besser-pkgbuild: linux-pinetab2-danctnix-besser PKGBUILD + cumulative bes2600 patch
Soft-upstream candidate for DanctNIX.  Drop-in replacement for
linux-pinetab2 carrying the BESser bes2600 staging-driver patchset
(16 squashed commits from marfrit/bes2600-dkms cleanups branch,
adapted to danctnix-flavor).

Layout:
  README.md                                                — overview
  kernel/PKGBUILD                                           — patched fork of pine64/linux-pinetab2/PKGBUILD
  kernel/config                                             — danctnix kernel config (unchanged)
  kernel/0001-bes2600-besser-cumulative-series.patch        — 172 KB cumulative diff

Net diff vs danctnix v7.0-danctnix1: 48 files, +1426 / -2003 in
drivers/staging/bes2600/.

Squashed series:
  c5.1, c5.1.1, c5.2, c6.1, c6.2, c7, c5.2.1   (c-stack: scan-defer,
                                                PM-state-resync,
                                                firmware-PSM-skip,
                                                multi-func SDIO rescan)
  Patch A (decrypt-storm fast-recover)
  Patch B (connection-loss bus_reset)
  Patch F (cw1200 mainline backports)
  Patch C v3 (drop sdio_rx_work relay)
  Patch G (SPDX + ST-Ericsson attribution)
  Patch D (ba_lock atomicization)
  Patch E (ps_state_lock skip)
  Patch C2 (ieee80211_rx_irqsafe -> ieee80211_rx_ni)
  Patch H (bh.c hygiene cleanup)

Phase 7 on Mobian DKMS: +67% throughput vs Patch B baseline; race-fix
verified under stress.  Danctnix-flavor build verification deferred
to PKGBUILD CI.

See danctnix-besser-pkgbuild/README.md for full provenance.
2026-05-08 10:11:32 +02:00
marfrit 216c7c59b1 notes: Patch C2 Phase 7 — N=3 ramp, no measurable throughput delta (#15) 2026-05-08 05:46:06 +00:00
claude-noether 02d3f4b222 notes: Patch C2 Phase 7 — N=3 ramp, no measurable throughput delta
| rep | uptime | MB/s |
|----:|-------:|-----:|
|   1 |   544s | 2.289|
|   2 |   716s | 2.165|
|   3 |   750s | 2.376|

N=3 mean: 2.277 MB/s.  vs Patch C v3 N=3 (2.352 MB/s): -3% (within
rep variance).  vs Patch B baseline (1.362 MB/s): +67%.

C2 was predicted in §4.5 of the Phase 4 plan as a possible
"<2% delta" outcome -> "ship for upstream-cleanliness anyway".
Observed -3% -> within noise -> ship.  The tasklet hop in
ieee80211_rx_irqsafe was apparently cheap on this kernel.

Phase 8 lesson: _irqsafe -> _rx_ni is a CORRECTNESS / kernel.org-
submission move, not a performance optimization.  Don't oversell
predicted throughput deltas without prior measurement.

Patch C v3 architectural win remains the durable +73%; D / E / C2 /
F / G are smaller cleanups that don't compound visibly above noise.

Throughput ceiling on this hardware: ~2.4 MB/s sustained @ 4 MB/s
sender, fresh chip.  Further improvement needs firmware-side fixes
(wsm_generic_confirm 0x0007 path), not driver-side.
2026-05-08 07:43:33 +02:00
marfrit 3d63ec0a35 notes: Patch C2 Phase 4 plan — ieee80211_rx_irqsafe → ieee80211_rx_list (#14) 2026-05-07 22:46:56 +00:00
claude-noether 722434414a notes: Patch C2 Phase 4 plan — ieee80211_rx_irqsafe to ieee80211_rx_list
After Patch C v3 / D / E / F / G all merged, the remaining cleanup
target is the per-RX-frame tasklet defer that ieee80211_rx_irqsafe
introduces.  Patch C2 migrates all 6 call sites in bes2600 to
ieee80211_rx_list, the process-context API verified per the
kerneldoc audit (Task #19, mainline include/net/mac80211.h:5324-5345).

Key constraints from kerneldoc:
  - cannot mix _list and _irqsafe for the same hardware
    (=> all 6 sites convert atomically)
  - requires local_bh_disable + rcu_read_lock wrap
  - calls must be synchronized for a single hardware
    (=> bh-thread-as-sole-RX-context post-v3 satisfies trivially)

Plan §4.2 design decision: per-batch wrap (Option B), wrapping
bes2600_sdio_read_rx_batch outer loop, rather than per-call wrap.
Captures the actual batch benefit.

Open questions for the Phase 5 reviewer:

  1. rx_list draining semantics — does mainline expect explicit
     netif_receive_skb_list at end-of-batch, or does mac80211
     internal-deliver?  Need to verify by reading mt76 / iwl_pcie
     usage before Phase 6 lands.
  2. beacon path (wsm.c:2415) SKB ownership — hw_priv->beacon is
     long-lived; after _rx_list consumes it, the field would be
     dangling.  Audit before Phase 6.

Predicted throughput delta: +5-15% over v3 N=3 baseline (2.352 MB/s),
medium confidence.  Smaller-than-expected delta = "marginal but no
regression, ship for upstream-cleanliness".

Phase 7 N=3 ramp uses wired enu1 path + per-rep fresh nc listener
per the rig-failure-is-finding lesson.
2026-05-08 00:42:50 +02:00
marfrit fc88ff41c3 notes: Bug #5 RX-degradation campaign — Phase 0 plan (#13) 2026-05-07 21:52:14 +00:00
marfrit fde41fcdd4 notes: Patch C v3 Phase 7 N=3 — +73% throughput, race fix verified (#12) 2026-05-07 21:51:28 +00:00
claude-noether 6bae531917 notes: Bug #5 RX-degradation campaign — Phase 0 plan + research question
After Patch C v3 closed (PR #5 merged, Phase 7 N=3 verified at +73%
throughput vs Patch B baseline), the post-13-min RX-degradation
pattern remains.  Reproduces on Patch B, F, and v3 alike — independent
of the relay/race issues v3 addressed.  Side-effect that was masked
by the throughput floor while v2's race was the dominant variable.

Research question (locked):

  Why does the bes2600 RX path collapse from ~2 MB/s sustained @
  fresh-chip uptime to ~180 B/s @ ~28-min uptime, with periodic
  wsm_generic_confirm failed for request 0x0007 + ieee80211 phy0:
  [SCAN] Scan failed (-22) every 300 s in the intervening window?

Phase 0 protocol:

  - long-capture rig armed on ohm at uptime 0 (fresh boot 23:13 CEST)
  - ftrace events: workqueue, mac80211, cfg80211, mmc, sdhci, power
  - iw event (cfg80211 reason codes), dmesg follow, per-30s netdev
    counter snap, 5 stress probes at T+5/10/15/20/25 min

Phase 0 will:

  - re-anchor the predecessor data via the long capture (in-session
    N=1; re-run if anomalous)
  - characterize state transitions (first scan-fail, first throughput
    drop) via cfg80211/mac80211 ftrace + iw event correlation
  - feed Phase 1 metric formulation

Mechanism candidates (Phase 4 will discriminate):

  1. Firmware-side resource exhaustion (per-scan accumulator)
  2. NetworkManager scan-fail recovery loop competing with data
  3. AP-side rate limiting / fairness probation
  4. PSM state machine deadlock (c7 latch stale)
  5. SDIO bus retune interaction
  6. Power-management busy-event accumulator leak

Out of scope: Patch C2/D/E, higher-rate ramp, reproducing on different
APs.  Independent campaign from Patch C closure.
2026-05-07 23:23:31 +02:00
claude-noether 3a38286e6f notes: Patch C v3 Phase 7 N=3 results — +73% throughput, race fix verified
N=3 stress reps on ohm with v3 module (srcversion 371C6606B73AF19299228CA),
3 min @ 4 MB/s each, all within fresh-chip uptime window (200/391/582 s).

| rep | MB/s | sdio_rx_work | bh_work redispatches |
|----:|----:|-:|-:|
|  1  | 2.363 | 0 | 0 |
|  2  | 2.590 | 0 | 0 |
|  3  | 2.102 | 0 | 0 |

N=3 mean: 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s.

vs Patch B baseline (1.362 MB/s, run-20260507-patchC-preflight): +73%.
vs original Bug #5 floor (75 KB/s rep 3 death): 28× improvement.

Plan §4.5 prediction verified:
  - sdio_rx_work dispatch rate: 86.4/s -> 0/s (function deleted)
  - bes2600_bh_work redispatches: 0 (preserved invariant)
  - observed receive @ 4 MB/s: floor lifts toward >= 1 MB/s (exceeded —
    floor is 2.10 MB/s)

Bonus finding: sdio_tx_work dispatch rate dropped from 276.1/s to
0.8/s.  The post-tx queue_work(rx_work) call I rewired to
self->irq_handler() was actually firing more often than predicted;
folding it into bh-wake-up cuts ~99.7% of the workqueue dispatches.

No WARN/BUG/oops on any rep — the v2 race that wedged Patch C v1
within 13 s under stress did NOT reproduce on v3.

Phase 8 lesson distilled as feedback_mine_upstream_ancestor memory:
when patching a fork-from-upstream driver, mine the ancestor's
fix history BEFORE writing fixes from scratch.  cw1200 mining
drove the structural pivot from v2's atomic_t wrapper to v3's
no-relay architecture.  Without the mine, we'd have shipped v2.

Phase 7 receipts checklist met (N=3, fresh-chip, identical
instrumentation, predicted delta verified, no-WARN under stress).
2026-05-07 23:08:51 +02:00
marfrit 1e408c9d33 Merge pull request 'notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200' (#11) from claude-noether-9 into main
Reviewed-on: #11
2026-05-07 19:41:44 +00:00
claude-noether d01400140b notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200
Supersedes v2 (PR #10).  cw1200 mining (~/src/linux-rockchip, 228
cw1200 commits) confirmed: upstream cw1200 has no sdio_rx_work
workqueue at all.  IRQ handler bumps bh_rx + wakes bh_wq; bh thread
does the SDIO read inline via cw1200_bh_rx_helper.  Single thread =
single writer for hw_bufs_used = no race by construction.  Same int
hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream.

v3 brings bes2600 into that shape:

  - delete sdio_rx_work, self->rx_work, self->rx_queue,
    self->rx_queue_lock, bes2600_sdio_pipe_read
  - GPIO IRQ handler calls self->irq_handler directly (matches
    cw1200_sdio_irq_handler shape)
  - bes2600_bh_rx_helper's BES_SDIO_RX_MULTIPLE_ENABLE branch
    replaced with inline SDIO read + extract_packets + per-skb
    delivery via new bes2600_bh_handle_rx_skb()
  - GPIO wake-flag bracketing moves into bh thread

§5 shared-state delta table (the v2 lesson, applied):  zero fields
require new locking.  hw_bufs_used / hw_bufs_used_vif / wsm_tx_pending
all stay single-writer-from-bh.  v2's atomic_t prep is mooted.

§6 risk #6 is the open question for reviewer:  bes2600's
__bes2600_irq_enable(1) call is commented out in the BH-loop done:
label with an asm volatile("nop") in its place.  Either SDIO IRQ
is auto-managed (so commenting out is fine) or the current code
relies on sdio_rx_work being queued regardless of driver-side IRQ
flag.  Block Phase 6 on this audit.

Patch F (PR #4 merged) is the new baseline.  v3 will branch off
F-merged cleanups.  Phase 7 N=3 stress ramp uses wired enu1 path
(192.168.88.80) for wedge-resilient telemetry collection.
2026-05-07 21:36:15 +02:00
marfrit 993117a108 Merge pull request 'notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver (re-after-failure)' (#10) from claude-noether-8 into main
Question - you said earlier, the driver is a search-and-replace CW12xx driver. Did the CW12xx evolve since this "fork"? If so, are there lessons that can be learned from the CW12xx driver in it's nowadays state?

Reviewed-on: #10
2026-05-07 18:56:12 +00:00
claude-noether 0b63ca3c24 notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver
Phase 7 of Patch C (PR #9 → bes2600-dkms PR #3 → boot -1 of ohm
20:18:10) failed with a thread-safety race: wsm_release_tx_buffer's
unlocked R-M-W on hw_bufs_used races against wsm_alloc_tx_buffer in
the bh thread when Patch C moved the RX-confirm decrement into
sdio_rx_work.  WARN storm at +13s under stress, chip wedges, host
off-network.

Phase 6 contract analysis cited wsm_handle_rx's sleepability and
held-lock invariants but stopped at the function signature.  Did not
enumerate hw_bufs_used as shared state mutated by the callee.  Lesson
saved as feedback_phase6_contract_threadsafety memory.

Phase 4 v2 designs around that gap.  Two-step:

1. Patch C-prep: NFC refactor — convert hw_bufs_used,
   hw_bufs_used_vif[], wsm_tx_pending[] from int / int[] to atomic_t /
   atomic_t[].  Use atomic_fetch_sub_release in wsm_release_tx_buffer
   (returns prior value for the >= numInpChBufs - 1 predicate).
   Mechanical atomic_read swap at ~58 read sites.  Lands first;
   Phase 7 should show zero delta from baseline.

2. Patch C v2: re-apply the sdio_rx_work direct-deliver on top of
   C-prep.  Identical structural change to the closed PR #3, but now
   the racing counter is safe.  Contract block in
   bes2600_bh_handle_rx_skb expanded to include the shared-state
   delta table.

Plan §2 is the shared-state delta table — every field
bes2600_bh_handle_rx_skb mutates directly or transitively, with
current protection and required action.  3 fields need atomic_t,
the rest are already concurrency-safe or stay single-writer.

Plan §6 lists 6 risks including memory-ordering choices, the
inc/dec_pending_count timer-decision race, and the new wired-rig
fallback (enu1 192.168.88.80) that survives bes2600 wedges so Phase 7
can capture dmesg / ftrace from a wedged ohm without reboot.

PR superseded #3 closed with full verdict comment.  Phase B rolled
back on ohm at /lib/modules/.../extra/bes2600.ko.  Markus's reboot
button to land Patch B again before C-prep work begins.
2026-05-07 20:50:39 +02:00
marfrit 4666e03254 Merge pull request 'notes: Patch C Phase 4 plan (item 1 only — collapse sdio_rx_work into BH)' (#9) from claude-noether-7 into main
Reviewed-on: #9
2026-05-07 17:21:37 +00:00
claude-noether f232476240 notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only)
Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C
is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work
indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out
and gated on Task #19 kerneldoc contract verification.

Approach choice: Option A (sdio_rx_work delivers directly into
wsm_handle_rx, removing rx_queue and its two synchronization points per
frame) over Option B (subsume into bh thread). Option A has a smaller
diff and clearer bisection story; the residual per-IRQ workqueue
dispatch is preserved as a measurable Phase 7 data point that motivates
or doesn't motivate a follow-on Option-B patch.

Predicted delta in Phase 3 units, with confidence levels stated
explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract
citations (wsm_handle_rx callability from sdio_wq context;
wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress
ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp.

Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON
in hot path) explicitly deferred to Task #24 per
feedback_dont_patch_downstream_artifacts.

Awaiting Phase 5 second-model review on DokuWiki.
2026-05-07 19:04:53 +02:00
marfrit 08c7aafb48 Merge pull request 'notes: Opus second-opinion BES2600 WiFi structural critique' (#8) from claude-noether-6 into main
Reviewed-on: #8
Reviewed-by: Markus Fritsche <mfritsche@reauktion.de>
2026-05-07 16:58:55 +00:00
claude-noether 809e3cce84 notes: opus second-opinion BES2600 WiFi structural critique
Independent code-review writeup (Opus 4.7) against Sonnet's review of the
same tree. Concurs with Sonnet on items 1+2 (RX relay, batch delivery)
and items 4+5 (ba_lock atomics, ps_state_lock skip-when-pm_unsupported);
pushes back on the "9 workqueue events per frame" quantification and
records BES_SDIO_OPTIMIZED_LEN as hard-baked rather than togglable.

New findings: cw12xx-not-bes2600 genealogy still active in source, ~700
lines of #if 0 fossil in bh.c, Allwinner-specific sw_mci_check_r1_ready
in the SDIO bus path, asm volatile("nop") placeholder where IRQ re-enable
used to live, BUG_ON in steady-state hot path, vendor-SDK Makefile shape
that pollutes every diff, 8 EXPORT_SYMBOLs from a nominally-single-binary
module.

Recommends ordering: Patch C (1+2 wrapped) high-risk-first, Patches D+E
as small individually-verifiable cleanups, explicit don't-touch list.
Notes ieee80211_rx_list contract verification (task #19) blocks Patch C.
2026-05-07 18:12:54 +02:00
marfrit 4344873f2d Merge pull request 'Sonnet architect review for Bug #5 — ranked restructuring map' (#7) from claude-noether-5 into main
Reviewed-on: #7
2026-05-07 16:01:55 +00:00
claude-noether 679083d1aa notes: Sonnet architect review for Bug #5 — ranked restructuring map
Sonnet (general-purpose subagent, model=sonnet) reviewed
~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement
context. Output: 8-item ranked restructuring map, file:line cited.

Headline:
- Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue
  dispatch reduction, medium effort)
- Item 2: batch deliver via ieee80211_rx_list (small effort, removes
  per-frame softirq)
- Items 1 + 2 together collapse "9 workqueue events per delivered
  frame" to ~1.

Items 3-5 clean up next-layer overhead (TX-side queue_work,
per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8
are follow-ons to be re-measured after 1-3 land.

Phase 4 plan locking the lead candidate(s) follows in a separate PR.
2026-05-07 17:38:16 +02:00
claude-noether 594f73c6b4 notes: Bug #5 root cause refined — workqueue-per-SDIO-transaction is the floor
Follow-up ftrace measurement (post-reboot, 3-min 4MB/s capture):
- workqueue_execute_start: 5,643/sec  ← dominates
- wsm_cmd_send: only 13/sec (host-to-chip command path NOT the hotspot)
- lock contention: 50/sec (modest)

The throughput floor is set by per-SDIO-transaction workqueue dispatch
overhead. Surgical patches B5-1/B5-2/B5-3 from the prior Phase 4 plan
all targeted the wrong layer; deferring those until an architectural
restructuring map is produced.

Promoting the Sonnet architect review from "backlog" to
"blocking on Bug #5" — the next step is a restructuring assessment,
not another patch.
2026-05-07 17:31:31 +02:00
claude-noether 928268f477 notes: backlog Sonnet architect review of bes2600 driver
Per PR #6 review feedback. Independent track from Bug #5; scheduled
once the Bug #5 measurement pass finishes.
2026-05-07 16:38:58 +02:00
marfrit 425eb92456 Merge pull request 'Bug #5 Phase 1 metric + Phase 0 anchor receipts' (#6) from claude-noether-4 into main
Reviewed-on: #6
2026-05-07 14:37:29 +00:00
claude-noether 1830c17891 notes: Bug #5 Phase 1 metric + Phase 0 anchor receipts
Phase 0 anchored at N=3 reps (10min @ 4MB/s pv-cap on 2.4GHz):
- rep1+2: ~700 KB/s sustained (10% of link capacity)
- rep3: link death at ~9 min in (passive mode, beacon-loss cascade)

Hot symbol identified: _raw_spin_unlock_irqrestore at ~20% CPU in both
healthy and failed reps, callstack process_one_work → wsm_configuration
→ wsm_cmd_send → bes2600_bh.isra.0 → spin-unlock.

Phase 1 metric locked: ≥2 MB/s sustained throughput, <10% CPU in lock-
cycling, no link death under 30 min continuous load.

Three Phase 4 candidates drafted (B5-1: shrink wsm_cmd_send lock scope;
B5-2: coalesce vif_list_lock in BH dispatcher; B5-3: SPSC ringbuffer for
WSM commands). Locking pending review.
2026-05-07 16:32:45 +02:00
claude-noether 69a1d0f8b1 notes: phase 7 verdict — Patch A confirmed, Patch B dormant
Phase 7 verification of cleanups + Patch A + Patch B (srcversion
1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load
ramped 1 MB/s → 4 MB/s on 2.4GHz newton.

Patch A: predicted delta CONFIRMED at N=2 reproductions.
  - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation
  - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation

Patch B: installed, untriggered. 2 api_connection_loss events spaced
91 s apart, never tripping the 3-in-60s threshold. No false positives,
no spurious bus_resets. Recovery delta unobserved (no harm done).

Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks
firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth
in ~4 s. New backlog item.

Bug #5 documented separately (RX path degrades under throughput
pressure; possible root of the original Phase-0 YouTube frame drops).
2026-05-07 15:18:36 +02:00
claude-noether 458ad36f8b notes: backlog Bug #5 — RX path degrades under throughput pressure
Observed 2026-05-07: bumping the netcat sender from 1 MB/s to 4 MB/s
DECREASED ohm's observed RX rate (1015 KB/s → 563 KB/s) and degraded
the link (signal -57 → -67 dBm, MCS 4 → 3). Chip can't sustain near-
link-rate RX even though theoretical capacity is ~8 MB/s.

Hypothesis: driver/firmware lock contention or busy-wait on the RX
SDIO path. Plausibly explains the original Phase-0 observation that
YouTube DASH chunks drop ~10 frames per chunk fetch — chunk fetch is
a brief near-line-rate burst that this bug would be triggered by.
2026-05-07 13:56:36 +02:00
marfrit ea509e810f Merge pull request 'Phase 4 plan: Patch B (Trigger A / api_connection_loss)' (#5) from claude-noether-3 into main
Reviewed-on: #5
2026-05-07 10:45:28 +00:00
claude-noether e53aad5013 notes: phase 4 plan for Patch B (Trigger A / api_connection_loss)
Drafted after Phase 7 verification of Patch A (PR #1, srcversion
21BD07B3). 10h30m sustained load on 2.4GHz produced:
- 0 DecryptStormRecoveries (Patch A dormant; no decrypt-storm fired)
- 9 mac80211 api_connection_loss events
- 1 catastrophic blackhole at 02:42 (reason 4 inactivity → reauth
  with assoc-comeback timeouts → AP unprotected-deauth-6 cluster)

Phase 4 pivots to Trigger A (Patch B). Candidate B-1 lock proposal:
extend c5.2 bus_reset infrastructure to fire on N consecutive
api_connection_loss events; reuses existing recovery path.

Pending Phase 5 review before Phase 6 implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-07 10:22:34 +02:00
marfrit 4acba3e707 Merge pull request #4: Phase 4 plan: decrypt-storm fast-recover (Trigger B), with revised Phase 1 2026-05-06 17:30:48 +00:00
test0r f6a25d811f notes: phase 4 plan artifact for BES2600 wifi-stability campaign
Drafts Patch A (decrypt-storm fast-recover, Trigger B) at txrx.c:1696
with sliding-window threshold + ieee80211_connection_loss reassoc.
Patch B (beacon-loss / Trigger A) parked behind one more diagnostic
rep with 10s snap-loop cadence on the beacon-loss counter.

Folds reviewer feedback from PR #3 + the new Trigger-A finding
(post-resume P1 = api_connection_loss-driven, two reps captured today
at 17:23 and 18:03) into a revised Phase 1 metric counting three
event classes.

Pending Phase 5 second-model review of the plan before Phase 6
implementation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 19:10:12 +02:00
marfrit 07a7d4b3af Merge pull request 'Phase 5 review: BES2600 WiFi-stability campaign artifact' (#3) from claude-noether into main
Reviewed-on: #3
2026-05-06 13:37:16 +00:00
test0r 1a21212744 notes: phase 5 review artifact for BES2600 wifi-stability campaign
Captures Phase 0-3 receipts as of 2026-05-06: three Pattern-P1 events
reproduced (07:13, 11:03, yesterday 22:33), decrypt-failure metric locked
as Phase 1 with source pins (txrx.c:1696, wsm.h:620, wsm.c:1484), rig built
(snap loop + tcpdump filtered ring + iw event + dynamic_debug + netcat 1MB/s),
idle-vs-load comparison shows 35x burst-rate elevation under load with
conditional-escalation flip (100% idle / 0% load).

Pending Phase 5 second-model review before Phase 4 plan.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-06 15:23:24 +02:00
test0r a6cd2a80fd patches: refresh c7.2 (gate gpio_sleep recovery on !pm_unsupported) 2026-04-28 17:56:10 +02:00
test0r 8301bd24e9 patches: refresh c7.2 (don't undo latched MCU bit in timeout recovery) 2026-04-28 17:54:51 +02:00
test0r 649831a26e patches: refresh c7.2 (don't undo latched MCU bit in timeout recovery) 2026-04-28 17:54:47 +02:00
test0r 0b904093c0 patches: refresh c7 (hold MCU wake-flag bit on latch to spare SDIO_RX msleep) 2026-04-28 17:47:47 +02:00
test0r 670ee57432 patches: add c7 self-detect firmware doesn't honor PSM 2026-04-28 16:56:15 +02:00
test0r 71d9d5f8c4 patches: refresh c6.1 v3 (drop wake GPIO on enter_lp_mode timeout) 2026-04-28 16:11:29 +02:00
test0r c9057cac2c patches: refresh c6.1 + c6.2 (init UNKNOWN; always run wsm_set_operational_mode) 2026-04-28 15:51:58 +02:00
test0r e6a942a5df patches: add c6.2 wake-path consumer of chip_pm_state (Mobian + danctnix) 2026-04-28 15:27:05 +02:00
test0r fc95cb790e patches: add c6.1 PM-indication race + chip_pm_state (Mobian + danctnix) 2026-04-28 15:06:55 +02:00
test0r 42ae953a0c patches: add c5.1.1 scan-defer backoff tune (Mobian + danctnix) 2026-04-28 14:35:53 +02:00
test0r e65a6acd09 patches: add scan-defer-on-reject (c5.1)
Suppresses the WSM-scan WARN cascade in bes2600_scan_work +
wsm_handle_rx by (a) pre-checking BT A2DP coex state and
backing off after N consecutive firmware rejections, and
(b) demoting the WARN() in wsm_generic_confirm to bes_devel
(the upstream-caller's wiphy_warn with request-id is kept).

Deployed + verified on ohm (srcversion A5C8146A…): WARN splat
count 0 per 1 min boot (pre-patch: 32 per 25 min). WiFi
immediately roamed from ch1 signal-47 to ch48 5240 MHz newton
because scan completions now land cleanly with mac80211.
Other counters still 0. Net: +83/-2 lines across 3 files.

Standalone single-patch series in both Mobian-paths and
drivers/staging/bes2600/ paths variants, checkpatch --strict
clean.
2026-04-24 23:51:47 +02:00
test0r 2b7fe4e1de patches: add debian-copyright-fsf-address
Salsa-CI's lintian stage flagged the pre-existing boilerplate
paragraph in debian/copyright as 'old-fsf-address-in-copyright-
file' when cleanups first hit CI. Replace the '51 Franklin
Street, Fifth Floor, Boston, MA 02110-1301 USA' literal with a
'https://www.gnu.org/licenses/' reference; the
/usr/share/common-licenses/LGPL-2.1 reference a few lines later
is unchanged, so license-text location is still covered.

Pushed to salsa and gitea as commit f31c57a on branch cleanups
and as standalone topic debian/copyright-fsf-address on gitea.
2026-04-24 16:26:30 +02:00
test0r d850a8f0fe patches: add pm-timeout-silence (c2.1)
Demote 'wait pm ind timeout' from bes_err() to bes_devel() in
bes2600_pwr_enter_lp_mode(). The cascade this used to warn about
is already suppressed by c2 (pm-gate-on-handshake); the remaining
log line is benign steady-state noise (3-9 events per 10-min
uptime on PineTab2). Deployed + verified on ohm (srcversion
ED89A26…): err-priority count 0, WiFi associated, no
regression. 1-line patch.
2026-04-23 20:42:22 +02:00
test0r e7a021d901 patches: add drop-orphan-file-io (c1.4)
Completes the filp_open/kernel_read/kernel_write removal pass
across the driver. Deletes bes_fw.c DATA_DUMP_OBSERVE blocks
(4 #ifdefs gated on a commented-out #define, dead by default;
would fail to build on modern kernels due to removed
get_fs/set_fs) and main.c's orphan access_file() helper
(no callers in-tree, also relies on get_fs/set_fs).

With c1.2 + c1.3 + c1.4 combined: zero filp_open /
kernel_read / kernel_write / vfs_read / vfs_write references
anywhere in the driver -- precondition for a linux-wireless
RFC for drivers/staging/bes2600/ unlocked.

Deployed + verified on ohm (srcversion 12BAFB9C…): WiFi
associated, no KFENCE / sdio_tx_work / RX failure / PS Mode
Error / factory cali data get failed. Net: -69 lines.
2026-04-23 20:34:07 +02:00