besser

Author	SHA1	Message	Date
claude-noether	3a38286e6f	notes: Patch C v3 Phase 7 N=3 results — +73% throughput, race fix verified N=3 stress reps on ohm with v3 module (srcversion 371C6606B73AF19299228CA), 3 min @ 4 MB/s each, all within fresh-chip uptime window (200/391/582 s). \| rep \| MB/s \| sdio_rx_work \| bh_work redispatches \| \|----:\|----:\|-:\|-:\| \| 1 \| 2.363 \| 0 \| 0 \| \| 2 \| 2.590 \| 0 \| 0 \| \| 3 \| 2.102 \| 0 \| 0 \| N=3 mean: 2.352 MB/s · median 2.363 MB/s · min 2.102 MB/s. vs Patch B baseline (1.362 MB/s, run-20260507-patchC-preflight): +73%. vs original Bug #5 floor (75 KB/s rep 3 death): 28× improvement. Plan §4.5 prediction verified: - sdio_rx_work dispatch rate: 86.4/s -> 0/s (function deleted) - bes2600_bh_work redispatches: 0 (preserved invariant) - observed receive @ 4 MB/s: floor lifts toward >= 1 MB/s (exceeded — floor is 2.10 MB/s) Bonus finding: sdio_tx_work dispatch rate dropped from 276.1/s to 0.8/s. The post-tx queue_work(rx_work) call I rewired to self->irq_handler() was actually firing more often than predicted; folding it into bh-wake-up cuts ~99.7% of the workqueue dispatches. No WARN/BUG/oops on any rep — the v2 race that wedged Patch C v1 within 13 s under stress did NOT reproduce on v3. Phase 8 lesson distilled as feedback_mine_upstream_ancestor memory: when patching a fork-from-upstream driver, mine the ancestor's fix history BEFORE writing fixes from scratch. cw1200 mining drove the structural pivot from v2's atomic_t wrapper to v3's no-relay architecture. Without the mine, we'd have shipped v2. Phase 7 receipts checklist met (N=3, fresh-chip, identical instrumentation, predicted delta verified, no-WARN under stress).	2026-05-07 23:08:51 +02:00
claude-noether	d01400140b	notes: Patch C v3 Phase 4 plan — drop sdio_rx_work, match cw1200 Supersedes v2 (PR #10). cw1200 mining (~/src/linux-rockchip, 228 cw1200 commits) confirmed: upstream cw1200 has no sdio_rx_work workqueue at all. IRQ handler bumps bh_rx + wakes bh_wq; bh thread does the SDIO read inline via cw1200_bh_rx_helper. Single thread = single writer for hw_bufs_used = no race by construction. Same int hw_bufs_used as bes2600, never atomic_t'd in 16 years upstream. v3 brings bes2600 into that shape: - delete sdio_rx_work, self->rx_work, self->rx_queue, self->rx_queue_lock, bes2600_sdio_pipe_read - GPIO IRQ handler calls self->irq_handler directly (matches cw1200_sdio_irq_handler shape) - bes2600_bh_rx_helper's BES_SDIO_RX_MULTIPLE_ENABLE branch replaced with inline SDIO read + extract_packets + per-skb delivery via new bes2600_bh_handle_rx_skb() - GPIO wake-flag bracketing moves into bh thread §5 shared-state delta table (the v2 lesson, applied): zero fields require new locking. hw_bufs_used / hw_bufs_used_vif / wsm_tx_pending all stay single-writer-from-bh. v2's atomic_t prep is mooted. §6 risk #6 is the open question for reviewer: bes2600's __bes2600_irq_enable(1) call is commented out in the BH-loop done: label with an asm volatile("nop") in its place. Either SDIO IRQ is auto-managed (so commenting out is fine) or the current code relies on sdio_rx_work being queued regardless of driver-side IRQ flag. Block Phase 6 on this audit. Patch F (PR #4 merged) is the new baseline. v3 will branch off F-merged cleanups. Phase 7 N=3 stress ramp uses wired enu1 path (192.168.88.80) for wedge-resilient telemetry collection.	2026-05-07 21:36:15 +02:00
claude-noether	0b63ca3c24	notes: Patch C v2 Phase 4 plan — atomic_t prep + direct-deliver Phase 7 of Patch C (PR #9 → bes2600-dkms PR #3 → boot -1 of ohm 20:18:10) failed with a thread-safety race: wsm_release_tx_buffer's unlocked R-M-W on hw_bufs_used races against wsm_alloc_tx_buffer in the bh thread when Patch C moved the RX-confirm decrement into sdio_rx_work. WARN storm at +13s under stress, chip wedges, host off-network. Phase 6 contract analysis cited wsm_handle_rx's sleepability and held-lock invariants but stopped at the function signature. Did not enumerate hw_bufs_used as shared state mutated by the callee. Lesson saved as feedback_phase6_contract_threadsafety memory. Phase 4 v2 designs around that gap. Two-step: 1. Patch C-prep: NFC refactor — convert hw_bufs_used, hw_bufs_used_vif[], wsm_tx_pending[] from int / int[] to atomic_t / atomic_t[]. Use atomic_fetch_sub_release in wsm_release_tx_buffer (returns prior value for the >= numInpChBufs - 1 predicate). Mechanical atomic_read swap at ~58 read sites. Lands first; Phase 7 should show zero delta from baseline. 2. Patch C v2: re-apply the sdio_rx_work direct-deliver on top of C-prep. Identical structural change to the closed PR #3, but now the racing counter is safe. Contract block in bes2600_bh_handle_rx_skb expanded to include the shared-state delta table. Plan §2 is the shared-state delta table — every field bes2600_bh_handle_rx_skb mutates directly or transitively, with current protection and required action. 3 fields need atomic_t, the rest are already concurrency-safe or stay single-writer. Plan §6 lists 6 risks including memory-ordering choices, the inc/dec_pending_count timer-decision race, and the new wired-rig fallback (enu1 192.168.88.80) that survives bes2600 wedges so Phase 7 can capture dmesg / ftrace from a wedged ohm without reboot. PR superseded #3 closed with full verdict comment. Phase B rolled back on ohm at /lib/modules/.../extra/bes2600.ko. Markus's reboot button to land Patch B again before C-prep work begins.	2026-05-07 20:50:39 +02:00
claude-noether	f232476240	notes: Patch C Phase 4 plan — collapse sdio_rx_work into BH (item 1 only) Per merged PR #8 inline review: items 1 and 2 split, sequential. Patch C is item-1-only (collapse the sdio_rx_work → rx_queue → bh_work indirection). Patch C2 (ieee80211_rx_list batch delivery) is split out and gated on Task #19 kerneldoc contract verification. Approach choice: Option A (sdio_rx_work delivers directly into wsm_handle_rx, removing rx_queue and its two synchronization points per frame) over Option B (subsume into bh thread). Option A has a smaller diff and clearer bisection story; the residual per-IRQ workqueue dispatch is preserved as a measurable Phase 7 data point that motivates or doesn't motivate a follow-on Option-B patch. Predicted delta in Phase 3 units, with confidence levels stated explicitly. §4.6 lists 6 risks, of which 2 require Phase 6 contract citations (wsm_handle_rx callability from sdio_wq context; wsm_release_tx_buffer's bh-wake invariant). §4.8 mandates a stress ramp in Phase 7, not a steady cap, per feedback_phase7_stress_ramp. Symptom-shaped findings (asm nop, commented-out IRQ re-enable, BUG_ON in hot path) explicitly deferred to Task #24 per feedback_dont_patch_downstream_artifacts. Awaiting Phase 5 second-model review on DokuWiki.	2026-05-07 19:04:53 +02:00
claude-noether	809e3cce84	notes: opus second-opinion BES2600 WiFi structural critique Independent code-review writeup (Opus 4.7) against Sonnet's review of the same tree. Concurs with Sonnet on items 1+2 (RX relay, batch delivery) and items 4+5 (ba_lock atomics, ps_state_lock skip-when-pm_unsupported); pushes back on the "9 workqueue events per frame" quantification and records BES_SDIO_OPTIMIZED_LEN as hard-baked rather than togglable. New findings: cw12xx-not-bes2600 genealogy still active in source, ~700 lines of #if 0 fossil in bh.c, Allwinner-specific sw_mci_check_r1_ready in the SDIO bus path, asm volatile("nop") placeholder where IRQ re-enable used to live, BUG_ON in steady-state hot path, vendor-SDK Makefile shape that pollutes every diff, 8 EXPORT_SYMBOLs from a nominally-single-binary module. Recommends ordering: Patch C (1+2 wrapped) high-risk-first, Patches D+E as small individually-verifiable cleanups, explicit don't-touch list. Notes ieee80211_rx_list contract verification (task #19) blocks Patch C.	2026-05-07 18:12:54 +02:00
claude-noether	679083d1aa	notes: Sonnet architect review for Bug #5 — ranked restructuring map Sonnet (general-purpose subagent, model=sonnet) reviewed ~/src/besser/bes2600-dkms-mobian/bes2600/ given the Phase 0 measurement context. Output: 8-item ranked restructuring map, file:line cited. Headline: - Item 1: collapse sdio_rx_work relay into BH loop (~5x workqueue dispatch reduction, medium effort) - Item 2: batch deliver via ieee80211_rx_list (small effort, removes per-frame softirq) - Items 1 + 2 together collapse "9 workqueue events per delivered frame" to ~1. Items 3-5 clean up next-layer overhead (TX-side queue_work, per-frame ba_lock, ps_state_lock under known-dead PSM). Items 6-8 are follow-ons to be re-measured after 1-3 land. Phase 4 plan locking the lead candidate(s) follows in a separate PR.	2026-05-07 17:38:16 +02:00
claude-noether	594f73c6b4	notes: Bug #5 root cause refined — workqueue-per-SDIO-transaction is the floor Follow-up ftrace measurement (post-reboot, 3-min 4MB/s capture): - workqueue_execute_start: 5,643/sec ← dominates - wsm_cmd_send: only 13/sec (host-to-chip command path NOT the hotspot) - lock contention: 50/sec (modest) The throughput floor is set by per-SDIO-transaction workqueue dispatch overhead. Surgical patches B5-1/B5-2/B5-3 from the prior Phase 4 plan all targeted the wrong layer; deferring those until an architectural restructuring map is produced. Promoting the Sonnet architect review from "backlog" to "blocking on Bug #5" — the next step is a restructuring assessment, not another patch.	2026-05-07 17:31:31 +02:00
claude-noether	928268f477	notes: backlog Sonnet architect review of bes2600 driver Per PR #6 review feedback. Independent track from Bug #5; scheduled once the Bug #5 measurement pass finishes.	2026-05-07 16:38:58 +02:00
claude-noether	1830c17891	notes: Bug #5 Phase 1 metric + Phase 0 anchor receipts Phase 0 anchored at N=3 reps (10min @ 4MB/s pv-cap on 2.4GHz): - rep1+2: ~700 KB/s sustained (10% of link capacity) - rep3: link death at ~9 min in (passive mode, beacon-loss cascade) Hot symbol identified: _raw_spin_unlock_irqrestore at ~20% CPU in both healthy and failed reps, callstack process_one_work → wsm_configuration → wsm_cmd_send → bes2600_bh.isra.0 → spin-unlock. Phase 1 metric locked: ≥2 MB/s sustained throughput, <10% CPU in lock- cycling, no link death under 30 min continuous load. Three Phase 4 candidates drafted (B5-1: shrink wsm_cmd_send lock scope; B5-2: coalesce vif_list_lock in BH dispatcher; B5-3: SPSC ringbuffer for WSM commands). Locking pending review.	2026-05-07 16:32:45 +02:00
claude-noether	69a1d0f8b1	notes: phase 7 verdict — Patch A confirmed, Patch B dormant Phase 7 verification of cleanups + Patch A + Patch B (srcversion 1B3B3ED0) on ohm 2026-05-07 12:48 → 15:13 CEST under netcat load ramped 1 MB/s → 4 MB/s on 2.4GHz newton. Patch A: predicted delta CONFIRMED at N=2 reproductions. - 13:47:56 storm → 1 s reassoc, no AP-deauth-6 escalation - 13:49:26 storm → 1 s reassoc, no AP-deauth-6 escalation Patch B: installed, untriggered. 2 api_connection_loss events spaced 91 s apart, never tripping the 3-in-60s threshold. No false positives, no spurious bus_resets. Recovery delta unobserved (no harm done). Trigger C: 17-frame AP-deauth-6 cluster at 12:53 with no patch hooks firing — bes2600 TX-side glitch suspect. Recovery via mac80211 reauth in ~4 s. New backlog item. Bug #5 documented separately (RX path degrades under throughput pressure; possible root of the original Phase-0 YouTube frame drops).	2026-05-07 15:18:36 +02:00
claude-noether	458ad36f8b	notes: backlog Bug #5 — RX path degrades under throughput pressure Observed 2026-05-07: bumping the netcat sender from 1 MB/s to 4 MB/s DECREASED ohm's observed RX rate (1015 KB/s → 563 KB/s) and degraded the link (signal -57 → -67 dBm, MCS 4 → 3). Chip can't sustain near- link-rate RX even though theoretical capacity is ~8 MB/s. Hypothesis: driver/firmware lock contention or busy-wait on the RX SDIO path. Plausibly explains the original Phase-0 observation that YouTube DASH chunks drop ~10 frames per chunk fetch — chunk fetch is a brief near-line-rate burst that this bug would be triggered by.	2026-05-07 13:56:36 +02:00
claude-noether	e53aad5013	notes: phase 4 plan for Patch B (Trigger A / api_connection_loss) Drafted after Phase 7 verification of Patch A (PR #1, srcversion 21BD07B3). 10h30m sustained load on 2.4GHz produced: - 0 DecryptStormRecoveries (Patch A dormant; no decrypt-storm fired) - 9 mac80211 api_connection_loss events - 1 catastrophic blackhole at 02:42 (reason 4 inactivity → reauth with assoc-comeback timeouts → AP unprotected-deauth-6 cluster) Phase 4 pivots to Trigger A (Patch B). Candidate B-1 lock proposal: extend c5.2 bus_reset infrastructure to fire on N consecutive api_connection_loss events; reuses existing recovery path. Pending Phase 5 review before Phase 6 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-07 10:22:34 +02:00
Markus Fritsche	f6a25d811f	notes: phase 4 plan artifact for BES2600 wifi-stability campaign Drafts Patch A (decrypt-storm fast-recover, Trigger B) at txrx.c:1696 with sliding-window threshold + ieee80211_connection_loss reassoc. Patch B (beacon-loss / Trigger A) parked behind one more diagnostic rep with 10s snap-loop cadence on the beacon-loss counter. Folds reviewer feedback from PR #3 + the new Trigger-A finding (post-resume P1 = api_connection_loss-driven, two reps captured today at 17:23 and 18:03) into a revised Phase 1 metric counting three event classes. Pending Phase 5 second-model review of the plan before Phase 6 implementation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 19:10:12 +02:00
Markus Fritsche	1a21212744	notes: phase 5 review artifact for BES2600 wifi-stability campaign Captures Phase 0-3 receipts as of 2026-05-06: three Pattern-P1 events reproduced (07:13, 11:03, yesterday 22:33), decrypt-failure metric locked as Phase 1 with source pins (txrx.c:1696, wsm.h:620, wsm.c:1484), rig built (snap loop + tcpdump filtered ring + iw event + dynamic_debug + netcat 1MB/s), idle-vs-load comparison shows 35x burst-rate elevation under load with conditional-escalation flip (100% idle / 0% load). Pending Phase 5 second-model review before Phase 4 plan. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-05-06 15:23:24 +02:00
Markus Fritsche	e580126d84	Initial: BESser umbrella for BES2600 driver mainlining Sets up the BES2600 mainlining work tree with: - README: project overview, hardware target, driver lineage (CW1200 -> Bestechnic -> arjan-vlek -> Mobian/danctnix), patch series status, repo map, build/deploy workflow. - patches/: c1 patch generated by git format-patch from marfrit/bes2600-dkms branch bes2600/factory-request-firmware (checkpatch.pl --no-tree --strict: 0 errors / 0 warnings / 0 checks). - scripts/: build-bes2600-on-ohm.sh, deploy-c1-to-ohm.sh, backup-ohm-kernel.sh - reproducible build + deploy + backup. - fw-analysis/: per-blob strings.txt + fnnames.txt extracted from the 4 firmware blobs pulled from ohm 2026-04-21. Source binaries NOT committed (Bestechnic-proprietary). - notes/: observed-bugs.md (4 known bug surfaces with file:line + patch-series cross-reference), source-map.md (every public driver source variant + their canonical role). Companion work tree: marfrit/bes2600-dkms (Mobian DKMS fork) at git.reauktion.de. Signed-off-by: Markus Fritsche <fritsche.markus@gmail.com>	2026-04-22 10:13:23 +02:00

15 Commits