From 3d15c5367de1894c69e7152543370f71ca229883 Mon Sep 17 00:00:00 2001 From: Markus Fritsche Date: Thu, 21 May 2026 12:23:47 +0200 Subject: [PATCH] =?UTF-8?q?fleet/ohm:=20pkgrel=3D6=20=E2=80=94=20per-serie?= =?UTF-8?q?s=20converged=20with=20tx-sdio-dma-oob=20+=20join-confirm-reset?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Two additions to fleet/ohm.yaml's includes for the bes2600 driver scope: 1. driver/bes2600/tx-sdio-dma-oob-danctnix/ — already on disk from ka#17 but not previously included. The cumulative-c5x-danctnix shipped in pkgrel=3 did NOT have this fix; pkgrel=4 per-series regressed because the staging-prep series was excluded. KFENCE caught the OOB during pkgrel=4 soak; pkgrel=5 included it. 2. driver/bes2600/join-confirm-reset-danctnix/ — NEW scope. cw1200 ancestor port (sta.c:1339-1344) with bes2600-specific PASSIVE-gate compensation in bes2600_unjoin_work. Closes besser#25. Verified pkgrel=6 srcversion 0E16463F: cascade gone, periodic ~600ms latency jitter also gone (same root cause). Status note: per-series reconstruction is now converged. The cumulative-c5x-danctnix entry is left as historical fallback; ka#29's blocker (per-series mirrors not applying cleanly) was resolved by manually reconstructing the per-series in marfrit/bes2600-dkms bes2600/join-confirm-failure-reset (top commit 3d833f8). Build still hand-managed via boltzmann:~/src/besser/marfrit-besser/ danctnix-besser-pkgbuild/kernel/PKGBUILD; ka-promote / ka-build template rendering still pending per the original TODOs. Signed-off-by: Claude (noether) --- fleet/ohm.yaml | 18 ++- ...rmware-state-on-wsm_join_confirm-fai.patch | 131 ++++++++++++++++++ .../join-confirm-reset-danctnix/README.md | 46 ++++++ 3 files changed, 194 insertions(+), 1 deletion(-) create mode 100644 patches/driver/bes2600/join-confirm-reset-danctnix/0001-bes2600-reset-firmware-state-on-wsm_join_confirm-fai.patch create mode 100644 patches/driver/bes2600/join-confirm-reset-danctnix/README.md diff --git a/fleet/ohm.yaml b/fleet/ohm.yaml index 5456d01..7f0e914 100644 --- a/fleet/ohm.yaml +++ b/fleet/ohm.yaml @@ -1,6 +1,6 @@ # kernel-agent manifest for ohm (PineTab2 / Rockchip RK3566 + BES2600 SDIO WiFi/BT) # -# Status: scaffolding from 2026-05-16. Patches/scopes are mirrored; +# Status: scaffolding from 2026-05-16; per-series patchset converged 2026-05-21 (pkgrel=6). Patches/scopes are mirrored; # the build pipeline (cumulative-patch generation, makepkg invocation, # sign+publish) still relies on the hand-managed flow in # boltzmann:~/src/besser/marfrit-besser/danctnix-besser-pkgbuild/kernel/. @@ -32,6 +32,13 @@ baseline: # mixed-prefix headers (a/drivers/staging/bes2600/... b/bes2600/...). # They do NOT apply cleanly against the linux-pinetab2 baseline. # + +# 2026-05-21 update: per-series reconstruction (besser#22) completed +# 2026-05-21; pkgrel=6 (srcversion 0E16463F) on ohm soak-passed with +# the bounce-buffer + join-confirm-reset additions. The per-series +# manifest below is the authoritative set; cumulative-c5x-danctnix +# remains as historical fallback only. +# # Until the per-series mirrors are reconstructed (kernel-agent followup # issue), the bes2600 driver scope is satisfied by a single-file # cumulative captured from the working hand-managed @@ -52,6 +59,15 @@ includes: # close besser#18 — pending_record_lock SOFTIRQ-safe -> -unsafe inversion. # Mirror of marfrit/bes2600-dkms#11 (d95453c). 5-site spin_lock -> _bh. - driver/bes2600/queue-pending-record-lock-bh-danctnix/ + # bounce-buffer fix for SDIO TX DMA OOB (KFENCE-detected on pkgrel=4 soak); + # the per-series mirror of marfrit/bes2600-dkms bes2600/tx-sdio-dma-oob. + # cumulative-c5x-danctnix did NOT include this — it was the regression + # surfaced during the per-series reconstruction. + - driver/bes2600/tx-sdio-dma-oob-danctnix/ + # close besser#25 — wsm_reset + serialised unjoin on JOIN reject. + # cw1200 ancestor port (drivers/net/wireless/st/cw1200/sta.c:1339-1344) + # with bes2600-specific PASSIVE-gate compensation. pkgrel=6 verified. + - driver/bes2600/join-confirm-reset-danctnix/ # Explicitly NOT included (decision logged): # - debian-copyright-fsf-address: Debian packaging metadata, not kernel diff --git a/patches/driver/bes2600/join-confirm-reset-danctnix/0001-bes2600-reset-firmware-state-on-wsm_join_confirm-fai.patch b/patches/driver/bes2600/join-confirm-reset-danctnix/0001-bes2600-reset-firmware-state-on-wsm_join_confirm-fai.patch new file mode 100644 index 0000000..bc8a1e8 --- /dev/null +++ b/patches/driver/bes2600/join-confirm-reset-danctnix/0001-bes2600-reset-firmware-state-on-wsm_join_confirm-fai.patch @@ -0,0 +1,131 @@ +From 3d833f8ccf31895a2ce7bf4fd4ef839e653b29bb Mon Sep 17 00:00:00 2001 +From: Markus Fritsche +Date: Thu, 21 May 2026 09:25:12 +0200 +Subject: [PATCH 22/22] bes2600: reset firmware state on wsm_join_confirm + failure +MIME-Version: 1.0 +Content-Type: text/plain; charset=UTF-8 +Content-Transfer-Encoding: 8bit + +When wsm_join_confirm() returns status != WSM_STATUS_SUCCESS (ret 1), +the driver cleared its bookkeeping but did not reset the firmware +interface, leaving it in an intermediate post-rejection state. A rapid +second JOIN attempt (e.g. wpa_supplicant retrying after the +PREV_AUTH_NOT_VALID deauth that mac80211 emits to clean up) hits an +inconsistent firmware context, causing bes2600_sdio_read_rx_batch to +return SDIO error which cascades into wifi_force_close: + + wsm_join_confirm ret 1 + deauthenticating from by local choice (Reason: 2=PREV_AUTH_NOT_VALID) + [~10 min later] + bes2600_sdio_read_rx_batch sdio read error + WARNING: at bes2600_tx_loop_set_enable / bes2600_chrdev_wifi_force_close + +Two additions to the failure path in bes2600_join_work(): + +1. wsm_reset (WSM_REQ_ID_RESET, 0x000A) with reset_statistics=false. + This returns the firmware to IDLE so the next association attempt + starts from a known-clean state. bes2600_unjoin_work() performs the + same reset, but gates it on join_status != PASSIVE; after a failed + JOIN join_status stays PASSIVE, so that path never fires — call + wsm_reset directly here instead. + + Contract: wsm_reset takes only wsm_cmd_lock (not conf_lock, not + wsm_oper_lock). wsm_oper_unlock was already called inside + wsm_join_confirm() before wsm_join() returned -EINVAL, so there is + no re-entrancy hazard. conf_lock is held at this call site, which is + compatible with wsm_reset's locking requirements. + +2. queue_work(workqueue, &priv->unjoin_work) instead of direct + wsm_unlock_tx(). Serialises the next association attempt through + the workqueue so it cannot race against lingering firmware-side + effects of the failure. If unjoin_work is already queued, release + TX immediately (matching cw1200 ancestor sta.c:1344 comment "Tx lock + still held, unjoin will clear it."). + +Ancestor reference: drivers/net/wireless/st/cw1200/sta.c, function +cw1200_join_work(), lines 1339-1344. cw1200 queues unjoin_work on join +failure for the same reason. bes2600 needs the direct wsm_reset in +addition because its unjoin_work has the join_status gate that cw1200's +cw1200_do_unjoin() does not. + +Signed-off-by: Claude (noether) +--- + bes2600/sta.c | 47 +++++++++++++++++++++++++++++++++++++++++++---- + 1 file changed, 43 insertions(+), 4 deletions(-) + +diff --git a/drivers/staging/bes2600/sta.c b/drivers/staging/bes2600/sta.c +index 476d875..bf86835 100644 +--- a/drivers/staging/bes2600/sta.c ++++ b/drivers/staging/bes2600/sta.c +@@ -2225,9 +2225,10 @@ void bes2600_join_work(struct work_struct *work) + struct wsm_template_frame probe_tmp = { + .frame_type = WSM_FRAME_TYPE_PROBE_REQUEST, + }; +- /*struct wsm_reset reset = { +- .reset_statistics = true, +- };*/ ++ struct wsm_reset join_fail_reset = { ++ .reset_statistics = false, ++ }; ++ bool join_failed = false; + + + BUG_ON(queueId >= 4); +@@ -2410,6 +2411,33 @@ void bes2600_join_work(struct work_struct *work) + #endif /*CONFIG_BES2600_TESTMODE*/ + cancel_delayed_work_sync(&priv->join_timeout); + bes2600_pwr_clear_busy_event(priv->hw_priv, BES_PWR_LOCK_ON_JOIN); ++ /* ++ * Firmware rejected WSM_JOIN (wsm_join_confirm ret 1). ++ * Issue wsm_reset so the firmware returns to a clean ++ * IDLE state before the next association attempt. ++ * ++ * Without this reset the firmware sits in an ++ * intermediate post-reject state. A rapid second ++ * JOIN (e.g. wpa_supplicant retrying after the ++ * PREV_AUTH_NOT_VALID deauth that follows) hits an ++ * inconsistent firmware context, causing ++ * bes2600_sdio_read_rx_batch to return SDIO error ++ * which cascades into wifi_force_close. ++ * ++ * cw1200 ancestor (drivers/net/wireless/st/cw1200/ ++ * sta.c:1339) queues unjoin_work on join failure for ++ * the same reason; bes2600_unjoin_work gates its ++ * wsm_reset on join_status != PASSIVE, so after a ++ * failed JOIN (join_status stays PASSIVE) that path ++ * never fires — call wsm_reset directly here instead. ++ * ++ * Contract: wsm_reset takes only wsm_cmd_lock; safe ++ * to call while conf_lock is held. wsm_oper_unlock ++ * was already called in wsm_join_confirm() before ++ * wsm_join() returned the error. ++ */ ++ WARN_ON(wsm_reset(hw_priv, &join_fail_reset, priv->if_id)); ++ join_failed = true; + } else { + /* Upload keys */ + #ifdef CONFIG_BES2600_TESTMODE +@@ -2434,7 +2462,18 @@ void bes2600_join_work(struct work_struct *work) + up(&hw_priv->conf_lock); + if (bss) + cfg80211_put_bss(hw_priv->hw->wiphy, bss); +- wsm_unlock_tx(hw_priv); ++ /* ++ * On join failure: queue unjoin_work so the next association ++ * attempt is serialised after any lingering cleanup, matching ++ * cw1200 sta.c:1344 "Tx lock still held, unjoin will clear it." ++ * If unjoin_work is already queued, release TX immediately. ++ */ ++ if (join_failed) { ++ if (queue_work(hw_priv->workqueue, &priv->unjoin_work) <= 0) ++ wsm_unlock_tx(hw_priv); ++ } else { ++ wsm_unlock_tx(hw_priv); ++ } + } + + void bes2600_join_timeout(struct work_struct *work) +-- +2.54.0 + diff --git a/patches/driver/bes2600/join-confirm-reset-danctnix/README.md b/patches/driver/bes2600/join-confirm-reset-danctnix/README.md new file mode 100644 index 0000000..399873c --- /dev/null +++ b/patches/driver/bes2600/join-confirm-reset-danctnix/README.md @@ -0,0 +1,46 @@ +# bes2600/join-confirm-reset-danctnix + +Danctnix-flavor patch closing besser#25 (wsm_join_confirm failure cascade). + +## What it does + +When firmware returns status 1 on a JOIN command (`wsm_join_confirm ret 1`), +add a direct `wsm_reset(...)` call so the firmware returns to a clean IDLE +state, plus `queue_work(workqueue, &priv->unjoin_work)` for serialisation of +the next association attempt. + +## Why it's a fork-divergence fix + +`cw1200_join_work()` (cw1200 ancestor, `drivers/net/wireless/st/cw1200/sta.c:1339-1344`) +queues `unjoin_work` on join failure: `cw1200_do_unjoin()` calls `wsm_reset` +when `join_status == STA`. + +bes2600's `bes2600_unjoin_work()` gates the same `wsm_reset` on +`join_status != PASSIVE`. After a failed JOIN, `join_status` stays PASSIVE +(only set to STA on success) — queuing `unjoin_work` alone is insufficient +on bes2600. The danctnix variant carries a direct `wsm_reset` in the +failure path *and* the queue_work serialisation. + +## Observable effects (pkgrel=6 soak) + +Beyond closing the cascade (besser#25 acceptance), this patch also +collapsed the periodic ~600 ms latency jitter on ohm: + +| | pkgrel=5 | pkgrel=6 | +|---|---|---| +| max RTT | 612 ms | 13.9 ms | +| mdev | 103.5 ms | 1.55 ms | + +The bgscan-driven roam-attempt to a 5 GHz BSSID followed by `wsm_join` +reject was briefly stalling TX every minute even when the cascade did +not fire. + +## Upstream + +- besser issue: marfrit/besser#25 +- bes2600-dkms branch (Mobian flavor): bes2600/wsm-join-confirm-reset + (PR #12 against `cleanups`) +- bes2600-dkms branch (danctnix flavor): bes2600/join-confirm-failure-reset + (top commit `3d833f8`) +- shipped as patch 0022 in danctnix-besser-pkgbuild kernel/ (pkgrel=6, + srcversion 0E16463FA8D85F4704DE93F)