pkgrel=6 (per-series reconstruction): SDIO timeout cascade after 1-6h uptime — wifi_force_close path WARN_ONs #22
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Symptom
Under
linux-pinetab2-danctnix-besserpkgrel=6 (the per-series reconstruction landed via kernel-agent#33 then reverted), the bes2600 chip wedges after 1–6h of uptime in a consistent failure cascade:Reproductions
21d89a7e, Tue 2026-05-19 23:39:54 → Wed 2026-05-20 05:54:57, 6h15m uptime): wedged at 05:53:56.a197770b, Wed 2026-05-20 05:55:19 → 08:54:34, 2h59m uptime): wedged at 07:41:42.In both cases the chain is identical: decrypt-storm fast-recover (Patch A) fires after a burst of RX-status-4 failures, then minutes-to-hours later the SDIO bus stops responding (
err=-110ETIMEDOUT), the BH worker reachesbes2600_chrdev_wifi_force_closefrom the recovery path, which then WARN_ONs inbes2600_tx_loop_set_enable.pkgrel=5 (c5x interim cumulative) does NOT exhibit this — multi-day uptimes are clean.
Likely cause
The per-series reconstruction (kernel-agent#33, then redone in #36 via rebase onto v7.0-danctnix1 baseline) had to resolve a conflict on the
remove-chardev-user-interfacecommit because danctnix'sbes2600_btuart.cdepends on chardev utility symbols. My conflict resolution re-addedbes2600_chrdev_switch_subsys_glb,bes2600_chrdev_is_bus_error, andbes2600_switch_btto make the build link. But the c5x-interim hand-curated cumulative kept a slightly different set of internal helpers inbes_chardev.c, including the specific state paths thatbes2600_chrdev_wifi_force_closerelies on for emergency-close. My re-add doesn't match c5x-interim's recovery-path invariants, so when Patch A's recovery flow eventually calls_wifi_force_close, the function hits inconsistent state and WARN_ONs.Actions taken (immediate)
2299d7a02in marfrit-packages, reverts pkgrel=6 commit31da35a54).588350c).fleet/ohm.yamlis back to usingcumulative-c5x-danctnix/.Future redo acceptance criteria
For a future per-series reconstruction to land, it MUST:
bes2600_chrdev_wifi_force_closemust be redone with a tested replacement path. Either way, the failure mode in this issue must be reproduced+fixed before merge, not deferred.Related
31da35a54reverted as2299d7a02(no separate issue)Phase 7 verified - closing. pkgrel=5 srcversion 91E5C5F1BFAF70BDE3A1970 passed 6h53m soak: zero wifi_force_close cascade, zero err=-110, zero KFENCE OOB. Bounce-buffer fix (2f9b4c7) was missing from pkgrel=4 per-series (staging-prep filter exclusion); added as patch 0021 for pkgrel=5. Full details in memory file project_besser22_closed.md.
Build: linux-pinetab2-danctnix-besser 7.0.danctnix1-5, srcversion 91E5C5F1BFAF70BDE3A1970
Soak: 6h53m clean graceful shutdown (user-initiated reboot at end).
Results against acceptance criteria:
Regression found and fixed during soak: Initial per-series (pkgrel=4, 20 patches) was missing commit 2f9b4c7 (bounce SDIO TX buffers to avoid DMA OOB read). Present in cumulative single-patch but excluded from per-series reconstruction because it was classified as staging-prep. Without it, KFENCE caught OOB reads in bes_sdio_memcpy_to_io_helper every ~10 minutes, causing TX workqueue stalls and latency scatter (~50% packet loss at times). Added as patch 0021 in pkgrel=5; KFENCE hits dropped to zero.
Final per-series: 21 patches, branch bes2600/besser-danctnix-v3 in marfrit/bes2600-dkms. PKGBUILD at noether/readme-pkgrel4-kernel-agent-flow in marfrit/besser, commit
818d7b8.Lesson logged: When reconstructing per-series from a cumulative, diff the cumulative final state vs per-series final state per file. A srcversion mismatch is the signal to audit -- if cumulative and per-series produce different source trees, the diff tells you what was missed.