patches/driver/bes2600/*-danctnix/: reconstruct broken per-series mirrors (PR #17 followup) #29

Closed
opened 2026-05-18 14:56:13 +00:00 by marfrit · 1 comment
Owner

The per-series -danctnix mirrors merged in #17 do NOT apply against the linux-pinetab2 baseline. Discovered during the ohm pkgrel=4 migration audit on 2026-05-18.

Symptom

ka-promote ohm with the original 17 per-series includes produced a 172 644-byte cumulative touching 27 file paths, of which 11 are bogus:

  • 10 patches target DKMS-style root paths (bes2600/foo.c):
    • bes2600/bes2600_factory.c, bes2600/bes2600_factory.h
    • bes2600/bes2600_sdio.c
    • bes2600/bes_chardev.c
    • bes2600/bes_log.h
    • bes2600/bes_pwr.c
    • bes2600/Makefile
    • bes2600/sta.c
    • bes2600/wsm.h
    • (other fragments scattered across other series)
  • 1 patch has a corrupted mixed-prefix header: diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/bes2600/bes2600_sdio.ca/ says staging path, b/ says DKMS root.

patch / git apply against a linux-pinetab2 baseline rejects these because there is no bes2600/ directory at the kernel root — bes2600 lives at drivers/staging/bes2600/ in-tree.

Workaround landed in #28

patches/driver/bes2600/cumulative-c5x-danctnix/ — single-file interim staging the hand-curated cumulative from boltzmann:~/src/besser/marfrit-besser/danctnix-besser-pkgbuild/kernel/0001-bes2600-besser-cumulative-series.patch. 148 149 bytes, touches the correct 48 drivers/staging/bes2600/* files. This is what pkgrel=3 builds with on ohm.

The 17 broken per-series series-dirs remain in patches/driver/bes2600/ but are dropped from fleet/ohm.yaml includes. They should be reconstructed (this issue), not deleted, so authorship + commit messages survive the rebuild.

Reconstruction plan

Source of truth: marfrit/bes2600-dkms-mobian repo (the canonical c5x driver state). Each -danctnix series should be regenerated by:

  1. Identify the commit range in bes2600-dkms-mobian corresponding to each series (the series legend in danctnix-besser-pkgbuild changelog: A, B, C v3, F, G, D, E, C2, c5.x, c6.x, c7, H — NOT alphabetical).
  2. git format-patch <range> --src-prefix=a/drivers/staging/bes2600/ --dst-prefix=b/drivers/staging/bes2600/ (or post-rewrite with sed to fix prefixes).
  3. Verify each series-dir's patches apply cleanly on the linux-pinetab2 baseline (use ka-promote ohm --validate-against <clean-checkout>).
  4. Once all 17 reconstructed, drop the interim cumulative-c5x-danctnix from fleet/ohm.yaml and re-include the per-series in the original A-H order.
  5. Re-promote, byte-diff the cumulative against the c5x-interim cumulative to confirm equivalence.

Why this matters

Per-series traceability is the whole point of ka-promote's manifest model. A single-file cumulative is the interim that ships, not the long-term shape. Once reconstructed:

  • Each fix is independently revertable from the manifest.
  • Bisecting on the kernel-agent side becomes practical.
  • New patches can land as new series-dirs without needing a full hand-cumulative regeneration on boltzmann.
  • apply_order field per the manifest comments can be honored properly.

Acceptance

  • ka-promote ohm with reconstructed per-series produces a cumulative byte-equivalent (or applying to identical source-tree state) to patches/driver/bes2600/cumulative-c5x-danctnix/0001-...patch (b2sum verified at the time of reconstruction).
  • All 17 series-dirs have proper in-tree paths, no DKMS root, no mixed-prefix headers.
  • fleet/ohm.yaml swaps the interim include for the per-series includes; ka-promote remains clean.
  • This issue closes when both conditions hold.

Provenance / blame

#17 (the migrate-besser PR) did the mass-rewrite. The DKMS-style paths likely came from copying patches verbatim from a bes2600-dkms-mobian working tree without rewriting a//b/ prefixes to the in-tree path. The mixed-prefix corruption suggests one patch was partially renamed.

Refs: #28 (interim workaround), #5 (PKGBUILD migration, partial), besser#1 (closed).

The per-series `-danctnix` mirrors merged in #17 do NOT apply against the linux-pinetab2 baseline. Discovered during the ohm pkgrel=4 migration audit on 2026-05-18. ## Symptom `ka-promote ohm` with the original 17 per-series includes produced a 172 644-byte cumulative touching **27 file paths**, of which **11 are bogus**: - 10 patches target DKMS-style root paths (`bes2600/foo.c`): - `bes2600/bes2600_factory.c`, `bes2600/bes2600_factory.h` - `bes2600/bes2600_sdio.c` - `bes2600/bes_chardev.c` - `bes2600/bes_log.h` - `bes2600/bes_pwr.c` - `bes2600/Makefile` - `bes2600/sta.c` - `bes2600/wsm.h` - (other fragments scattered across other series) - 1 patch has a corrupted mixed-prefix header: `diff --git a/drivers/staging/bes2600/bes2600_sdio.c b/bes2600/bes2600_sdio.c` — `a/` says staging path, `b/` says DKMS root. `patch` / `git apply` against a linux-pinetab2 baseline rejects these because there is no `bes2600/` directory at the kernel root — bes2600 lives at `drivers/staging/bes2600/` in-tree. ## Workaround landed in #28 `patches/driver/bes2600/cumulative-c5x-danctnix/` — single-file interim staging the hand-curated cumulative from `boltzmann:~/src/besser/marfrit-besser/danctnix-besser-pkgbuild/kernel/0001-bes2600-besser-cumulative-series.patch`. 148 149 bytes, touches the correct 48 `drivers/staging/bes2600/*` files. This is what pkgrel=3 builds with on ohm. The 17 broken per-series series-dirs remain in `patches/driver/bes2600/` but are dropped from `fleet/ohm.yaml` includes. They should be reconstructed (this issue), not deleted, so authorship + commit messages survive the rebuild. ## Reconstruction plan Source of truth: `marfrit/bes2600-dkms-mobian` repo (the canonical c5x driver state). Each `-danctnix` series should be regenerated by: 1. Identify the commit range in `bes2600-dkms-mobian` corresponding to each series (the series legend in danctnix-besser-pkgbuild changelog: A, B, C v3, F, G, D, E, C2, c5.x, c6.x, c7, H — NOT alphabetical). 2. `git format-patch <range> --src-prefix=a/drivers/staging/bes2600/ --dst-prefix=b/drivers/staging/bes2600/` (or post-rewrite with sed to fix prefixes). 3. Verify each series-dir's patches apply cleanly on the linux-pinetab2 baseline (use `ka-promote ohm --validate-against <clean-checkout>`). 4. Once all 17 reconstructed, drop the interim cumulative-c5x-danctnix from `fleet/ohm.yaml` and re-include the per-series in the original A-H order. 5. Re-promote, byte-diff the cumulative against the c5x-interim cumulative to confirm equivalence. ## Why this matters Per-series traceability is the whole point of `ka-promote`'s manifest model. A single-file cumulative is the interim that ships, not the long-term shape. Once reconstructed: - Each fix is independently revertable from the manifest. - Bisecting on the kernel-agent side becomes practical. - New patches can land as new series-dirs without needing a full hand-cumulative regeneration on boltzmann. - `apply_order` field per the manifest comments can be honored properly. ## Acceptance - `ka-promote ohm` with reconstructed per-series produces a cumulative byte-equivalent (or applying to identical source-tree state) to `patches/driver/bes2600/cumulative-c5x-danctnix/0001-...patch` (b2sum verified at the time of reconstruction). - All 17 series-dirs have proper in-tree paths, no DKMS root, no mixed-prefix headers. - `fleet/ohm.yaml` swaps the interim include for the per-series includes; ka-promote remains clean. - This issue closes when both conditions hold. ## Provenance / blame #17 (the migrate-besser PR) did the mass-rewrite. The DKMS-style paths likely came from copying patches verbatim from a bes2600-dkms-mobian working tree without rewriting `a/`/`b/` prefixes to the in-tree path. The mixed-prefix corruption suggests one patch was partially renamed. Refs: #28 (interim workaround), #5 (PKGBUILD migration, partial), besser#1 (closed).
Author
Owner

Planning notes for a redo (from the failed pkgrel=6 attempt, 2026-05-19/20)

Closing PR #36 and reverting PR #33 (commits 2299d7a02 in marfrit-packages, 588350c in kernel-agent) was driven by a real regression — full repro in besser#22. Capturing what a future redo needs to do differently, while the context is fresh.

Acceptance criteria

  1. Soak gate: any reconstructed per-series cumulative must pass ≥6h continuous uptime on ohm with normal wifi load before merge. The pkgrel=6 attempt wedged the chip after 1h45m (lockdep) or 5h54m (production). The lockdep flavor is the strictest test rig — same source bytes as production + active runtime checking. See feedback_phase7_real_load memory entry and besser#22.
  2. No wifi_force_close → tx_loop_set_enable WARN_ON cascade in the soak window. That's the specific symptom that surfaced.
  3. No bes_sdio_memcpy_io_helper err=-110 / startup timeout!!! — the chip-wedge endpoint of the cascade.
  4. srcversion bit-equivalence to the c5x-interim cumulative pkgrel=5 srcversion (26B0003FE9F2B05DCE838C4) is preferred but not mandatory. Functional equivalence under load is mandatory. PR #33/#36 produced different srcversion (1A919EED0E6DC2478559B17) AND functionally diverged under sustained load.

Likely root cause of the previous failure

During the rebase of bes2600/cleanups onto a v7.0-danctnix1-aligned baseline (danctnix-sync branch), the merge conflict on the remove-chardev-user-interface commit was resolved by accepting theirs (the patch's removal intent), then surgically re-adding three helpers that bes2600_btuart.c needs to link:

  • bes2600_chrdev_switch_subsys_glb
  • bes2600_chrdev_is_bus_error
  • bes2600_switch_bt (as static)

Both got EXPORT_SYMBOL_GPL so btuart can use them. The link succeeded, the module loaded, short-window functional checks looked fine.

But a fourth helper, bes2600_chrdev_wifi_force_close, was kept by the patch's theirs-merge and the recovery state it touches relies on bes_chardev.c internals (the bes2600_cdev static + its status_lock, bus_error, halt_dev, bus_probe fields, the chardev wait-queues) that I did not carefully preserve in the re-add. When Patch A's decrypt-storm fast-recover → forcing reassoc eventually fires (typical 1–6h), the recovery path enters _wifi_force_close, hits inconsistent state, and the chip wedges.

The c5x-interim hand-curated cumulative kept the chardev helpers AND the supporting state in a consistent shape. The naive rebase conflict resolution didn't.

What a redo must do differently

  1. Extract the post-cumulative state of bes_chardev.c from a pkgrel=5 build (/tmp/...-extract/src/linux-7.0/drivers/staging/bes2600/bes_chardev.c after prepare()). This is the authoritative shape.
  2. Side-by-side compare that file against what the per-series rebase produces for bes_chardev.c. Port the missing/wrong bits — likely involves the bes2600_chrdev_wifi_force_close function body itself + the static bes2600_cdev struct definition + the chardev wait_queue initializations.
  3. Build a debug variant first (lockdep+ flavored) and soak ≥6h with real wifi load before publishing. The pkgrel=5-lockdep build pattern documented in besser#18 Phase 7 close is the template.
  4. Don't accept theirs blindly on hand-curated patches. When upstream c5x-interim has surgically kept a function that other danctnix-only files depend on, the rebase must preserve the same intent — not just enough symbols to satisfy the linker.

Branches kept around for the redo

  • marfrit/bes2600-dkms: danctnix-sync, cleanups-rebased-on-danctnix, bh-c-fossil-cleanup-rebased (the rebased commit tree — most conflict resolutions ARE correct, only the chardev re-add needs to be redone).
  • claude-noether/kernel-agent:noether/kernel-agent-29-rebased-on-danctnix-clean (the per-series patch files generated from the rebased branches).

These stay until a redo branch supersedes them. Reverting #33 brought the cumulative-c5x-danctnix/ interim back into fleet/ohm.yaml as the working state.

Leaving this issue open so the per-series reconstruction work resumes when there's time + an explicit owner.

## Planning notes for a redo (from the failed pkgrel=6 attempt, 2026-05-19/20) Closing PR #36 and reverting PR #33 (commits `2299d7a02` in marfrit-packages, `588350c` in kernel-agent) was driven by a real regression — full repro in besser#22. Capturing what a future redo needs to do differently, while the context is fresh. ### Acceptance criteria 1. **Soak gate**: any reconstructed per-series cumulative must pass ≥6h continuous uptime on ohm with normal wifi load before merge. The pkgrel=6 attempt wedged the chip after 1h45m (lockdep) or 5h54m (production). The lockdep flavor is the strictest test rig — same source bytes as production + active runtime checking. See [`feedback_phase7_real_load`](https://git.reauktion.de/...) memory entry and besser#22. 2. **No `wifi_force_close → tx_loop_set_enable WARN_ON` cascade** in the soak window. That's the specific symptom that surfaced. 3. **No `bes_sdio_memcpy_io_helper err=-110` / `startup timeout!!!`** — the chip-wedge endpoint of the cascade. 4. **srcversion bit-equivalence** to the c5x-interim cumulative pkgrel=5 srcversion (`26B0003FE9F2B05DCE838C4`) is preferred but not mandatory. **Functional equivalence under load is mandatory.** PR #33/#36 produced different srcversion (`1A919EED0E6DC2478559B17`) AND functionally diverged under sustained load. ### Likely root cause of the previous failure During the rebase of `bes2600/cleanups` onto a v7.0-danctnix1-aligned baseline (`danctnix-sync` branch), the merge conflict on the `remove-chardev-user-interface` commit was resolved by accepting `theirs` (the patch's removal intent), then surgically re-adding three helpers that `bes2600_btuart.c` needs to link: - `bes2600_chrdev_switch_subsys_glb` - `bes2600_chrdev_is_bus_error` - `bes2600_switch_bt` (as static) Both got `EXPORT_SYMBOL_GPL` so btuart can use them. The link succeeded, the module loaded, short-window functional checks looked fine. But a fourth helper, `bes2600_chrdev_wifi_force_close`, was kept by the patch's `theirs`-merge and the recovery state it touches relies on `bes_chardev.c` internals (the `bes2600_cdev` static + its `status_lock`, `bus_error`, `halt_dev`, `bus_probe` fields, the chardev wait-queues) that I did not carefully preserve in the re-add. When Patch A's `decrypt-storm fast-recover → forcing reassoc` eventually fires (typical 1–6h), the recovery path enters `_wifi_force_close`, hits inconsistent state, and the chip wedges. The c5x-interim hand-curated cumulative kept the chardev helpers AND the supporting state in a consistent shape. The naive rebase conflict resolution didn't. ### What a redo must do differently 1. **Extract the post-cumulative state of `bes_chardev.c`** from a pkgrel=5 build (`/tmp/...-extract/src/linux-7.0/drivers/staging/bes2600/bes_chardev.c` after `prepare()`). This is the authoritative shape. 2. **Side-by-side compare** that file against what the per-series rebase produces for `bes_chardev.c`. Port the missing/wrong bits — likely involves the `bes2600_chrdev_wifi_force_close` function body itself + the static `bes2600_cdev` struct definition + the chardev wait_queue initializations. 3. **Build a debug variant first** (lockdep+ flavored) and soak ≥6h with real wifi load before publishing. The pkgrel=5-lockdep build pattern documented in besser#18 Phase 7 close is the template. 4. **Don't accept `theirs` blindly on hand-curated patches.** When upstream c5x-interim has surgically kept a function that other danctnix-only files depend on, the rebase must preserve the same intent — not just enough symbols to satisfy the linker. ### Branches kept around for the redo - `marfrit/bes2600-dkms`: `danctnix-sync`, `cleanups-rebased-on-danctnix`, `bh-c-fossil-cleanup-rebased` (the rebased commit tree — most conflict resolutions ARE correct, only the chardev re-add needs to be redone). - `claude-noether/kernel-agent:noether/kernel-agent-29-rebased-on-danctnix-clean` (the per-series patch files generated from the rebased branches). These stay until a redo branch supersedes them. Reverting #33 brought the `cumulative-c5x-danctnix/` interim back into `fleet/ohm.yaml` as the working state. Leaving this issue **open** so the per-series reconstruction work resumes when there's time + an explicit owner.
Sign in to join this conversation.