18da673ccc
Three changes that together flip kernel-agent from spec'd to operational
in the manual-orchestrated form. Real ka-* CLI verbs come in later phases;
this commit gets a first iteration through the pipeline and proves the
flow at the artifact level.
1. Promote vb2_dma_resv RFC v2 series into the scope-tagged tree
Markus iterated v2 locally on boltzmann (kernel-agent-bootstrap dir,
reaching linux-fresnel-fourier pkgrel=14). v2 attaches the producer
fence at device_run in slept-OK context per Dufresne's v1 review on
linux-media. The three patches land under
patches/subsystem/media/videobuf2/dma-resv-release-fence/:
- 0004 (helper) — opt-in vb2 dma_resv producer-fence helper
- 0005 (driver opt-in) — hantro device_run attach
- 0006 (driver opt-in) — rockchip-rga device_run attach
Numbered 4/5/6 because the fresnel build PKGBUILD applies them after
the three 0001/0002/0003 PBP DTS patches; this directory's numbering
follows that apply-order, not the upstream lore series numbering.
README at the scope dir documents fleet eligibility, decision history,
and the v1 → v2 design pivot.
2. Update fleet/fresnel.yaml to include the v2 series
Pre-v2 manifest had a comment block 'Explicitly NOT included … vb2
dma-resv-release-fence … defer until v2 lands'. v2 has landed. Move
those three lines from 'excluded' to 'includes', annotate the decision
inline.
3. README updates
- Build hosts table: add ampere (CoolPi GenBook, RK3588 32GB) as
secondary aarch64 host. Same uarch as boltzmann, on-demand wake via
His. Gives the fleet a second native build target for when boltzmann
is busy (e.g. carrying a firefox-fourier 4h build).
- 'Out of scope this round' bootstrap section: mark vb2_dma_resv as
resolved 2026-05-15, keep panfrost IOMMU_CACHE deferred.
4. First ka-* CLI verb implemented: bin/ka-status
bash, ~120 lines. Reads fleet/*.yaml manifests, queries Gitea for
open [ka:*] issues, probes each reachable host for the installed
kernel-package version. Read-only — no sudo, no host writes. Picks
GITEA_TOKEN from /opt/herding/etc/claude-identities/<host>.creds or
env override.
Proves the agent's Gitea-API + manifest-parsing skeleton works
end-to-end without committing to a full ka-promote/build/install
implementation. Smoke-tested locally:
$ bin/ka-status
kernel-agent status (repo: marfrit/kernel-agent)
open [ka:*] issues total: 1
══ fresnel ══
manifest: arch=arm64 soc=rockchip/rk3399 board=pinebook-pro
package: linux-fresnel-fourier
installed: host-down # (fresnel is currently powered off)
open ka-issues: (none for this host)
No PKGBUILD update in this PR — that lives in marfrit-packages and
ships as a sibling PR (the actual linux-fresnel-fourier-7.0-14 publish).
322 lines
18 KiB
Markdown
322 lines
18 KiB
Markdown
# kernel-agent
|
|
|
|
Owns the kernel side of the home fleet: source/branch/patch curation, per-host
|
|
build orchestration, promote-to-fleet pipeline. Peer to His (home infra). Uses
|
|
His for ops it doesn't own (waking data, host provisioning); files Gitea
|
|
issues for coordination it can't decide alone.
|
|
|
|
Targets: dev/work hosts only. Infra hosts (noether, hertz, dcw2/3, turing,
|
|
nuccies as compile-only) are NOT in the promote list — explicit opt-in via
|
|
`fleet/<host>.yaml` manifest.
|
|
|
|
Customized today: ampere · boltzmann · fresnel · ohm
|
|
Anticipated Debian targets: higgs · clevo · pi-fleet (when they ask for it)
|
|
|
|
|
|
## Lifecycle
|
|
|
|
```
|
|
┌────────────────────────────────────────────────────────────────┐
|
|
│ INPUT — campaign session │
|
|
│ patches in marfrit/<campaign>/ or marfrit/misc-kernel-patches│
|
|
│ triggers: ka-promote, ka-close, ka-abandon │
|
|
└─────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────▼──────────────────────────────────────┐
|
|
│ ORCHESTRATION — kernel-agent │
|
|
│ resolve manifest by scope tag │
|
|
│ pre-flight target build host (minimal; thorough nightly) │
|
|
│ on miss → [ka:host-changed] block to His │
|
|
└─────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────▼──────────────────────────────────────┐
|
|
│ BUILD │
|
|
│ aarch64: kbuild-aarch64 on boltzmann (primary) │
|
|
│ fermi on hertz (fallback) │
|
|
│ distcc pool: tesla + dcc1 + dcc2 (zeroconf) │
|
|
│ x86_64: kbuild-x86 on data (wakes via wake-host lmcp) │
|
|
│ ccache + 5-min watcher (hertz cron) for stalls/errors │
|
|
│ wall-clock cap (absolute), warn on degraded distcc pool │
|
|
└─────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────▼──────────────────────────────────────┐
|
|
│ SIGN │
|
|
│ build host submits unsigned .pkg.tar.zst / .deb to hertz │
|
|
│ hertz signs with existing marfrit-packages key (one key, │
|
|
│ pkg + repo db) │
|
|
│ hertz pushes to packages.reauktion.de │
|
|
└─────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────▼──────────────────────────────────────┐
|
|
│ INSTALL — consent-via-action │
|
|
│ kernel-agent files [ka:installable] │
|
|
│ session-hook reminders (escalating: now, +1h, +6h, daily) │
|
|
│ YOU run ka-install <host> │
|
|
│ → backup current → pacman/apt -U → reboot │
|
|
└─────────────────────────┬──────────────────────────────────────┘
|
|
│
|
|
┌─────────────────────────▼──────────────────────────────────────┐
|
|
│ VERIFY — post-install (auto, by hertz cron) │
|
|
│ Bar 1: SSH heartbeat (10 min) │
|
|
│ Bar 2: package version installed │
|
|
│ Bar 3: DTB/sysfs matches manifest (custom-DTB hosts) │
|
|
│ Bar 4: per-patch probe (manifest opt-in, simple lang) │
|
|
│ Bar 5: burn-in N hours (host opt-in) │
|
|
│ failure → [ka:regression] block, host marked drifted │
|
|
└────────────────────────────────────────────────────────────────┘
|
|
|
|
Loopback (7→4): yank patches from manifest; host drifted; next
|
|
install converges. No automatic rollback;
|
|
backup at /sparfuxdata/kernel-agent-backups/
|
|
on hertz, 7-day retention, you fetch + reinstall.
|
|
```
|
|
|
|
|
|
## Agent boundaries
|
|
|
|
```
|
|
peer agents
|
|
┌───────────────────────────────────────┐
|
|
│ │
|
|
His ←──── lmcp tools (ops) ────→ kernel-agent
|
|
│ wake-host, host-status, │
|
|
│ prepare-build-host, ... │
|
|
│ │
|
|
└─── Gitea issues (coordination) ───────┘
|
|
|
|
▲ ▲
|
|
│ │
|
|
campaign sessions
|
|
(Bin · MegabitChip · RockHard ·
|
|
Neutron · fresnel-fourier ·
|
|
ohm_gl_fix · besser · ...)
|
|
▲
|
|
│
|
|
subagents inside session
|
|
(Janet · avr-specialist · Plan)
|
|
no independent identity, contribute
|
|
to whatever the calling session ships
|
|
```
|
|
|
|
Routine ops between peer agents go through lmcp tools (sync, idempotent,
|
|
no per-call audit trail). Coordination goes through Gitea issues (async,
|
|
persistent, audit trail per item).
|
|
|
|
|
|
## Verbs (explicit, parameterized, audit-issue auto-filed)
|
|
|
|
```
|
|
ka-promote <campaign> <patch-or-glob> --to <scope>
|
|
ka-close <campaign> --status success
|
|
ka-abandon <campaign> --keep-as-archive | --purge-from-fleet
|
|
ka-install <host>
|
|
ka-keep <job-id> [--for <duration>]
|
|
ka-pause-prune / ka-resume-prune
|
|
ka-restore-archive <job-id>
|
|
ka-snooze <issue-id> [--for <duration>]
|
|
ka-debug <job-id> # shells into the same container that ran the build
|
|
ka-status # per-host one-liner with drift/pending state [bin/ka-status — implemented Phase 1]
|
|
ka-migrate-tree --from <p> --to <p>
|
|
ka-wake-data # wraps wake-host data through His
|
|
```
|
|
|
|
Conversational invocation triggers a y/n confirmation enumerating what will
|
|
happen. Direct CLI invocation executes immediately.
|
|
|
|
|
|
## Block-severity issues — what halts what
|
|
|
|
```
|
|
[ka:patch-fail] only that patch's promotes
|
|
[ka:campaign-conflict] those patches across the involved campaigns
|
|
[ka:host-drifted] installs to that host (builds OK)
|
|
[ka:build-fail] builds routing to that build host
|
|
[ka:bootstrap-missing] builds for that build host
|
|
[ka:host-changed] builds to that host until pre-flight re-passes
|
|
[ka:signing-fail] global (all builds need signing)
|
|
[ka:regression] installs to that host until triaged
|
|
```
|
|
|
|
Scoped per issue. No implicit cross-domain propagation. Dependency cascades
|
|
detected at promote-time, not propagated globally.
|
|
|
|
|
|
## Patch tree (in marfrit/kernel-agent)
|
|
|
|
```
|
|
patches/
|
|
├── arch/{arm64,x86_64}/
|
|
├── soc/{rockchip/{rk3399,rk3566,rk3588},...}/
|
|
├── module/<som-name>/
|
|
├── board/<board-name>/
|
|
├── driver/<driver-name>/
|
|
└── subsystem/<subsystem-name>/
|
|
```
|
|
|
|
Each patch lives at the narrowest scope that's correct (a board patch goes
|
|
under `board/`, an SoC-wide fix under `soc/`). Per-host manifest resolves
|
|
tags + explicit includes. Reorgs via `ka-migrate-tree` (atomic tree +
|
|
manifest rewrite); paths stable otherwise.
|
|
|
|
|
|
## Build hosts
|
|
|
|
```
|
|
Host Where Role Wake? Notes
|
|
──────────────────────────────────────────────────────────────────────────
|
|
boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64
|
|
ampere CoolPi GenBook aarch64 secondary on-demand RK3588 32GB; same uarch as boltzmann,
|
|
wakes via His; idle 30 min → release
|
|
fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile
|
|
kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release
|
|
```
|
|
|
|
Native make on the assigned build host. **No distcc** for kernel-agent
|
|
builds (`feedback_kernel_agent_no_distcc.md`, locked 2026-05-09). ccache
|
|
stays per-host. distcc remains in scope for userspace package builds.
|
|
|
|
|
|
## Files / paths
|
|
|
|
```
|
|
/srv/kernel-agent/source/<job-id>/ live build dir (kbuild UID owns)
|
|
/srv/kernel-agent/ccache/ persistent across builds
|
|
/srv/kernel-agent/output/<job-id>/ built packages, pre-sign
|
|
/srv/kernel-agent/manifest/ per-host manifests (yaml)
|
|
/srv/kernel-agent/keep/ failed builds tagged ka-keep
|
|
|
|
hertz:/sparfuxdata/kernel-agent-backups/<host>/<version>/ 7-day
|
|
hertz:/sparfuxdata/kernel-agent-archive/<job-id>/ 1-year (cron)
|
|
|
|
https://logs.reauktion.de/<host>/<job-id>/ 1-year (cron on lagrange)
|
|
```
|
|
|
|
Repos:
|
|
- `marfrit/kernel-agent` — agent source, manifests, scope-tagged patch tree
|
|
- `marfrit/<campaign>` — each campaign owns its repo
|
|
- `marfrit/misc-kernel-patches` — landing pad for one-off non-campaign fixes
|
|
- `marfrit-packages` — kernel package PKGBUILDs / .debs
|
|
|
|
|
|
## Identity
|
|
|
|
Issues filed as the host the agent runs on (claude-noether by default, per
|
|
`reference_claude_noether_gitea.md`). Title prefix `[ka:*]` carries the role.
|
|
No new Gitea identity; per-host bootstrap one-liner already covers this.
|
|
|
|
|
|
## Reminder channel
|
|
|
|
Active Claude session top-of-conversation hook only — no email, no HA, no
|
|
DokuWiki. Cadence: escalating ladder (initial → +1h → +6h → daily). Snooze
|
|
via `ka-snooze <issue-id> [--for <duration>]`.
|
|
|
|
|
|
## Hard rules — won't change without re-litigation
|
|
|
|
- Never auto-promote. Closure is your explicit verb.
|
|
- Never auto-install. Reboot only happens inside `ka-install`.
|
|
- Never reach into `$HOME` on any host.
|
|
- Never targets infra hosts (noether, hertz, dcw*, turing) without explicit
|
|
`fleet/` manifest opt-in.
|
|
- Never sudo-mutates host setup. His provisions; agent consumes.
|
|
- Refuse abandon without `--keep-as-archive` | `--purge-from-fleet` flag.
|
|
- Refuse promote of patches lacking scope tag.
|
|
|
|
|
|
## Bootstrap reference build (2026-05-09 — fresnel)
|
|
|
|
First end-to-end run, before any `ka-*` CLI exists. Documented here as the
|
|
canonical worked example so future ka-* implementations have a concrete
|
|
substrate to replay. Issue #3 (fresnel DTS persistence) closed by this
|
|
build.
|
|
|
|
### Inputs
|
|
|
|
- **Baseline:** mmind/linux-rockchip @ `v7.0` (Heiko Stübner / Collabora,
|
|
via kernel.org).
|
|
- **Patches** (scope `board/pinebook-pro`):
|
|
- `0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch`
|
|
- `0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch`
|
|
- `0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch`
|
|
- **Manifest:** `fleet/fresnel.yaml` (tree=mmind v7.0, 3 patches above,
|
|
alongside-install vs `linux-eos-arm`).
|
|
- **.config source:** snapshot from fresnel `/usr/lib/modules/6.19.10-1-eos-arm/build/.config`,
|
|
recovered from the data backintime backup (May 7 snapshot) since the
|
|
laptop was off when the build started; `make olddefconfig` to fold in
|
|
v7.0 new symbols (one harmless `BOOTPARAM_SOFTLOCKUP_PANIC` warning,
|
|
ignored).
|
|
|
|
### Manual substitute for each ka-* verb
|
|
|
|
| Designed verb | What we did manually |
|
|
|---|---|
|
|
| `ka-promote fresnel-fourier <patches> --to board/pinebook-pro` | Authored 3 patches with proper headers/scope tags, pushed to `marfrit/kernel-agent/patches/board/pinebook-pro/` via Gitea contents API as `claude-noether`. |
|
|
| `ka-build fresnel` | On boltzmann: cloned linux v7.0 from kernel.org, ran `makepkg -s --skipchecksums --skippgpcheck` against `marfrit-packages/arch/linux-fresnel-fourier/PKGBUILD`. Native aarch64 (boltzmann is RK3588). One headers-pkg bug discovered (`ln -sr` on missing parent dir) and fixed mid-flight. Repackaged. |
|
|
| `ka-sign + push` | scp pkgs hertz → `sudo /opt/herding/bin/marfrit-publish-arch aarch64 <pkg>` per pkg. Script signs with key `92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C`, runs repo-add, rsyncs to nc. |
|
|
| `ka-install fresnel` (consent-via-action) | `sudo pacman -U /tmp/<pkg>` over LAN scp (HTTPS to nc was throttled by fresnel's wifi). pacman post-transaction hook updated extlinux. mkinitcpio run manually because the standard hook trigger watches `vmlinuz` not `Image`. |
|
|
| Bar 1..3 verification | SSH heartbeat OK, `pacman -Q linux-fresnel-fourier` = `7.0-1`, post-reboot cluster0 1.704 GHz / cluster1 2.184 GHz confirmed. |
|
|
|
|
### Files / locations involved
|
|
|
|
- `git.reauktion.de/marfrit/kernel-agent/patches/board/pinebook-pro/` — patches
|
|
- `git.reauktion.de/marfrit/kernel-agent/fleet/fresnel.yaml` — manifest
|
|
- `git.reauktion.de/marfrit/marfrit-packages/arch/linux-fresnel-fourier/` — PKGBUILD + 3 patches + config + extlinux hook+script + mkinitcpio preset
|
|
- `boltzmann:~/src/kernel-agent-bootstrap/` — local build root (baseline clone, patches, build dir, artifacts)
|
|
- `hertz:/tmp/ka-publish/` — staging for sign+push (transient)
|
|
- `hertz:/sparfuxdata/kernel-agent-backups/fresnel/6.19.9-99-eos-arm/fresnel-boot-pre-install.tgz` — pre-install /boot snapshot (71MB, 7-day retention per design)
|
|
- `https://packages.reauktion.de/arch/aarch64/linux-fresnel-fourier-7.0-1-aarch64.pkg.tar.zst` — published artifact
|
|
- `fresnel:/boot/{Image,initramfs,dtbs}-fresnel-fourier{,/...}` — installed artifacts
|
|
- `fresnel:/boot/extlinux/extlinux.conf` — managed block tagged `>>> linux-fresnel-fourier (managed) >>>` … `<<<`
|
|
|
|
### What was learned that ka-* should bake in
|
|
|
|
- mkinitcpio's stock hook watches `vmlinuz`, not `Image`. ARM kernel installs
|
|
must explicitly run `mkinitcpio -p <preset>` from the install hook, OR
|
|
ship a custom alpm hook with `Target = boot/Image-<suffix>`.
|
|
- Headers PKGBUILD: `ln -sr "${_builddir}" "${pkgdir}/usr/src/${pkgbase}"`
|
|
needs a preceding `install -d "${pkgdir}/usr/src"`. Cargo-cult from
|
|
arch's `linux` package without checking that pacman pre-creates `/usr/src`
|
|
for kernels.
|
|
- HTTPS download from nc.reauktion.de can stall on slow wifi (fresnel @ 181 ms
|
|
ping). Same-LAN scp from hertz (which already has the published pkgs in
|
|
`/tmp/ka-publish/`) is the workaround. ka-install should detect and prefer
|
|
LAN-fanout.
|
|
- Manifest must carry the kernel suffix (`-fresnel-fourier`) explicitly so
|
|
alongside-install paths (`/boot/Image-<suffix>`, `/boot/dtbs-<suffix>/`,
|
|
`/boot/initramfs-<suffix>.img`) don't collide with the EOS-stock paths.
|
|
- Backup target needs `install -d -o $USER -g $USER` first time per host —
|
|
`/sparfuxdata/kernel-agent-backups/<host>/<version>/` is created lazily.
|
|
|
|
### Out of scope this round (explicit defer)
|
|
|
|
- **vb2 dma_resv RFC v2** — *resolved 2026-05-15.* Markus iterated v2 locally
|
|
on boltzmann reaching pkgrel=14; the v2 series attaches the fence at
|
|
`device_run` (slept-OK context per Dufresne's v1 review). Now carried in
|
|
`patches/subsystem/media/videobuf2/dma-resv-release-fence/` and included
|
|
in `fleet/fresnel.yaml`. Still in scope for upstream targeting; default
|
|
remains "build-tree only, no PR until explicitly asked"
|
|
(`feedback_no_upstream.md`).
|
|
- **panfrost IOMMU_CACHE for RK3399** — sibling kernel work that targets the
|
|
readback transitive-proof gap that vb2_dma_resv alone doesn't close.
|
|
Still deferred until that lands; ship together when ready.
|
|
- **Replace** `linux-eos-arm` rather than coexist alongside — preserves easy
|
|
rollback at u-boot. Can flip to `provides=(linux-eos-arm) conflicts=(...)`
|
|
later once burn-in proves the OC kernel reliable.
|
|
|
|
|
|
## Open follow-ups (post-rollout)
|
|
|
|
- Migrate `github.com/marfrit/misc_patches/genbook/kernel/` (9 patches against
|
|
linux-6.19.9) into proper Coulomb/RockHard campaign repo with scope tags
|
|
applied. Some patches will need splitting (e.g., 0010 suspend/resume is
|
|
multi-scope and should split into soc:rk3588 + board:coolpi-cm5-genbook
|
|
pieces). — Issue #1.
|
|
- Migrate `besser/patches/` (~30 BES2600 staging series) into the scope-tagged
|
|
tree at `driver/bes2600/` with promote eligibility per series. — Issue #2.
|
|
- Decide whether boltzmann (BredOS-stock today) becomes a Neutron-managed
|
|
custom kernel target or stays stock. Decision deferred per memory
|
|
`project_neutron.md`. — Issue #4.
|
|
- ~~fresnel DTS persistence~~ — **closed** by the bootstrap reference build
|
|
above. Issue #3 closed.
|