# kernel-agent Owns the kernel side of the home fleet: source/branch/patch curation, per-host build orchestration, promote-to-fleet pipeline. Peer to His (home infra). Uses His for ops it doesn't own (waking data, host provisioning); files Gitea issues for coordination it can't decide alone. Targets: dev/work hosts only. Infra hosts (noether, hertz, dcw2/3, turing, nuccies as compile-only) are NOT in the promote list — explicit opt-in via `fleet/.yaml` manifest. Customized today: ampere · boltzmann · fresnel · ohm Anticipated Debian targets: higgs · clevo · pi-fleet (when they ask for it) ## Lifecycle ``` ┌────────────────────────────────────────────────────────────────┐ │ INPUT — campaign session │ │ patches in marfrit// or marfrit/misc-kernel-patches│ │ triggers: ka-promote, ka-close, ka-abandon │ └─────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────────────────┐ │ ORCHESTRATION — kernel-agent │ │ resolve manifest by scope tag │ │ pre-flight target build host (minimal; thorough nightly) │ │ on miss → [ka:host-changed] block to His │ └─────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────────────────┐ │ BUILD │ │ aarch64: kbuild-aarch64 on boltzmann (primary) │ │ fermi on hertz (fallback) │ │ distcc pool: tesla + dcc1 + dcc2 (zeroconf) │ │ x86_64: kbuild-x86 on data (wakes via wake-host lmcp) │ │ ccache + 5-min watcher (hertz cron) for stalls/errors │ │ wall-clock cap (absolute), warn on degraded distcc pool │ └─────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────────────────┐ │ SIGN │ │ build host submits unsigned .pkg.tar.zst / .deb to hertz │ │ hertz signs with existing marfrit-packages key (one key, │ │ pkg + repo db) │ │ hertz pushes to packages.reauktion.de │ └─────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────────────────┐ │ INSTALL — consent-via-action │ │ kernel-agent files [ka:installable] │ │ session-hook reminders (escalating: now, +1h, +6h, daily) │ │ YOU run ka-install │ │ → backup current → pacman/apt -U → reboot │ └─────────────────────────┬──────────────────────────────────────┘ │ ┌─────────────────────────▼──────────────────────────────────────┐ │ VERIFY — post-install (auto, by hertz cron) │ │ Bar 1: SSH heartbeat (10 min) │ │ Bar 2: package version installed │ │ Bar 3: DTB/sysfs matches manifest (custom-DTB hosts) │ │ Bar 4: per-patch probe (manifest opt-in, simple lang) │ │ Bar 5: burn-in N hours (host opt-in) │ │ failure → [ka:regression] block, host marked drifted │ └────────────────────────────────────────────────────────────────┘ Loopback (7→4): yank patches from manifest; host drifted; next install converges. No automatic rollback; backup at /sparfuxdata/kernel-agent-backups/ on hertz, 7-day retention, you fetch + reinstall. ``` ## Agent boundaries ``` peer agents ┌───────────────────────────────────────┐ │ │ His ←──── lmcp tools (ops) ────→ kernel-agent │ wake-host, host-status, │ │ prepare-build-host, ... │ │ │ └─── Gitea issues (coordination) ───────┘ ▲ ▲ │ │ campaign sessions (Bin · MegabitChip · RockHard · Neutron · fresnel-fourier · ohm_gl_fix · besser · ...) ▲ │ subagents inside session (Janet · avr-specialist · Plan) no independent identity, contribute to whatever the calling session ships ``` Routine ops between peer agents go through lmcp tools (sync, idempotent, no per-call audit trail). Coordination goes through Gitea issues (async, persistent, audit trail per item). ## Verbs (explicit, parameterized, audit-issue auto-filed) ``` ka-promote --to ka-close --status success ka-abandon --keep-as-archive | --purge-from-fleet ka-install ka-keep [--for ] ka-pause-prune / ka-resume-prune ka-restore-archive ka-snooze [--for ] ka-debug # shells into the same container that ran the build ka-status # per-host one-liner with drift/pending state [bin/ka-status — implemented Phase 1] ka-migrate-tree --from

--to

ka-wake-data # wraps wake-host data through His ``` Conversational invocation triggers a y/n confirmation enumerating what will happen. Direct CLI invocation executes immediately. ## Block-severity issues — what halts what ``` [ka:patch-fail] only that patch's promotes [ka:campaign-conflict] those patches across the involved campaigns [ka:host-drifted] installs to that host (builds OK) [ka:build-fail] builds routing to that build host [ka:bootstrap-missing] builds for that build host [ka:host-changed] builds to that host until pre-flight re-passes [ka:signing-fail] global (all builds need signing) [ka:regression] installs to that host until triaged ``` Scoped per issue. No implicit cross-domain propagation. Dependency cascades detected at promote-time, not propagated globally. ## Patch tree (in marfrit/kernel-agent) ``` patches/ ├── arch/{arm64,x86_64}/ ├── soc/{rockchip/{rk3399,rk3566,rk3588},...}/ ├── module// ├── board// ├── driver// └── subsystem// ``` Each patch lives at the narrowest scope that's correct (a board patch goes under `board/`, an SoC-wide fix under `soc/`). Per-host manifest resolves tags + explicit includes. Reorgs via `ka-migrate-tree` (atomic tree + manifest rewrite); paths stable otherwise. ## Build hosts ``` Host Where Role Wake? Notes ────────────────────────────────────────────────────────────────────────── boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64 ampere CoolPi GenBook aarch64 secondary on-demand RK3588 32GB; same uarch as boltzmann, wakes via His; idle 30 min → release fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release ``` Native make on the assigned build host. **No distcc** for kernel-agent builds (`feedback_kernel_agent_no_distcc.md`, locked 2026-05-09). ccache stays per-host. distcc remains in scope for userspace package builds. ## Files / paths ``` /srv/kernel-agent/source// live build dir (kbuild UID owns) /srv/kernel-agent/ccache/ persistent across builds /srv/kernel-agent/output// built packages, pre-sign /srv/kernel-agent/manifest/ per-host manifests (yaml) /srv/kernel-agent/keep/ failed builds tagged ka-keep hertz:/sparfuxdata/kernel-agent-backups/// 7-day hertz:/sparfuxdata/kernel-agent-archive// 1-year (cron) https://logs.reauktion.de/// 1-year (cron on lagrange) ``` Repos: - `marfrit/kernel-agent` — agent source, manifests, scope-tagged patch tree - `marfrit/` — each campaign owns its repo - `marfrit/misc-kernel-patches` — landing pad for one-off non-campaign fixes - `marfrit-packages` — kernel package PKGBUILDs / .debs ## Identity Issues filed as the host the agent runs on (claude-noether by default, per `reference_claude_noether_gitea.md`). Title prefix `[ka:*]` carries the role. No new Gitea identity; per-host bootstrap one-liner already covers this. ## Reminder channel Active Claude session top-of-conversation hook only — no email, no HA, no DokuWiki. Cadence: escalating ladder (initial → +1h → +6h → daily). Snooze via `ka-snooze [--for ]`. ## Hard rules — won't change without re-litigation - Never auto-promote. Closure is your explicit verb. - Never auto-install. Reboot only happens inside `ka-install`. - Never reach into `$HOME` on any host. - Never targets infra hosts (noether, hertz, dcw*, turing) without explicit `fleet/` manifest opt-in. - Never sudo-mutates host setup. His provisions; agent consumes. - Refuse abandon without `--keep-as-archive` | `--purge-from-fleet` flag. - Refuse promote of patches lacking scope tag. ## Bootstrap reference build (2026-05-09 — fresnel) First end-to-end run, before any `ka-*` CLI exists. Documented here as the canonical worked example so future ka-* implementations have a concrete substrate to replay. Issue #3 (fresnel DTS persistence) closed by this build. ### Inputs - **Baseline:** mmind/linux-rockchip @ `v7.0` (Heiko Stübner / Collabora, via kernel.org). - **Patches** (scope `board/pinebook-pro`): - `0001-arm64-dts-rk3399-pinebook-pro-add-OC-OPP-tables-1704-2184.patch` - `0002-arm64-dts-rk3399-pinebook-pro-enable-hdmi-sound.patch` - `0003-arm64-dts-rk3399-pinebook-pro-spi1-max-freq-10MHz.patch` - **Manifest:** `fleet/fresnel.yaml` (tree=mmind v7.0, 3 patches above, alongside-install vs `linux-eos-arm`). - **.config source:** snapshot from fresnel `/usr/lib/modules/6.19.10-1-eos-arm/build/.config`, recovered from the data backintime backup (May 7 snapshot) since the laptop was off when the build started; `make olddefconfig` to fold in v7.0 new symbols (one harmless `BOOTPARAM_SOFTLOCKUP_PANIC` warning, ignored). ### Manual substitute for each ka-* verb | Designed verb | What we did manually | |---|---| | `ka-promote fresnel-fourier --to board/pinebook-pro` | Authored 3 patches with proper headers/scope tags, pushed to `marfrit/kernel-agent/patches/board/pinebook-pro/` via Gitea contents API as `claude-noether`. | | `ka-build fresnel` | On boltzmann: cloned linux v7.0 from kernel.org, ran `makepkg -s --skipchecksums --skippgpcheck` against `marfrit-packages/arch/linux-fresnel-fourier/PKGBUILD`. Native aarch64 (boltzmann is RK3588). One headers-pkg bug discovered (`ln -sr` on missing parent dir) and fixed mid-flight. Repackaged. | | `ka-sign + push` | scp pkgs hertz → `sudo /opt/herding/bin/marfrit-publish-arch aarch64 ` per pkg. Script signs with key `92D5E96D8F63C75E4116AA1FF5C8C4603D0D250C`, runs repo-add, rsyncs to nc. | | `ka-install fresnel` (consent-via-action) | `sudo pacman -U /tmp/` over LAN scp (HTTPS to nc was throttled by fresnel's wifi). pacman post-transaction hook updated extlinux. mkinitcpio run manually because the standard hook trigger watches `vmlinuz` not `Image`. | | Bar 1..3 verification | SSH heartbeat OK, `pacman -Q linux-fresnel-fourier` = `7.0-1`, post-reboot cluster0 1.704 GHz / cluster1 2.184 GHz confirmed. | ### Files / locations involved - `git.reauktion.de/marfrit/kernel-agent/patches/board/pinebook-pro/` — patches - `git.reauktion.de/marfrit/kernel-agent/fleet/fresnel.yaml` — manifest - `git.reauktion.de/marfrit/marfrit-packages/arch/linux-fresnel-fourier/` — PKGBUILD + 3 patches + config + extlinux hook+script + mkinitcpio preset - `boltzmann:~/src/kernel-agent-bootstrap/` — local build root (baseline clone, patches, build dir, artifacts) - `hertz:/tmp/ka-publish/` — staging for sign+push (transient) - `hertz:/sparfuxdata/kernel-agent-backups/fresnel/6.19.9-99-eos-arm/fresnel-boot-pre-install.tgz` — pre-install /boot snapshot (71MB, 7-day retention per design) - `https://packages.reauktion.de/arch/aarch64/linux-fresnel-fourier-7.0-1-aarch64.pkg.tar.zst` — published artifact - `fresnel:/boot/{Image,initramfs,dtbs}-fresnel-fourier{,/...}` — installed artifacts - `fresnel:/boot/extlinux/extlinux.conf` — managed block tagged `>>> linux-fresnel-fourier (managed) >>>` … `<<<` ### What was learned that ka-* should bake in - mkinitcpio's stock hook watches `vmlinuz`, not `Image`. ARM kernel installs must explicitly run `mkinitcpio -p ` from the install hook, OR ship a custom alpm hook with `Target = boot/Image-`. - Headers PKGBUILD: `ln -sr "${_builddir}" "${pkgdir}/usr/src/${pkgbase}"` needs a preceding `install -d "${pkgdir}/usr/src"`. Cargo-cult from arch's `linux` package without checking that pacman pre-creates `/usr/src` for kernels. - HTTPS download from nc.reauktion.de can stall on slow wifi (fresnel @ 181 ms ping). Same-LAN scp from hertz (which already has the published pkgs in `/tmp/ka-publish/`) is the workaround. ka-install should detect and prefer LAN-fanout. - Manifest must carry the kernel suffix (`-fresnel-fourier`) explicitly so alongside-install paths (`/boot/Image-`, `/boot/dtbs-/`, `/boot/initramfs-.img`) don't collide with the EOS-stock paths. - Backup target needs `install -d -o $USER -g $USER` first time per host — `/sparfuxdata/kernel-agent-backups///` is created lazily. ### Out of scope this round (explicit defer) - **vb2 dma_resv RFC v2** — *resolved 2026-05-15.* Markus iterated v2 locally on boltzmann reaching pkgrel=14; the v2 series attaches the fence at `device_run` (slept-OK context per Dufresne's v1 review). Now carried in `patches/subsystem/media/videobuf2/dma-resv-release-fence/` and included in `fleet/fresnel.yaml`. Still in scope for upstream targeting; default remains "build-tree only, no PR until explicitly asked" (`feedback_no_upstream.md`). - **panfrost IOMMU_CACHE for RK3399** — sibling kernel work that targets the readback transitive-proof gap that vb2_dma_resv alone doesn't close. Still deferred until that lands; ship together when ready. - **Replace** `linux-eos-arm` rather than coexist alongside — preserves easy rollback at u-boot. Can flip to `provides=(linux-eos-arm) conflicts=(...)` later once burn-in proves the OC kernel reliable. ## Open follow-ups (post-rollout) - Migrate `github.com/marfrit/misc_patches/genbook/kernel/` (9 patches against linux-6.19.9) into proper Coulomb/RockHard campaign repo with scope tags applied. Some patches will need splitting (e.g., 0010 suspend/resume is multi-scope and should split into soc:rk3588 + board:coolpi-cm5-genbook pieces). — Issue #1. - Migrate `besser/patches/` (~30 BES2600 staging series) into the scope-tagged tree at `driver/bes2600/` with promote eligibility per series. — Issue #2. - Decide whether boltzmann (BredOS-stock today) becomes a Neutron-managed custom kernel target or stays stock. Decision deferred per memory `project_neutron.md`. — Issue #4. - ~~fresnel DTS persistence~~ — **closed** by the bootstrap reference build above. Issue #3 closed.