Initial design — agent spec, lifecycle, verbs, hard rules
This commit is contained in:
@@ -1,3 +1,239 @@
|
||||
# kernel-agent
|
||||
|
||||
Kernel source/branch/patch curation, per-host build orchestration, promote-to-fleet pipeline for the home fleet. Peer agent to His (home infra).
|
||||
Owns the kernel side of the home fleet: source/branch/patch curation, per-host
|
||||
build orchestration, promote-to-fleet pipeline. Peer to His (home infra). Uses
|
||||
His for ops it doesn't own (waking data, host provisioning); files Gitea
|
||||
issues for coordination it can't decide alone.
|
||||
|
||||
Targets: dev/work hosts only. Infra hosts (noether, hertz, dcw2/3, turing,
|
||||
nuccies as compile-only) are NOT in the promote list — explicit opt-in via
|
||||
`fleet/<host>.yaml` manifest.
|
||||
|
||||
Customized today: ampere · boltzmann · fresnel · ohm
|
||||
Anticipated Debian targets: higgs · clevo · pi-fleet (when they ask for it)
|
||||
|
||||
|
||||
## Lifecycle
|
||||
|
||||
```
|
||||
┌────────────────────────────────────────────────────────────────┐
|
||||
│ INPUT — campaign session │
|
||||
│ patches in marfrit/<campaign>/ or marfrit/misc-kernel-patches│
|
||||
│ triggers: ka-promote, ka-close, ka-abandon │
|
||||
└─────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────▼──────────────────────────────────────┐
|
||||
│ ORCHESTRATION — kernel-agent │
|
||||
│ resolve manifest by scope tag │
|
||||
│ pre-flight target build host (minimal; thorough nightly) │
|
||||
│ on miss → [ka:host-changed] block to His │
|
||||
└─────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────▼──────────────────────────────────────┐
|
||||
│ BUILD │
|
||||
│ aarch64: kbuild-aarch64 on boltzmann (primary) │
|
||||
│ fermi on hertz (fallback) │
|
||||
│ distcc pool: tesla + dcc1 + dcc2 (zeroconf) │
|
||||
│ x86_64: kbuild-x86 on data (wakes via wake-host lmcp) │
|
||||
│ ccache + 5-min watcher (hertz cron) for stalls/errors │
|
||||
│ wall-clock cap (absolute), warn on degraded distcc pool │
|
||||
└─────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────▼──────────────────────────────────────┐
|
||||
│ SIGN │
|
||||
│ build host submits unsigned .pkg.tar.zst / .deb to hertz │
|
||||
│ hertz signs with existing marfrit-packages key (one key, │
|
||||
│ pkg + repo db) │
|
||||
│ hertz pushes to packages.reauktion.de │
|
||||
└─────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────▼──────────────────────────────────────┐
|
||||
│ INSTALL — consent-via-action │
|
||||
│ kernel-agent files [ka:installable] │
|
||||
│ session-hook reminders (escalating: now, +1h, +6h, daily) │
|
||||
│ YOU run ka-install <host> │
|
||||
│ → backup current → pacman/apt -U → reboot │
|
||||
└─────────────────────────┬──────────────────────────────────────┘
|
||||
│
|
||||
┌─────────────────────────▼──────────────────────────────────────┐
|
||||
│ VERIFY — post-install (auto, by hertz cron) │
|
||||
│ Bar 1: SSH heartbeat (10 min) │
|
||||
│ Bar 2: package version installed │
|
||||
│ Bar 3: DTB/sysfs matches manifest (custom-DTB hosts) │
|
||||
│ Bar 4: per-patch probe (manifest opt-in, simple lang) │
|
||||
│ Bar 5: burn-in N hours (host opt-in) │
|
||||
│ failure → [ka:regression] block, host marked drifted │
|
||||
└────────────────────────────────────────────────────────────────┘
|
||||
|
||||
Loopback (7→4): yank patches from manifest; host drifted; next
|
||||
install converges. No automatic rollback;
|
||||
backup at /sparfuxdata/kernel-agent-backups/
|
||||
on hertz, 7-day retention, you fetch + reinstall.
|
||||
```
|
||||
|
||||
|
||||
## Agent boundaries
|
||||
|
||||
```
|
||||
peer agents
|
||||
┌───────────────────────────────────────┐
|
||||
│ │
|
||||
His ←──── lmcp tools (ops) ────→ kernel-agent
|
||||
│ wake-host, host-status, │
|
||||
│ prepare-build-host, ... │
|
||||
│ │
|
||||
└─── Gitea issues (coordination) ───────┘
|
||||
|
||||
▲ ▲
|
||||
│ │
|
||||
campaign sessions
|
||||
(Bin · MegabitChip · RockHard ·
|
||||
Neutron · fresnel-fourier ·
|
||||
ohm_gl_fix · besser · ...)
|
||||
▲
|
||||
│
|
||||
subagents inside session
|
||||
(Janet · avr-specialist · Plan)
|
||||
no independent identity, contribute
|
||||
to whatever the calling session ships
|
||||
```
|
||||
|
||||
Routine ops between peer agents go through lmcp tools (sync, idempotent,
|
||||
no per-call audit trail). Coordination goes through Gitea issues (async,
|
||||
persistent, audit trail per item).
|
||||
|
||||
|
||||
## Verbs (explicit, parameterized, audit-issue auto-filed)
|
||||
|
||||
```
|
||||
ka-promote <campaign> <patch-or-glob> --to <scope>
|
||||
ka-close <campaign> --status success
|
||||
ka-abandon <campaign> --keep-as-archive | --purge-from-fleet
|
||||
ka-install <host>
|
||||
ka-keep <job-id> [--for <duration>]
|
||||
ka-pause-prune / ka-resume-prune
|
||||
ka-restore-archive <job-id>
|
||||
ka-snooze <issue-id> [--for <duration>]
|
||||
ka-debug <job-id> # shells into the same container that ran the build
|
||||
ka-status # per-host one-liner with drift/pending state
|
||||
ka-migrate-tree --from <p> --to <p>
|
||||
ka-wake-data # wraps wake-host data through His
|
||||
```
|
||||
|
||||
Conversational invocation triggers a y/n confirmation enumerating what will
|
||||
happen. Direct CLI invocation executes immediately.
|
||||
|
||||
|
||||
## Block-severity issues — what halts what
|
||||
|
||||
```
|
||||
[ka:patch-fail] only that patch's promotes
|
||||
[ka:campaign-conflict] those patches across the involved campaigns
|
||||
[ka:host-drifted] installs to that host (builds OK)
|
||||
[ka:build-fail] builds routing to that build host
|
||||
[ka:bootstrap-missing] builds for that build host
|
||||
[ka:host-changed] builds to that host until pre-flight re-passes
|
||||
[ka:signing-fail] global (all builds need signing)
|
||||
[ka:regression] installs to that host until triaged
|
||||
```
|
||||
|
||||
Scoped per issue. No implicit cross-domain propagation. Dependency cascades
|
||||
detected at promote-time, not propagated globally.
|
||||
|
||||
|
||||
## Patch tree (in marfrit/kernel-agent)
|
||||
|
||||
```
|
||||
patches/
|
||||
├── arch/{arm64,x86_64}/
|
||||
├── soc/{rockchip/{rk3399,rk3566,rk3588},...}/
|
||||
├── module/<som-name>/
|
||||
├── board/<board-name>/
|
||||
├── driver/<driver-name>/
|
||||
└── subsystem/<subsystem-name>/
|
||||
```
|
||||
|
||||
Each patch lives at the narrowest scope that's correct (a board patch goes
|
||||
under `board/`, an SoC-wide fix under `soc/`). Per-host manifest resolves
|
||||
tags + explicit includes. Reorgs via `ka-migrate-tree` (atomic tree +
|
||||
manifest rewrite); paths stable otherwise.
|
||||
|
||||
|
||||
## Build hosts
|
||||
|
||||
```
|
||||
Host Where Role Wake? Notes
|
||||
──────────────────────────────────────────────────────────────────────────
|
||||
boltzmann Rock 5 ITX+ aarch64 primary always container kbuild-aarch64
|
||||
fermi hertz LXD aarch64 fallback always matches kbuild-aarch64 profile
|
||||
kbuild-x86 data CT x86_64 on-demand wakes via His; idle 30 min → release
|
||||
tesla hertz LXD distcc helper always manual hosts list
|
||||
dcc1 dcw3 Pi 4 distcc helper always zeroconf (bridge-detach risk)
|
||||
dcc2 dcw2 Pi 4 distcc helper always zeroconf
|
||||
```
|
||||
|
||||
|
||||
## Files / paths
|
||||
|
||||
```
|
||||
/srv/kernel-agent/source/<job-id>/ live build dir (kbuild UID owns)
|
||||
/srv/kernel-agent/ccache/ persistent across builds
|
||||
/srv/kernel-agent/output/<job-id>/ built packages, pre-sign
|
||||
/srv/kernel-agent/manifest/ per-host manifests (yaml)
|
||||
/srv/kernel-agent/keep/ failed builds tagged ka-keep
|
||||
|
||||
hertz:/sparfuxdata/kernel-agent-backups/<host>/<version>/ 7-day
|
||||
hertz:/sparfuxdata/kernel-agent-archive/<job-id>/ 1-year (cron)
|
||||
|
||||
https://logs.reauktion.de/<host>/<job-id>/ 1-year (cron on lagrange)
|
||||
```
|
||||
|
||||
Repos:
|
||||
- `marfrit/kernel-agent` — agent source, manifests, scope-tagged patch tree
|
||||
- `marfrit/<campaign>` — each campaign owns its repo
|
||||
- `marfrit/misc-kernel-patches` — landing pad for one-off non-campaign fixes
|
||||
- `marfrit-packages` — kernel package PKGBUILDs / .debs
|
||||
|
||||
|
||||
## Identity
|
||||
|
||||
Issues filed as the host the agent runs on (claude-noether by default, per
|
||||
`reference_claude_noether_gitea.md`). Title prefix `[ka:*]` carries the role.
|
||||
No new Gitea identity; per-host bootstrap one-liner already covers this.
|
||||
|
||||
|
||||
## Reminder channel
|
||||
|
||||
Active Claude session top-of-conversation hook only — no email, no HA, no
|
||||
DokuWiki. Cadence: escalating ladder (initial → +1h → +6h → daily). Snooze
|
||||
via `ka-snooze <issue-id> [--for <duration>]`.
|
||||
|
||||
|
||||
## Hard rules — won't change without re-litigation
|
||||
|
||||
- Never auto-promote. Closure is your explicit verb.
|
||||
- Never auto-install. Reboot only happens inside `ka-install`.
|
||||
- Never reach into `$HOME` on any host.
|
||||
- Never targets infra hosts (noether, hertz, dcw*, turing) without explicit
|
||||
`fleet/` manifest opt-in.
|
||||
- Never sudo-mutates host setup. His provisions; agent consumes.
|
||||
- Refuse abandon without `--keep-as-archive` | `--purge-from-fleet` flag.
|
||||
- Refuse promote of patches lacking scope tag.
|
||||
|
||||
|
||||
## Open follow-ups (post-rollout)
|
||||
|
||||
- Migrate `github.com/marfrit/misc_patches/genbook/kernel/` (9 patches against
|
||||
linux-6.19.9) into proper Coulomb/RockHard campaign repo with scope tags
|
||||
applied. Some patches will need splitting (e.g., 0010 suspend/resume is
|
||||
multi-scope and should split into soc:rk3588 + board:coolpi-cm5-genbook
|
||||
pieces).
|
||||
- Migrate `besser/patches/` (~30 BES2600 staging series) into the scope-tagged
|
||||
tree at `driver/bes2600/` with promote eligibility per series.
|
||||
- fresnel: replace the loose `~/rk3399-pinebook-pro.dts` workflow with either
|
||||
a pacman hook (cheap, restores OC after each kernel update) or a proper
|
||||
`linux-eos-arm-fresnel` PKGBUILD that owns the DTB conflict (real fix).
|
||||
The April 2026 silent-revert is the canonical reason kernel-agent exists.
|
||||
- Decide whether boltzmann (BredOS-stock today) becomes a Neutron-managed
|
||||
custom kernel target or stays stock. Decision deferred per memory
|
||||
`project_neutron.md`.
|
||||
|
||||
Reference in New Issue
Block a user