busy-lock recipe should echo human-readable busy_until for off-by-N self-check #1

Closed
opened 2026-04-28 03:53:15 +00:00 by marfrit · 2 comments
Owner

What happened

2026-04-27, his agent filed /opt/herding/var/data-busy.lock with busy_until=1777291200 to suppress data's nightly auto-shutdown during a long chromium build. The shutdown fired anyway at 03:00 UTC 2026-04-28, killing the build.

Root cause

1777291200 = 2026-04-27 12:00 UTC, i.e. ~3 hours BEFORE the next shutdown firing, not 24h after. The user/agent's intent was "tomorrow noon" — off by 24h. The cron at /etc/cron.d/shutdown-data did exactly the right thing: read the lock, saw busy_until was in the past (by 13h), logged is stale (busy_until=1777291200), ignoring, removed it, and proceeded to shut down. /root/shutdown-data.sh is correct.

The gap is in the recipe (reference_data_shutdown_override.md in the auto-memory, mirrored in this agent's runbook): writing a future-dated epoch with no human-readable echo means a wrong-day error slips through silently. The operator walks away thinking they're protected for 24h, only to find the build dead next morning.

Proposed fix

When writing the lock, the recipe should echo a human-readable confirmation:

busy_until=$(date -d 'tomorrow 12:00 UTC' +%s)
echo "busy_until=$busy_until"   > /opt/herding/var/data-busy.lock
echo "owner=..."                 >> /opt/herding/var/data-busy.lock
echo "lock valid until: $(date -d @$busy_until -u)"  # <-- self-check

The his agent should also include that line in its post-action report so the user can sanity-check the date in the same response.

Optional follow-up

/root/shutdown-data.sh could log the human-readable interpretation of any lock it finds (busy_until=1777291200 (2026-04-27 12:00 UTC, 13h stale)) for cleaner post-mortem reading. Currently the log only has the epoch.

Repro / evidence

  • Lock file written: 2026-04-27 ~17:30 CEST
  • Cron firing: 03:00:01 CEST 2026-04-28
  • Cron log entry: is stale (busy_until=1777291200), ignoring followed by rm -f + poweroff
  • Build killed: chromium-rebuild on CT 220 mid-link of blink renderer
  • Recovery: his agent woke data again at 05:52 CEST and refiled the lock, this time with busy_until=1777464000 (2026-04-29 12:00 UTC) — verified by reading back via date -d @1777464000 -u.

Flagged by his agent itself in the post-action report ("Suggest tightening the recipe in reference_data_shutdown_override.md to always echo date -d @<epoch> after writing the lock, as a self-check.").

## What happened 2026-04-27, his agent filed `/opt/herding/var/data-busy.lock` with `busy_until=1777291200` to suppress data's nightly auto-shutdown during a long chromium build. The shutdown fired anyway at 03:00 UTC 2026-04-28, killing the build. ## Root cause `1777291200` = **2026-04-27 12:00 UTC**, i.e. ~3 hours BEFORE the next shutdown firing, not 24h after. The user/agent's intent was "tomorrow noon" — off by 24h. The cron at `/etc/cron.d/shutdown-data` did exactly the right thing: read the lock, saw `busy_until` was in the past (by 13h), logged `is stale (busy_until=1777291200), ignoring`, removed it, and proceeded to shut down. `/root/shutdown-data.sh` is correct. The gap is in the **recipe** (`reference_data_shutdown_override.md` in the auto-memory, mirrored in this agent's runbook): writing a future-dated epoch with no human-readable echo means a wrong-day error slips through silently. The operator walks away thinking they're protected for 24h, only to find the build dead next morning. ## Proposed fix When writing the lock, the recipe should echo a human-readable confirmation: ```bash busy_until=$(date -d 'tomorrow 12:00 UTC' +%s) echo "busy_until=$busy_until" > /opt/herding/var/data-busy.lock echo "owner=..." >> /opt/herding/var/data-busy.lock echo "lock valid until: $(date -d @$busy_until -u)" # <-- self-check ``` The his agent should also include that line in its post-action report so the user can sanity-check the date in the same response. ## Optional follow-up `/root/shutdown-data.sh` could log the human-readable interpretation of any lock it finds (`busy_until=1777291200 (2026-04-27 12:00 UTC, 13h stale)`) for cleaner post-mortem reading. Currently the log only has the epoch. ## Repro / evidence - Lock file written: 2026-04-27 ~17:30 CEST - Cron firing: 03:00:01 CEST 2026-04-28 - Cron log entry: `is stale (busy_until=1777291200), ignoring` followed by `rm -f` + `poweroff` - Build killed: chromium-rebuild on CT 220 mid-link of blink renderer - Recovery: his agent woke data again at 05:52 CEST and refiled the lock, this time with `busy_until=1777464000` (2026-04-29 12:00 UTC) — verified by reading back via `date -d @1777464000 -u`. Flagged by his agent itself in the post-action report ("Suggest tightening the recipe in `reference_data_shutdown_override.md` to always echo `date -d @<epoch>` after writing the lock, as a self-check.").
Author
Owner

Recipe confirmed in active use 2026-04-29 (noether session). Wrote a 24h lock to skip tonight's 3 AM shutdown, echoed back via date -d @$BUSY -Iseconds plus an explicit tonight 3 AM < busy_until predicate before walking away. Output:

now    : 2026-04-29T00:08:43+02:00
expires: 2026-04-30T00:07:41+02:00
delta  : 23 hours
tonight 3 AM: 2026-04-29T03:00:00+02:00  -> 1 (1=lock-protected)

The 12-hour date default (Thu Apr 30 12:07:41 AM CEST) is itself a minor footgun — 12:07 AM reads as noon to a tired eye. Recommending the recipe pin -Iseconds or -u so the readback is unambiguous. Leaving issue open for the runbook update.

Recipe confirmed in active use 2026-04-29 (noether session). Wrote a 24h lock to skip tonight's 3 AM shutdown, echoed back via `date -d @$BUSY -Iseconds` plus an explicit `tonight 3 AM < busy_until` predicate before walking away. Output: ``` now : 2026-04-29T00:08:43+02:00 expires: 2026-04-30T00:07:41+02:00 delta : 23 hours tonight 3 AM: 2026-04-29T03:00:00+02:00 -> 1 (1=lock-protected) ``` The 12-hour `date` default (`Thu Apr 30 12:07:41 AM CEST`) is itself a minor footgun — `12:07 AM` reads as noon to a tired eye. Recommending the recipe pin `-Iseconds` or `-u` so the readback is unambiguous. Leaving issue open for the runbook update.
Author
Owner

Fixed in v0.1.9 (commit 2dd7ad9). The runbook now includes a "Data nightly-shutdown override" subsection in both agents/his.md and skills/his/SKILL.md with the busy-lock recipe and the mandatory date -d "@$busy_until" -Iseconds self-check, plus the rule that the human-readable expiry must appear in the post-action report. Live on packages.reauktion.de in claude-his-agent_0.1.9-1.

Fixed in v0.1.9 (commit 2dd7ad9). The runbook now includes a "Data nightly-shutdown override" subsection in both `agents/his.md` and `skills/his/SKILL.md` with the busy-lock recipe and the mandatory `date -d "@$busy_until" -Iseconds` self-check, plus the rule that the human-readable expiry must appear in the post-action report. Live on packages.reauktion.de in `claude-his-agent_0.1.9-1`.
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/claude-his-agent#1