Commit Graph

15 Commits

Author SHA1 Message Date
marfrit 1f34b6dce8 config + docs/PHASE7: example block + status -> Implement (Phase 7 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE7.md). N5: PHASE0 §11 amendment landed in commit 3bad07b
(formulate); not re-applied here.

config.lua:
  - Commented-out `cost = { warn_at_dollars, warn_at_tokens }` block
    with parity to the Phase 1-6 example blocks.
  - Notes warn flags are independent (R4) and per-turn usage flows
    to session/*.jsonl for after-the-fact analysis.

docs/PHASE7.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits inline for traceability:
      7364963  broker: usage capture + opts widening
      7b4a9be  context: accumulator helpers
      8adebd5  repl: _record_usage + opts.category at 5 sites
      b30212a  safety + repl: opts.category for Norris + probe
      0d6ff93  repl: :cost meta surface
      this     config example + status bump

Phase 7 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 23:02:55 +00:00
marfrit ac58b19da2 config + docs/PHASE6: example block + status -> Implement (Phase 6 commit #6)
R9-resolved single-owner of the status bump (commit #5 didn't touch
PHASE6.md per the review fold-in).

config.lua:
  - Commented-out `project = { auto_tree, tree_depth, tree_max_chars }`
    block with the same shape as the Phase 1-5 example blocks.
  - Note that :diff / :tree / :highlight all work without config; the
    `project` block ONLY controls the startup auto-inject.
  - Note about :highlight v1 having no config flag (runtime-only),
    cross-references the in-REPL install hint.

docs/PHASE6.md:
  - Status header bumped: "Plan + review fold-in" -> "Implement"
  - Lists the 6 implement commits in the header for traceability:
      c4fc7fd  context: compose_project plumbing
      d1dce83  _scan_project_tree + :tree + auto_tree hook
      4d5f93a  :diff + _git_clean_cmd (B1 helper)
      0d63f01  expand_mentions @<r1>..<r2> tiered resolution
      11d0e59  tree-sitter highlighter (renderer fence filter +
               highlighted dispatch + :highlight meta)
      this    config example + status bump

Phase 6 implementation is complete. Next inner-loop step is verify
(7) — user-driven smoke tests against the live broker on each pillar
plus filing of issues for any defects, then memory-update (8).

Regression: test_safety 87/87, test_router_model 31/31, repl loads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:27:58 +00:00
marfrit d852acadc2 repl: wire #13 secrets — scrub outbound, rehydrate stream + tool args
Plumbs the secrets.lua module (commit e4b818b) into the conversation
pipeline. Hook points:

  ask_ai          — scrub_messages(ctx:to_messages(), mode) before
                    call_broker; rehydrate streamed deltas via
                    streaming_rehydrator so the user sees real values
                    while text_parts accumulates rehydrated chunks
                    (final_resp is plain — CMD: / DELEGATE: extractors
                    see plain values)

  MCP dispatch    — dispatch_tool_call rehydrates the args table before
                    sess:call_tool so the trusted MCP server receives
                    real values (the model emitted placeholders because
                    it saw a scrubbed context)

  DELEGATE: & :delegate
                  — scrub sub_msgs before broker.chat; rehydrate sub_text
                    before appending to context, so future turns see
                    real values restored

  Phase 5 summarize-on-evict
                  — scrub sum_msgs before broker.chat; rehydrate the
                    reply that becomes ctx.summary

  :memory summarize
                  — same scrub + rehydrate pair

Mode resolution per call: model_cfg.redact → config.secrets.default →
"vault+autodetect" if vault loaded, else "off".

ctx storage convention: PLAIN values throughout. The scrub happens at
the egress (broker call) per the active redact mode; ctx.turns never
holds placeholders for content the user typed or executor produced.
The model's own emissions (assistant tool_call arguments) may carry
placeholders because the model saw the scrubbed context — rehydrated
at MCP dispatch and otherwise harmless on re-serialization (idempotent
re-scrubbing).

New meta:
  :secrets [status]         vault entries, placeholders allocated this
                            session, active broker mode. Never prints
                            actual values (vault file is itself a
                            secret per gotcha 7).
  :secrets check <text>     dry-run scrub against the active broker's
                            mode — shows the output transformation.

Documented in config.lua with a commented-out block + per-broker
redact field example.

Deferred to a follow-up issue (clearly scoped):
  - safety.lua broker call sites (Norris main loop, is_destructive
    LLM second-opinion probe) — same wiring pattern, but they don't
    currently see secrets_session; needs threading through helpers.
  - @-mention file content is appended PLAIN to ctx and scrubbed at
    egress alongside the rest of the user turn (covered by the
    ask_ai scrub).
  - exec output streamed live to terminal is pre-scrub (user sees
    real values in their own shell — by design); the captured-for-
    context copy is scrubbed at egress alongside the rest.

This is the "full scope" implementation chosen via AskUserQuestion.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:38:23 +00:00
marfrit f94d16fc89 repl: background CMD&: with handle/poll (closes #8)
Builds, long-running network calls, and file watches no longer block
the turn. A new "CMD&: <cmd>" marker (analogue of CMD:) tells the REPL
to spawn the command in the background, return immediately, and poll
for completion between user inputs.

Process model: shell-wrapped to avoid needing fork()/execv() FFI.

  nohup sh -c '(<cmd>) > <log> 2>&1; echo $? > <status>' </dev/null
       >/dev/null 2>&1 & echo $!

The child is reparented to init; we hold only the PID and the path to
the .status sidecar. Completion is detected by the .status file
existing (the wrapper writes it as its last act). No waitpid needed —
the child isn't ours after the popen subshell exits.

Storage: <history.dir>/bg/<id>.log + <id>.status. The directory is
created lazily at startup (mkdir -p). Requires history.dir to be
configured; without it CMD&: emits an error status and the model
sees an "[bg failed to start]" exec-output note.

check_bg_done() runs at the top of each main-loop iteration alongside
check_every_due(). When a job is detected as exited, the REPL:
  - emits a status line "[bg:<id> exited <code>, <bytes>, <secs>s wall] <cmd>"
  - appends the same string to ctx as exec output, so the model sees
    the completion on its next turn (natural follow-up: "ok the build
    finished; let me check the log")

Meta surface:
  :bg-spawn <cmd>       start a bg job directly (no AI needed; also
                        useful for testing without depending on the
                        model emitting CMD&:)
  :bg-list              show running/done jobs (id, pid, state, runtime, cmd)
  :bg-output <id>       dump the log file to stdout
  :bg-kill <id>         SIGTERM (note: only delivers if the PID is
                        still the actual command — long-lived shells
                        may need pkill by name)

Scope (deliberately limited for v1):
  - No callback-mode readline: bg completion detection is pre-prompt,
    not mid-readline. If a build finishes while the user is typing,
    notification comes when they hit Enter.
  - Permission policy DSL (#9) does NOT apply to CMD&: — the
    asynchronous gating model wasn't designed for the y/N flow.
    Filed as follow-up if needed.
  - Norris not extended: helpers.exec_cmd is still synchronous; the
    planner doesn't dispatch bg jobs.
  - Plan mode interaction: CMD&: in plan mode emits "PLAN: & <cmd>"
    and a "[plan] would bg-run: <cmd>" exec-output note, no spawn.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:25:55 +00:00
marfrit 17e62c0326 safety: permission policy DSL — allow/confirm/deny rule lists (closes #9)
The confirm_cmd boolean was too coarse: true interrupts every harmless
ls; false ungates everything. Most workflows want trust for read-only
ops while still gating writes/network/sudo.

New config:

    permissions = {
        allow   = { "^ls%s", "^cat%s", "^git status" },
        confirm = { "^rm%s", "^git push", "^docker%s", "^sudo%s" },
        deny    = { "^ssh%s+root@", "^curl%s+http[^s]" },
    }

Verdict order: deny > confirm > allow. First match in the chosen
category wins. Unmatched defaults to "confirm". Patterns are Lua
patterns (not regex) per PHASE0.md §3 — no compiled extensions.

Verdict behavior in the interactive CMD: loop:
  - allow   → run without prompt
  - deny    → status line, skip
  - confirm → [y/N] prompt (same UX as legacy confirm_cmd=true)

Backward compat:
  - permissions unset + confirm_cmd=true  → always confirm
  - permissions unset + confirm_cmd=false → always allow
  - permissions set                        → policy table is authoritative

Scope deliberately limited to the interactive AI-suggested CMD: gate.
Norris autonomous mode keeps its own safety.is_destructive machinery
(combining the two would double-gate or replace the LLM probe — both
non-obvious behavioral changes that belong in their own issues).
User-typed shell-routed lines (`router.classify → "shell"`) and
:exec also bypass the policy by design — those are direct user intent.

New introspection:
  :perms list           — show the configured rule lists
  :perms check <cmd>    — report verdict + matching rule (debug)

safety.classify_command is exported and unit-tested with 12 cases
covering each category, priority order (deny > allow on overlap),
and both fallback paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:20:56 +00:00
marfrit fb15f7a690 repl: pre/post CMD hooks via config.hooks (closes #3)
Optional shell scripts trigger around every CMD: execution. Use cases:
audit logging, auto-format-after-edit, custom safety gates beyond the
existing confirm_cmd boolean.

Config shape:

    hooks = {
        pre_cmd  = "/path/to/pre-script",
        post_cmd = "/path/to/post-script",
    }

Contract per hook invocation:
  - The command line is piped to the hook on stdin.
  - Env vars: AISH_CMD (the command), AISH_TURN (#ctx.turns at the
    moment of dispatch), AISH_CWD (libc.getcwd() result).
  - Hook stdout is streamed live to the terminal via executor.exec
    (so the user sees its output regardless of exit status).

Pre-hook: non-zero exit aborts the command and emits a status line
including the exit code. last_exec_code is set to the hook's exit
so the {last_status} prompt template variable reflects the abort.

Post-hook: exit code is ignored (the spec says so); only the visible
stdout matters. Runs after the command's exec_end frame.

Tested with success, abort, and stdin-matches-env paths.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:16:11 +00:00
marfrit d738f339cb repl: configurable prompt template via config.shell.prompt (closes #10)
At-a-glance situational awareness: see the active model, context fill,
mode flags, and cwd in the prompt itself — prevents "wait, am I still
in plan mode?" surprises.

Example config:

    shell = {
        prompt = "[{model} {ctx_used}/{ctx_max}t T{turn} {mode}] {cwd_short} > ",
    }

Variables (substituted via {name}):
  {model}        active preset name
  {ctx_used}     char/4 token heuristic (Phase 0 §8; accurate is Q1)
  {ctx_max}      config.context.token_budget
  {turn}         #ctx.turns
  {cwd}          libc.getcwd() (chdir-aware; PWD env may drift)
  {cwd_short}    cwd with $HOME -> ~
  {last_status}  last exec exit code, "" if none yet
  {mode}         "norris" | "plan" | "normal"

Default behavior unchanged when shell.prompt is unset — keeps the
"[aish:<model>]>" form with norris  and plan markers.

Side wiring:
  - ffi/libc.lua gains getcwd() (chdir() doesn't update PWD).
  - run_shell records exit code into last_exec_code for {last_status}.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 21:14:43 +00:00
marfrit d72689f709 config: deep model → deepseek-coder-v2-lite (temporary)
qwen3-30b-a3b-instruct isn't loaded on hossenfelder right now (per
/v1/models). deepseek-coder-v2-lite IS loaded — 16B MoE with ~2.4B
active params; fast enough that the 30-min timeout from the
qwen3-30b config was wildly over-budget.

Switched to deepseek-coder-v2-lite for the time being. Restore
qwen3-30b when the slot is back up.

Live-probed: YES/NO destructive probe via the deep model preset
returns "YES." in ~4.8s — well within the new 5-min timeout, and
fast enough that the Phase 3 LLM second-opinion path is now
functional again without falling back to "fail-safe YES" on every
ambiguous command.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:42:23 +00:00
marfrit a9b39cd435 config: Phase 5 routing + summarize-on-evict example (commit #5)
Phase 5 commit #5 (final) per docs/PHASE5.md §11. Documentation-only;
commented-out example showing:
  - routing.auto            (per-request auto-routing toggle)
  - routing.classes         (class → model mapping; reasoning = nil
                             by default per R-N2 cost-safety)
  - routing.fallback        (single-hop retry to cloud on transport fail)
  - routing.fallback_model  (default "cloud" if uncommented)
  - context.summarize_on_evict + summarizer_model + max_summary_chars
    (shown INSIDE the context = {...} block above)

All defaults OFF — Phase 5 is opt-in across the board. Existing configs
without `routing` or `context.summarize_on_evict` behave identically to
Phase 4.

Phase 5 implementation complete:
  #1 3e57824  router.classify_model + 31-case corpus
  #2 03497b5  context summarize_fn callback + summary block in to_messages
  #3 40ea0b4  repl routing + fallback + summarize_fn wiring + :route/:fallback
  #4 -        (bundled into #3 since meta cmds are trivial additions)
  #5 (this)   config example block

Phase 5 verify-partial:
  - router.classify_model: 31/31 case corpus passes
  - context summarize-on-evict: mock callback fires correctly (additive
    + compress paths), summary suppressed under Norris, :reset clears it
  - repl meta cmds: :route on/off/classes/check + :fallback on/off all
    work; :route check reports class + "routing currently disabled"
    suffix when auto is off (N1)

Verify-pending: end-to-end with real broker (route a code question, see
it land on deep; kill local backend, see fallback fire to cloud).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 11:32:20 +00:00
marfrit 27784f9b68 config: Phase 4 memory example block (commit #5)
Phase 4 commit #5 (final) per docs/PHASE4.md §12. Documentation-only;
commented-out example showing:
  - inject_max_chars (cap on startup injection; default 2000)
  - summarizer_model (which configured model :memory summarize uses)

The block is OFF by default. The :memory meta surface (:remember,
:memory list/forget/clear/inject/summarize) works without the block —
items persist to <history.dir>/memory.jsonl regardless. The block only
configures the injection-into-system-prompt behavior + summarizer model
choice.

Phase 4 implementation complete:
  #1 199dd87  history.lua memory store + ffi/libc.lua flock
  #2 c1a5c73  context.lua [background] block (suppressed in Norris)
  #3 3b074af  repl.lua memory handle + :remember + :memory meta
  #4 f22d21d  :memory summarize — LLM candidate extraction
  #5 (this)   config.lua memory example block

Phase 4 verify-partial:
  - history memory round-trip tests: add/forget/load all green
  - flock single-writer enforcement verified
  - context composition order (DEFAULT → [background] → NORRIS) +
    Norris suppression all green
  - End-to-end persistence across boots: :remember on boot 1 visible
    on boot 2 as injected memory items
  - :memory forget id-not-active surfaces clean status (N1)
  - :memory clear with [y/N] confirm gate works
  - :memory summarize wire-correct against fast model (candidate
    parsing tolerates bullets; per-candidate y/N/edit prompts fire)

Verify-pending: real-model summarizer quality test (deep/cloud);
multi-process flock contention test; long-running :memory inject
race with running broker stream.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 07:53:58 +00:00
marfrit 50666d092f config: Phase 3 safety example block (commit #6)
Phase 3 commit #6 (final) per docs/PHASE3.md §12. Documentation-only;
commented-out example showing the safety schema:
  - llm_second_opinion  (bool, default true)
  - llm_model           (string, default deep→default_model fallback)
  - max_norris_steps    (int,  default 8)

The block notes the model-selection trade-off (R-B2): cloud is the
independent-class fast option (costs money), deep is the local-but-slow
option, fast is self-policing and NOT recommended.

No behavior change to existing configs — safety defaults kick in when
the block is absent.

Phase 3 implementation complete:
  #1 bd59ce7  safety static patterns (34 rules) + 87-case test corpus
  #2 2abd5da  LLM second-opinion + session cache + opts.max_tokens
  #3 d2a53d2  renderer Norris frames
  #4 11b1f56  safety.norris_step planner (single iteration)
  #5 a404b2a  repl driver + \C-n real binding + :norris/:safety meta
              + readline rl_insert_text/rl_redisplay
  #6 (this)   config.lua safety example block

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 23:42:46 +00:00
marfrit f26cbd9a3a phase2 amend: __ separator (Bedrock-safe) + post_sse error diagnostics
Phase 7 verify finding from TC #26 against :model cloud:
  HTTP 400 from openrouter→Amazon Bedrock:
  "tools.0.custom.name: String should match pattern
   '^[a-zA-Z0-9_-]{1,128}$'"

Anthropic via Bedrock validates tool names against that regex and
rejects dots. PHASE2 originally chose "." as the namespace separator
("boltzmann.list_dir"); OpenAI tolerated it, Bedrock does not.

Separator switched to "__" (two underscores) everywhere — internal
API matches on-wire shape, no transformation layer:

  - repl.lua:
    - tools_schema builds "alias__name"
    - dispatch_tool_call splits via "^(.-)__(.+)$" (non-greedy → leftmost __)
    - :mcp tool parser uses same split
    - :mcp tools formatter prints "alias__name"
    - HELP block shows <alias__name>
  - safety.lua confirm_tool_call: alias.* glob → alias__* glob
  - config.lua example block: keys rewritten
  - docs/PHASE2.md: amendment header added; §1, §2 row, §3 config.lua
    row, §5 wire-shape JSON examples, §6 auto_approve schema, §7
    meta-cmd table, §12 plan all updated. Original "." references
    preserved in commit history.

Constraint: aliases must not themselves contain "__" so the parse
stays unambiguous. Tool names from MCP servers may have underscores
freely.

Second fix bundled — uninformative broker error:
  Previously "broker error: transport: HTTP response code said error"
  Now      "broker error: transport: HTTP 400: {full body snippet}"

ffi/curl.lua M.post_sse changes:
  - FAILONERROR no longer set (was hiding the response body).
  - raw_body accumulator added alongside the SSE buffer; captures
    every byte regardless of SSE shape.
  - After perform, check status_code via curl_easy_getinfo. On >=400,
    return (nil, "HTTP <code>: <body[:400]>"). 2xx unchanged.
  - End-of-stream SSE flush only runs on 2xx (no false event on
    error bodies that aren't SSE-shaped).
  - Phase 1 callers reading just first return slot stay correct.

End-to-end verified:
  - :model cloud + tools=[boltzmann__read_file ...] +
    "Use boltzmann__read_file with path=/etc/hostname" →
    Claude emits tool_call with name="boltzmann__read_file",
    args='{"path": "/etc/hostname"}'. ok=true, transport clean.
  - Force-bad tool name "bad.name.with.dots" → err string carries
    the full bedrock 400 with the regex-pattern message visible.

TC #26 (sub-loop end-to-end) is now testable against cloud — the
error that blocked it is resolved.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 20:04:57 +00:00
marfrit 09800d192a config: Phase 2 mcp example block + deep model switch
Phase 2 commit #7 (final) per docs/PHASE2.md §12. Two changes bundled:

(1) commented-out mcp = {...} example block (~40 lines) at the end of
    config.lua showing the Phase 2 schema:
      - mcp.servers — alias → {url, auth_token | auth_env}
      - mcp.auto_approve — "<alias>.<tool>" or "<alias>.*" globs
      - mcp.max_tool_depth — sub-loop budget per ask_ai turn
    The block is OFF by default; uncomment + adjust per fleet to
    activate. Documentation-only; no behavior change to existing
    configs (mcp_sessions stays empty, tools_schema() returns [],
    broker omits the field — full Phase 1 compatibility).

(2) User-authored: deep model preset switched from
    mistral-nemo-12b-instruct to qwen3-30b-a3b-instruct, with a 10-min
    timeout_ms accommodating the larger model's RK3588 inference time.
    Reason: nemo backend is dormant per the proxy /v1/models discovery
    (aish#23 now returns 404 cleanly for unknown models instead of
    silent fallback); qwen3-30b is the practical "deep" alternative.

Phase 2 implementation is now complete — 7 of 7 commits landed:
  #1 6c194de  mcp.lua + ffi/curl status_code + PHASE0 §4 amendment
  #2 0fde77f  safety.lua confirm_tool_call
  #3 7c221a8  context.lua tool turns + use_tool_role fallback
  #4 c736d0e  renderer.lua tool-call frames
  #5 efdc728  broker.lua opts.tools + tool_call accumulator
  #6 7e9cfff  repl.lua sub-loop + :mcp meta + system-prompt block
  #7 (this)   config.lua example + deep model switch

Next phase-loop step: verify (Phase 7). Files written are wired and
isolated-tested; end-to-end model-driven verification waits on either
a more compliant model or explicit forcing of tool_calls from the
prompt — known to be marginal with the loaded qwen-1.5b but proven
correct against direct probes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-12 15:40:21 +00:00
marfrit 8870eb0451 config: route all presets through hossenfelder per issue #12
Resolves issue #12 by partial-accept of the recommendation.

What landed:
  - Single broker URL: http://hossenfelder.fritz.box:8082 for all three
    presets (fast / deep / cloud). Server-side model-aware routing; no
    client-side cloud auth (proxy holds the OpenRouter bearer).
  - Models from hossenfelder's /v1/models inventory:
      fast  -> qwen2.5-coder-1.5b-q4_k_m.gguf  (boltzmann local)
      deep  -> mistral-nemo-12b-instruct        (boltzmann local)
      cloud -> anthropic/claude-haiku-4.5       (OpenRouter route)
  - `cloud` was already pointing at hossenfelder but with https://; flipped
    to http:// so it matches the proxy's actual scheme.

What deferred:
  - Schema rename `models` -> `brokers` (and the 5-cloud-preset shape
    suggested in #12) — would touch repl.lua + broker.lua. Not blocking
    Phase 7. If multi-preset becomes useful in practice, file a separate
    issue for the rename then.

Phase 7 verification (live broker test):
  - broker.chat(fast, [user="say pong"]) -> "CMD: echo pong" in ~3s
  - multi-turn arithmetic (7*8=56, *2=112) preserved across turns

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 18:06:08 +00:00
claude-noether 4310207738 Phase 0: scaffold tree + manifest
- README, .gitignore, CLAUDE.md (project conventions)
- docs/PHASE0.md — full Phase 0 manifest (locked substrate)
- 10 root .lua modules + 4 ffi/ bindings, all stubs raising NotImplemented
  with module-scoped responsibilities matching the manifest
- config.lua wired to current dirac/hossenfelder endpoints (qwen-coder-7b
  snappy/32k + cloud via OpenRouter through hossenfelder)

File names match docs/PHASE0.md §4 exactly. Module bodies fill in across
later phases; the tree shape is locked.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-09 23:16:07 +00:00