fahrenheit/openrc: supervise-daemon orphans pihole-FTL child on restart, holding ports #3

Closed
opened 2026-04-29 04:42:25 +00:00 by marfrit · 1 comment
Owner

What happened

2026-04-29 04:29 UTC, while diagnosing a stuck pihole-FTL dashboard outage on fahrenheit (Alpine 3.23):

  1. rc-service pihole-FTL stop returned [ ok ].
  2. PID 496 (the actual FTL daemon, child of supervise-daemon) was still running — reparented to PID 1 (init).
  3. The next rc-service pihole-FTL start spawned a fresh FTL (PID 24450).
  4. Result: two pihole-FTL processes coexisted. PID 496 held port 53. PID 24450 held only 80/443 (it had failed to bind 53 because the orphan still owned it, and civetweb's o modifier on the web ports meant it didn't respawn-loop on the partial bind).
  5. kill -TERM 496 had no effect. The orphan ignored SIGTERM.
  6. kill -9 496 finally cleared it.
  7. Required a second rc-service pihole-FTL restart so the new FTL would retry binding port 53.

Net effect: a single rc-service restart did the opposite of what was intended — created two daemons instead of cycling one.

Why it matters

This is the upstream cause of issue #2 — every restart that orphans the previous child guarantees a port-bind conflict on the next start, which civetweb silently shrugs at.

Repro hypothesis

  • Container: fahrenheit (Alpine 3.23, OpenRC)
  • Service file: /etc/init.d/pihole-FTL
    • supervisor=supervise-daemon
    • command_args_foreground="-f"
    • command_background=true
    • No explicit stop_pre/stop_post other than stop_post=sh ${PI_HOLE_SCRIPT_DIR}/pihole-FTL-poststop.sh
  • The supervise-daemon process gets terminated, but its child FTL is left behind (no kill propagation).

Possible fixes

  1. Service-level: add an explicit stop() that kills the actual child (read PID from /run/pihole-FTL_openrc.pid, send TERM, reap, fall back to KILL). This is what most modern OpenRC services do for supervise-daemon supervisors.
  2. OpenRC-level: supervise-daemon flags — verify whether --stop is being sent at all on rc-service stop, and whether it's translating to a signal on the child. Possibly an Alpine packaging quirk on the pihole-FTL aports script.
  3. Workaround: a his-side runbook for fahrenheit restarts that always does pgrep pihole-FTL | xargs -r kill -9 after rc-service stop and before rc-service start. Ugly but reliable.

Investigation pointers

  • apk info -L pihole-FTL to find which package owns /etc/init.d/pihole-FTL and whether it's a Pi-hole-shipped or Alpine-packaged init.
  • supervise-daemon --help | grep -A2 -- --stop: does it propagate the stop signal to the child?
  • Check upstream Alpine bug tracker for supervise-daemon orphan-on-stop reports (would be surprised if this is novel).

Severity

Medium. Workaround is kill -9 + restart. The real cost is that an unattended restart (e.g., from a pihole -up upgrade) silently leaves a half-broken daemon, which is exactly what bit fahrenheit for ~5 days.

Related: #2

## What happened 2026-04-29 04:29 UTC, while diagnosing a stuck pihole-FTL dashboard outage on fahrenheit (Alpine 3.23): 1. `rc-service pihole-FTL stop` returned `[ ok ]`. 2. PID 496 (the actual FTL daemon, child of supervise-daemon) **was still running** — reparented to PID 1 (init). 3. The next `rc-service pihole-FTL start` spawned a fresh FTL (PID 24450). 4. Result: two `pihole-FTL` processes coexisted. PID 496 held port 53. PID 24450 held only 80/443 (it had failed to bind 53 because the orphan still owned it, and civetweb's `o` modifier on the web ports meant it didn't respawn-loop on the partial bind). 5. `kill -TERM 496` had no effect. The orphan ignored SIGTERM. 6. `kill -9 496` finally cleared it. 7. Required a **second** `rc-service pihole-FTL restart` so the new FTL would retry binding port 53. Net effect: a single `rc-service restart` did the *opposite* of what was intended — created two daemons instead of cycling one. ## Why it matters This is the upstream cause of issue #2 — every restart that orphans the previous child guarantees a port-bind conflict on the next start, which civetweb silently shrugs at. ## Repro hypothesis - Container: fahrenheit (Alpine 3.23, OpenRC) - Service file: `/etc/init.d/pihole-FTL` - `supervisor=supervise-daemon` - `command_args_foreground="-f"` - `command_background=true` - No explicit `stop_pre`/`stop_post` other than `stop_post=sh ${PI_HOLE_SCRIPT_DIR}/pihole-FTL-poststop.sh` - The supervise-daemon process gets terminated, but its child FTL is left behind (no kill propagation). ## Possible fixes 1. **Service-level**: add an explicit `stop()` that kills the actual child (read PID from `/run/pihole-FTL_openrc.pid`, send TERM, reap, fall back to KILL). This is what most modern OpenRC services do for supervise-daemon supervisors. 2. **OpenRC-level**: `supervise-daemon` flags — verify whether `--stop` is being sent at all on `rc-service stop`, and whether it's translating to a signal on the child. Possibly an Alpine packaging quirk on the pihole-FTL aports script. 3. **Workaround**: a his-side runbook for fahrenheit restarts that always does `pgrep pihole-FTL | xargs -r kill -9` after `rc-service stop` and before `rc-service start`. Ugly but reliable. ## Investigation pointers - `apk info -L pihole-FTL` to find which package owns `/etc/init.d/pihole-FTL` and whether it's a Pi-hole-shipped or Alpine-packaged init. - `supervise-daemon --help | grep -A2 -- --stop`: does it propagate the stop signal to the child? - Check upstream Alpine bug tracker for `supervise-daemon` orphan-on-stop reports (would be surprised if this is novel). ## Severity Medium. Workaround is `kill -9` + restart. The real cost is that an unattended restart (e.g., from a `pihole -up` upgrade) silently leaves a half-broken daemon, which is exactly what bit fahrenheit for ~5 days. Related: #2
Author
Owner

Workaround documented in v0.1.9 runbook ("fahrenheit / pihole-FTL gotchas" subsection #1): explicit rc-service stop + pgrep | kill -TERM + sleep 2 + pgrep | kill -KILL + rc-service start cycle. Proper fix (service-level stop() that propagates to the supervised child) remains parked — needs an upstream patch to pihole-FTL aports or a downstream init.d override. Closing as worked-around. Cross-ref #2 (which depended on this for the silent-failure to manifest).

Workaround documented in v0.1.9 runbook ("fahrenheit / pihole-FTL gotchas" subsection #1): explicit `rc-service stop` + `pgrep | kill -TERM` + `sleep 2` + `pgrep | kill -KILL` + `rc-service start` cycle. Proper fix (service-level `stop()` that propagates to the supervised child) remains parked — needs an upstream patch to pihole-FTL aports or a downstream init.d override. Closing as worked-around. Cross-ref #2 (which depended on this for the silent-failure to manifest).
Sign in to join this conversation.
No Label
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: marfrit/claude-his-agent#3