Docker: Blackout Boot Bug Busting Bonanza
Last night we had a power cut. This morning my wife couldn't log in to our self-hosted chat. A 500 error. The kind of thing that makes you question every life choice that led to running your own infrastructure.
The error in the logs was clear enough:
error: ErrorController => error Unknown authentication strategy "openid"
The app (LibreChat) had booted before the auth provider (Authentik) was ready, so it never registered its OpenID strategy. A restart fixed it immediately. But why did it happen? I'd specifically set up dependency ordering to prevent exactly this.
The setup
My homelab runs from a single meta docker-compose.yml with a systemd oneshot unit:
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv
ExecStart=/usr/bin/docker compose up -d --wait
ExecStop=/usr/bin/docker compose down
And the compose file has proper dependency ordering:
services:
authentik-server:
restart: on-failure
healthcheck:
test: ["CMD", "ak", "healthcheck"]
interval: 30s
timeout: 30s
retries: 5
start_period: 20s
api: # LibreChat
restart: on-failure
depends_on:
authentik-server:
condition: service_healthy
This should be bulletproof. Compose won't start api until authentik-server is healthy. So what went wrong?
The clue
The systemd journal told the story:
Jun 09 20:06:17 borg systemd[1]: Starting homelab.service...
Jun 09 20:06:18 borg docker[1432]: Container authentik-server Running
Jun 09 20:06:18 borg docker[1432]: Container LibreChat Running
Both containers were already running when compose started. It didn't start them — it just found them alive and waited for healthchecks. The depends_on ordering never fired because there was nothing to order.
Docker itself had already restarted them — simultaneously — during daemon initialisation.
But I'm using restart: on-failure!
The Docker docs are quite clear: on-failure does not restart containers when the daemon restarts. Only always and unless-stopped do that.
So what gives?
Here's the thing: after a power cut, Docker doesn't see a "daemon restart." It sees containers that were running and are now dead with a non-zero exit. The process was killed uncleanly. That's a failure. And on-failure says: restart on failure.
You can see it in the daemon debug logs:
"loaded container" container=94157... paused=false running=true
"syncing container on disk state with real state"
"setting stopped state"
"set stopped state" container=94157... running=false
Docker loads the container metadata (which says "I was running"), discovers the process is actually dead, marks it stopped — and then the restart policy kicks in. Both containers. At the same time. No ordering. No compose. No depends_on.
Proving it
I wasn't about to change my restart policy based on a theory. I built a test VM with Vagrant + VirtualBox that mirrors the production setup exactly:
- Two nginx containers:
test-auth→test-app depends_onwithcondition: service_healthyrestart: on-failureon both- A systemd oneshot unit with
docker compose up -d --wait - Docker daemon configured with
"log-level": "debug"
The test script confirms everything's healthy, then simulates a power cut with VBoxManage controlvm <vm> poweroff — instant, ungraceful death. Then it boots the VM back up and captures everything.
Before the power cut (ordering works):
test-auth: 2026-06-10T15:50:53.557Z
test-app: 2026-06-10T15:50:59.330Z ← 6 seconds later ✓
After the power cut (ordering broken):
test-auth: 2026-06-10T15:54:19.406779Z
test-app: 2026-06-10T15:54:19.406124Z ← same millisecond!
And the systemd service log confirms compose was useless:
Container test-auth Running
Container test-app Running
Already running. Nothing to do. Dependency ordering bypassed.
The fix
One line in the systemd unit:
ExecStartPre=/usr/bin/docker compose down
This kills any containers that Docker's restart policy may have already zombie-started, then ExecStart brings them back up properly through compose with full dependency ordering.
After the fix, same power cut test:
Jun 10 16:06:07 docker Container test-app Stopping ← kills the zombies
Jun 10 16:06:08 docker Container test-auth Stopping
Jun 10 16:06:09 docker Container test-auth Starting ← proper startup
Jun 10 16:06:15 docker Container test-auth Healthy ← waits...
Jun 10 16:06:15 docker Container test-app Starting ← NOW starts app
Jun 10 16:06:20 docker Container test-app Healthy ← done
Timestamps confirm ordering is restored:
test-auth: 2026-06-10T16:06:09.299Z
test-app: 2026-06-10T16:06:15.088Z ← 6 seconds later ✓
The full systemd unit
[Unit]
Description=Homelab Docker Stack
After=docker.service
Requires=docker.service
[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv
ExecStartPre=/usr/bin/docker compose down
ExecStart=/usr/bin/docker compose up -d --wait
ExecStop=/usr/bin/docker compose down
[Install]
WantedBy=multi-user.target
The down is idempotent — on a clean boot it does nothing. After a power cut it cleans up Docker's mess. Either way, compose handles the startup and your depends_on ordering actually works.
TL;DR
restart: on-failuredoes restart containers after a power cut (unclean shutdown = failure)- Docker daemon restarts them simultaneously, ignoring
depends_onordering - Your systemd unit's
docker compose upfinds them already running and skips startup - Fix: add
ExecStartPre=/usr/bin/docker compose downso compose always controls ordering - Don't trust restart policies to do what the docs say when the lights go out