Docker: Blackout Boot Bug Busting Bonanza

Last night we had a power cut. This morning my wife couldn't log in to our self-hosted chat. A 500 error. The kind of thing that makes you question every life choice that led to running your own infrastructure.

The error in the logs was clear enough:

error: ErrorController => error Unknown authentication strategy "openid"

The app (LibreChat) had booted before the auth provider (Authentik) was ready, so it never registered its OpenID strategy. A restart fixed it immediately. But why did it happen? I'd specifically set up dependency ordering to prevent exactly this.

The setup

My homelab runs from a single meta docker-compose.yml with a systemd oneshot unit:

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv
ExecStart=/usr/bin/docker compose up -d --wait
ExecStop=/usr/bin/docker compose down

And the compose file has proper dependency ordering:

services:
  authentik-server:
    restart: on-failure
    healthcheck:
      test: ["CMD", "ak", "healthcheck"]
      interval: 30s
      timeout: 30s
      retries: 5
      start_period: 20s

  api:  # LibreChat
    restart: on-failure
    depends_on:
      authentik-server:
        condition: service_healthy

This should be bulletproof. Compose won't start api until authentik-server is healthy. So what went wrong?

The clue

The systemd journal told the story:

Jun 09 20:06:17 borg systemd[1]: Starting homelab.service...
Jun 09 20:06:18 borg docker[1432]: Container authentik-server Running
Jun 09 20:06:18 borg docker[1432]: Container LibreChat Running

Both containers were already running when compose started. It didn't start them — it just found them alive and waited for healthchecks. The depends_on ordering never fired because there was nothing to order.

Docker itself had already restarted them — simultaneously — during daemon initialisation.

But I'm using `restart: on-failure`!

The Docker docs are quite clear: on-failure does not restart containers when the daemon restarts. Only always and unless-stopped do that.

So what gives?

Here's the thing: after a power cut, Docker doesn't see a "daemon restart." It sees containers that were running and are now dead with a non-zero exit. The process was killed uncleanly. That's a failure. And on-failure says: restart on failure.

You can see it in the daemon debug logs:

"loaded container" container=94157... paused=false running=true
"syncing container on disk state with real state"
"setting stopped state"
"set stopped state" container=94157... running=false

Docker loads the container metadata (which says "I was running"), discovers the process is actually dead, marks it stopped — and then the restart policy kicks in. Both containers. At the same time. No ordering. No compose. No depends_on.

Proving it

I wasn't about to change my restart policy based on a theory. I built a test VM with Vagrant + VirtualBox that mirrors the production setup exactly:

Two nginx containers: test-auth → test-app
depends_on with condition: service_healthy
restart: on-failure on both
A systemd oneshot unit with docker compose up -d --wait
Docker daemon configured with "log-level": "debug"

The test script confirms everything's healthy, then simulates a power cut with VBoxManage controlvm <vm> poweroff — instant, ungraceful death. Then it boots the VM back up and captures everything.

Before the power cut (ordering works):

test-auth: 2026-06-10T15:50:53.557Z
test-app:  2026-06-10T15:50:59.330Z   ← 6 seconds later ✓

After the power cut (ordering broken):

test-auth: 2026-06-10T15:54:19.406779Z
test-app:  2026-06-10T15:54:19.406124Z  ← same millisecond!

And the systemd service log confirms compose was useless:

Container test-auth  Running
Container test-app   Running

Already running. Nothing to do. Dependency ordering bypassed.

The fix

One line in the systemd unit:

ExecStartPre=/usr/bin/docker compose down

This kills any containers that Docker's restart policy may have already zombie-started, then ExecStart brings them back up properly through compose with full dependency ordering.

After the fix, same power cut test:

Jun 10 16:06:07 docker  Container test-app  Stopping     ← kills the zombies
Jun 10 16:06:08 docker  Container test-auth Stopping
Jun 10 16:06:09 docker  Container test-auth Starting     ← proper startup
Jun 10 16:06:15 docker  Container test-auth Healthy      ← waits...
Jun 10 16:06:15 docker  Container test-app  Starting     ← NOW starts app
Jun 10 16:06:20 docker  Container test-app  Healthy      ← done

Timestamps confirm ordering is restored:

test-auth: 2026-06-10T16:06:09.299Z
test-app:  2026-06-10T16:06:15.088Z   ← 6 seconds later ✓

The full systemd unit

[Unit]
Description=Homelab Docker Stack
After=docker.service
Requires=docker.service

[Service]
Type=oneshot
RemainAfterExit=yes
WorkingDirectory=/srv
ExecStartPre=/usr/bin/docker compose down
ExecStart=/usr/bin/docker compose up -d --wait
ExecStop=/usr/bin/docker compose down

[Install]
WantedBy=multi-user.target

The down is idempotent — on a clean boot it does nothing. After a power cut it cleans up Docker's mess. Either way, compose handles the startup and your depends_on ordering actually works.

TL;DR

restart: on-failure does restart containers after a power cut (unclean shutdown = failure)
Docker daemon restarts them simultaneously, ignoring depends_on ordering
Your systemd unit's docker compose up finds them already running and skips startup
Fix: add ExecStartPre=/usr/bin/docker compose down so compose always controls ordering
Don't trust restart policies to do what the docs say when the lights go out