Skip to content

ADR 0011: Authentik behind the Traefik gateway

Status: Accepted Date: 2026-05-01

Context

Until now Authentik ran on localhost:9000 directly, with the Docker host's port 9000 exposed publicly in production. Apps used http://localhost:9000/application/o/<slug>/ as both the OIDC issuer and the JWKS URL (the latter via host.docker.internal).

This had three real problems:

  1. OIDC tokens crossed the public internet in plain HTTP. Port 9000 wasn't behind TLS, so any prod sign-in exposed the authorization code and access token to anyone on the network path. This is a straightforward auth-bypass pre-condition and was already flagged in DEPLOYMENT.md as deferred work.
  2. Inconsistent URL pattern. Every other component on the platform was reachable via <slug>.${SCALR_PUBLIC_DOMAIN} through the gateway. Authentik was the lone exception — a special case in every doc, env file, and operator's mental model.
  3. No automated TLS for the auth host. Adding TLS to a hand-managed port-9000 binding meant a separate certbot setup, separate renewal logic, and a separate config surface. Going through Traefik means automatic Let's Encrypt with the same le resolver as every app.

Decision

Move the Authentik server container behind Traefik using the same 3-router-per-service pattern every app uses (ADR 0007):

  • Dev: http://auth.localhost
  • Prod redirect: http://auth.${SCALR_PUBLIC_DOMAIN} → HTTPS via the https-redirect@docker middleware
  • Prod TLS: https://auth.${SCALR_PUBLIC_DOMAIN} with a Let's Encrypt cert via the le resolver

The server container joins the existing scalr-edge external network so Traefik can reach it. Backends also use this network to fetch JWKS directly via the container hostname (http://scalr-auth-server:9000) — internal traffic stays inside Docker rather than bouncing through the gateway.

Port 9000 is still bound on the host, but only on 127.0.0.1 so it's not externally reachable. Useful for SSH-tunnelled admin work and for the worker (which still consults the API directly, not via the gateway).

AUTHENTIK_HOST becomes the canonical "where browsers find Authentik" env var. It's set in .env.shared (defaulting to http://auth.localhost for dev) so prod deployments can override it via make set-public-domain DOMAIN=.... The script has been extended to also rewrite the per-app .env.app auth URLs and the AUTHENTIK_HOST in .env.shared so the dev → prod transition is one command.

Why scalr-edge for backend JWKS, not the gateway

A backend fetching JWKS could either go:

  1. Through the gateway: https://auth.${SCALR_PUBLIC_DOMAIN}/application/o/<slug>/jwks/
  2. Direct via container DNS: http://scalr-auth-server:9000/application/o/<slug>/jwks/

Direct (option 2) wins because:

  • No TLS termination overhead. JWKS fetches are cheap, but they happen on every fresh JWT verify before the cache warms.
  • No DNS dependency. Backends inside Docker can always resolve container names; they can't always resolve the public domain (especially on first boot before Let's Encrypt has finished issuing).
  • No certificate trust dance. Internal traffic over HTTP doesn't need the backend to trust the platform's CA.
  • Security parity. The traffic never leaves the Docker network. The threat model that justified TLS for the browser-facing URL (network eavesdropping) doesn't apply to container-to-container traffic on a single host.

Browser-facing URLs (the issuer URL embedded in tokens, the OIDC authorization endpoint) still go through the gateway. They have to — the browser is talking to a public hostname, not Docker DNS.

Why bind port 9000 to 127.0.0.1 instead of removing it

Three reasons we keep the host-side port available:

  1. Admin tunnel. SSH tunnel directly to the host on localhost:9000 for emergency admin work without depending on the gateway being healthy. If Traefik is down, this is how you get in.
  2. Backwards compatibility for the worker. The Authentik worker uses internal API access via AUTHENTIK_INTERNAL_HOST (defaulting to the same scheme as AUTHENTIK_HOST), but some legacy bits of the Authentik codebase still expect localhost:9000 to work. Keeps the surprise surface low.
  3. No external exposure. Binding to 127.0.0.1 means external scans never see port 9000. A firewall rule blocking 9000 is still recommended belt-and-braces, but the bind alone is sufficient.

Migration

Existing OIDC tokens become invalid the moment AUTHENTIK_HOST changes — the iss claim baked into them no longer matches what apps expect. Every signed-in user has to re-authenticate once. Single-user deployments (where this lands) absorb that cost trivially.

Per-app .env.app files were bulk-rewritten as part of this change to use the new auth URLs. The scaffold script's source (template-app/) was rewritten too, so future apps inherit the new pattern.

Google OAuth's authorized redirect URIs need one new entry per deployment: - Dev: http://auth.localhost/source/oauth/callback/google/ - Prod: https://auth.${SCALR_PUBLIC_DOMAIN}/source/oauth/callback/google/

Both must be added in Google Cloud Console (manual).

Consequences

Positive: - Auth tokens travel over TLS in prod. Closes a deferred security gap. - One URL pattern for every component. No more "Authentik is special." - Automatic cert renewal via Let's Encrypt — same machinery as every app. - make set-public-domain is now a complete prod-cutover command.

Negative: - Existing tokens invalidate on transition. Documented in DEPLOYMENT.md. - Slightly more compose surface (3 routers instead of zero) on the auth service — now consistent with every other component. - Scaling Authentik horizontally would require a sticky-session middleware on the Traefik router. Not relevant at single-instance scale; flagged for future work. - One more thing the gateway has to be alive for. Before this change the gateway being down only broke the apps; now it also breaks sign-in. Mitigation: the 127.0.0.1:9000 bind gives operators a fallback admin path, and uptime monitoring (services/uptime) keeps an eye on both endpoints.