ADR 0011: Authentik behind the Traefik gateway¶
Status: Accepted Date: 2026-05-01
Context¶
Until now Authentik ran on localhost:9000 directly, with the Docker
host's port 9000 exposed publicly in production. Apps used
http://localhost:9000/application/o/<slug>/ as both the OIDC issuer
and the JWKS URL (the latter via host.docker.internal).
This had three real problems:
- OIDC tokens crossed the public internet in plain HTTP. Port 9000
wasn't behind TLS, so any prod sign-in exposed the authorization
code and access token to anyone on the network path. This is a
straightforward auth-bypass pre-condition and was already flagged
in
DEPLOYMENT.mdas deferred work. - Inconsistent URL pattern. Every other component on the platform
was reachable via
<slug>.${SCALR_PUBLIC_DOMAIN}through the gateway. Authentik was the lone exception — a special case in every doc, env file, and operator's mental model. - No automated TLS for the auth host. Adding TLS to a hand-managed
port-9000 binding meant a separate certbot setup, separate renewal
logic, and a separate config surface. Going through Traefik means
automatic Let's Encrypt with the same
leresolver as every app.
Decision¶
Move the Authentik server container behind Traefik using the same 3-router-per-service pattern every app uses (ADR 0007):
- Dev:
http://auth.localhost - Prod redirect:
http://auth.${SCALR_PUBLIC_DOMAIN}→ HTTPS via thehttps-redirect@dockermiddleware - Prod TLS:
https://auth.${SCALR_PUBLIC_DOMAIN}with a Let's Encrypt cert via theleresolver
The server container joins the existing scalr-edge external network
so Traefik can reach it. Backends also use this network to fetch JWKS
directly via the container hostname (http://scalr-auth-server:9000)
— internal traffic stays inside Docker rather than bouncing through
the gateway.
Port 9000 is still bound on the host, but only on 127.0.0.1 so it's
not externally reachable. Useful for SSH-tunnelled admin work and for
the worker (which still consults the API directly, not via the gateway).
AUTHENTIK_HOST becomes the canonical "where browsers find Authentik"
env var. It's set in .env.shared (defaulting to http://auth.localhost
for dev) so prod deployments can override it via
make set-public-domain DOMAIN=.... The script has been extended to
also rewrite the per-app .env.app auth URLs and the AUTHENTIK_HOST
in .env.shared so the dev → prod transition is one command.
Why scalr-edge for backend JWKS, not the gateway¶
A backend fetching JWKS could either go:
- Through the gateway:
https://auth.${SCALR_PUBLIC_DOMAIN}/application/o/<slug>/jwks/ - Direct via container DNS:
http://scalr-auth-server:9000/application/o/<slug>/jwks/
Direct (option 2) wins because:
- No TLS termination overhead. JWKS fetches are cheap, but they happen on every fresh JWT verify before the cache warms.
- No DNS dependency. Backends inside Docker can always resolve container names; they can't always resolve the public domain (especially on first boot before Let's Encrypt has finished issuing).
- No certificate trust dance. Internal traffic over HTTP doesn't need the backend to trust the platform's CA.
- Security parity. The traffic never leaves the Docker network. The threat model that justified TLS for the browser-facing URL (network eavesdropping) doesn't apply to container-to-container traffic on a single host.
Browser-facing URLs (the issuer URL embedded in tokens, the OIDC authorization endpoint) still go through the gateway. They have to — the browser is talking to a public hostname, not Docker DNS.
Why bind port 9000 to 127.0.0.1 instead of removing it¶
Three reasons we keep the host-side port available:
- Admin tunnel. SSH tunnel directly to the host on
localhost:9000for emergency admin work without depending on the gateway being healthy. If Traefik is down, this is how you get in. - Backwards compatibility for the worker. The Authentik worker
uses internal API access via
AUTHENTIK_INTERNAL_HOST(defaulting to the same scheme asAUTHENTIK_HOST), but some legacy bits of the Authentik codebase still expectlocalhost:9000to work. Keeps the surprise surface low. - No external exposure. Binding to
127.0.0.1means external scans never see port 9000. A firewall rule blocking 9000 is still recommended belt-and-braces, but the bind alone is sufficient.
Migration¶
Existing OIDC tokens become invalid the moment AUTHENTIK_HOST
changes — the iss claim baked into them no longer matches what apps
expect. Every signed-in user has to re-authenticate once. Single-user
deployments (where this lands) absorb that cost trivially.
Per-app .env.app files were bulk-rewritten as part of this change to
use the new auth URLs. The scaffold script's source (template-app/)
was rewritten too, so future apps inherit the new pattern.
Google OAuth's authorized redirect URIs need one new entry per
deployment:
- Dev: http://auth.localhost/source/oauth/callback/google/
- Prod: https://auth.${SCALR_PUBLIC_DOMAIN}/source/oauth/callback/google/
Both must be added in Google Cloud Console (manual).
Consequences¶
Positive:
- Auth tokens travel over TLS in prod. Closes a deferred security gap.
- One URL pattern for every component. No more "Authentik is special."
- Automatic cert renewal via Let's Encrypt — same machinery as every app.
- make set-public-domain is now a complete prod-cutover command.
Negative:
- Existing tokens invalidate on transition. Documented in DEPLOYMENT.md.
- Slightly more compose surface (3 routers instead of zero) on the
auth service — now consistent with every other component.
- Scaling Authentik horizontally would require a sticky-session
middleware on the Traefik router. Not relevant at single-instance
scale; flagged for future work.
- One more thing the gateway has to be alive for. Before this change
the gateway being down only broke the apps; now it also breaks
sign-in. Mitigation: the 127.0.0.1:9000 bind gives operators a
fallback admin path, and uptime monitoring (services/uptime) keeps
an eye on both endpoints.