Skip to content

ADR 0006: Hybrid Smoke Tests (Bash + Playwright)

Status: Accepted Date: 2026-04-29

Context

Every app needs a "smoke test" that verifies the app actually works after scaffolding or any change. Two competing forces:

  • Agent feedback loop must be fast. Agents iterate by running the test after each change. A 60-second test means the agent skips it or thrashes waiting.
  • Real failures must be caught. A test that only checks "container booted" misses the most important class of bug: "app boots but doesn't actually work."

Considered formats: - Bash + curl + jq. - Python pytest with httpx. - Playwright end-to-end. - Hybrid (some combination).

Decision

Use a hybrid strategy:

Test Runs Speed Catches
smoke-test.sh After every scaffold; after every agent change; CI on every push ~5s Wiring, config, auth, container boot, routing, manifest validity
smoke.spec.ts (E2E) CI nightly + manually on demand ~30-60s Full browser flow: login → token → frontend renders, real UI behaviour

The agent only ever runs the fast one. Humans and CI run both.

What smoke-test.sh checks

Bash + curl + jq. Per app:

  1. Backend health: GET /health → 200 with {"status": "ok"}.
  2. Backend auth enforcement: GET /api/me without token → 401.
  3. Backend auth success: get a test token from auth, then GET /api/me → 200.
  4. Frontend serves: GET / → 200, HTML contains the app's slug (proves the right app, not the template, is being served).
  5. Manifest is valid: parses, has all required fields.

It explicitly does not test: - Whether React actually renders. - Visual correctness of the UI. - Multi-step user flows.

What smoke.spec.ts checks

Playwright. Runs once against the full ecosystem (root compose up). For each registered app in the portal's app hub:

  1. Open portal, log in as test user.
  2. Navigate to the app via the hub.
  3. Assert app renders something visible (heading containing app's name).
  4. Assert at least one authenticated API call from the frontend succeeds.
  5. Take a screenshot for visual review.

Why hybrid beats either extreme

  • Bash-only would miss frontend-rendering breaks. Worth catching, even if rarely.
  • Playwright-only would slow the agent loop from 5s to 60s per attempt. The agent would either skip the test or thrash waiting.
  • Hybrid gives the agent fast confidence on the 90% case and gives the team thorough confidence on the 100% case, without paying both costs every time.

Consequences

Positive: - Agent iteration is fast. - Frontend rendering bugs are still caught nightly before they ship. - Bash + curl needs zero per-app dependencies. - Plain-text output is easy for agents to parse and react to.

Negative: - A frontend that compiles but renders blank could ship to dev for a day before nightly E2E catches it. Mitigated by humans running the new app and noticing immediately. - Two test scripts to maintain instead of one.