ADR 0006: Hybrid Smoke Tests (Bash + Playwright)¶
Status: Accepted Date: 2026-04-29
Context¶
Every app needs a "smoke test" that verifies the app actually works after scaffolding or any change. Two competing forces:
- Agent feedback loop must be fast. Agents iterate by running the test after each change. A 60-second test means the agent skips it or thrashes waiting.
- Real failures must be caught. A test that only checks "container booted" misses the most important class of bug: "app boots but doesn't actually work."
Considered formats:
- Bash + curl + jq.
- Python pytest with httpx.
- Playwright end-to-end.
- Hybrid (some combination).
Decision¶
Use a hybrid strategy:
| Test | Runs | Speed | Catches |
|---|---|---|---|
smoke-test.sh |
After every scaffold; after every agent change; CI on every push | ~5s | Wiring, config, auth, container boot, routing, manifest validity |
smoke.spec.ts (E2E) |
CI nightly + manually on demand | ~30-60s | Full browser flow: login → token → frontend renders, real UI behaviour |
The agent only ever runs the fast one. Humans and CI run both.
What smoke-test.sh checks¶
Bash + curl + jq. Per app:
- Backend health:
GET /health→ 200 with{"status": "ok"}. - Backend auth enforcement:
GET /api/mewithout token → 401. - Backend auth success: get a test token from auth, then
GET /api/me→ 200. - Frontend serves:
GET /→ 200, HTML contains the app's slug (proves the right app, not the template, is being served). - Manifest is valid: parses, has all required fields.
It explicitly does not test: - Whether React actually renders. - Visual correctness of the UI. - Multi-step user flows.
What smoke.spec.ts checks¶
Playwright. Runs once against the full ecosystem (root compose up). For each registered app in the portal's app hub:
- Open portal, log in as test user.
- Navigate to the app via the hub.
- Assert app renders something visible (heading containing app's name).
- Assert at least one authenticated API call from the frontend succeeds.
- Take a screenshot for visual review.
Why hybrid beats either extreme¶
- Bash-only would miss frontend-rendering breaks. Worth catching, even if rarely.
- Playwright-only would slow the agent loop from 5s to 60s per attempt. The agent would either skip the test or thrash waiting.
- Hybrid gives the agent fast confidence on the 90% case and gives the team thorough confidence on the 100% case, without paying both costs every time.
Consequences¶
Positive: - Agent iteration is fast. - Frontend rendering bugs are still caught nightly before they ship. - Bash + curl needs zero per-app dependencies. - Plain-text output is easy for agents to parse and react to.
Negative: - A frontend that compiles but renders blank could ship to dev for a day before nightly E2E catches it. Mitigated by humans running the new app and noticing immediately. - Two test scripts to maintain instead of one.