Skip to content

Commit 7e27221

Browse files
refactor(deploy): drop the migrate service and image
The backend migrates itself at boot, so the migrate Dockerfile target, compose service, and the Swarm playbook's post-deploy migration task (plus its attachable-network requirement) all go away. Two images remain: runtime and web. The worker now waits for the backend healthcheck so it never touches a pre-migration schema. Verified on a wiped stack: virgin database boots, backend logs 'database migrations applied' before listening, Sales Inquiry Pipeline runs to execution_completed over live SSE, rate limiter returns 429 past the budget.
1 parent 3d10ff4 commit 7e27221

9 files changed

Lines changed: 61 additions & 81 deletions

File tree

CLAUDE.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@ Three onboarding paths (A, B local-run; C docs-only). README "Get started" is th
1111
| `pnpm preflight` | both | Verify Node / pnpm / Docker / ports / `.env` files. Add `--json` for agents |
1212
| `pnpm dev` / `pnpm dev:demo` | A | Demo (UI only, port 4200). No backend, no Docker |
1313
| `pnpm infra:up` | B | Start Postgres + Temporal in Docker. Required before backend/worker |
14-
| `pnpm -F backend db:migrate` | B | Apply Drizzle migrations. First run, or after schema changes |
14+
| `pnpm -F backend db:migrate` | B | Apply Drizzle migrations out-of-band (backend also auto-migrates on boot) |
1515
| `pnpm dev:ai-studio` | B | Full stack: infra + backend (3001) + worker + AI Studio frontend (4201) |
1616
| `pnpm dev:backend` | B | Backend only (debug). Needs infra up |
1717
| `pnpm dev:worker` | B | Execution worker only (debug). Needs infra up |
@@ -22,7 +22,7 @@ Three onboarding paths (A, B local-run; C docs-only). README "Get started" is th
2222
| `pnpm test` | - | Run tests in `packages/sdk` and `packages/execution-core` |
2323
| `pnpm check` | - | Lint + typecheck + format + knip |
2424

25-
Path A is UI-only and does not need Docker. Path B requires `pnpm infra:up` before backend/worker can start, and `db:migrate` on the first run.
25+
Path A is UI-only and does not need Docker. Path B requires `pnpm infra:up` before backend/worker can start; the backend applies pending migrations automatically at boot.
2626

2727
### Agent signals
2828

@@ -44,7 +44,7 @@ Long-running processes already emit stable log lines that scripts and agents can
4444
tools/ - Root dev scripts: preflight, setup:env, infra wait
4545
deployment/ - Swarm/Ansible deploy path mirroring the workflow-builder repo (ACR, Traefik)
4646
deploy/
47-
ai-studio/ - Production deployment: Dockerfile (runtime/migrate/web), compose, nginx, README
47+
ai-studio/ - Production deployment: Dockerfile (runtime/web), compose, nginx, README
4848
apps/
4949
demo/ - Reference app consuming the SDK (React + Vite, port 4200)
5050
ai-studio/ - Reference AI workflow product (React + Vite, port 4201)

deploy/ai-studio/Dockerfile

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,12 @@
33
# AI Studio execution stack — single Dockerfile, multiple targets:
44
#
55
# runtime -> backend + execution-worker (command chosen per compose service)
6-
# migrate -> one-shot Drizzle migration runner (needs backend devDependencies)
76
# web -> nginx serving the AI Studio SPA + reverse proxy to the backend
87
#
8+
# Database migrations run inside the backend at boot (drizzle-orm's
9+
# programmatic migrator over apps/backend/drizzle/), so there is no separate
10+
# migration image or deploy step.
11+
#
912
# Build context must be the repo root (workspace packages are linked via
1013
# pnpm `workspace:*`), e.g.:
1114
#
@@ -51,13 +54,6 @@ RUN --mount=type=cache,id=pnpm-store,target=/pnpm/store \
5154
# backend: pnpm --filter backend start:prod
5255
# worker: pnpm --filter execution-worker start:prod
5356

54-
# Migrations need drizzle-kit, a backend devDependency — hence a separate
55-
# target with a dev install. Runs as a one-shot service before the backend.
56-
FROM source AS migrate
57-
RUN --mount=type=cache,id=pnpm-store,target=/pnpm/store \
58-
pnpm install --frozen-lockfile --prefer-offline --filter backend...
59-
CMD ["pnpm", "--filter", "backend", "db:migrate"]
60-
6157
# The SPA build imports the SDK from source (vite alias), so this needs the
6258
# full frontend dependency tree. VITE_BACKEND_URL is baked in at build time;
6359
# the default (empty) makes the app call /api on its own origin, which the

deploy/ai-studio/README.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -14,15 +14,16 @@ any Docker host — an Azure VM, AWS, on-prem — with no cloud-specific glue.
1414
| `web` | `ai-studio-web` (nginx) | Serves the SPA, proxies `/api` to the backend | `${WEB_PORT}` (only one) |
1515
| `backend` | `ai-studio-runtime` | Hono REST + SSE event stream | internal |
1616
| `worker` | `ai-studio-runtime` | Temporal worker, makes the OpenRouter LLM calls | internal |
17-
| `migrate` | `ai-studio-migrate` | One-shot Drizzle migrations, then exits | internal |
1817
| `temporal` | `temporalio/auto-setup` pinned | Workflow engine | internal |
1918
| `app-db` | `postgres:16` | Workflow snapshots + execution events | internal |
2019
| `temporal-db` | `postgres:16` | Temporal's own state store | internal |
2120
| `temporal-ui` | `temporalio/ui` pinned | Debug only (`--profile debug`) | `127.0.0.1:8233` |
2221

23-
All images build from one Dockerfile (`deploy/ai-studio/Dockerfile`) with the
22+
Both images build from one Dockerfile (`deploy/ai-studio/Dockerfile`) with the
2423
repo root as context. Backend and worker share a single image and differ only
25-
in the compose `command`.
24+
in the compose `command`. Database migrations are applied by the backend at
25+
boot (drizzle-orm's programmatic migrator) — there is no separate migration
26+
service or step.
2627

2728
## Quick start
2829

@@ -32,9 +33,9 @@ cp .env.example .env # set OPENROUTER_API_KEY
3233
docker compose up -d --build
3334
```
3435

35-
First boot: migrations run automatically (`migrate` exits 0, then the backend
36-
starts). The worker crash-loops for ~30s until Temporal finishes auto-setup —
37-
that's expected, `restart: unless-stopped` converges it.
36+
First boot: the backend applies migrations and only then starts serving (its
37+
healthcheck gates the worker). The worker crash-loops for ~30s until Temporal
38+
finishes auto-setup — that's expected, `restart: unless-stopped` converges it.
3839

3940
Verify:
4041

@@ -89,7 +90,7 @@ Swapping the LLM is a one-liner: change `AI_MODEL` to any
8990
```bash
9091
docker compose logs -f backend worker # tail the apps
9192
docker compose --profile debug up -d # Temporal UI on 127.0.0.1:8233
92-
docker compose up -d --build # deploy a new version (re-runs migrations)
93+
docker compose up -d --build # deploy a new version (backend re-applies migrations at boot)
9394
docker compose down # stop (volumes survive)
9495
docker exec ai-studio-app-db-1 pg_dump -U wb workflow_builder > backup.sql
9596
```

deploy/ai-studio/ai-studio-deployment.decision-log.md

Lines changed: 14 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ Everything lives in `deploy/ai-studio/`: one multi-target Dockerfile, a
3232
production `docker-compose.yml`, the nginx config, `.env.example`, and a
3333
DevOps-facing README.
3434

35-
1. **One Dockerfile, three targets** (`runtime`, `migrate`, `web`), built
35+
1. **One Dockerfile, two targets** (`runtime`, `web`), built
3636
with the repo root as context (pnpm `workspace:*` links require it). A
3737
shared `source` stage does `pnpm fetch` against a BuildKit cache mount, so
3838
per-target installs are store-hits.
@@ -44,11 +44,13 @@ DevOps-facing README.
4444
entirely — there is no bundling step to get wrong.
4545
3. **One shared `runtime` image for backend and worker**; the compose
4646
`command` picks the entrypoint. One image to build, push, and version.
47-
4. **Migrations as a one-shot compose service** (`migrate` target, carries
48-
drizzle-kit as a backend devDependency). `depends_on:
49-
service_completed_successfully` gates the backend, so `docker compose up`
50-
is a complete first boot. Same answer works as a k8s Job / ACA job if a
51-
customer reshapes the topology.
47+
4. **Migrations on backend boot** (revised 11.06.2026 — originally a
48+
one-shot `migrate` compose service). The backend applies pending Drizzle
49+
migrations via drizzle-orm's programmatic migrator before accepting
50+
traffic; on failure it exits and the restart policy retries until
51+
Postgres answers. One less image, no orchestrator-specific ordering —
52+
the same behavior on compose, Swarm, or anything else. Single-replica
53+
assumption: concurrent backends would race the migrator.
5254
5. **nginx is the only public surface.** It serves the SPA and proxies
5355
`/api` to the backend on the internal network; the SSE stream route gets
5456
`proxy_buffering off` + long read timeout. The backend container is
@@ -135,6 +137,12 @@ Found and fixed during end-to-end verification: the worker ignored
135137
- pnpm version is pinned in two places (root `packageManager` +
136138
Dockerfile).
137139

140+
## Revisions
141+
142+
- **11.06.2026**`migrate` target and service removed; the backend now
143+
migrates itself at boot (Jan's simplification request during WB-229
144+
review). Dockerfile is down to two targets (`runtime`, `web`).
145+
138146
## Status
139147

140148
Accepted

deploy/ai-studio/docker-compose.yml

Lines changed: 5 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -73,19 +73,8 @@ services:
7373
- '127.0.0.1:8233:8080'
7474
restart: unless-stopped
7575

76-
migrate:
77-
image: ai-studio-migrate
78-
build:
79-
context: ../..
80-
dockerfile: deploy/ai-studio/Dockerfile
81-
target: migrate
82-
environment:
83-
DATABASE_URL: postgresql://wb:${APP_DB_PASSWORD:-wb}@app-db:5432/workflow_builder
84-
depends_on:
85-
app-db:
86-
condition: service_healthy
87-
restart: 'no'
88-
76+
# Applies Drizzle migrations at boot, before accepting traffic. A failure
77+
# (e.g. Postgres still starting) exits the process and `restart` retries.
8978
backend:
9079
image: ai-studio-runtime
9180
build: *runtime-build
@@ -106,8 +95,6 @@ services:
10695
depends_on:
10796
app-db:
10897
condition: service_healthy
109-
migrate:
110-
condition: service_completed_successfully
11198
temporal:
11299
condition: service_started
113100
healthcheck:
@@ -138,8 +125,9 @@ services:
138125
depends_on:
139126
app-db:
140127
condition: service_healthy
141-
migrate:
142-
condition: service_completed_successfully
128+
# healthy = migrations applied — the worker writes to the same schema
129+
backend:
130+
condition: service_healthy
143131
temporal:
144132
condition: service_started
145133
restart: unless-stopped

tools/deployment/README.md

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ the same layout, scripts, and Ansible flow as the `workflow-builder` repo's
55
`tools/deployment/` — so DevOps operates one familiar shape.
66

77
This is an **orchestration overlay, not a second deployment**: it consumes
8-
the exact same three images (`runtime`, `migrate`, `web`) built from
8+
the exact same two images (`runtime`, `web`) built from
99
[`deploy/ai-studio/Dockerfile`](../../deploy/ai-studio/Dockerfile). The
1010
compose file in `deploy/ai-studio/` remains the portable, customer-facing
1111
artifact and the local full-stack runner; this directory adds the
@@ -17,7 +17,7 @@ tools/deployment/
1717
│ ├── build-docker.sh # build all 3 targets, tag for ACR, push (CI-gated)
1818
│ └── deploy.sh # run the Ansible playbook (CI image or workstation)
1919
└── ansible/deploy-application/
20-
└── main.yml # writes the Swarm stack file on the master + deploys + migrates
20+
└── main.yml # writes the Swarm stack file on the master + deploys
2121
```
2222

2323
## Usage
@@ -52,13 +52,12 @@ make them runnable from a workstation or GitHub Actions.
5252

5353
## What differs from the workflow-builder playbook (and why)
5454

55-
| Deviation | Reason |
56-
| ------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------ |
57-
| Postgres ×2 + Temporal services with named volumes, pinned via `node.labels.ai-studio-data==true` | AI Studio is stateful; Swarm volumes are node-local. **One-time setup:** `docker node update --label-add ai-studio-data=true <node>` |
58-
| Migrations run post-deploy as a one-shot `docker run` on the stack network (with retries) | Swarm ignores compose `depends_on` conditions |
59-
| `internal` network is `attachable: true` | Lets the migrate container join the overlay |
60-
| Services carry short DNS aliases (`backend`, `app-db`, `temporal`, …) | The web image's nginx proxies to `http://backend:3001`; aliases keep the images and env defaults identical between compose and Swarm |
61-
| Gatekeeper is conditional (`AUTH_ENABLED`) | The WB-229 public demo is deliberately login-free; internal instances can keep SSO |
55+
| Deviation | Reason |
56+
| ------------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------ |
57+
| Postgres ×2 + Temporal services with named volumes, pinned via `node.labels.ai-studio-data==true` | AI Studio is stateful; Swarm volumes are node-local. **One-time setup:** `docker node update --label-add ai-studio-data=true <node>` |
58+
| No migration step — the backend applies Drizzle migrations at boot and restarts until Postgres answers | Swarm ignores compose `depends_on` conditions, so ordering must not rely on them |
59+
| Services carry short DNS aliases (`backend`, `app-db`, `temporal`, …) | The web image's nginx proxies to `http://backend:3001`; aliases keep the images and env defaults identical between compose and Swarm |
60+
| Gatekeeper is conditional (`AUTH_ENABLED`) | The WB-229 public demo is deliberately login-free; internal instances can keep SSO |
6261

6362
SSE note: Traefik streams responses by default, so the live execution stream
6463
works without special ingress config; the 15 s backend heartbeat keeps the

tools/deployment/ansible/deploy-application/main.yml

Lines changed: 4 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,16 +1,15 @@
11
---
22
# Deploys the AI Studio execution stack to the Docker Swarm cluster,
33
# following the workflow-builder repo's deploy-application playbook. The
4-
# images are the same three targets the local compose builds
5-
# (deploy/ai-studio/Dockerfile); only the orchestration differs.
4+
# images are the same two targets the local compose builds
5+
# (deploy/ai-studio/Dockerfile); only the orchestration differs. Database
6+
# migrations run inside the backend at boot, so there is no migration step
7+
# here — the backend restarts until Postgres answers, then migrates itself.
68
#
79
# Differences from the workflow-builder playbook, all forced by AI Studio
810
# being stateful:
911
# - Postgres x2 + Temporal services with named volumes, pinned to the node
1012
# labeled `ai-studio-data=true` (Swarm volumes are node-local).
11-
# - Migrations run as a one-shot container after stack deploy — Swarm
12-
# ignores compose depends_on conditions, so ordering lives here.
13-
# - The `internal` network is attachable so the migrate container can join.
1413
# - Services get short DNS aliases (backend, app-db, temporal, ...) so the
1514
# same images and env defaults work under compose and Swarm.
1615
# - Gatekeeper is optional (AUTH_ENABLED=true): the WB-229 public demo is
@@ -203,8 +202,6 @@
203202
204203
networks:
205204
internal:
206-
# attachable so the one-shot migrate container below can join
207-
attachable: true
208205
traefik-host-external:
209206
external: true
210207
@@ -224,18 +221,3 @@
224221
with_registry_auth: yes
225222
compose:
226223
- '/mnt/docker-swarm-storage/stacks/{{ stack_name }}/{{ app_name }}.stack.yml'
227-
228-
# Swarm has no depends_on / one-shot service semantics: run Drizzle
229-
# migrations as a plain container on the stack's attachable overlay
230-
# network. Retries cover app-db still starting up on first deploy.
231-
- name: Run database migrations
232-
command: >
233-
docker run --rm
234-
--network {{ stack_name }}_internal
235-
-e DATABASE_URL={{ database_url }}
236-
{{ registry }}/{{ app_name }}:migrate-{{ image_tag }}
237-
pnpm --filter backend db:migrate
238-
register: migrate_result
239-
retries: 10
240-
delay: 6
241-
until: migrate_result.rc == 0

tools/deployment/scripts/build-docker.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
#!/bin/sh
22
# Build + push the AI Studio images to ACR, mirroring the workflow-builder
3-
# repo's tools/deployment/scripts/build-docker.sh. All three images come from
3+
# repo's tools/deployment/scripts/build-docker.sh. Both images come from
44
# the same multi-target Dockerfile in deploy/ai-studio/ — this script only
55
# adds registry tagging; the images are identical to the local-compose ones.
66
#
@@ -15,7 +15,7 @@ COMMIT="${BITBUCKET_COMMIT:-$(git rev-parse HEAD)}"
1515
ENVIRONMENT="${BITBUCKET_DEPLOYMENT_ENVIRONMENT:-${DEPLOY_ENV:-}}"
1616
export IMAGE_TAG="${TAG_PREFIX:-}$COMMIT"
1717

18-
for TARGET in runtime migrate web; do
18+
for TARGET in runtime web; do
1919
TAG="$REGISTRY/$APP_NAME:$TARGET-$IMAGE_TAG"
2020
docker build \
2121
-f ./deploy/ai-studio/Dockerfile \
@@ -30,7 +30,7 @@ if echo "$ALLOWED_ENVIRONMENTS" | grep -w "$ENVIRONMENT" > /dev/null; then
3030
# setup-az.sh exists in the deployment CI image; logging in by other means
3131
# (az acr login / docker login) is fine when running elsewhere
3232
[ -f /var/setup-az.sh ] && . /var/setup-az.sh
33-
for TARGET in runtime migrate web; do
33+
for TARGET in runtime web; do
3434
docker push "$REGISTRY/$APP_NAME:$TARGET-$IMAGE_TAG"
3535
done
3636
else

tools/deployment/swarm-alignment.decision-log.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,12 +36,12 @@ a static frontend:
3636

3737
1. Database/Temporal services with named volumes pinned to a labeled node
3838
(`node.labels.ai-studio-data==true`) — Swarm volumes are node-local.
39-
2. Migrations as a post-deploy one-shot `docker run` with retries — Swarm
40-
ignores compose `depends_on` conditions, so the ordering that compose
41-
expressed declaratively lives in the playbook.
42-
3. An `attachable` internal network plus short DNS aliases (`backend`,
43-
`app-db`, `temporal`) so the unmodified web image's nginx upstream and
44-
the compose env defaults resolve identically under Swarm.
39+
2. No migration step (revised 11.06.2026) — the backend applies Drizzle
40+
migrations at boot and restarts until Postgres answers, which sidesteps
41+
Swarm's lack of `depends_on` ordering entirely.
42+
3. Short DNS aliases (`backend`, `app-db`, `temporal`) so the unmodified
43+
web image's nginx upstream and the compose env defaults resolve
44+
identically under Swarm.
4545
4. Gatekeeper made conditional (`AUTH_ENABLED`, default off) — the public
4646
demo is login-free by design; internal stage/dev instances can keep SSO.
4747

@@ -79,6 +79,12 @@ a static frontend:
7979
- Secrets land in a stack file on the Swarm master's disk (inherited
8080
trade-off from the existing flow).
8181

82+
## Revisions
83+
84+
- **11.06.2026** — playbook migration task and the `attachable` network
85+
removed; the backend migrates itself at boot. Image set is down to
86+
`runtime` + `web`.
87+
8288
## Status
8389

8490
Proposed — pending the DevOps conversation

0 commit comments

Comments
 (0)