You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add an asynchronous invocation mode to the pipeline deployment HTTP server. In async mode, POST /invoke would return a run_id (HTTP 202) immediately after enqueueing the run, instead of blocking until the pipeline finishes. Clients would then poll a status endpoint to retrieve progress and outputs.
Today, /invoke is sync-only. The timeout field already accepted by BaseDeploymentInvocationRequest is silently dropped on the server: src/zenml/deployers/server/service.py:355 →
# Unused parameters for future implementation_=request.run_name, request.timeout
The parent feature #3928 — Pipeline Serving (Deploy Pipelines as Always-Warm HTTP Endpoints) listed POST /invoke (sync/async) in its MVP description, but only the sync path was delivered before the issue was closed. This is a follow-up scoped to the missing async path.
Problem or Use Case
Pipeline deployments can take several minutes to execute. With sync-only /invoke, the caller must hold an HTTP connection open for the entire pipeline duration, which:
Hits idle/read timeouts on every LB / reverse proxy / ingress in the path; multi-minute waits are brittle and operationally painful.
Forces callers to pick between long client-side timeouts (risk of mid-flight termination by any intermediary) and short timeouts (premature failure of in-flight runs).
Provides no run_id until completion, so callers cannot expose progress to end users, deduplicate concurrent invocations, implement "run already in progress, here is its ID" semantics, or correlate logs and metrics with a specific execution before it ends.
Couples client lifecycle to server lifecycle: a client crash, network blip, or scale-down kills observability of an otherwise healthy run.
Cold-start alternatives (snapshot-based runs) impose a 1–3 min image-pull + ZenML bootstrap penalty per request, which is the trade-off that always-warm deployments were introduced to avoid in the first place.
Proposed Solution
Follow standard practice for long-running HTTP APIs (RFC 7231 §6.3.3, RFC 7240 Prefer: respond-async):
Async opt-in on POST /invoke, via either a request body flag (async: true) or an HTTP header (Prefer: respond-async). Sync remains the default — non-breaking.
202 Accepted response carrying the run_id and a Location header pointing to the status resource (e.g., Location: /runs/{run_id}), with a response body such as {"run_id": ..., "status": "queued"}.
GET /runs/{run_id} returns the current state of the run with a small, stable status state machine — e.g. queued | running | succeeded | failed | cancelled — alongside timing fields (created_at, started_at, finished_at) and the final outputs once succeeded.
Run lifecycle is server-owned: cancelling or terminating the HTTP client after a 202 must not affect the pipeline. The placeholder run is created before returning so the run_id is always queryable, including for runs that fail to start.
Backpressure / queue limits surfaced via standard status codes (429 Too Many Requests with Retry-After) when the deployment cannot accept new runs.
Optional (out of MVP, listed for completeness): a DELETE /runs/{run_id} cancellation endpoint, and webhook delivery as a complement to polling (callback_url in the request body, server POSTs the terminal state).
Alternatives Considered
Snapshot runs (zenml pipeline run): rejected — full cold-start per request, defeats the purpose of an always-warm deployment.
Long sync HTTP wrapped in an external orchestrator (Temporal/Argo/etc.) with heartbeats: keeps the orchestrator's worker visible but does not keep the HTTP connection itself alive, so it still relies on every infra hop tolerating multi-minute idle reads. Moves the problem rather than solving it.
Direct polling of the control plane via Client().get_pipeline_run(): viable for status retrieval, but only once a run_id is known — which today requires the sync call to complete. Async /invoke is the precondition that makes this pattern usable.
Additional Context
ZenML server / client: 0.94.2 (confirmed unchanged on develop for the relevant code path).
Source pointer: src/zenml/deployers/server/service.py::execute_pipeline — the timeout field is read off the request but explicitly unused.
Contact Details [Optional]
eliott.iticsohn@brevo.com
Feature Description
Add an asynchronous invocation mode to the pipeline deployment HTTP server. In async mode,
POST /invokewould return arun_id(HTTP 202) immediately after enqueueing the run, instead of blocking until the pipeline finishes. Clients would then poll a status endpoint to retrieve progress and outputs.Today,
/invokeis sync-only. Thetimeoutfield already accepted byBaseDeploymentInvocationRequestis silently dropped on the server:src/zenml/deployers/server/service.py:355→The parent feature #3928 — Pipeline Serving (Deploy Pipelines as Always-Warm HTTP Endpoints) listed
POST /invoke (sync/async)in its MVP description, but only the sync path was delivered before the issue was closed. This is a follow-up scoped to the missing async path.Problem or Use Case
Pipeline deployments can take several minutes to execute. With sync-only
/invoke, the caller must hold an HTTP connection open for the entire pipeline duration, which:run_iduntil completion, so callers cannot expose progress to end users, deduplicate concurrent invocations, implement "run already in progress, here is its ID" semantics, or correlate logs and metrics with a specific execution before it ends.Cold-start alternatives (snapshot-based runs) impose a 1–3 min image-pull + ZenML bootstrap penalty per request, which is the trade-off that always-warm deployments were introduced to avoid in the first place.
Proposed Solution
Follow standard practice for long-running HTTP APIs (RFC 7231 §6.3.3, RFC 7240
Prefer: respond-async):POST /invoke, via either a request body flag (async: true) or an HTTP header (Prefer: respond-async). Sync remains the default — non-breaking.202 Acceptedresponse carrying therun_idand aLocationheader pointing to the status resource (e.g.,Location: /runs/{run_id}), with a response body such as{"run_id": ..., "status": "queued"}.GET /runs/{run_id}returns the current state of the run with a small, stable status state machine — e.g.queued | running | succeeded | failed | cancelled— alongside timing fields (created_at,started_at,finished_at) and the final outputs oncesucceeded.run_idis always queryable, including for runs that fail to start.429 Too Many RequestswithRetry-After) when the deployment cannot accept new runs.Optional (out of MVP, listed for completeness): a
DELETE /runs/{run_id}cancellation endpoint, and webhook delivery as a complement to polling (callback_urlin the request body, serverPOSTs the terminal state).Alternatives Considered
zenml pipeline run): rejected — full cold-start per request, defeats the purpose of an always-warm deployment.startup_hookbackground thread looping back to/invoke: mentioned as a workaround in Feature Request: Native Event-Driven & Transport-Agnostic Pipeline Deployment Ingest #4723. Error-prone and not first-class (no health, no lifecycle, no observability).Client().get_pipeline_run(): viable for status retrieval, but only once arun_idis known — which today requires the sync call to complete. Async/invokeis the precondition that makes this pattern usable.Additional Context
developfor the relevant code path).src/zenml/deployers/server/service.py::execute_pipeline— thetimeoutfield is read off the request but explicitly unused.Priority
High - Critical for my use case
Code of Conduct