Titan replaces the live Groovy program with data. A pipeline is a declarative description; a build is a set of rows in PostgreSQL; a unit of work is a task in a queue. Stateless workers pull tasks, run them, and report back. Nothing in a build depends on a process staying alive or a socket staying open.
- State lives in PostgreSQL. Pipelines, builds, stages, steps, the task
queue, timers, and logs are all rows. There is no
program.dat, no XStream object graph, no controller-disk-as-database. - The pipeline is declarative. A pipeline is a
titan-pipeline.ymlfile — parsed and validated into a DAG of stages and steps before it runs. It is data you can read, diff, and check. It is not arbitrary code. - Work is a queue of tasks. The controller decomposes a build into discrete tasks and writes them to a queue table. It does not push work down a channel.
- Workers pull; channels are not persistent. Workers poll for tasks, claim one under a lease, execute it, and stream results back. The controller never dials a worker. A dropped connection loses nothing.
flowchart LR
subgraph controller["Jenkins controller — titan-plugin"]
UI["UI<br/>jobs · builds · pipeline editor"]
PARSE["Parser<br/>titan-pipeline.yml → DAG"]
ORCH["Orchestrator<br/>build → tasks, advance on completion"]
SCHED["Scheduler<br/>triggers · timers"]
API["Worker API (gRPC)<br/>claim · log · heartbeat · report"]
end
DB[("PostgreSQL<br/>pipelines · builds · stages<br/>task queue · logs · timers")]
subgraph workers["Workers — titan-worker (anywhere)"]
W1["Worker"]
W2["Worker"]
W3["Worker"]
end
UI --- DB
PARSE --- DB
ORCH --- DB
SCHED --- DB
API --- DB
W1 -- "poll / claim / report" --> API
W2 -- "poll / claim / report" --> API
W3 -- "poll / claim / report" --> API
titan-plugin— runs inside the Jenkins controller. Parses pipelines, orchestrates builds, schedules triggers and timers, owns the database, and serves the worker API. It reuses Jenkins for what Jenkins is good at: the UI shell, authentication, credentials, and JCasC configuration.titan-worker— a standalone process. It holds no durable state. It can run as a Kubernetes pod, a VM, a container, or a local process.- PostgreSQL — the single source of truth. (An embedded H2 database is supported for single-node and local use.)
titan-pipeline-model— the pipeline grammar, parser, and model.titan-step-api— the SPI for implementing step handlers.titan-db-core— the shared JDBC pool and Flyway schema migrations.
sequenceDiagram
participant C as Controller (titan-plugin)
participant DB as PostgreSQL
participant W as Worker (titan-worker)
C->>DB: parse titan-pipeline.yml → create build + stage rows
C->>DB: enqueue tasks with no unmet dependencies
loop until the build is terminal
W->>C: poll for a task
C->>DB: claim next ready task under a lease
C-->>W: task (step + arguments + context)
W->>W: execute step — local process or container
W->>C: stream logs → DB
W->>C: heartbeat (extend the lease)
W->>C: report success / failure → DB
C->>DB: mark task terminal and enqueue newly-ready tasks
end
The controller's job is to keep the queue moving: when a task finishes, it marks the row and enqueues whatever became ready. The worker's job is to drain the queue. Neither holds the other open.
A task is a row. Its life is a small state machine, and the lease is what makes it safe without a persistent connection:
stateDiagram-v2
[*] --> Ready: dependencies satisfied
Ready --> Leased: a worker claims it
Leased --> Done: worker reports success
Leased --> Failed: worker reports failure
Leased --> Ready: lease expires (worker died or stalled)
Done --> [*]
Failed --> [*]
When a worker claims a task it takes a time-bounded lease and must
heartbeat to keep it. If the worker crashes, is evicted, or its network drops,
it simply stops heartbeating — the lease expires and the task returns to
Ready for another worker. No connection needs to be detected as broken, and
no state is lost, because the task row in PostgreSQL was always the truth.
There is nothing to resume. A build is its rows in PostgreSQL. If the
controller restarts, it reads the database and continues advancing the queue.
There is no live Groovy heap, so there is no heap to serialize, no
program.dat, no XStream, and no CPS transform. "Survives a restart" stops
being a feature that can fail and becomes a property of the storage model.
The classic model wants stable agents and gets transient pods. Titan wants transient workers:
- A worker is a pure client. It is never dialed, never holds inbound state, and can be killed at any moment — the lease covers it.
- Scaling is horizontal and trivial: run more workers. They contend for the same queue with no coordination beyond the claim.
- The controller is thin — an orchestrator over a database, not a monolith holding live builds in its heap.
Transient pods stop being a reliability problem and become the intended deployment.