02 — Architecture

Titan replaces the live Groovy program with data. A pipeline is a declarative description; a build is a set of rows in PostgreSQL; a unit of work is a task in a queue. Stateless workers pull tasks, run them, and report back. Nothing in a build depends on a process staying alive or a socket staying open.

Four principles

State lives in PostgreSQL. Pipelines, builds, stages, steps, the task queue, timers, and logs are all rows. There is no program.dat, no XStream object graph, no controller-disk-as-database.
The pipeline is declarative. A pipeline is a titan-pipeline.yml file — parsed and validated into a DAG of stages and steps before it runs. It is data you can read, diff, and check. It is not arbitrary code.
Work is a queue of tasks. The controller decomposes a build into discrete tasks and writes them to a queue table. It does not push work down a channel.
Workers pull; channels are not persistent. Workers poll for tasks, claim one under a lease, execute it, and stream results back. The controller never dials a worker. A dropped connection loses nothing.

Components

flowchart LR
  subgraph controller["Jenkins controller — titan-plugin"]
    UI["UI<br/>jobs · builds · pipeline editor"]
    PARSE["Parser<br/>titan-pipeline.yml → DAG"]
    ORCH["Orchestrator<br/>build → tasks, advance on completion"]
    SCHED["Scheduler<br/>triggers · timers"]
    API["Worker API (gRPC)<br/>claim · log · heartbeat · report"]
  end

  DB[("PostgreSQL<br/>pipelines · builds · stages<br/>task queue · logs · timers")]

  subgraph workers["Workers — titan-worker (anywhere)"]
    W1["Worker"]
    W2["Worker"]
    W3["Worker"]
  end

  UI --- DB
  PARSE --- DB
  ORCH --- DB
  SCHED --- DB
  API --- DB

  W1 -- "poll / claim / report" --> API
  W2 -- "poll / claim / report" --> API
  W3 -- "poll / claim / report" --> API

titan-plugin — runs inside the Jenkins controller. Parses pipelines, orchestrates builds, schedules triggers and timers, owns the database, and serves the worker API. It reuses Jenkins for what Jenkins is good at: the UI shell, authentication, credentials, and JCasC configuration.
titan-worker — a standalone process. It holds no durable state. It can run as a Kubernetes pod, a VM, a container, or a local process.
PostgreSQL — the single source of truth. (An embedded H2 database is supported for single-node and local use.)
titan-pipeline-model — the pipeline grammar, parser, and model. titan-step-api — the SPI for implementing step handlers. titan-db-core — the shared JDBC pool and Flyway schema migrations.

The life of a build

sequenceDiagram
  participant C as Controller (titan-plugin)
  participant DB as PostgreSQL
  participant W as Worker (titan-worker)

  C->>DB: parse titan-pipeline.yml → create build + stage rows
  C->>DB: enqueue tasks with no unmet dependencies
  loop until the build is terminal
    W->>C: poll for a task
    C->>DB: claim next ready task under a lease
    C-->>W: task (step + arguments + context)
    W->>W: execute step — local process or container
    W->>C: stream logs  →  DB
    W->>C: heartbeat (extend the lease)
    W->>C: report success / failure  →  DB
    C->>DB: mark task terminal and enqueue newly-ready tasks
  end

The controller's job is to keep the queue moving: when a task finishes, it marks the row and enqueues whatever became ready. The worker's job is to drain the queue. Neither holds the other open.

The task queue and the lease

A task is a row. Its life is a small state machine, and the lease is what makes it safe without a persistent connection:

stateDiagram-v2
  [*] --> Ready: dependencies satisfied
  Ready --> Leased: a worker claims it
  Leased --> Done: worker reports success
  Leased --> Failed: worker reports failure
  Leased --> Ready: lease expires (worker died or stalled)
  Done --> [*]
  Failed --> [*]

When a worker claims a task it takes a time-bounded lease and must heartbeat to keep it. If the worker crashes, is evicted, or its network drops, it simply stops heartbeating — the lease expires and the task returns to Ready for another worker. No connection needs to be detected as broken, and no state is lost, because the task row in PostgreSQL was always the truth.

Why resumption is free

There is nothing to resume. A build is its rows in PostgreSQL. If the controller restarts, it reads the database and continues advancing the queue. There is no live Groovy heap, so there is no heap to serialize, no program.dat, no XStream, and no CPS transform. "Survives a restart" stops being a feature that can fail and becomes a property of the storage model.

Why this fits Kubernetes

The classic model wants stable agents and gets transient pods. Titan wants transient workers:

A worker is a pure client. It is never dialed, never holds inbound state, and can be killed at any moment — the lease covers it.
Scaling is horizontal and trivial: run more workers. They contend for the same queue with no coordination beyond the claim.
The controller is thin — an orchestrator over a database, not a monolith holding live builds in its heap.

Transient pods stop being a reliability problem and become the intended deployment.

← 01 — Motivation · Next: 03 — Tradeoffs →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

02 — Architecture

Four principles

Components

The life of a build

The task queue and the lease

Why resumption is free

Why this fits Kubernetes

FilesExpand file tree

02-architecture.md

Latest commit

History

02-architecture.md

File metadata and controls

02 — Architecture

Four principles

Components

The life of a build

The task queue and the lease

Why resumption is free

Why this fits Kubernetes