Skip to content

Commit 6bc4b1c

Browse files
committed
fix: use blue_green for rolling deployments
1 parent 026724a commit 6bc4b1c

8 files changed

Lines changed: 581 additions & 15 deletions

File tree

README.md

Lines changed: 19 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -155,7 +155,7 @@ If `database` fails to come up, `web` will remain stopped and log the dependency
155155

156156
#### Rolling Deployments
157157

158-
Services can opt into rolling restarts so existing instances keep serving traffic until replacements are healthy. Add a `deployment` block to configure the behavior:
158+
Services can opt into rolling restarts. For single-host web services that bind a fixed port, use the `blue_green` block so the replacement starts on an alternate slot and traffic is switched after health checks.
159159

160160
```yaml
161161
version: "1"
@@ -167,18 +167,31 @@ services:
167167
strategy: "rolling" # default is "immediate"
168168
pre_start: "cargo build --release"
169169
health_check:
170-
url: "http://localhost:8080/health"
170+
url: "http://localhost:8000/health"
171171
timeout: "60s"
172172
retries: 5
173173
grace_period: "5s"
174+
blue_green:
175+
env_var: "PORT"
176+
slots: ["8000", "8001"]
177+
candidate_health_check_url: "http://127.0.0.1:{slot}/health"
178+
switch_command: "/usr/local/bin/switch-upstream {candidate_slot}"
179+
switch_verify_url: "http://localhost:8000/health"
180+
state_path: ".state/api-slot.json"
174181
```
175182

176-
- `strategy` — set to `rolling` to enable the zero-downtime workflow, or omit to keep the traditional stop/start cycle.
183+
- `strategy` — set to `rolling` to enable restart-time rollout workflow, or omit to keep the traditional stop/start cycle.
177184
- `pre_start` — optional shell command executed before the new instance launches (perfect for build or migrate steps).
178-
- `health_check` — optional HTTP probe the replacement must pass before traffic flips; configure timeout and retry budget per service.
185+
- `health_check` — optional HTTP probe settings reused by rolling checks.
179186
- `grace_period` — optional delay to keep the old instance alive after the new one passes health checks, giving load balancers time to rebalance.
180-
181-
If any rolling step fails, systemg restores the original instance and surfaces the error so unhealthy builds never replace running services.
187+
- `blue_green.env_var` — env var injected into the candidate instance (`PORT` by default).
188+
- `blue_green.slots` — exactly two alternating slot values (typically port numbers).
189+
- `blue_green.switch_command` — command run after candidate is healthy; supports `{candidate_slot}`, `{active_slot}`, `{service_name}` placeholders.
190+
- `blue_green.candidate_health_check_url` — optional candidate probe URL template with `{slot}`.
191+
- `blue_green.switch_verify_url` — optional post-switch verify URL.
192+
- `blue_green.state_path` — optional path for persisted active-slot state.
193+
194+
If any rolling step fails, systemg restores the original instance and surfaces the error so unhealthy builds never replace running services. For deployment scripts, use `sysg restart --daemonize` so supervisor state is restored even if detection fails.
182195

183196
#### Cron Scheduling
184197

docs/docs/examples/crud.md

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,13 @@ env:
2525
services:
2626
node__web_server:
2727
command: "node server.js"
28-
deployment_strategy: "rolling_start"
28+
deployment:
29+
strategy: "rolling"
30+
blue_green:
31+
env_var: "PORT"
32+
slots: ["3000", "3001"]
33+
candidate_health_check_url: "http://127.0.0.1:{slot}/health"
34+
switch_command: "sudo /usr/local/bin/switch-crud-upstream {candidate_slot}"
2935
env:
3036
vars:
3137
NODE_ENV: "${NODE_ENV}"
@@ -148,7 +154,7 @@ $ git pull
148154
$ npm install
149155

150156
# Rolling restart - zero downtime
151-
$ sysg restart node__web_server
157+
$ sysg restart --service node__web_server --daemonize
152158
```
153159

154160
### View logs

docs/docs/how-it-works/commands/restart.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -90,12 +90,21 @@ Services configured with `deployment.strategy: rolling` get zero-downtime restar
9090

9191
1. New instance starts
9292
2. Health checks pass
93-
3. Old instance receives `SIGTERM`
94-
4. Grace period allows requests to complete
95-
5. Old instance stops
93+
3. (Optional) blue/green `switch_command` shifts traffic to candidate slot
94+
4. Old instance receives `SIGTERM`
95+
5. Grace period allows requests to complete
96+
6. Old instance stops
9697

9798
Services without rolling deployment stop then start.
9899

100+
For deployment scripts, prefer:
101+
102+
```sh
103+
sysg restart --config <config> --daemonize
104+
```
105+
106+
This keeps restart behavior explicit when supervisor detection is unavailable.
107+
99108
## See also
100109

101110
- [`start`](start) - Launch services

docs/docs/how-it-works/configuration.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -214,14 +214,23 @@ services:
214214
url: "http://localhost:8000/health"
215215
timeout: "30s"
216216
grace_period: "5s"
217+
blue_green:
218+
env_var: "PORT"
219+
slots: ["8000", "8001"]
220+
candidate_health_check_url: "http://127.0.0.1:{slot}/health"
221+
switch_command: "/usr/local/bin/switch-upstream {candidate_slot}"
222+
switch_verify_url: "http://localhost:8000/health"
223+
state_path: ".state/api-slot.json"
217224
```
218225

219-
Rolling deployments start the new instance, wait for health checks, then stop the old instance. The `grace_period` allows in-flight requests to complete.
226+
Rolling deployments start the new instance, wait for health checks, then stop the old instance. For single-host zero-downtime with fixed ports, use `blue_green` so traffic can be switched between two slots.
220227

221228
## Field reference
222229

223230
### Service fields
224231

232+
Primary keys available on each service definition.
233+
225234
| Field | Type | Description |
226235
|-------|------|-------------|
227236
| `command` | string | Command to execute (required) |
@@ -237,13 +246,17 @@ Rolling deployments start the new instance, wait for health checks, then stop th
237246

238247
### Environment object
239248

249+
Environment sources and inline overrides merged into the service process environment.
250+
240251
| Field | Type | Description |
241252
|-------|------|-------------|
242253
| `vars` | object | Key-value environment variables |
243254
| `file` | string | Path to env file |
244255

245256
### Hooks object
246257

258+
Lifecycle callbacks you can trigger on service start/stop/restart outcomes.
259+
247260
| Field | Type | Description |
248261
|-------|------|-------------|
249262
| `on_start` | object | Commands for start events |
@@ -256,6 +269,8 @@ Each hook has `success` and `error` handlers with:
256269

257270
### Health check object
258271

272+
Probe configuration used to determine readiness/health during deployment workflows.
273+
259274
| Field | Type | Description |
260275
|-------|------|-------------|
261276
| `command` | string | Check command |
@@ -266,9 +281,25 @@ Each hook has `success` and `error` handlers with:
266281

267282
### Deployment object
268283

284+
Controls how restarts are performed and what validation happens before cutover.
285+
269286
| Field | Type | Description |
270287
|-------|------|-------------|
271288
| `strategy` | string | `rolling` or `immediate` |
272289
| `pre_start` | string | Command to run before starting |
273290
| `health_check` | object | Health check configuration |
274291
| `grace_period` | string | Time before stopping old instance |
292+
| `blue_green` | object | Single-host blue/green rollout settings |
293+
294+
### Blue/green deployment object
295+
296+
Single-host zero-downtime options for alternating between two rollout slots (typically ports).
297+
298+
| Field | Type | Description |
299+
|-------|------|-------------|
300+
| `env_var` | string | Env var injected with slot value (`PORT` default) |
301+
| `slots` | array | Exactly two slot values to alternate between |
302+
| `switch_command` | string | Command to switch traffic to candidate slot |
303+
| `candidate_health_check_url` | string | Candidate health probe URL template (`{slot}` supported) |
304+
| `switch_verify_url` | string | Optional post-switch verify URL |
305+
| `state_path` | string | Optional persisted active-slot state file path |

src/bin/main.rs

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2447,6 +2447,7 @@ struct InspectProcessContext<'a> {
24472447
total_memory: f64,
24482448
}
24492449

2450+
/// Renders an htop-style process table for the inspected unit and all discovered descendants.
24502451
fn render_inspect_process_table(unit: &UnitStatus, no_color: bool) {
24512452
let Some(root_runtime) = unit.process.as_ref() else {
24522453
println!("Process Table: unit is not currently running.");
@@ -2625,6 +2626,7 @@ fn render_inspect_process_table(unit: &UnitStatus, no_color: bool) {
26252626
println!("{}", make_bottom_border(&columns));
26262627
}
26272628

2629+
/// Walks the process tree rooted at `pid` and appends formatted table rows in tree order.
26282630
fn append_inspect_process_rows(
26292631
context: &InspectProcessContext<'_>,
26302632
pid: u32,
@@ -2703,6 +2705,7 @@ fn append_inspect_process_rows(
27032705
}
27042706
}
27052707

2708+
/// Formats a generic table row using dynamic columns with control-character sanitization.
27062709
fn format_row_cells(values: &[String], columns: &[Column], _no_color: bool) -> String {
27072710
let mut row = String::from('│');
27082711
for (value, column) in values.iter().zip(columns.iter()) {
@@ -2715,6 +2718,7 @@ fn format_row_cells(values: &[String], columns: &[Column], _no_color: bool) -> S
27152718
row
27162719
}
27172720

2721+
/// Normalizes cell text to a single printable line while preserving ANSI color escape sequences.
27182722
fn sanitize_table_cell(value: &str) -> String {
27192723
let mut collapsed = String::new();
27202724
let mut chars = value.chars().peekable();
@@ -2746,10 +2750,12 @@ fn sanitize_table_cell(value: &str) -> String {
27462750
collapsed.split_whitespace().collect::<Vec<_>>().join(" ")
27472751
}
27482752

2753+
/// Returns a display-friendly process name from sysinfo process metadata.
27492754
fn process_display_name(process: &sysinfo::Process) -> String {
27502755
process.name().to_string_lossy().into_owned()
27512756
}
27522757

2758+
/// Returns the full command line when available, otherwise falls back to process display name.
27532759
fn process_command_line(process: &sysinfo::Process) -> String {
27542760
if process.cmd().is_empty() {
27552761
process_display_name(process)
@@ -2763,6 +2769,7 @@ fn process_command_line(process: &sysinfo::Process) -> String {
27632769
}
27642770
}
27652771

2772+
/// Converts sysinfo's status enum into a compact single-letter process state marker.
27662773
fn process_status_code(status: sysinfo::ProcessStatus) -> String {
27672774
format!("{status:?}")
27682775
.chars()
@@ -2772,6 +2779,7 @@ fn process_status_code(status: sysinfo::ProcessStatus) -> String {
27722779
}
27732780

27742781
#[cfg(target_os = "linux")]
2782+
/// Reads Linux `/proc` process stats used by inspect table columns (PPID, PRI/NI, CPU ticks, SHR).
27752783
fn read_linux_proc_stats(pid: u32) -> LinuxProcStats {
27762784
let stat_path = format!("/proc/{pid}/stat");
27772785
let statm_path = format!("/proc/{pid}/statm");
@@ -2807,11 +2815,13 @@ fn read_linux_proc_stats(pid: u32) -> LinuxProcStats {
28072815
}
28082816

28092817
#[cfg(not(target_os = "linux"))]
2818+
/// Non-Linux stub returning empty Linux-specific process stats.
28102819
fn read_linux_proc_stats(_pid: u32) -> LinuxProcStats {
28112820
LinuxProcStats::default()
28122821
}
28132822

28142823
#[cfg(target_os = "linux")]
2824+
/// Parses a `/proc/<pid>/stat` line into selected fields needed for inspect table rendering.
28152825
fn parse_proc_stat_line(contents: &str) -> Option<LinuxProcStats> {
28162826
let closing_paren = contents.rfind(')')?;
28172827
let remainder = contents.get((closing_paren + 1)..)?.trim();
@@ -2838,6 +2848,7 @@ fn parse_proc_stat_line(contents: &str) -> Option<LinuxProcStats> {
28382848
})
28392849
}
28402850

2851+
/// Formats CPU clock ticks as `MM:SS.CC`, matching htop-style time display.
28412852
fn format_cpu_time_from_ticks(ticks: u64) -> String {
28422853
#[cfg(target_os = "linux")]
28432854
let hz = {

src/config.rs

Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -414,6 +414,25 @@ pub struct DeploymentConfig {
414414
pub health_check: Option<HealthCheckConfig>,
415415
/// Grace period before stopping the old service instance.
416416
pub grace_period: Option<String>,
417+
/// Optional blue/green rollout settings for single-host zero-downtime deployments.
418+
pub blue_green: Option<BlueGreenDeploymentConfig>,
419+
}
420+
421+
/// Blue/green rollout configuration used by rolling deployments on a single host.
422+
#[derive(Debug, Deserialize, Clone, serde::Serialize)]
423+
pub struct BlueGreenDeploymentConfig {
424+
/// Environment variable used to inject the selected slot value (defaults to "PORT").
425+
pub env_var: Option<String>,
426+
/// Two slot values (commonly two port numbers) used as alternating targets.
427+
pub slots: Vec<String>,
428+
/// Command executed to switch traffic to the candidate slot once healthy.
429+
pub switch_command: Option<String>,
430+
/// Optional URL template for candidate health checks (supports `{slot}` substitution).
431+
pub candidate_health_check_url: Option<String>,
432+
/// Optional URL to verify after switch command completes.
433+
pub switch_verify_url: Option<String>,
434+
/// Optional path for persisting the active slot state.
435+
pub state_path: Option<String>,
417436
}
418437

419438
/// Health check configuration used during rolling deployments.
@@ -1144,6 +1163,55 @@ services:
11441163
assert!(vars2.get("SERVICE_ONLY").is_none());
11451164
}
11461165

1166+
#[test]
1167+
fn load_config_parses_blue_green_deployment_block() {
1168+
let dir = tempdir().expect("tempdir");
1169+
let yaml_path = dir.path().join("systemg.yaml");
1170+
let mut yaml_file = File::create(&yaml_path).expect("create yaml");
1171+
writeln!(
1172+
yaml_file,
1173+
r#"
1174+
version: "1"
1175+
services:
1176+
web:
1177+
command: "python app.py"
1178+
deployment:
1179+
strategy: "rolling"
1180+
blue_green:
1181+
env_var: "PORT"
1182+
slots: ["8000", "8001"]
1183+
switch_command: "echo switch"
1184+
candidate_health_check_url: "http://127.0.0.1:{{slot}}/health"
1185+
switch_verify_url: "http://127.0.0.1/health"
1186+
state_path: ".state/web-slot.json"
1187+
"#
1188+
)
1189+
.expect("write yaml");
1190+
1191+
let config = load_config(Some(yaml_path.to_str().expect("yaml path")))
1192+
.expect("load config");
1193+
let deployment = config
1194+
.services
1195+
.get("web")
1196+
.expect("web service")
1197+
.deployment
1198+
.as_ref()
1199+
.expect("deployment");
1200+
let blue_green = deployment.blue_green.as_ref().expect("blue_green");
1201+
1202+
assert_eq!(deployment.strategy.as_deref(), Some("rolling"));
1203+
assert_eq!(blue_green.env_var.as_deref(), Some("PORT"));
1204+
assert_eq!(blue_green.slots, vec!["8000", "8001"]);
1205+
assert_eq!(
1206+
blue_green.candidate_health_check_url.as_deref(),
1207+
Some("http://127.0.0.1:{slot}/health")
1208+
);
1209+
assert_eq!(
1210+
blue_green.switch_verify_url.as_deref(),
1211+
Some("http://127.0.0.1/health")
1212+
);
1213+
}
1214+
11471215
#[test]
11481216
fn hash_computation_is_stable() {
11491217
// Same config should always produce the same hash

0 commit comments

Comments
 (0)