Skip to content

Commit 5d91680

Browse files
committed
enhancement: explain unit health and color inspect cron status
1 parent 6f19125 commit 5d91680

5 files changed

Lines changed: 892 additions & 35 deletions

File tree

docs/how-it-works/commands/inspect.mdx

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,18 @@ runtime command lines and process/spawn descendants.
9494
- **Average duration** - For completed processes
9595
- **Success rate** - For cron jobs
9696

97+
## Cron run history
98+
99+
For cron units, inspect prints a `Cron Run History` table. The `STATUS` column is
100+
color-coded so outcomes are obvious at a glance:
101+
102+
- `success` - light green
103+
- `running` - light blue (the run is still in progress)
104+
- `failed` / `failed: <reason>` - red
105+
- `overlap` - yellow (a run was skipped because the previous one was still going)
106+
107+
Pass `--no-color` to disable the coloring.
108+
97109
## Snapshot detail
98110

99111
The default `status.snapshot_mode: summary` is intended for production-scale

docs/how-it-works/commands/status.mdx

Lines changed: 43 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -24,14 +24,50 @@ when you explicitly want the no-supervisor disk fallback for that config.
2424
## Interactive Mode
2525

2626
By default, `sysg status` displays an interactive table where you can:
27-
- **Tab** or **** (Down arrow) - Move to the next service
28-
- **Shift+Tab** or **** (Up arrow) - Move to the previous service
29-
- **Enter** or **I** - Open the inspect view for the selected service
27+
- **** / **Tab** (Down) - Move to the next service
28+
- **** / **Shift+Tab** (Up) - Move to the previous service
29+
- **** / **** (Right / Left) - Move the focused cell across columns in the
30+
selected row
31+
- **I** or **Enter** - Open the inspect view for the selected service
32+
- **H** - Open the health report explaining why the selected unit is `Warn` or
33+
`Failing` (see below)
3034
- **L** - Open a live log snapshot stream for the selected service
3135
- **R** - Restart the selected service
3236
- **q**, **ESC**, or **Ctrl+C** - Exit the status view
3337

34-
The selected row is highlighted with a white background for easy visibility.
38+
The focused cell in the selected row is drawn in reverse video, so it stands
39+
out as lighter on a dark terminal and darker on a light terminal. This makes it
40+
obvious which cell you are on as you tab across the row.
41+
42+
## Why is it `Warn` or `Failing`?
43+
44+
The `HEALTH` column is a verdict, not an explanation. To see *why* a unit holds
45+
its health, select it and press **H**. systemg opens a full-screen, README-style
46+
health report in the same window:
47+
48+
```
49+
# 'api' is stopped but should be serving
50+
51+
Severity: 5/10 (Warn)
52+
53+
TLDR: A long-running service is intentionally stopped.
54+
55+
## Description
56+
57+
'api' has intent 'Serve', meaning it is expected to stay available, but it is
58+
currently stopped. Nothing is serving requests for this unit right now.
59+
60+
## Recommended Fix
61+
62+
Start it again if it should be up:
63+
64+
sysg restart -s api --log-level debug
65+
66+
If it is meant to stay down, this is expected and can be ignored.
67+
```
68+
69+
The report wraps to at most 80 columns, narrowing to 80% of the terminal width
70+
on smaller terminals. Press any key to return to the status table.
3571

3672
> **Info:** Press **L** while a UNIT row is selected to open that unit's logs
3773
> without leaving the status workflow. The shortcut runs
@@ -170,7 +206,9 @@ process and spawn-tree detail in status/inspect views.
170206
data is available.
171207
- **LAST_EXIT** - Last recorded exit code or signal when the unit has exited.
172208
- **HEALTH** - Operator action signal. `HEALTH` answers "does this unit need
173-
attention?" Values are `Healthy`, `Idle`, `Warn`, and `Failing`.
209+
attention?" Values are `Healthy`, `Idle`, `Warn`, and `Failing`. Press **H**
210+
on a selected unit to open a health report explaining the verdict and how to
211+
fix it.
174212

175213
### State values
176214

src/bin/main.rs

Lines changed: 67 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ use systemg::{
4545
CronUnitStatus, ExitMetadata, OverallHealth, ProcessState, ProjectRunMode,
4646
SpawnedProcessNode, StatusSnapshot, UnitHealth, UnitIntent, UnitKind,
4747
UnitMetricsSummary, UnitState, UnitStatus, UptimeInfo, collect_disk_snapshot,
48-
compute_overall_health, format_elapsed,
48+
compute_overall_health, explain_unit_health, format_elapsed,
4949
},
5050
supervisor::Supervisor,
5151
};
@@ -1190,7 +1190,7 @@ mod tests {
11901190
runtime_command: None,
11911191
spawned_children: vec![],
11921192
};
1193-
let unit_row = format_unit_row(&unit, &columns, true);
1193+
let unit_row = format_unit_row_focus(&unit, &columns, true, None);
11941194
assert!(unit_row.contains("srvc"));
11951195
assert!(unit_row.contains("rashad"));
11961196

@@ -2248,6 +2248,71 @@ mod tests {
22482248
assert!(widths[3] >= INSPECT_CRON_SOFT_MIN_WIDTHS[3]);
22492249
}
22502250

2251+
#[test]
2252+
fn format_inspect_cron_status_colors_by_outcome() {
2253+
let success =
2254+
format_inspect_cron_status(Some(&CronExecutionStatus::Success), false);
2255+
assert!(success.contains("success"));
2256+
assert!(success.contains(BRIGHT_GREEN));
2257+
2258+
let running = format_inspect_cron_status(None, false);
2259+
assert!(running.contains("running"));
2260+
assert!(running.contains(LIGHT_BLUE));
2261+
2262+
let failed = format_inspect_cron_status(
2263+
Some(&CronExecutionStatus::Failed("boom".into())),
2264+
false,
2265+
);
2266+
assert!(failed.contains("failed: boom"));
2267+
assert!(failed.contains(RED_BOLD));
2268+
}
2269+
2270+
#[test]
2271+
fn format_inspect_cron_status_respects_no_color() {
2272+
let success =
2273+
format_inspect_cron_status(Some(&CronExecutionStatus::Success), true);
2274+
assert_eq!(success, "success");
2275+
}
2276+
2277+
#[test]
2278+
fn wrap_paragraph_respects_width_and_keeps_words_whole() {
2279+
let text = "the quick brown fox jumps over the lazy dog";
2280+
let lines = wrap_paragraph(text, 20);
2281+
assert!(lines.iter().all(|line| visible_length(line) <= 20));
2282+
let rejoined = lines.join(" ");
2283+
assert_eq!(rejoined, text);
2284+
}
2285+
2286+
#[test]
2287+
fn render_health_report_includes_required_sections() {
2288+
let mut unit = UnitStatus {
2289+
name: "api".into(),
2290+
hash: "hash".into(),
2291+
project: None,
2292+
kind: UnitKind::Service,
2293+
lifecycle: Some(ServiceLifecycleStatus::Stopped),
2294+
state: UnitState::Stopped,
2295+
intent: UnitIntent::Serve,
2296+
health: UnitHealth::Warn,
2297+
process: None,
2298+
uptime: None,
2299+
last_exit: None,
2300+
cron: None,
2301+
metrics: None,
2302+
command: None,
2303+
runtime_command: None,
2304+
spawned_children: Vec::new(),
2305+
};
2306+
unit.intent = UnitIntent::Serve;
2307+
2308+
let rendered = render_health_report(&unit, true).join("\n");
2309+
assert!(rendered.contains("# "));
2310+
assert!(rendered.contains("Severity:"));
2311+
assert!(rendered.contains("TLDR:"));
2312+
assert!(rendered.contains("## Description"));
2313+
assert!(rendered.contains("## Recommended Fix"));
2314+
}
2315+
22512316
#[test]
22522317
fn test_format_uptime_short() {
22532318
assert_eq!(format_uptime_short("30 secs ago"), "< 1m");

0 commit comments

Comments
 (0)