Skip to content

Latest commit

 

History

History
26 lines (18 loc) · 1.54 KB

File metadata and controls

26 lines (18 loc) · 1.54 KB

Documentation

Top-level docs for the sre-on-call project.

Operations

  • Deployment — build images → push to ECR → terraform apply → secret hydration.
  • Slack app setup — create the Slack app (manifest or manual), scopes, events, triggers.
  • Testing — synthetic webhook + real Slack alert procedures.
  • Architecture diagram — generated from architecture.d2.

Agents

Agent Purpose
Master Orchestrates investigations across specialized agents, routes/synthesizes, enforces deadlines, posts the Incident Report.
Slack Scanner Scans Slack channel history for correlated alerts within an investigation window.
CloudWatch Logs Discovers real log groups, then queries AWS CloudWatch Logs Insights around the incident.
EKS Gathers Kubernetes cluster state (pods, events, logs, node conditions).
Incident History Finds similar past incidents via embedding similarity search.
Discord Scanner Scans Discord channel history. Checked-in but not in config.yaml for this deployment.

Domain vocabulary

See CONTEXT.md at the repo root for the canonical term definitions (AlertContext, Finding, AgentResult, ToolResult, WebhookAdapter, ChatPoster, ReportRenderer, ChannelMessageSource).