Skip to content

Add AI-powered cost anomaly detection: Bedrock insights, ML forecasting, bot detection & FinOps dashboard#1

Open
sunilp303 wants to merge 2 commits into
mainfrom
claude/goofy-elbakyan-392edb
Open

Add AI-powered cost anomaly detection: Bedrock insights, ML forecasting, bot detection & FinOps dashboard#1
sunilp303 wants to merge 2 commits into
mainfrom
claude/goofy-elbakyan-392edb

Conversation

@sunilp303

Copy link
Copy Markdown
Owner

Summary

  • Amazon Bedrock AI analysis - on every alert, calls Nova Lite with the full anomaly context and returns a structured report: severity rating, root-cause narrative, predicted next-week spend risk, immediate actions, and FinOps recommendations
  • ML spend forecasting - pure-Python Holt double-exponential smoothing forecasts next 7-day spend with confidence bounds; Z-score detection (|Z| > 2.0) flags statistical anomalies against 30-day rolling history - no extra Lambda layers or packages needed
  • Bot/DDoS attack detection - reads CloudWatch traffic metrics (Lambda invocations, API Gateway 4xx errors, CloudFront requests) and correlates spikes with cost-anomaly services to surface probable bot attacks
  • DynamoDB cost history - daily per-service snapshots stored with 90-day TTL, feeding the ML engine with real account-specific baselines
  • CloudWatch custom metrics - 7 metrics published to CostMonitor namespace every run (actual/forecast spend, rule-based and statistical anomaly counts, bot signals, per-service scores)
  • FinOps dashboard - rebuilt from 2 widgets to 11: actual-vs-forecast spend chart, cost change %, anomaly counts, bot-signal timeline, 4 KPI single-value tiles, Lambda health, filtered log view
  • Two CloudWatch Alarms - SNS notification if bot signals >= 1 or anomaly service count >= 3

Infrastructure changes (terraform/)

Resource Change
aws_dynamodb_table.cost_history New - PAY_PER_REQUEST, TTL on ttl field
aws_iam_role_policy.lambda_policy Added DynamoDB, bedrock:InvokeModel, cloudwatch:Put/Get, lambda:ListFunctions
aws_lambda_function.billing_monitor Memory 256 ? 512 MB; 6 new env vars
aws_cloudwatch_metric_alarm x2 New - bot attack + high anomaly count alarms
aws_cloudwatch_dashboard.billing_monitor Replaced 2-widget Lambda health view with 11-widget FinOps dashboard
terraform.tfvars Fixed threshold_percentage from -50 ? 15; added AI/ML config block

New variables

Variable Default Purpose
bedrock_model_id amazon.nova-lite-v1:0 Swap to amazon.nova-pro-v1:0 or Claude Haiku for richer analysis
bedrock_region us-east-1 Region serving the Bedrock model
enable_bedrock true Toggle AI analysis off to reduce cost
enable_bot_detection true Toggle traffic spike analysis
cost_history_retention_days 90 DynamoDB TTL window
cloudwatch_namespace CostMonitor Custom metrics namespace

Reviewer notes

  • All five new components wrap their AWS calls in try/except - a Bedrock outage or DynamoDB throttle never blocks the existing rule-based SNS alert
  • ML forecasting is stdlib-only (statistics module) so no Lambda layer is required
  • Bot detection calls lambda:ListFunctions (limited to 20 functions) and GetMetricStatistics; disable with enable_bot_detection = false if IAM is constrained
  • Bedrock costs ~.001 per daily run with Nova Lite; DynamoDB is ~.25/month for this data volume
  • The threshold_percentage = -50 bug in terraform.tfvars (which caused alerts only on 50%+ cost decreases) is fixed to 15

Test plan

  • terraform plan shows expected new resources (DynamoDB table, 2 alarms, updated dashboard, updated IAM policy)
  • Deploy and invoke Lambda manually - verify DynamoDB row written for yesterday
  • After 7+ days of history, verify forecast appears in CloudWatch CostMonitor namespace
  • Verify Bedrock section appears in SNS email when an alert fires
  • Check FinOps dashboard renders all 11 widgets in CloudWatch console

?? Generated with Claude Code

sunilp303 and others added 2 commits April 20, 2026 18:40
…tection, and FinOps dashboard

- HistoricalDataManager: persists daily per-service costs to DynamoDB (90-day TTL)
  for time-series ML analysis
- MLForecastingEngine: pure-Python Holt double-exponential smoothing for spend
  forecasting + Z-score statistical anomaly detection (no external deps)
- BotAttackDetector: correlates CloudWatch traffic spikes (Lambda/API GW/CloudFront)
  with cost anomaly services to flag potential bot/DDoS attacks
- BedrockAnalyzer: calls Amazon Nova Lite via Bedrock with full anomaly context;
  returns severity, root-cause narrative, immediate actions, and FinOps recommendations
- CloudWatchMetricsPublisher: pushes 7 custom metrics to CostMonitor namespace
  (actual spend, forecast, anomaly counts, bot signals, per-service scores)
- Terraform: adds DynamoDB table, Bedrock/CW/Lambda IAM permissions, two CloudWatch
  alarms, and a 4-row FinOps dashboard (actual vs forecast, anomaly counts, KPI tiles)
- All new features fail silently so existing rule-based alerts are never blocked
- Lambda memory bumped 256→512 MB; threshold_percentage fixed from -50 to 15

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Documents Bedrock AI analysis, ML forecasting, Z-score detection, bot
attack detection, DynamoDB cost history, CloudWatch custom metrics,
and the new FinOps dashboard. Adds configuration tables for all new
variables, cost estimates, troubleshooting guides for each new
component, and updated architecture diagram.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant