| name | unified-token-telemetry |
|---|---|
| description | Use this when: track AI token usage across providers, build unified cost dashboard, how much am I spending on AI, compare model costs, cache hit ratio monitoring, LiteLLM token tracking, GitHub Copilot usage, OpenAI usage API, Anthropic usage API, ChatGPT Plus cost tracking, Claude Pro cost tracking, Gemini Advanced cost, Perplexity Pro cost, Cursor usage, Windsurf usage, multi-provider token telemetry, subscription cost tracking, idempotent token collection, hourly token rollups |
| type | skill |
Single Postgres fact table for all AI token usage. Covers LiteLLM (via Prometheus, routing to Ollama/vLLM/cloud APIs), GitHub Copilot (daily API), OpenAI Organization Usage API, Anthropic Admin API (with cache breakdown), fixed subscriptions (ChatGPT Plus/Pro, Claude Pro/Max, Gemini Advanced, Perplexity Pro, Cursor Pro/Ultra, Windsurf Pro/Ultra), and manual imports. Direct Postgres upserts — no Redis, no per-request events.
- Intake — Copy
templates/config.template.yaml→instances/{name}/config.yaml. Fill in all fields for your environment. - Schema — Run
references/schema.sqlagainst your Postgres. Replace{{schema}}with your schema name (e.g.telemetry). Or usetemplates/migration.template.sqlfor a versioned migration. Schema includestoken_usage(facts),watchdog_status(Phase 1 freshness), androllup_watermark(Phase 2 backfill cursor). - Adapters — Copy relevant adapters from
templates/adapters/into your project. Configure each source section in config.yaml. - Worker — Copy
templates/worker.template.js. Point it at your config.yaml. Run on cron or as a long-running process. The worker writeswatchdog_statusrows on every cycle and supports watermark-driven backfill after downtime. - External freshness watchdog — Copy
templates/consumers/watchdog-reader.template.shto your ops box, fill in DB connection placeholders, install as cron*/5 * * * *. This is the second layer that detects when the worker itself dies. - Consumers — Import
templates/consumers/grafana-dashboard.jsoninto Grafana. Set theDS_POSTGRESdatasource variable. Optionally deploynextjs-api-route.template.ts. - Verify — Run
templates/consumers/verification-runner.shor executereferences/verification.sqlmanually to confirm data is flowing and invariants hold.
Read references/invariants.md before writing any adapter code. Non-negotiable rules (13 total — only the headliners listed here):
- §1: Upserts are replacement, not additive — running twice must produce identical row counts.
- §2:
cached_read_tokens/cached_write_tokensareNULLwhen provider reports no cache data — never0. - §3: Counter resets (container restarts) detected by: if current_counter < last_recorded, treat current as delta.
- §4:
measurement_basismust accurately reflect how tokens were counted — affects cost math downstream. - §7:
UNIQUE NULLS NOT DISTINCTis required on the conflict key for PG 15+ (or paired partial-unique-indexes for PG <15). - §8: Every HTTP call has a 15-second socket deadline.
- §9: Exit code 2 on partial source failure (vs 0 for all-ok, 1 for fatal).
- §10: Freshness liveness uses
MAX(updated_at), NEVERMAX(created_at)(idempotent replay sources falsely look stale otherwise). - §11:
SOURCE_STATE_OVERRIDEfor sources that are configured-but-not-operating ('paused' status). - §12: Watermarks must be wall-clock aligned; snap forward on read.
- §13: Use Prometheus
query_rangefor multi-window backfill (1 round-trip vs N).
| Adapter | Granularity | Cache Data | measurement_basis | Cost |
|---|---|---|---|---|
| litellm-prometheus | hour | read + write | exact | computed from pricing config |
| copilot-daily | day | none (NULL) | provider_aggregate | none |
| openai-usage | day | none (NULL) | exact | from API |
| anthropic-usage | day | read + write | exact | from API |
| subscription | month | none (NULL) | derived_estimate | fixed monthly |
| manual-import | any | optional | varies | if provided |
Full-stack gateways with their own observability (Bifrost, Helicone, Portkey, WSO2, Kong) — they already export to Prometheus/Grafana natively.
Local inference backends (Ollama, vLLM) — covered implicitly when routed through LiteLLM.
AGENT_DEPLOY.md— read first — deployment walkthrough, failure modes, and decision points for agentsreferences/schema.sql— canonical DDL (token_usage + watchdog_status + rollup_watermark)references/invariants.md— non-negotiable data rules (13 numbered invariants)references/operational-lessons.md— real-world failures and durable fixes (9 lessons covering encoding bugs, npm half-installs, schema migrations, watermark alignment, etc.)references/adapter-contract.md— adapter interface specificationreferences/grafana-queries.sql— panel SQL queriesreferences/verification.sql— smoke test queriesreferences/architecture.md— data flow, freshness detection, backfill designtemplates/consumers/watchdog-reader.template.sh— external freshness watchdog (Phase 1 D2).env.example— all required environment variables with placeholder values
references/verification.sqlqueries return rows for each enabled source.- No row has
cached_read_tokens = 0where provider reports no cache (must beNULL). - Running the worker twice for the same window produces identical row count (idempotent).
- Watermark advances correctly:
SELECT * FROM rollup_watermarkshows aligned timestamps and recentupdated_at. - Worker writes
watchdog_statusrows on every cycle (checkSELECT status, COUNT(*) FROM watchdog_status GROUP BY status). - External watchdog reader (cron) exits 0 — all enabled sources are
okorpaused. - Grafana dashboard loads with data in all three panels.
grep -r "192.168" ./ | grep -v "instances/"returns zero matches.