Skip to content

alialshehriar/ai-radar

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AI Radar

Predictive AI radar. Surfaces tools, models, papers and posts before they trend — by tracking growth-rate inflection rather than absolute popularity.

Trending lists are noise. By the time claude-code hits 100K stars or vllm shows up on every newsletter, it's already too late to be early. AI Radar inverts the ranking: items that are already huge get penalised, and items showing acceleration in the early window get boosted. An LLM judge then re-ranks the candidates by genuine novelty + expected impact + signal-vs-noise.

It's a small system you can run on a $5 VPS. Live example: https://alkinani.live/#radar.


What it does

sources ──▶  crawler  ──▶  discovery filter  ──▶  LLM judge  ──▶  translator  ──▶  API/UI
 (5)         every 4m       reject already-viral    novelty +       AR titles +     ranked
                            + age-cap + slow-growth  impact + signal Najdi summary   feed
                                                     scores 1-10

Five sources, all free:

Source What it watches How
GitHub Repos created in last 45d with growth ≥1.5 stars/day Search API + watched-orgs (NEW repos only — no megastar pollution)
Hacker News Stories <36h old, AI-keyword filtered Firebase API
Reddit r/LocalLLaMA, r/MachineLearning, r/OpenAI, etc — /rising + /new Public .json (or OAuth if datacenter IP is blocked)
Hugging Face Spaces sorted by trendingScore + likes Public /api/spaces
Product Hunt Daily AI launches Public Atom feed

One agent for X (Twitter): The X scraper runs on a machine with a logged-in browser (Chrome with CDP on :9223), tracks tracked-handle timelines + search queries, and POSTs to /api/radar/ingest. It runs on a separate machine because datacenter IPs get blocked by X.


What makes it predictive (not popularity-tracking)

Three mechanisms, in order of impact:

1. Discovery filter — reject already-viral items before the DB

// scripts/radar-config.json
"discovery": {
  "maxAbsoluteEngagement": {
    "github": 3000,        // > 3000 stars = already discovered
    "hackernews": 400,     // > 400 points = already on the front page
    "huggingface": 400,
    "reddit": 1500
  },
  "maxAgeDays": {
    "github": 45,          // > 45 days old = past the inflection window
    "hackernews": 1.5,
    "huggingface": 90,
    "reddit": 0.75
  },
  "minGrowthRate": {
    "github": 1.5,         // < 1.5 stars/day = no traction
    "huggingface": 0.3
  },
  "blockedRepos": [        // hard-blocked household names (extend as needed)
    "anthropics/claude-code",
    "openai/codex",
    "ggml-org/llama.cpp",
    "vllm-project/vllm",
    "huggingface/transformers"
    // ...etc
  ]
}

2. Score formula — growth-rate beats absolute count

// scripts/radar-crawl.mjs (simplified)
discoveryScore = (
    weights.velocity      * log10(1 + velocity_per_min * 60) * 4   // /hour
  + weights.growthRate    * log10(1 + stars_per_day)
  + weights.novelty       * exp(-age_hours / 12) * 5               // 12h half-life
  + weights.authority     * min(authority, 2.5)                    // capped
  + weights.crossSource   * (1 + (sources_count - 1) * 0.6)        // boost when 2+ sources agree
  + weights.absoluteEngagement * log10(1 + engagement)             // ← weight is NEGATIVE
);

The trick is the negative absoluteEngagement weight: a 100K-star repo gets a penalty relative to a 50-star repo, so the system actively works against established giants.

Items get velocity from successive snapshots — every crawl writes a row to radar_snapshots(signal_id, engagement, captured_at), so the second crawl of the same item already has a Δengagement / Δminutes. Cross-source detection (same URL appearing on HN + Reddit + GitHub within 3h) further boosts the rank.

3. LLM judge — final filter on novelty/impact/signal

After the heuristic ranking, every new candidate is sent to Kimi K2 (or Anthropic if you have a key) with a strict rubric:

novelty (1-10): genuinely new vs already-known/recycled
impact  (1-10): will it actually matter / trend
signal  (1-10): AI-relevance + signal vs noise

Plus a one-line Najdi (Saudi Arabic) verdict like:

  • "إطلاق Google لعامل ترميز جديد يدمج Gemini"8.7/10
  • "شكوى Reddit على Codex بدون جديد"2.7/10
  • "meta-discussion عن نقص العتاد بسبب AI"3.3/10

The composite judge_score = (novelty + impact + signal) / 3 folds into the API ranking via score + judge_score * 0.8, so a high-judge / low-popularity item beats a low-judge / high-popularity item every time.

Items with signal < 3 get pruned from the DB on the next judge pass.


Architecture

┌────────────── VPS ────────────────────────┐
│ cron */4 min                              │
│   ├── radar-crawl.mjs   (collect + filter) │
│   ├── radar-judge.mjs   (LLM rubric)       │
│   └── radar-translate.mjs (Arabic + Najdi) │
│                                            │
│ Express server                             │
│   ├── GET  /api/radar      (ranked feed)  │
│   ├── GET  /api/radar/breaking            │
│   ├── POST /api/radar/refresh  (admin)    │
│   └── POST /api/radar/ingest  (admin) ←───┼─── X agent on local machine
└────────────────────────────────────────────┘

Storage: SQLite via better-sqlite3 (single file, radar.db). No Postgres, no Redis, no queue. The radar_signals table holds current state; radar_snapshots holds engagement-over-time so velocity can be computed.


Quickstart

1. Clone + install

git clone https://github.com/alialshehriar/ai-radar.git
cd ai-radar
cp .env.example .env
# Edit .env — at minimum set RADAR_REFRESH_SECRET and one of {ANTHROPIC_API_KEY, KIMI_API_KEY}
cp scripts/radar-config.example.json scripts/radar-config.json
# Edit radar-config.json — pick subreddits, watched orgs, etc.
cd server && npm install && cd ..

2. Smoke-test crawler (no DB write)

RADAR_DRY=1 node scripts/radar-crawl.mjs

This collects from all enabled sources and prints the top items by engagement. Verify you're seeing real things, not 80K-star megastars.

3. Run the server + a real crawl

# Terminal A
node server/server.mjs

# Terminal B
node scripts/radar-crawl.mjs   # populate
node scripts/radar-judge.mjs   # rate them
node scripts/radar-translate.mjs   # Arabic translation (optional)

# Read the result
curl 'http://127.0.0.1:3002/api/radar?limit=10' | jq .

4. Wire the cron on a real server

bash scripts/install-cron.sh

This registers three cron entries (*/4, 1-57/4, 2-58/4 — staggered by one minute each) under the current user so crawl → judge → translate run in sequence every four minutes.

5. Run the X agent (optional)

On a machine that has Chrome running with CDP enabled and a logged-in X session:

# Start Chrome with CDP — see docs/x-agent.md for details
chrome --remote-debugging-port=9223 --user-data-dir=/tmp/x-agent-profile

# Then run the agent (it'll POST collected tweets to your radar instance)
ALK_RADAR_INGEST_URL=https://your-host.example/api/radar/ingest \
ALK_RADAR_SECRET=your-secret \
node scripts/x-agent.mjs

The agent uses agent-browser under the hood for resilient CDP automation. Schedule via launchd (macOS) or systemd timer (Linux).


API

GET /api/radar

Returns the ranked feed.

Query param Default Notes
limit 30 1–100
source * One of github, hackernews, reddit, huggingface, producthunt, x
category * One of tool, paper, news, tweet, discussion
sinceHours 24 1–168

Response shape:

{
  "items": [
    {
      "id": 142,
      "source": "hackernews",
      "url": "https://...",
      "title": "AlphaEvolve: Gemini-powered coding agent...",
      "titleAr": "AlphaEvolve: وكيل برمجة مدعوم من Gemini...",
      "summaryAr": "إطلاق Google لعامل ترميز جديد يدمج Gemini",
      "score": 20.64,
      "velocity": 0.14,        // engagement units per minute
      "crossSource": 2,        // appeared on 2 sources within 3h
      "breaking": true,
      "judge": {
        "novelty": 8, "impact": 9, "signal": 9,
        "score": 8.7,
        "verdict": "إطلاق Google لعامل ترميز جديد يدمج Gemini"
      },
      "meta": { "stars": 47, "starsPerDay": 23.5, "language": "Python" }
    }
  ],
  "stats": { "github": { "count": 21, "lastAt": ... }, ... },
  "fetchedAt": 1778116100000
}

GET /api/radar/breaking

Items with breaking=1 (velocity >= per-source p90), within last 6h. Useful for a "🔥 Breaking now" pill.

POST /api/radar/refresh (admin)

Triggers a crawl out of band. Headers: X-Radar-Secret: <RADAR_REFRESH_SECRET>. Returns immediately; crawl runs detached.

POST /api/radar/ingest (admin)

Accepts pre-collected items from external scrapers (the X agent uses this). Same auth header. Body:

{
  "items": [
    {
      "source": "x",
      "externalId": "1842394...",
      "url": "https://x.com/...",
      "title": "...",
      "author": "@simonw",
      "engagement": 412,
      "authority": 2.5,
      "postedAt": 1778116100000,
      "meta": { "likes": 320, "retweets": 41 }
    }
  ]
}

Costs

The reference deployment (alkinani.live) runs on a $5/mo Hostinger VPS + Kimi K2 API for translation/judge. At cron */4:

  • ~360 cycles/day × 40 items/cycle = ~14k items judged + ~14k items translated
  • Kimi K2 turbo: ~$2/day total LLM cost
  • VPS: $5/mo
  • GitHub anonymous (60 req/hr) covers small footprints; with GITHUB_TOKEN (5000 req/hr) you can scale 80×

If you swap to Anthropic Haiku for translate + judge, expect ~3-4× the cost but noticeably better Arabic and tighter rubric scoring.


Frontend (optional)

client/Radar.tsx is a React 19 + Tailwind 4 + Motion component that consumes /api/radar. It includes:

  • Source filter pills (with live counts)
  • Category filter (All / 🔥 Breaking / Tools / News / Discussion)
  • Pagination (6 per page)
  • Hover-revealed verdict tooltip (Najdi summary + judge breakdown)
  • Auto-refresh every 60s
  • RTL-aware (mirrors layout in Arabic)

It's wired to a specific design token system (ember-500, tide-500, ink-*) — adapt to your own palette by find/replacing those classes.


Tuning

The whole system is configured by scripts/radar-config.json. Three knobs you'll touch most:

  1. discovery.blockedRepos — household names to ignore. Add anything that keeps appearing in the top despite filters.
  2. scoring.weights — if too much noise sneaks through, increase velocity and novelty, decrease authority. If real news is missed, the opposite.
  3. reddit.subreddits, github.watchedOrgs, x.accounts — your taste, your radar. Fewer high-signal sources beat many noisy ones.

For finer X tuning see the x block — freshnessWindowMinutes, minLikesForFreshTweet, and per-handle weight (1.0–3.0 authority multiplier).


Project structure

ai-radar/
├── server/                  # Express API
│   ├── server.mjs
│   └── package.json
├── scripts/
│   ├── radar-crawl.mjs      # multi-source collector + scorer
│   ├── radar-judge.mjs      # LLM novelty/impact/signal rubric
│   ├── radar-translate.mjs  # Kimi/Anthropic Arabic translation
│   ├── x-agent.mjs          # X (Twitter) scraper via agent-browser CDP
│   ├── radar-config.example.json
│   └── install-cron.sh
├── client/                  # Optional React component
│   ├── Radar.tsx
│   └── radar-client.ts
├── docs/
│   ├── architecture.md
│   ├── deployment.md
│   └── x-agent.md
├── .env.example
├── LICENSE                  # MIT
├── package.json
└── README.md

License

MIT — see LICENSE. Built by Ali Alkinani (offshore, between work shifts).

If you ship something useful with it, ping @o0a98.

About

AI Radar — predictive trend-detection radar by Ali Alkinani (alkinani.live). Surfaces tools, models, papers and posts BEFORE they trend, by tracking growth-rate inflection. Multi-source crawler + LLM judge + Arabic translation.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors