Skip to content

itallstartedwithaidea/reddit

Repository files navigation

Reddit-Sourced PPC Blog Engine

English | Français | Español | 中文 | Nederlands | Русский | 한국어

Automated AI blog pipeline that discovers Google Ads & PPC questions from Reddit, generates expert-grade articles with Claude, and deploys to Cloudflare Pages — twice daily, fully unattended.

Live Blog Cloudflare Workers Claude Reddit License: MIT


What This Does

This engine runs on a Cloudflare Worker, triggered twice daily by cron. Each run:

  1. Discovers trending PPC questions from Reddit via SearchAPI (SERP-based, no Reddit API required)
  2. Filters for quality — minimum engagement, blocklist, NSFW skip, duplicate detection
  3. Deduplicates using Vectorize semantic similarity to avoid publishing the same topic twice
  4. Generates a full 1,500-3,000 word expert article using Claude (Anthropic)
  5. Validates the HTML — JSON-LD schemas, GTM, meta tags, AI disclosure, Reddit attribution
  6. Publishes via GitHub API commit, triggering Cloudflare Pages deploy
  7. Verifies the live URL returns 200 before marking as published
  8. Notifies you via email + SMS (AT&T gateway, no Twilio needed)
  9. Pings IndexNow to get the post indexed within hours

Every generated article includes full SEO structured data (BlogPosting, QAPage, BreadcrumbList), E-E-A-T author signals, AI disclosure, and a link back to the source Reddit discussion.


Credits & Community Sources

This project exists because of the incredible PPC practitioner communities on Reddit, LinkedIn, Quora, and beyond. Every article generated by this engine credits its source discussion and links back to the original thread.

Reddit Communities

Subreddit Focus Link
r/PPC Pay-per-click advertising, Google Ads, Meta Ads reddit.com/r/PPC
r/googleads Google Ads-specific questions and strategies reddit.com/r/googleads
r/digital_marketing Broad digital marketing including paid media reddit.com/r/digital_marketing
r/adwords Legacy Google AdWords community (still active) reddit.com/r/adwords
r/marketing General marketing strategy and industry discussion reddit.com/r/marketing
r/SEO Search engine optimization (cross-channel context) reddit.com/r/SEO
r/DigitalMarketing Digital strategy, analytics, paid media reddit.com/r/DigitalMarketing
r/FacebookAds Meta/Facebook advertising reddit.com/r/FacebookAds
r/AmazonSeller Amazon PPC and marketplace advertising reddit.com/r/AmazonSeller

Popular r/PPC Threads That Inspired This Project

LinkedIn Communities & Thought Leaders

Quora Topics

Open Source Tools Referenced

Project Description Link
Google Ads MCP Python MCP server with 29 tools for Google Ads API github.com/itallstartedwithaidea/google-ads-mcp
Google Ads Gemini Extension Gemini CLI extension with 22 tools github.com/itallstartedwithaidea/google-ads-gemini-extension
Google Ads Skills Anthropic Claude Agent Skills for Google Ads github.com/itallstartedwithaidea/google-ads-skills
Google Ads API Agent Full Python agent with 28 actions github.com/itallstartedwithaidea/google-ads-api-agent
GoogleAdsAgent.ai The complete website and tools github.com/itallstartedwithaidea/googleadsagent-site

Architecture

                                    ┌─────────────────┐
                                    │  Cron Scheduler  │
                                    │  7am + 7pm ET    │
                                    └────────┬────────┘
                                             │
                                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Blog Engine Worker                           │
│                                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐│
│  │ Discover  │──▶│  Filter  │──▶│ Generate │──▶│    Validate      ││
│  │ (SERP)   │   │ & Dedupe │   │ (Claude) │   │ (schema, links)  ││
│  └──────────┘   └──────────┘   └──────────┘   └────────┬─────────┘│
│       │              │              │                    │          │
│       ▼              ▼              ▼                    ▼          │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐│
│  │SearchAPI │   │Vectorize │   │  Claude   │   │   GitHub API     ││
│  │          │   │ D1 (DB)  │   │  API      │   │   Commit + Push  ││
│  └──────────┘   └──────────┘   └──────────┘   └────────┬─────────┘│
│                                                         │          │
└─────────────────────────────────────────────────────────┼──────────┘
                                                          │
                          ┌───────────────────────────────┼──────────┐
                          │                               ▼          │
                          │  ┌──────────┐   ┌──────────────────────┐ │
                          │  │ Verify   │──▶│ Notify (Email + SMS) │ │
                          │  │ (200 OK) │   │ IndexNow Ping        │ │
                          │  └──────────┘   └──────────────────────┘ │
                          │          Cloudflare Pages Deploy         │
                          └──────────────────────────────────────────┘

State Machine

Every post flows through these statuses:

discovered → queued → generating → draft_ready → published
                │                       │
                ▼                       ▼
             skipped                  failed
         (filtered out)          (retryable error)

Cloudflare Services Used

Service Purpose
Workers Blog engine runtime (cron-triggered)
D1 Post tracker database (SQLite) — status, budgets, run logs
KV Blog manifest for dynamic index page
R2 Hero image storage
Workers AI Embeddings (bge-base-en-v1.5) + image generation (Stable Diffusion XL)
Vectorize Semantic dedup index (cosine similarity on blog-topics)
Pages Static site hosting + Functions

Quick Start

Prerequisites

Step 1: Clone the Repo

git clone https://github.com/itallstartedwithaidea/reddit.git
cd reddit
npm install

Step 2: Create Cloudflare Resources

# Create D1 database
wrangler d1 create blog-tracker
# Copy the database_id into wrangler.toml

# Create KV namespace
wrangler kv namespace create BLOG_DATA
# Copy the id into wrangler.toml

# Create R2 bucket
wrangler r2 bucket create blog-assets

# Create Vectorize index
wrangler vectorize create blog-topics --dimensions=768 --metric=cosine

Step 3: Configure wrangler.toml

Update the binding IDs in wrangler.toml with the values from Step 2:

[[d1_databases]]
binding = "DB"
database_name = "blog-tracker"
database_id = "YOUR_D1_ID_HERE"

[[kv_namespaces]]
binding = "BLOG_DATA"
id = "YOUR_KV_ID_HERE"

Step 4: Set Secrets

wrangler secret put ANTHROPIC_API_KEY
# Paste your Anthropic API key

wrangler secret put SEARCHAPI_KEY
# Paste your SearchAPI key

wrangler secret put GITHUB_TOKEN
# Paste your GitHub personal access token

# Notifications use Gmail API via the admin's Google OAuth session
# No additional API keys needed for email/SMS

wrangler secret put EMAIL_TO
# Enter: your@email.com

wrangler secret put SMS_GATEWAY_TO
# Enter: your 10-digit phone number (e.g. 5551234567)
# SMS is sent via AT&T gateway: number@txt.att.net
# For other carriers:
#   T-Mobile: number@tmomail.net
#   Verizon: number@vtext.com
#   Sprint: number@messaging.sprintpcs.com

wrangler secret put CRON_SECRET
# Enter a random secret string for auth

Step 5: Initialize the Database

wrangler d1 execute blog-tracker --file=./schema.sql

Step 6: Deploy

wrangler deploy

Step 7: Test with Dry Run

curl -X POST "https://your-worker.workers.dev/run?key=YOUR_CRON_SECRET&dry_run=true"

Step 8: Trigger a Real Run

curl -X POST "https://your-worker.workers.dev/run?key=YOUR_CRON_SECRET"

Project Structure

reddit/
├── src/
│   ├── index.ts          # Worker entry — cron handler, HTTP routes, pipeline orchestration
│   ├── types.ts           # TypeScript interfaces for all data types
│   ├── db.ts              # D1 database operations — posts, budget, run logs
│   ├── ingest.ts          # Reddit thread discovery via SearchAPI SERP
│   ├── gates.ts           # Quality filters, blocklist, Vectorize semantic dedup
│   ├── topics.ts          # Topic classification into 12 PPC clusters
│   ├── generate.ts        # Claude article generation + related post linking
│   ├── template.ts        # Full HTML template (matches googleadsagent.ai design)
│   ├── images.ts          # Workers AI hero image generation (Stable Diffusion XL)
│   ├── validate.ts        # Post-generation validation (10+ checks)
│   ├── publish.ts         # GitHub API commit + deploy verification
│   ├── notify.ts          # Email + SMS notifications + IndexNow
│   └── budget.ts          # Cost tracking, daily/monthly caps
├── schema.sql             # D1 database schema
├── wrangler.toml          # Cloudflare Worker configuration
├── package.json
├── tsconfig.json
└── README.md

Configuration

Environment Variables (wrangler.toml)

Variable Default Description
MAX_POSTS_PER_RUN 2 Maximum posts to publish per cron run
MAX_DAILY_GENERATIONS 4 Maximum posts per day (2 runs x 2 posts)
MAX_MONTHLY_USD 50 Monthly budget cap (Claude + SearchAPI costs)
SITE_URL https://googleadsagent.ai Your site's base URL
GITHUB_REPO itallstartedwithaidea/googleadsagent-site GitHub repo for git commits
GITHUB_BRANCH main Branch to commit to

Topic Clusters

Every article is classified into one of 12 PPC topic clusters:

Cluster Keywords
Bidding bid, tCPA, tROAS, smart bidding, manual CPC
Creative RSA, headlines, ad copy, extensions, sitelinks
Audiences targeting, remarketing, custom segments, broad match
Measurement conversion tracking, GA4, attribution, enhanced conversions
Automation scripts, rules, API, AI Max, optimization score
Policy disapproved, suspended, trademark, appeal
Shopping shopping, merchant center, PMax, product feed
Video YouTube, demand gen, bumper, TrueView
Local local campaigns, GMB, location extensions
Budget CPC, CPA, ROAS, daily budget, pacing
Account Structure ad groups, naming conventions, SKAG, Hagakure
General Catch-all for broad strategy questions

How Articles Are Generated

Prompt Engineering

Each article is generated by Claude with a system prompt that:

  • Sets the voice as John Williams, Senior Paid Media Specialist ($350M+ managed)
  • Provides the Reddit thread context (title, subreddit, snippet)
  • Requires 1,500-3,000 words of substantive, actionable content
  • Enforces callout boxes, comparison tables, stat cards
  • Requires specific benchmarks and real campaign data ranges
  • Prohibits fabricated Redditor quotes
  • Mandates a "Bottom Line" section with numbered action items

HTML Template

Every article includes:

  • 3 JSON-LD blocks: BlogPosting (with full author entity + sameAs), BreadcrumbList, QAPage
  • Full meta tags: og:, twitter:, canonical, robots
  • GTM tracking: GTM-NR7F6P92
  • AI disclosure: Visible block crediting Reddit source + AI assistance
  • Author E-E-A-T signals: Person schema with name, jobTitle, sameAs links
  • Responsive design: Dark theme matching googleadsagent.ai

Validation Checks (10+)

Before publishing, every article is validated for:

  1. Word count within bounds (800-5,000)
  2. GTM snippet present
  3. BlogPosting JSON-LD present and valid JSON
  4. BreadcrumbList JSON-LD present
  5. QAPage JSON-LD present
  6. Reddit source URL in article body
  7. AI disclosure block present
  8. Canonical URL has no .html suffix
  9. og:image and twitter:image tags present
  10. Author name present
  11. Shared scripts (site-search, cookie-consent, chat-widget)

Notifications

Email

From: blog-engine@googleadsagent.ai
To: your@email.com
Subject: New post live: {title}
Body: {title}\n{live URL}

SMS (via carrier email gateway — no Twilio needed)

To: 5551234567@txt.att.net
Body: New post: {title} {url}

Budget Alerts

  • 80% monthly budget: Email + SMS warning
  • 100% monthly budget: Email + SMS, pipeline pauses until next month

Error Alerts

  • Pipeline failures: Email only (no SMS to avoid waking you up)

Safeguards

Risk Protection
Duplicate posts D1 unique constraint on reddit_id + Vectorize cosine similarity (0.85 threshold)
404 after publish 5-retry verification (GET with cache bypass, checks 200 + content > 1KB)
Runaway costs MAX_POSTS_PER_RUN, MAX_DAILY_GENERATIONS, MAX_MONTHLY_USD — all fail closed
Stale generating state Auto-reset posts stuck in "generating" for > 2 hours on next run
Toxic content Blocklist + minimum quality thresholds
Reddit TOS Read-only SERP discovery (no Reddit API calls), attribution in every post
Bad HTML 10+ validation checks before any publish attempt
API outages Exponential backoff, bounded retries, clean failure logging

Live Examples

These posts were generated by this engine and are live on googleadsagent.ai:

  1. Should I Use Broad Match or Exact Match in Google Ads in 2026?
  2. How Much Should I Spend on Google Ads? The Small Business Reality Check
  3. Performance Max vs Search Campaigns: When to Use Which

Legal & Compliance

  • Reddit: This engine uses SearchAPI (Google SERP) for thread discovery — it does NOT call the Reddit API directly. All articles link back to the source thread and attribute content to the community discussion. No Reddit user content is copied verbatim.
  • AI Disclosure: Every article contains a visible AI disclosure block. No content is presented as human-only authored.
  • Copyright: Articles are original AI-generated analysis inspired by community questions. They do not reproduce Reddit posts or comments.
  • Privacy: No Reddit usernames, IPs, or personal data are stored. Only thread IDs and URLs are tracked.

Author

John Williams — Senior Paid Media Specialist, $350M+ Managed


License

MIT License — see LICENSE for details.

About

Reddit-Sourced PPC Blog Engine — Automated AI blog pipeline that discovers Google Ads questions from r/PPC, r/googleads, r/digital_marketing & generates expert-grade articles with Claude, deploys to Cloudflare Pages, and notifies via email + SMS. Built with Cloudflare Workers, D1, R2, Vectorize, Workers AI.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors