Reddit-Sourced PPC Blog Engine

Automated AI blog pipeline that discovers Google Ads & PPC questions from Reddit, generates expert-grade articles with Claude, and deploys to Cloudflare Pages — twice daily, fully unattended.

What This Does

This engine runs on a Cloudflare Worker, triggered twice daily by cron. Each run:

Discovers trending PPC questions from Reddit via SearchAPI (SERP-based, no Reddit API required)
Filters for quality — minimum engagement, blocklist, NSFW skip, duplicate detection
Deduplicates using Vectorize semantic similarity to avoid publishing the same topic twice
Generates a full 1,500-3,000 word expert article using Claude (Anthropic)
Validates the HTML — JSON-LD schemas, GTM, meta tags, AI disclosure, Reddit attribution
Publishes via GitHub API commit, triggering Cloudflare Pages deploy
Verifies the live URL returns 200 before marking as published
Notifies you via email + SMS (AT&T gateway, no Twilio needed)
Pings IndexNow to get the post indexed within hours

Every generated article includes full SEO structured data (BlogPosting, QAPage, BreadcrumbList), E-E-A-T author signals, AI disclosure, and a link back to the source Reddit discussion.

Credits & Community Sources

This project exists because of the incredible PPC practitioner communities on Reddit, LinkedIn, Quora, and beyond. Every article generated by this engine credits its source discussion and links back to the original thread.

Reddit Communities

Subreddit	Focus	Link
r/PPC	Pay-per-click advertising, Google Ads, Meta Ads	reddit.com/r/PPC
r/googleads	Google Ads-specific questions and strategies	reddit.com/r/googleads
r/digital_marketing	Broad digital marketing including paid media	reddit.com/r/digital_marketing
r/adwords	Legacy Google AdWords community (still active)	reddit.com/r/adwords
r/marketing	General marketing strategy and industry discussion	reddit.com/r/marketing
r/SEO	Search engine optimization (cross-channel context)	reddit.com/r/SEO
r/DigitalMarketing	Digital strategy, analytics, paid media	reddit.com/r/DigitalMarketing
r/FacebookAds	Meta/Facebook advertising	reddit.com/r/FacebookAds
r/AmazonSeller	Amazon PPC and marketplace advertising	reddit.com/r/AmazonSeller

Popular r/PPC Threads That Inspired This Project

Should I use broad match or exact match? — The perennial match type debate
How much should I spend on Google Ads for a small business? — Budget sizing questions
Performance Max vs Search campaigns — Campaign type selection
My CPC keeps going up, what do I do? — Cost optimization
Is Google Ads worth it for a new business? — ROI justification
Smart Bidding isn't working for me — Bidding strategy troubleshooting
How to structure a Google Ads account? — Account organization

LinkedIn Communities & Thought Leaders

Google Ads Community on LinkedIn — Active professionals group
PPC Chat Community — Weekly Twitter/X chat community
Digital Marketing Professionals — Broad digital marketing group
Paid Search Association — Industry organization
Google Ads Help Community — Official Google forum

Quora Topics

Google Ads (Quora) — Q&A on Google Ads strategy
PPC Advertising (Quora) — Pay-per-click discussions
Digital Marketing (Quora) — Broad marketing Q&A
SEM (Quora) — Search engine marketing

Open Source Tools Referenced

Project	Description	Link
Google Ads MCP	Python MCP server with 29 tools for Google Ads API	github.com/itallstartedwithaidea/google-ads-mcp
Google Ads Gemini Extension	Gemini CLI extension with 22 tools	github.com/itallstartedwithaidea/google-ads-gemini-extension
Google Ads Skills	Anthropic Claude Agent Skills for Google Ads	github.com/itallstartedwithaidea/google-ads-skills
Google Ads API Agent	Full Python agent with 28 actions	github.com/itallstartedwithaidea/google-ads-api-agent
GoogleAdsAgent.ai	The complete website and tools	github.com/itallstartedwithaidea/googleadsagent-site

Architecture

                                    ┌─────────────────┐
                                    │  Cron Scheduler  │
                                    │  7am + 7pm ET    │
                                    └────────┬────────┘
                                             │
                                             ▼
┌─────────────────────────────────────────────────────────────────────┐
│                        Blog Engine Worker                           │
│                                                                     │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐│
│  │ Discover  │──▶│  Filter  │──▶│ Generate │──▶│    Validate      ││
│  │ (SERP)   │   │ & Dedupe │   │ (Claude) │   │ (schema, links)  ││
│  └──────────┘   └──────────┘   └──────────┘   └────────┬─────────┘│
│       │              │              │                    │          │
│       ▼              ▼              ▼                    ▼          │
│  ┌──────────┐   ┌──────────┐   ┌──────────┐   ┌──────────────────┐│
│  │SearchAPI │   │Vectorize │   │  Claude   │   │   GitHub API     ││
│  │          │   │ D1 (DB)  │   │  API      │   │   Commit + Push  ││
│  └──────────┘   └──────────┘   └──────────┘   └────────┬─────────┘│
│                                                         │          │
└─────────────────────────────────────────────────────────┼──────────┘
                                                          │
                          ┌───────────────────────────────┼──────────┐
                          │                               ▼          │
                          │  ┌──────────┐   ┌──────────────────────┐ │
                          │  │ Verify   │──▶│ Notify (Email + SMS) │ │
                          │  │ (200 OK) │   │ IndexNow Ping        │ │
                          │  └──────────┘   └──────────────────────┘ │
                          │          Cloudflare Pages Deploy         │
                          └──────────────────────────────────────────┘

State Machine

Every post flows through these statuses:

discovered → queued → generating → draft_ready → published
                │                       │
                ▼                       ▼
             skipped                  failed
         (filtered out)          (retryable error)

Cloudflare Services Used

Service	Purpose
Workers	Blog engine runtime (cron-triggered)
D1	Post tracker database (SQLite) — status, budgets, run logs
KV	Blog manifest for dynamic index page
R2	Hero image storage
Workers AI	Embeddings (bge-base-en-v1.5) + image generation (Stable Diffusion XL)
Vectorize	Semantic dedup index (cosine similarity on blog-topics)
Pages	Static site hosting + Functions

Quick Start

Prerequisites

Node.js 20+
Wrangler CLI (npm install -g wrangler)
Cloudflare account with Workers paid plan
Anthropic API key
SearchAPI key
GitHub personal access token (repo scope)
Google OAuth via Buddy (for email/SMS notifications — uses Gmail API, no extra service)

Step 1: Clone the Repo

git clone https://github.com/itallstartedwithaidea/reddit.git
cd reddit
npm install

Step 2: Create Cloudflare Resources

# Create D1 database
wrangler d1 create blog-tracker
# Copy the database_id into wrangler.toml

# Create KV namespace
wrangler kv namespace create BLOG_DATA
# Copy the id into wrangler.toml

# Create R2 bucket
wrangler r2 bucket create blog-assets

# Create Vectorize index
wrangler vectorize create blog-topics --dimensions=768 --metric=cosine

Step 3: Configure wrangler.toml

Update the binding IDs in wrangler.toml with the values from Step 2:

[[d1_databases]]
binding = "DB"
database_name = "blog-tracker"
database_id = "YOUR_D1_ID_HERE"

[[kv_namespaces]]
binding = "BLOG_DATA"
id = "YOUR_KV_ID_HERE"

Step 4: Set Secrets

wrangler secret put ANTHROPIC_API_KEY
# Paste your Anthropic API key

wrangler secret put SEARCHAPI_KEY
# Paste your SearchAPI key

wrangler secret put GITHUB_TOKEN
# Paste your GitHub personal access token

# Notifications use Gmail API via the admin's Google OAuth session
# No additional API keys needed for email/SMS

wrangler secret put EMAIL_TO
# Enter: your@email.com

wrangler secret put SMS_GATEWAY_TO
# Enter: your 10-digit phone number (e.g. 5551234567)
# SMS is sent via AT&T gateway: number@txt.att.net
# For other carriers:
#   T-Mobile: number@tmomail.net
#   Verizon: number@vtext.com
#   Sprint: number@messaging.sprintpcs.com

wrangler secret put CRON_SECRET
# Enter a random secret string for auth

Step 5: Initialize the Database

wrangler d1 execute blog-tracker --file=./schema.sql

Step 6: Deploy

wrangler deploy

Step 7: Test with Dry Run

curl -X POST "https://your-worker.workers.dev/run?key=YOUR_CRON_SECRET&dry_run=true"

Step 8: Trigger a Real Run

curl -X POST "https://your-worker.workers.dev/run?key=YOUR_CRON_SECRET"

Project Structure

reddit/
├── src/
│   ├── index.ts          # Worker entry — cron handler, HTTP routes, pipeline orchestration
│   ├── types.ts           # TypeScript interfaces for all data types
│   ├── db.ts              # D1 database operations — posts, budget, run logs
│   ├── ingest.ts          # Reddit thread discovery via SearchAPI SERP
│   ├── gates.ts           # Quality filters, blocklist, Vectorize semantic dedup
│   ├── topics.ts          # Topic classification into 12 PPC clusters
│   ├── generate.ts        # Claude article generation + related post linking
│   ├── template.ts        # Full HTML template (matches googleadsagent.ai design)
│   ├── images.ts          # Workers AI hero image generation (Stable Diffusion XL)
│   ├── validate.ts        # Post-generation validation (10+ checks)
│   ├── publish.ts         # GitHub API commit + deploy verification
│   ├── notify.ts          # Email + SMS notifications + IndexNow
│   └── budget.ts          # Cost tracking, daily/monthly caps
├── schema.sql             # D1 database schema
├── wrangler.toml          # Cloudflare Worker configuration
├── package.json
├── tsconfig.json
└── README.md

Configuration

Environment Variables (wrangler.toml)

Variable	Default	Description
`MAX_POSTS_PER_RUN`	`2`	Maximum posts to publish per cron run
`MAX_DAILY_GENERATIONS`	`4`	Maximum posts per day (2 runs x 2 posts)
`MAX_MONTHLY_USD`	`50`	Monthly budget cap (Claude + SearchAPI costs)
`SITE_URL`	`https://googleadsagent.ai`	Your site's base URL
`GITHUB_REPO`	`itallstartedwithaidea/googleadsagent-site`	GitHub repo for git commits
`GITHUB_BRANCH`	`main`	Branch to commit to

Topic Clusters

Every article is classified into one of 12 PPC topic clusters:

Cluster	Keywords
Bidding	bid, tCPA, tROAS, smart bidding, manual CPC
Creative	RSA, headlines, ad copy, extensions, sitelinks
Audiences	targeting, remarketing, custom segments, broad match
Measurement	conversion tracking, GA4, attribution, enhanced conversions
Automation	scripts, rules, API, AI Max, optimization score
Policy	disapproved, suspended, trademark, appeal
Shopping	shopping, merchant center, PMax, product feed
Video	YouTube, demand gen, bumper, TrueView
Local	local campaigns, GMB, location extensions
Budget	CPC, CPA, ROAS, daily budget, pacing
Account Structure	ad groups, naming conventions, SKAG, Hagakure
General	Catch-all for broad strategy questions

How Articles Are Generated

Prompt Engineering

Each article is generated by Claude with a system prompt that:

Sets the voice as John Williams, Senior Paid Media Specialist ($350M+ managed)
Provides the Reddit thread context (title, subreddit, snippet)
Requires 1,500-3,000 words of substantive, actionable content
Enforces callout boxes, comparison tables, stat cards
Requires specific benchmarks and real campaign data ranges
Prohibits fabricated Redditor quotes
Mandates a "Bottom Line" section with numbered action items

HTML Template

Every article includes:

3 JSON-LD blocks: BlogPosting (with full author entity + sameAs), BreadcrumbList, QAPage
Full meta tags: og:, twitter:, canonical, robots
GTM tracking: GTM-NR7F6P92
AI disclosure: Visible block crediting Reddit source + AI assistance
Author E-E-A-T signals: Person schema with name, jobTitle, sameAs links
Responsive design: Dark theme matching googleadsagent.ai

Validation Checks (10+)

Before publishing, every article is validated for:

Word count within bounds (800-5,000)
GTM snippet present
BlogPosting JSON-LD present and valid JSON
BreadcrumbList JSON-LD present
QAPage JSON-LD present
Reddit source URL in article body
AI disclosure block present
Canonical URL has no .html suffix
og:image and twitter:image tags present
Author name present
Shared scripts (site-search, cookie-consent, chat-widget)

Notifications

Email

From: blog-engine@googleadsagent.ai
To: your@email.com
Subject: New post live: {title}
Body: {title}\n{live URL}

SMS (via carrier email gateway — no Twilio needed)

To: 5551234567@txt.att.net
Body: New post: {title} {url}

Budget Alerts

80% monthly budget: Email + SMS warning
100% monthly budget: Email + SMS, pipeline pauses until next month

Error Alerts

Pipeline failures: Email only (no SMS to avoid waking you up)

Safeguards

Risk	Protection
Duplicate posts	D1 unique constraint on reddit_id + Vectorize cosine similarity (0.85 threshold)
404 after publish	5-retry verification (GET with cache bypass, checks 200 + content > 1KB)
Runaway costs	`MAX_POSTS_PER_RUN`, `MAX_DAILY_GENERATIONS`, `MAX_MONTHLY_USD` — all fail closed
Stale generating state	Auto-reset posts stuck in "generating" for > 2 hours on next run
Toxic content	Blocklist + minimum quality thresholds
Reddit TOS	Read-only SERP discovery (no Reddit API calls), attribution in every post
Bad HTML	10+ validation checks before any publish attempt
API outages	Exponential backoff, bounded retries, clean failure logging

Live Examples

These posts were generated by this engine and are live on googleadsagent.ai:

Legal & Compliance

Reddit: This engine uses SearchAPI (Google SERP) for thread discovery — it does NOT call the Reddit API directly. All articles link back to the source thread and attribute content to the community discussion. No Reddit user content is copied verbatim.
AI Disclosure: Every article contains a visible AI disclosure block. No content is presented as human-only authored.
Copyright: Articles are original AI-generated analysis inspired by community questions. They do not reproduce Reddit posts or comments.
Privacy: No Reddit usernames, IPs, or personal data are stored. Only thread IDs and URLs are tracked.

Author

John Williams — Senior Paid Media Specialist, $350M+ Managed

License

MIT License — see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
docs		docs
src		src
wiki		wiki
LICENSE		LICENSE
README.es.md		README.es.md
README.fr.md		README.fr.md
README.ko.md		README.ko.md
README.md		README.md
README.nl.md		README.nl.md
README.ru.md		README.ru.md
README.zh.md		README.zh.md
package.json		package.json
schema.sql		schema.sql
tsconfig.json		tsconfig.json
wrangler.toml		wrangler.toml

Folders and files

Latest commit

History

Repository files navigation

Reddit-Sourced PPC Blog Engine

What This Does

Credits & Community Sources

Reddit Communities

Popular r/PPC Threads That Inspired This Project

LinkedIn Communities & Thought Leaders

Quora Topics

Open Source Tools Referenced

Architecture

State Machine

Cloudflare Services Used

Quick Start

Prerequisites

Step 1: Clone the Repo

Step 2: Create Cloudflare Resources

Step 3: Configure wrangler.toml

Step 4: Set Secrets

Step 5: Initialize the Database

Step 6: Deploy

Step 7: Test with Dry Run

Step 8: Trigger a Real Run

Project Structure

Configuration

Environment Variables (wrangler.toml)

Topic Clusters

How Articles Are Generated

Prompt Engineering

HTML Template

Validation Checks (10+)

Notifications

Email

SMS (via carrier email gateway — no Twilio needed)

Budget Alerts

Error Alerts

Safeguards

Live Examples

Legal & Compliance

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages