A serverless pipeline that reads your newsletter subscriptions every morning and sends you one AI-generated digest email with only the stuff worth reading. Runs on Google Cloud, costs about $0.50/month.
What you get: A single email at 7 AM with weather, a curated summary of everything your newsletters said today, and a cost tracker in the footer showing exactly what you spent.
You subscribe to newsletters
↓
Gmail filter auto-labels incoming emails
↓
Cloud Scheduler triggers at 7 AM
↓
Cloud Function wakes up and:
1. Fetches all labeled emails via Gmail API
2. Converts HTML → clean markdown
3. Sends to Gemini for opinionated summarization
4. Emails you the digest (with weather + cost)
5. Moves processed emails to archive label
6. Auto-creates Gmail filters for new senders
7. Cleans up emails older than 30 days
8. Logs analytics to BigQuery
When you manually label a newsletter for the first time, the pipeline automatically creates a Gmail filter for that sender on the next run. The filter:
- Skips the inbox (archives it)
- Applies the same label you used (
To-Process-NewslettersorTo-Process-Saved)
So you label one email → every future email from that sender is auto-labeled. No need to manually create Gmail filters.
Known senders are tracked in Firestore. To see or edit them, visit the Firebase Console.
Existing users: If you set up the pipeline before this feature, re-run the OAuth flow to add the
gmail.settings.basicscope:python3 oauth_flow.py credentials.json # Then update Secret Manager with the new token
Two pipelines, two retention policies:
| Gmail Label | After Processing | Retention |
|---|---|---|
To-Process-Newsletters |
→ Archived-Newsletters |
30 days, then auto-deleted |
To-Process-Saved |
→ Saved-Newsletters |
Kept forever |
Use To-Process-Saved for newsletters you want digested but never deleted (e.g. James Clear's 3-2-1 Thursday).
You need five things before starting. Here's exactly where to get each:
| What | Where | Notes |
|---|---|---|
| Google Cloud account | console.cloud.google.com | New accounts get $300 free credits for 90 days |
| Gmail account | You probably have one | This is where newsletters arrive and where digests get sent |
gcloud CLI |
cloud.google.com/sdk/docs/install | Command-line tool to manage GCP |
| Python 3.12+ | python.org/downloads | Check with python3 --version |
| Gemini API key | aistudio.google.com/app/apikey | Free tier is generous; enable pay-as-you-go for reliable access |
git clone https://github.com/shipitsteven/newsletter-digest-pipeline.git
cd newsletter-digestcp .env.example .env
cp config.example.py config.pyEdit .env with your values:
GCP_PROJECT_ID=my-newsletter-digest # pick any unique name
DIGEST_RECIPIENT=you@gmail.com
WEATHER_LATITUDE=47.48 # your city's latitude
WEATHER_LONGITUDE=-122.21 # your city's longitude
GEMINI_API_KEY=your-key-from-ai-studio # only for local testing; production uses Secret ManagerEdit config.py if you want to change the schedule, model, or other defaults. The example file has comments explaining each option.
If you just installed gcloud, authenticate:
gcloud auth loginTip: If your browser is logged into multiple Google accounts, open the auth URL in an incognito window to avoid signing in as the wrong account.
Go to console.cloud.google.com/projectcreate and create a project. Use the same ID you put in .env.
Then link a billing account:
- Go to console.cloud.google.com/billing
- Click "Link a billing account" on your project
- If you're new, you'll be prompted to start a free trial — do it, you won't be charged
This is the most confusing part of GCP setup. Follow these steps exactly:
-
Go to the OAuth consent screen:
https://console.cloud.google.com/apis/credentials/consent?project=YOUR_PROJECT_ID(Replace
YOUR_PROJECT_IDwith your actual project ID) -
Configure the consent screen:
- Click "Get started" or "Configure consent screen"
- User type: External (only option for personal Gmail)
- App name:
Newsletter Digest(or anything you want) - User support email: your email
- Developer contact: your email
- Click through the remaining steps (you can skip optional fields)
-
Add yourself as a test user (CRITICAL):
- In the OAuth consent screen settings, find "Test users" or "Audience"
- Click "Add users"
- Enter your Gmail address
- Save
Why this matters: Your app starts in "Testing" mode. Only test users can authorize it. If you skip this, you'll get
Error 403: access_deniedand waste 20 minutes debugging. -
Publish the app (optional but recommended):
- Still on the consent screen page, click "Publish app"
- This moves it from "Testing" to "Production"
- For personal use, this just removes the test-user restriction
- You do NOT need Google to verify it for personal use
-
Go to the credentials page:
https://console.cloud.google.com/apis/credentials?project=YOUR_PROJECT_ID -
Create credentials:
- Click "+ CREATE CREDENTIALS" at the top
- Choose "OAuth client ID"
- Application type: Desktop app
- Name: anything (e.g. "Newsletter Digest CLI")
- Click "Create"
-
Download the JSON:
- In the popup that appears, click "DOWNLOAD JSON"
- Save it as
credentials.jsonin the project root (same folder asmain.py)
./bootstrap.sh YOUR_PROJECT_ID you@gmail.comHere's what bootstrap does at each step:
- Pre-flight checks — Verifies Python 3.12+ and gcloud are installed and authenticated
- Installs Python deps — Runs
pip install -r requirements.txt - Creates GCP project — Sets it as active (skips if exists)
- Pauses for billing — Gives you a link to confirm billing is linked
- Enables APIs — Cloud Functions, Scheduler, Secret Manager, Gmail, Firestore, BigQuery, etc.
- Sets IAM roles — Gives the Cloud Function permission to read secrets
- OAuth flow — Opens your browser to authorize Gmail access. Accept all scopes.
- Stores token — Saves the OAuth refresh token in Secret Manager
- Stores Gemini key — Prompts for your API key and saves it in Secret Manager as
gemini-api-key - Creates Gmail labels — All 5 labels:
To-Process-Newsletters,To-Process-Saved,Archived-Newsletters,Saved-Newsletters,AI-Digest - Seeds Firestore — Writes initial config for hot-reloading (sender hints, pipelines, etc.)
Re-running is safe. If something fails halfway through, just run it again — existing resources are skipped.
./deploy.sh YOUR_PROJECT_ID you@gmail.comThis deploys the Cloud Function and creates a Cloud Scheduler job that triggers it daily. The Gemini API key is read from Secret Manager (stored during bootstrap), not passed as an env var.
You need to tell Gmail which emails to route to the pipeline. Two options:
Option A: CLI helper (recommended)
# Standard newsletters (30-day retention):
./add-filter.sh "newsletter@therundownai.com"
./add-filter.sh "newsletter@bytebytego.com"
./add-filter.sh "news@tldrnewsletter.com"
# Newsletters to keep forever:
./add-filter.sh "newsletter@jamesclear.com" --savedOption B: Gmail UI
- Go to Gmail → Settings (gear icon) → See all settings → Filters and Blocked Addresses
- Click "Create a new filter"
- In "From": enter
newsletter@example.com - Click "Create filter"
- Check: "Skip the Inbox" + "Apply the label" → choose
To-Process-Newsletters - Click "Create filter"
Trigger the pipeline manually to make sure everything works:
gcloud functions call digest-pipeline --gen2 --region=us-central1What to expect:
- If you have newsletters already labeled, you'll get a digest email within 1-2 minutes
- If no emails are labeled yet, you'll get a "nothing today" email (unless you set
SEND_EMPTY_DIGEST=false) - Check the Cloud Functions logs if something seems off:
gcloud functions logs read digest-pipeline --gen2 --region=us-central1 --limit=50
You're done! Tomorrow at 7 AM you'll get your first real digest.
The pipeline supports multiple Gmail accounts. Each account gets its own OAuth token, digest emails, and analytics — completely isolated.
./onboard.sh partner@gmail.comThe account ID is derived from the email (partner@gmail.com → partner). Override with --id custom-name if needed.
The script handles everything: OAuth flow, Secret Manager, Gmail label creation, newsletter discovery, and adding the account to Firestore. One manual step: the new email must be added as a test user in GCP Console first (the script will prompt you).
After onboarding, the account is live immediately — no deploy needed.
# Run digest for one account only
gcloud functions call digest-pipeline --gen2 --region=us-central1 --data '{"account": "mypartner"}'
# Debug mode for one account
gcloud functions call digest-pipeline --gen2 --region=us-central1 --data '{"debug": "true", "account": "mypartner"}'Without account, the pipeline runs all enabled accounts sequentially with configurable stagger delays between them.
Instead of sending one digest per account, you can merge all accounts into a single unified digest delivered to one inbox.
Enable via Firestore remote config (no redeploy needed):
- Set
CONSOLIDATED_DIGEST→true(boolean) - Set
CONSOLIDATED_OUTPUT→ the account ID that should receive the merged email (e.g.swlilblack)
Or via environment variables + redeploy:
CONSOLIDATED_DIGEST=true
CONSOLIDATED_OUTPUT=swlilblackHow it works: The pipeline fetches emails from all accounts independently (each using its own OAuth token), merges them into a single collection, deduplicates by Message-ID, makes one Gemini call to summarize everything, and sends one digest to the output account's recipient. Each newsletter gets a [via: account_id] tag so "Read full →" links open in the correct Gmail inbox.
Failure handling: If one account's fetch fails (expired OAuth, timeout), the pipeline continues with the others and adds a warning banner to the digest. Relabeling happens per-account after sending — if it fails, emails reappear next run (self-healing).
Cost benefit: N-to-1 is cheaper than N-to-N since there's only one set of Gemini calls regardless of account count.
Rollback: Flip CONSOLIDATED_DIGEST back to false in Firestore. Instant, no redeploy.
Full spec: docs/specs/consolidated-digest.md
./offboard.sh --id mypartnerSoft-deletes the account (revokes OAuth, deletes secret). BQ analytics data is retained for 90 days, then auto-purged. Use --purge-now to skip the retention period.
All settings live in config.py. Edit and redeploy to apply changes.
| Setting | Default | Description |
|---|---|---|
SCHEDULE_CRON |
0 7 * * * |
When the digest runs (cron syntax, e.g. 0 7 * * 1-5 for weekdays only) |
SCHEDULE_TIMEZONE |
America/Los_Angeles |
Timezone for the schedule |
GEMINI_MODEL |
gemini-flash-latest |
Model for summarization (auto-updates to latest stable Flash) |
CONTENT_EXTRACTION_MODE |
markdown |
How to extract content: markdown (recommended), plain_text, or raw_html |
MAX_EMAILS_PER_RUN |
20 |
Max newsletters per digest — remainder waits for next run |
SEND_EMPTY_DIGEST |
true |
Send a "nothing today" email when no newsletters are found |
ARCHIVE_RETENTION_DAYS |
30 |
Days before archived newsletters are auto-trashed |
WEATHER_LATITUDE / WEATHER_LONGITUDE |
Seattle, WA | Your location for the weather in the digest header |
CONSOLIDATED_DIGEST |
false |
Merge all accounts into one digest (N-to-1) |
CONSOLIDATED_OUTPUT |
"" |
Account ID that receives the merged digest (required when consolidated is true) |
For settings you want to change without redeploying, use Firestore:
https://console.firebase.google.com/project/YOUR_PROJECT_ID/firestore/databases/-default-/data/~2Fconfig~2Fdigest-pipeline
The pipeline checks Firestore on every run and falls back to config.py if Firestore is unavailable. This means you can:
- Tweak
sender_hintsto tell the AI how to handle specific newsletters - Change
MAX_EMAILS_PER_RUNduring high-volume periods - Swap models without redeploying
- Toggle
CONSOLIDATED_DIGESTandCONSOLIDATED_OUTPUTon the fly
Accounts are stored in Firestore under config/digest-pipeline.accounts. The onboard.sh script writes here directly — no file editing or redeployment needed. You can also view and edit accounts in the Firestore console.
Sender hints are especially useful. They tell Gemini how to treat specific senders:
{
"sender_hints": {
"rundown": "Rapid-fire format with many items — extract top 3 only",
"bytebytego": "Usually one deep technical topic — focus on the core insight",
"trip.com": "Almost always promotional spam — skip unless genuinely useful"
}
}Edit these directly in the Firestore console — changes take effect on the next run.
To-Process-Newsletters— For daily newsletters you read and discard. Processed emails move toArchived-Newslettersand get auto-deleted after 30 days.To-Process-Saved— For newsletters you want to keep forever. Processed emails move toSaved-Newslettersand stay there permanently.
Here are some popular free newsletters to get you started:
| Newsletter | From address | Suggested pipeline |
|---|---|---|
| The Rundown AI | newsletter@therundownai.com |
Newsletters (30-day) |
| TLDR | news@tldrnewsletter.com |
Newsletters (30-day) |
| ByteByteGo | newsletter@bytebytego.com |
Newsletters (30-day) |
| Morning Brew | crew@morningbrew.com |
Newsletters (30-day) |
| Tim Ferriss 5-Bullet Friday | tim@fourhourbody.com |
Saved (forever) |
| James Clear 3-2-1 Thursday | newsletter@jamesclear.com |
Saved (forever) |
| Lenny's Newsletter | lenny@substack.com |
Saved (forever) |
| The Pragmatic Engineer | gergely@pragmaticengineer.com |
Saved (forever) |
Finding the "from" address: Open a newsletter email → click the three dots → "Show original" → look for the
From:header. Or just check what shows up in the "From" column.
| Component | Monthly Cost |
|---|---|
| Gemini API (markdown mode) | ~$0.20-0.50 |
| Cloud Functions | ~$0.00 (free tier) |
| Cloud Scheduler | ~$0.10 |
| Secret Manager | ~$0.00 (free tier) |
| BigQuery | ~$0.00 (free tier) |
| Firestore | ~$0.00 (free tier) |
| Gmail API | Free |
| Open-Meteo (weather) | Free |
| Total | ~$0.30-0.60/month |
The digest footer includes a per-run cost so you can track spending. With ~10 newsletters/day in markdown mode, expect $0.01-0.03 per digest.
Run data is logged to BigQuery (newsletter_digest dataset). Some useful queries:
-- Monthly cost and volume
SELECT FORMAT_TIMESTAMP('%Y-%m', timestamp) as month,
SUM(cost_usd) as cost,
SUM(emails_processed) as emails
FROM newsletter_digest.digest_runs
GROUP BY 1 ORDER BY 1
-- Noisiest senders (who floods your inbox most)
SELECT sender, COUNT(*) as emails
FROM newsletter_digest.digest_items
GROUP BY 1 ORDER BY 2 DESC LIMIT 20
-- Average processing time
SELECT FORMAT_TIMESTAMP('%Y-%m', timestamp) as month,
AVG(duration_seconds) as avg_seconds
FROM newsletter_digest.digest_runs
GROUP BY 1 ORDER BY 1When new features are released:
git pull origin main
./deploy.sh YOUR_PROJECT_ID you@gmail.comIf config.example.py has new settings, compare it with your config.py and add anything new:
diff config.py config.example.pyYou didn't add yourself as a test user. Go to:
https://console.cloud.google.com/apis/credentials/consent?project=YOUR_PROJECT_ID
Find "Test users" → Add your Gmail address → try again.
If your browser is logged into multiple accounts, the OAuth URL might redirect to the wrong one. Fix:
- Open the URL in an incognito window, or
- Append
&authuser=you@gmail.comto the URL
Cloud Functions won't deploy without billing. Even with the $300 free trial, you need to explicitly link it:
https://console.cloud.google.com/billing/linkedaccount?project=YOUR_PROJECT_ID
The default deploy uses 512MB which is plenty. If you somehow hit OOM:
- Check that
requirements.txthasn't accidentally pulled ingoogle-cloud-aiplatform(it's huge) - The pipeline uses
google-genaiwhich is much lighter
Gemini model names change. Common mistakes:
gemini-1.5-flash— retired, doesn't work anymoregemini-3-flash— doesn't exist, the correct name isgemini-3-flash-previewgemini-flash-latest— recommended, auto-updates to the latest stable Flash model
- Check that your Gmail filters are actually labeling emails: search
label:To-Process-Newslettersin Gmail - Make sure bootstrap created the labels (check Gmail sidebar)
- Re-run bootstrap if labels are missing — it's safe to re-run
If you deployed before the Secret Manager migration, your Gemini key is still set as a Cloud Function env var but the code now looks in Secret Manager first. Fix it by storing the key:
echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini-api-key \
--data-file=- --replication-policy=automatic --project=YOUR_PROJECT_IDThen redeploy (./deploy.sh) to remove the old env var. New installations handle this automatically during bootstrap.
Just run it again. It's idempotent — existing resources are skipped. If it fails on the OAuth step specifically, make sure credentials.json is in the project root.
To completely reset — delete all GCP resources and remove labels from Gmail (emails are preserved):
# Preview what will be deleted
./teardown.sh YOUR_PROJECT_ID --dry-run
# Run teardown (keeps GCP project)
./teardown.sh YOUR_PROJECT_ID
# Full nuke — also deletes the GCP project and local token.json
./teardown.sh YOUR_PROJECT_ID --nuke| Resource | Details |
|---|---|
| Cloud Function | digest-pipeline |
| Cloud Scheduler | trigger-newsletter-digest, trigger-sunday-edition |
| Secret Manager | OAuth tokens, Gemini API key, cost tracker |
| BigQuery | newsletter_digest dataset + all tables |
| Firestore | config/digest-pipeline document |
| Gmail labels | All 6 pipeline labels — removed from emails, then deleted |
| Gmail filters | Any filter routing to pipeline labels |
| GCP project | Only with --nuke (30-day recovery window) |
Local token.json |
Only with --nuke |
- All emails (labels removed, messages untouched)
credentials.json(reusable across projects).env,config.py, git repo
To remove one account without touching shared infra:
./teardown.sh YOUR_PROJECT_ID --account mypartnerThis removes that account's Gmail labels/filters and OAuth secret only.
config.example.py — Template config (copy to config.py)
config.py — Code defaults (schedule, model, weather, labels, pricing). No personal data.
.env.example — Template env vars (copy to .env)
.env — Your env vars (API keys, project ID)
main.py — Cloud Function entry point
pipeline.py — Pipeline orchestration (per-account + consolidated flows)
email_client.py — Gmail fetch, MIME parsing, relabel, cleanup, filter management
llm_client.py — Gemini prompt construction, API calls, response parsing
digest_builder.py — HTML assembly, weather fetch, cost footer, error banners
analytics.py — BigQuery logging (per-account + consolidated tracking)
remote_config.py — Firestore-backed config with fallback to config.py (accounts, sender hints, overrides)
seed_config.py — Seeds Firestore with initial config + sender hints
analytics.py — BigQuery logging (with per-account tracking)
oauth_flow.py — One-time OAuth helper (run by bootstrap/onboard)
bootstrap.sh — Full GCP setup (APIs, IAM, OAuth, labels, Firestore)
deploy.sh — Deploy function + create/update scheduler
add-filter.sh — CLI helper for creating Gmail filters
onboard.sh — Add a new Gmail account to the pipeline
offboard.sh — Remove an account (soft-delete with 90-day BQ TTL)
teardown.sh — Full infrastructure teardown (reset to pre-onboard state)
requirements.txt — Python dependencies
credentials.json — OAuth client secret (you download this from GCP)
docs/
multi-account-spec.md — Multi-account design spec
onboarding-guide.md — Step-by-step second account onboarding
specs/
consolidated-digest.md — Consolidated N-to-1 digest spec
sunday-edition.md — Weekly recap spec
internal/ — Specs, plans, troubleshooting (dev reference)
TODO.md — Planned features
MIT