Skip to content

shipitsteven/gmail-newsletter-digest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Newsletter Digest

A serverless pipeline that reads your newsletter subscriptions every morning and sends you one AI-generated digest email with only the stuff worth reading. Runs on Google Cloud, costs about $0.50/month.

What you get: A single email at 7 AM with weather, a curated summary of everything your newsletters said today, and a cost tracker in the footer showing exactly what you spent.


How It Works

You subscribe to newsletters
        ↓
Gmail filter auto-labels incoming emails
        ↓
Cloud Scheduler triggers at 7 AM
        ↓
Cloud Function wakes up and:
  1. Fetches all labeled emails via Gmail API
  2. Converts HTML → clean markdown
  3. Sends to Gemini for opinionated summarization
  4. Emails you the digest (with weather + cost)
  5. Moves processed emails to archive label
  6. Auto-creates Gmail filters for new senders
  7. Cleans up emails older than 30 days
  8. Logs analytics to BigQuery

Auto-Filter: One Label, Forever Filtered

When you manually label a newsletter for the first time, the pipeline automatically creates a Gmail filter for that sender on the next run. The filter:

  • Skips the inbox (archives it)
  • Applies the same label you used (To-Process-Newsletters or To-Process-Saved)

So you label one email → every future email from that sender is auto-labeled. No need to manually create Gmail filters.

Known senders are tracked in Firestore. To see or edit them, visit the Firebase Console.

Existing users: If you set up the pipeline before this feature, re-run the OAuth flow to add the gmail.settings.basic scope:

python3 oauth_flow.py credentials.json
# Then update Secret Manager with the new token

Two pipelines, two retention policies:

Gmail Label After Processing Retention
To-Process-Newsletters Archived-Newsletters 30 days, then auto-deleted
To-Process-Saved Saved-Newsletters Kept forever

Use To-Process-Saved for newsletters you want digested but never deleted (e.g. James Clear's 3-2-1 Thursday).


Prerequisites

You need five things before starting. Here's exactly where to get each:

What Where Notes
Google Cloud account console.cloud.google.com New accounts get $300 free credits for 90 days
Gmail account You probably have one This is where newsletters arrive and where digests get sent
gcloud CLI cloud.google.com/sdk/docs/install Command-line tool to manage GCP
Python 3.12+ python.org/downloads Check with python3 --version
Gemini API key aistudio.google.com/app/apikey Free tier is generous; enable pay-as-you-go for reliable access

Quick Start

Step 1: Clone the repo

git clone https://github.com/shipitsteven/newsletter-digest-pipeline.git
cd newsletter-digest

Step 2: Configure environment

cp .env.example .env
cp config.example.py config.py

Edit .env with your values:

GCP_PROJECT_ID=my-newsletter-digest    # pick any unique name
DIGEST_RECIPIENT=you@gmail.com
WEATHER_LATITUDE=47.48                  # your city's latitude
WEATHER_LONGITUDE=-122.21              # your city's longitude
GEMINI_API_KEY=your-key-from-ai-studio # only for local testing; production uses Secret Manager

Edit config.py if you want to change the schedule, model, or other defaults. The example file has comments explaining each option.

Step 3: Set up gcloud CLI

If you just installed gcloud, authenticate:

gcloud auth login

Tip: If your browser is logged into multiple Google accounts, open the auth URL in an incognito window to avoid signing in as the wrong account.

Step 4: Create a GCP project

Go to console.cloud.google.com/projectcreate and create a project. Use the same ID you put in .env.

Then link a billing account:

  1. Go to console.cloud.google.com/billing
  2. Click "Link a billing account" on your project
  3. If you're new, you'll be prompted to start a free trial — do it, you won't be charged

Step 5: Set up OAuth consent screen

This is the most confusing part of GCP setup. Follow these steps exactly:

  1. Go to the OAuth consent screen:

    https://console.cloud.google.com/apis/credentials/consent?project=YOUR_PROJECT_ID
    

    (Replace YOUR_PROJECT_ID with your actual project ID)

  2. Configure the consent screen:

    • Click "Get started" or "Configure consent screen"
    • User type: External (only option for personal Gmail)
    • App name: Newsletter Digest (or anything you want)
    • User support email: your email
    • Developer contact: your email
    • Click through the remaining steps (you can skip optional fields)
  3. Add yourself as a test user (CRITICAL):

    • In the OAuth consent screen settings, find "Test users" or "Audience"
    • Click "Add users"
    • Enter your Gmail address
    • Save

    Why this matters: Your app starts in "Testing" mode. Only test users can authorize it. If you skip this, you'll get Error 403: access_denied and waste 20 minutes debugging.

  4. Publish the app (optional but recommended):

    • Still on the consent screen page, click "Publish app"
    • This moves it from "Testing" to "Production"
    • For personal use, this just removes the test-user restriction
    • You do NOT need Google to verify it for personal use

Step 6: Create OAuth credentials

  1. Go to the credentials page:

    https://console.cloud.google.com/apis/credentials?project=YOUR_PROJECT_ID
    
  2. Create credentials:

    • Click "+ CREATE CREDENTIALS" at the top
    • Choose "OAuth client ID"
    • Application type: Desktop app
    • Name: anything (e.g. "Newsletter Digest CLI")
    • Click "Create"
  3. Download the JSON:

    • In the popup that appears, click "DOWNLOAD JSON"
    • Save it as credentials.json in the project root (same folder as main.py)

Step 7: Run bootstrap

./bootstrap.sh YOUR_PROJECT_ID you@gmail.com

Here's what bootstrap does at each step:

  1. Pre-flight checks — Verifies Python 3.12+ and gcloud are installed and authenticated
  2. Installs Python deps — Runs pip install -r requirements.txt
  3. Creates GCP project — Sets it as active (skips if exists)
  4. Pauses for billing — Gives you a link to confirm billing is linked
  5. Enables APIs — Cloud Functions, Scheduler, Secret Manager, Gmail, Firestore, BigQuery, etc.
  6. Sets IAM roles — Gives the Cloud Function permission to read secrets
  7. OAuth flow — Opens your browser to authorize Gmail access. Accept all scopes.
  8. Stores token — Saves the OAuth refresh token in Secret Manager
  9. Stores Gemini key — Prompts for your API key and saves it in Secret Manager as gemini-api-key
  10. Creates Gmail labels — All 5 labels: To-Process-Newsletters, To-Process-Saved, Archived-Newsletters, Saved-Newsletters, AI-Digest
  11. Seeds Firestore — Writes initial config for hot-reloading (sender hints, pipelines, etc.)

Re-running is safe. If something fails halfway through, just run it again — existing resources are skipped.

Step 8: Deploy

./deploy.sh YOUR_PROJECT_ID you@gmail.com

This deploys the Cloud Function and creates a Cloud Scheduler job that triggers it daily. The Gemini API key is read from Secret Manager (stored during bootstrap), not passed as an env var.

Step 9: Add Gmail filters

You need to tell Gmail which emails to route to the pipeline. Two options:

Option A: CLI helper (recommended)

# Standard newsletters (30-day retention):
./add-filter.sh "newsletter@therundownai.com"
./add-filter.sh "newsletter@bytebytego.com"
./add-filter.sh "news@tldrnewsletter.com"

# Newsletters to keep forever:
./add-filter.sh "newsletter@jamesclear.com" --saved

Option B: Gmail UI

  1. Go to Gmail → Settings (gear icon) → See all settings → Filters and Blocked Addresses
  2. Click "Create a new filter"
  3. In "From": enter newsletter@example.com
  4. Click "Create filter"
  5. Check: "Skip the Inbox" + "Apply the label" → choose To-Process-Newsletters
  6. Click "Create filter"

Step 10: Smoke test

Trigger the pipeline manually to make sure everything works:

gcloud functions call digest-pipeline --gen2 --region=us-central1

What to expect:

  • If you have newsletters already labeled, you'll get a digest email within 1-2 minutes
  • If no emails are labeled yet, you'll get a "nothing today" email (unless you set SEND_EMPTY_DIGEST=false)
  • Check the Cloud Functions logs if something seems off:
    gcloud functions logs read digest-pipeline --gen2 --region=us-central1 --limit=50

You're done! Tomorrow at 7 AM you'll get your first real digest.


Multi-Account Support

The pipeline supports multiple Gmail accounts. Each account gets its own OAuth token, digest emails, and analytics — completely isolated.

Adding a new account

./onboard.sh partner@gmail.com

The account ID is derived from the email (partner@gmail.compartner). Override with --id custom-name if needed.

The script handles everything: OAuth flow, Secret Manager, Gmail label creation, newsletter discovery, and adding the account to Firestore. One manual step: the new email must be added as a test user in GCP Console first (the script will prompt you).

After onboarding, the account is live immediately — no deploy needed.

Targeting a specific account

# Run digest for one account only
gcloud functions call digest-pipeline --gen2 --region=us-central1 --data '{"account": "mypartner"}'

# Debug mode for one account
gcloud functions call digest-pipeline --gen2 --region=us-central1 --data '{"debug": "true", "account": "mypartner"}'

Without account, the pipeline runs all enabled accounts sequentially with configurable stagger delays between them.

Consolidated Digest (N-to-1)

Instead of sending one digest per account, you can merge all accounts into a single unified digest delivered to one inbox.

Enable via Firestore remote config (no redeploy needed):

  • Set CONSOLIDATED_DIGESTtrue (boolean)
  • Set CONSOLIDATED_OUTPUT → the account ID that should receive the merged email (e.g. swlilblack)

Or via environment variables + redeploy:

CONSOLIDATED_DIGEST=true
CONSOLIDATED_OUTPUT=swlilblack

How it works: The pipeline fetches emails from all accounts independently (each using its own OAuth token), merges them into a single collection, deduplicates by Message-ID, makes one Gemini call to summarize everything, and sends one digest to the output account's recipient. Each newsletter gets a [via: account_id] tag so "Read full →" links open in the correct Gmail inbox.

Failure handling: If one account's fetch fails (expired OAuth, timeout), the pipeline continues with the others and adds a warning banner to the digest. Relabeling happens per-account after sending — if it fails, emails reappear next run (self-healing).

Cost benefit: N-to-1 is cheaper than N-to-N since there's only one set of Gemini calls regardless of account count.

Rollback: Flip CONSOLIDATED_DIGEST back to false in Firestore. Instant, no redeploy.

Full spec: docs/specs/consolidated-digest.md

Removing an account

./offboard.sh --id mypartner

Soft-deletes the account (revokes OAuth, deletes secret). BQ analytics data is retained for 90 days, then auto-purged. Use --purge-now to skip the retention period.


Configuration

Local config (config.py)

All settings live in config.py. Edit and redeploy to apply changes.

Setting Default Description
SCHEDULE_CRON 0 7 * * * When the digest runs (cron syntax, e.g. 0 7 * * 1-5 for weekdays only)
SCHEDULE_TIMEZONE America/Los_Angeles Timezone for the schedule
GEMINI_MODEL gemini-flash-latest Model for summarization (auto-updates to latest stable Flash)
CONTENT_EXTRACTION_MODE markdown How to extract content: markdown (recommended), plain_text, or raw_html
MAX_EMAILS_PER_RUN 20 Max newsletters per digest — remainder waits for next run
SEND_EMPTY_DIGEST true Send a "nothing today" email when no newsletters are found
ARCHIVE_RETENTION_DAYS 30 Days before archived newsletters are auto-trashed
WEATHER_LATITUDE / WEATHER_LONGITUDE Seattle, WA Your location for the weather in the digest header
CONSOLIDATED_DIGEST false Merge all accounts into one digest (N-to-1)
CONSOLIDATED_OUTPUT "" Account ID that receives the merged digest (required when consolidated is true)

Remote config (Firestore)

For settings you want to change without redeploying, use Firestore:

https://console.firebase.google.com/project/YOUR_PROJECT_ID/firestore/databases/-default-/data/~2Fconfig~2Fdigest-pipeline

The pipeline checks Firestore on every run and falls back to config.py if Firestore is unavailable. This means you can:

  • Tweak sender_hints to tell the AI how to handle specific newsletters
  • Change MAX_EMAILS_PER_RUN during high-volume periods
  • Swap models without redeploying
  • Toggle CONSOLIDATED_DIGEST and CONSOLIDATED_OUTPUT on the fly

Accounts are stored in Firestore under config/digest-pipeline.accounts. The onboard.sh script writes here directly — no file editing or redeployment needed. You can also view and edit accounts in the Firestore console.

Sender hints are especially useful. They tell Gemini how to treat specific senders:

{
  "sender_hints": {
    "rundown": "Rapid-fire format with many items — extract top 3 only",
    "bytebytego": "Usually one deep technical topic — focus on the core insight",
    "trip.com": "Almost always promotional spam — skip unless genuinely useful"
  }
}

Edit these directly in the Firestore console — changes take effect on the next run.


Gmail Filters

Dual pipeline explained

  • To-Process-Newsletters — For daily newsletters you read and discard. Processed emails move to Archived-Newsletters and get auto-deleted after 30 days.
  • To-Process-Saved — For newsletters you want to keep forever. Processed emails move to Saved-Newsletters and stay there permanently.

Starter list

Here are some popular free newsletters to get you started:

Newsletter From address Suggested pipeline
The Rundown AI newsletter@therundownai.com Newsletters (30-day)
TLDR news@tldrnewsletter.com Newsletters (30-day)
ByteByteGo newsletter@bytebytego.com Newsletters (30-day)
Morning Brew crew@morningbrew.com Newsletters (30-day)
Tim Ferriss 5-Bullet Friday tim@fourhourbody.com Saved (forever)
James Clear 3-2-1 Thursday newsletter@jamesclear.com Saved (forever)
Lenny's Newsletter lenny@substack.com Saved (forever)
The Pragmatic Engineer gergely@pragmaticengineer.com Saved (forever)

Finding the "from" address: Open a newsletter email → click the three dots → "Show original" → look for the From: header. Or just check what shows up in the "From" column.


Cost

Component Monthly Cost
Gemini API (markdown mode) ~$0.20-0.50
Cloud Functions ~$0.00 (free tier)
Cloud Scheduler ~$0.10
Secret Manager ~$0.00 (free tier)
BigQuery ~$0.00 (free tier)
Firestore ~$0.00 (free tier)
Gmail API Free
Open-Meteo (weather) Free
Total ~$0.30-0.60/month

The digest footer includes a per-run cost so you can track spending. With ~10 newsletters/day in markdown mode, expect $0.01-0.03 per digest.


Analytics

Run data is logged to BigQuery (newsletter_digest dataset). Some useful queries:

-- Monthly cost and volume
SELECT FORMAT_TIMESTAMP('%Y-%m', timestamp) as month,
       SUM(cost_usd) as cost,
       SUM(emails_processed) as emails
FROM newsletter_digest.digest_runs
GROUP BY 1 ORDER BY 1

-- Noisiest senders (who floods your inbox most)
SELECT sender, COUNT(*) as emails
FROM newsletter_digest.digest_items
GROUP BY 1 ORDER BY 2 DESC LIMIT 20

-- Average processing time
SELECT FORMAT_TIMESTAMP('%Y-%m', timestamp) as month,
       AVG(duration_seconds) as avg_seconds
FROM newsletter_digest.digest_runs
GROUP BY 1 ORDER BY 1

Updating

When new features are released:

git pull origin main
./deploy.sh YOUR_PROJECT_ID you@gmail.com

If config.example.py has new settings, compare it with your config.py and add anything new:

diff config.py config.example.py

Troubleshooting

"Error 403: access_denied" during OAuth

You didn't add yourself as a test user. Go to:

https://console.cloud.google.com/apis/credentials/consent?project=YOUR_PROJECT_ID

Find "Test users" → Add your Gmail address → try again.

OAuth opens in the wrong Google account

If your browser is logged into multiple accounts, the OAuth URL might redirect to the wrong one. Fix:

  • Open the URL in an incognito window, or
  • Append &authuser=you@gmail.com to the URL

"Billing account not linked" or API errors

Cloud Functions won't deploy without billing. Even with the $300 free trial, you need to explicitly link it:

https://console.cloud.google.com/billing/linkedaccount?project=YOUR_PROJECT_ID

Function runs out of memory

The default deploy uses 512MB which is plenty. If you somehow hit OOM:

  • Check that requirements.txt hasn't accidentally pulled in google-cloud-aiplatform (it's huge)
  • The pipeline uses google-genai which is much lighter

"Model not found" errors

Gemini model names change. Common mistakes:

  • gemini-1.5-flashretired, doesn't work anymore
  • gemini-3-flashdoesn't exist, the correct name is gemini-3-flash-preview
  • gemini-flash-latestrecommended, auto-updates to the latest stable Flash model

No emails being processed

  1. Check that your Gmail filters are actually labeling emails: search label:To-Process-Newsletters in Gmail
  2. Make sure bootstrap created the labels (check Gmail sidebar)
  3. Re-run bootstrap if labels are missing — it's safe to re-run

"Gemini API key not found in Secret Manager or environment"

If you deployed before the Secret Manager migration, your Gemini key is still set as a Cloud Function env var but the code now looks in Secret Manager first. Fix it by storing the key:

echo -n "YOUR_GEMINI_API_KEY" | gcloud secrets create gemini-api-key \
  --data-file=- --replication-policy=automatic --project=YOUR_PROJECT_ID

Then redeploy (./deploy.sh) to remove the old env var. New installations handle this automatically during bootstrap.

Bootstrap fails halfway through

Just run it again. It's idempotent — existing resources are skipped. If it fails on the OAuth step specifically, make sure credentials.json is in the project root.


Teardown

To completely reset — delete all GCP resources and remove labels from Gmail (emails are preserved):

# Preview what will be deleted
./teardown.sh YOUR_PROJECT_ID --dry-run

# Run teardown (keeps GCP project)
./teardown.sh YOUR_PROJECT_ID

# Full nuke — also deletes the GCP project and local token.json
./teardown.sh YOUR_PROJECT_ID --nuke

What gets deleted

Resource Details
Cloud Function digest-pipeline
Cloud Scheduler trigger-newsletter-digest, trigger-sunday-edition
Secret Manager OAuth tokens, Gemini API key, cost tracker
BigQuery newsletter_digest dataset + all tables
Firestore config/digest-pipeline document
Gmail labels All 6 pipeline labels — removed from emails, then deleted
Gmail filters Any filter routing to pipeline labels
GCP project Only with --nuke (30-day recovery window)
Local token.json Only with --nuke

What's preserved

  • All emails (labels removed, messages untouched)
  • credentials.json (reusable across projects)
  • .env, config.py, git repo

Single-account teardown

To remove one account without touching shared infra:

./teardown.sh YOUR_PROJECT_ID --account mypartner

This removes that account's Gmail labels/filters and OAuth secret only.


File Structure

config.example.py   — Template config (copy to config.py)
config.py           — Code defaults (schedule, model, weather, labels, pricing). No personal data.
.env.example        — Template env vars (copy to .env)
.env                — Your env vars (API keys, project ID)
main.py             — Cloud Function entry point
pipeline.py         — Pipeline orchestration (per-account + consolidated flows)
email_client.py     — Gmail fetch, MIME parsing, relabel, cleanup, filter management
llm_client.py       — Gemini prompt construction, API calls, response parsing
digest_builder.py   — HTML assembly, weather fetch, cost footer, error banners
analytics.py        — BigQuery logging (per-account + consolidated tracking)
remote_config.py    — Firestore-backed config with fallback to config.py (accounts, sender hints, overrides)
seed_config.py      — Seeds Firestore with initial config + sender hints
analytics.py        — BigQuery logging (with per-account tracking)
oauth_flow.py       — One-time OAuth helper (run by bootstrap/onboard)
bootstrap.sh        — Full GCP setup (APIs, IAM, OAuth, labels, Firestore)
deploy.sh           — Deploy function + create/update scheduler
add-filter.sh       — CLI helper for creating Gmail filters
onboard.sh          — Add a new Gmail account to the pipeline
offboard.sh         — Remove an account (soft-delete with 90-day BQ TTL)
teardown.sh         — Full infrastructure teardown (reset to pre-onboard state)
requirements.txt    — Python dependencies
credentials.json    — OAuth client secret (you download this from GCP)
docs/
  multi-account-spec.md — Multi-account design spec
  onboarding-guide.md  — Step-by-step second account onboarding
  specs/
    consolidated-digest.md — Consolidated N-to-1 digest spec
    sunday-edition.md — Weekly recap spec
  internal/         — Specs, plans, troubleshooting (dev reference)
  TODO.md           — Planned features

License

MIT

About

Serverless pipeline that reads your Gmail newsletters daily and sends one AI-generated digest. Google Cloud Functions + Gemini API. ~$0.50/month.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors