Skip to content

Commit e0239c1

Browse files
committed
docs: add AP News/Al Jazeera to supported sources table, update CLI reference, fix scraper count
1 parent 80b6e34 commit e0239c1

2 files changed

Lines changed: 13 additions & 2 deletions

File tree

docs/getting-started.md

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,7 +63,7 @@ Test that everything works:
6363
# Check available scrapers
6464
newswatch --list_scrapers
6565

66-
# Should show 60 stable scrapers
66+
# Should show 63 stable scrapers
6767
```
6868

6969
## Your First Scraping Session
@@ -216,10 +216,17 @@ print(f"\nRecent articles (>= Jan 15): {len(recent)}")
216216
| `-k, --keywords` | Comma-separated search terms | `"bank,kredit,fintech"` |
217217
| `-sd, --start_date` | Start date (YYYY-MM-DD) | `"2025-01-01"` |
218218
| `-s, --scrapers` | Specific scrapers or "auto"/"all" | `"kompas,tempo"` |
219-
| `-of, --output_format` | Output format: csv, xlsx, or json | `"csv"` |
219+
| `-of, --output_format` | Output format: csv, xlsx, json, or jsonl | `"csv"` |
220220
| `-o, --output_path` | Custom output file path | `"news-watch-output.csv"` |
221221
| `-v, --verbose` | Show detailed progress | (flag only) |
222222
| `--list_scrapers` | Show available scrapers | (flag only) |
223+
| `--max-pages` | Max pages per scraper (latest mode) | `2` |
224+
| `--scraper-timeout` | Timeout per scraper in seconds | `30` |
225+
| `--time-range` | Filter by ISO8601 time range | `"2025-01-01T00:00:00/2025-01-31T23:59:59"` |
226+
| `--dedup-file` | Skip articles already in this file | `"previous-output.csv"` |
227+
| `--proxy` | Proxy URL for all requests | `"http://proxy:8080"` |
228+
| `--progress` | Show progress bar (requires tqdm) | (flag only) |
229+
| `--health-report` | Run health checks instead of scraping | (flag only) |
223230

224231
## Next Steps
225232

docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,10 @@ print(len(latest))
5151

5252
| Source | Slug | Search | Latest | Notes |
5353
|--------|------|--------|--------|-------|
54+
| **Global Sources** | | | | |
55+
| AP News | `apnews` | Yes | Yes | Topic hub search; keyword filtering |
56+
| Al Jazeera | `aljazeera` | No | Yes | Latest-only via RSS; search JS-rendered |
57+
| **Indonesian Sources** | | | | |
5458
| Antara News | `antaranews` | Yes | Yes | Strict search |
5559
| BBC News | `bbc` | Yes | Yes | Strict search |
5660
| Bali Post | `balipost` | No | Yes | Latest-only |

0 commit comments

Comments
 (0)