System design, data flow, module structure, and design decisions for rag-params-finder.
rag-params-finder is a two-process system for RAG parameter sweep experimentation:
- Python CLI (thin client) — submits experiment configs to the server
- FastAPI Server (engine) — orchestrates the full pipeline end-to-end
- React Dashboard — visualization, sweep controls (pause/resume/cancel/delete), and results exploration
The CLI submits configs; the Dashboard observes progress and controls active sweeps. All pipeline business logic lives in the server.
CLI (submit YAML)
│
│ POST /experiments
▼
FastAPI Server
│
│ BackgroundTask per experiment
▼
┌──────────────────────────────────────────┐
│ Pipeline (one run per config combination)│
│ │
│ PDF/TXT/MD/CSV → Chunk → Embed │
│ → Atlas write → Query → Rerank │
│ → Store results │
└──────────────┬───────────────────────────┘
│
▼
MongoDB (Atlas cloud or Atlas Local)
┌────────────┐
│ chunks │ ← embeddings + vector index
│ experiments│
│ run_status │ ← phase tracking
│ results │
└────────────┘
│
│ polling (every 2s)
▼
React Dashboard
| Library | Purpose |
|---|---|
| FastAPI | REST API server |
| Python 3.12 | Language runtime |
| Voyage AI SDK | Embeddings + reranking (hosted) |
| sentence-transformers | Local embeddings + reranking (offline) |
| SIE (Superlinked Inference Engine) | Open-source embeddings via remote gateway or optional Docker (sie_embedder.py) |
| MongoDB Atlas / PyMongo | Vector storage + search (cloud or Atlas Local Docker) |
| LangChain text splitters | Recursive, fixed, token chunking |
| NLTK | Sentence chunking |
| tiktoken | Token-based chunking |
| pypdf | PDF text extraction |
| Typer | CLI framework |
| Rich | CLI output formatting |
| pydantic-settings | Centralized settings from .env |
| Library | Purpose |
|---|---|
| React 19 | UI framework |
| TypeScript 5.8 | Type safety |
| Vite 6 | Build tool |
| Tailwind CSS | Styling (locally installed, not CDN) |
rag-params-finder/
├── server/
│ ├── main.py # FastAPI app entry; lifespan: indexes + orphan reconciliation
│ ├── settings.py # Centralized pydantic-settings config
│ ├── api/
│ │ ├── experiments.py # CRUD, explore, db-stats, pause, resume, cancel, delete
│ │ ├── experiments_shared.py # Mongo helpers incl. db-stats aggregation
│ │ ├── sweep.py # POST /api/v1/sweep, GET /api/v1/best-config (Tier 1 ranked sweep)
│ │ └── runs.py # GET /runs/{id}/status
│ ├── core/
│ │ ├── orchestrator.py # run_sweep(), resume_sweep(), run_single() pipeline; index preflight
│ │ ├── embedder_factory.py # get_embedder(provider) — voyage | local | sie dispatch
│ │ ├── sie_embedder.py # SIE embeddings (BGE-M3, Stella-v5, SPLADE-v3)
│ │ ├── sie_guard.py # SIE preflight — SIE_ENABLED + gateway reachability
│ │ ├── aim_logger.py # Aim experiment run logging (no-op on init failure)
│ │ ├── executors.py # SWEEP_EXECUTOR + HEAVY_READ_EXECUTOR (isolate long work from API pool)
│ │ ├── search_index_plan.py # required indexes from config; capacity assessment (pure)
│ │ ├── search_index_guard.py # cluster snapshot + ensure retry; SearchIndexMismatchError
│ │ ├── startup_reconciliation.py # fix stale running experiments on boot
│ │ ├── atlas_storage.py # Atlas Admin API quota + dbStats footprint
│ │ ├── pdf_parser.py # pypdf text extraction
│ │ ├── query_loader.py # persona JSON → Query dataclass list
│ │ ├── model_registry.py # embedding + reranking model catalog
│ │ ├── embedder.py # Voyage embed(); voyage-context-3 → contextualized_embed + segment split
│ │ ├── local_embedder.py # sentence-transformers embedding (lazy-load, cached)
│ │ ├── reranker.py # Voyage reranking client
│ │ ├── local_reranker.py # CrossEncoder reranking (lazy-load, cached)
│ │ ├── retriever.py # Atlas Vector Search (dense/sparse/hybrid)
│ │ ├── results_analyzer.py # aggregates scores, min-max normalization
│ │ └── chunkers/
│ │ ├── recursive.py # LangChain RecursiveCharacterTextSplitter
│ │ ├── fixed.py # fixed-size character windows
│ │ ├── token.py # tiktoken-based
│ │ ├── sentence.py # NLTK sentence tokenizer
│ │ └── semantic.py # embedding-similarity sentence grouping
│ ├── models/
│ │ ├── enums.py # ChunkingMethod, RetrievalMethod, Phase
│ │ ├── config.py # Pydantic experiment config + provider validators
│ │ ├── status.py # RunStatus model
│ │ └── results.py # QueryResult, SearchResult, Chunk
│ └── db/
│ ├── atlas.py # MongoDB connection singleton (TLS for cloud URIs only)
│ ├── mongodb_uri.py # is_atlas_uri(), parse_atlas_cluster_name() — cloud vs local detection
│ └── indexes.py # collection + search index creation; bootstrap_indexes() on local URI
├── cli/
│ ├── main.py # Typer app (run, cancel, pause, resume, delete, indexes, version)
│ ├── indexes_cmd.py # indexes list | reset subcommands
│ ├── config_loader.py # YAML parser + model registry validation
│ └── api_client.py # HTTP client to server
├── tests/
│ ├── test_search_index_plan.py # index requirement + capacity scenarios
│ └── test_search_index_guard.py # preflight guard (mocked I/O)
└── frontend/src/
├── App.tsx # root component (screen routing)
├── components/
│ ├── DashboardShell.tsx # shared dashboard shell (header, nav)
│ ├── AppPageChrome.tsx # shared page chrome wrapper
│ ├── LoadingFeedbackPanel.tsx # network loading progress (byte-level, activity feed)
│ ├── ExperimentProgressCard.tsx # experiment progress card (circular indicator, reusable)
│ ├── PollingIndicator.tsx # subtle "Syncing..." indicator during polls
│ ├── ConfirmDeleteModal.tsx # delete confirmation modal with experiment details
│ ├── ExperimentControlButtons.tsx # pause / resume / cancel on detail screen
│ ├── CollapsibleCard.tsx # reusable collapsible section (localStorage state)
│ ├── VectorDbStatsPanel.tsx # cluster-grouped storage stats (experiments list)
│ ├── ExperimentVectorDbStatsCard.tsx # per-experiment db-stats on detail screen
│ ├── ExperimentsScreen.tsx # list view (collapsible rows, vector DB stats, delete)
│ ├── ExperimentDetailScreen.tsx # overview metrics, outcome banners, runs table
│ └── SearchExplorerScreen.tsx # results analysis (ranked configs, per-query, paginated)
├── services/
│ ├── apiClient.ts # fetch wrapper (all server API calls)
│ └── fetchWithProgress.ts # streamed fetch with byte-level progress tracking
├── utils/
│ ├── experimentStatus.ts # terminal/running helpers + summarizeExperimentRuns()
│ └── experimentDbStats.ts # db-stats response normalizers
└── types/index.ts # hand-mirrored TypeScript types from Python models
Each run progresses through phases tracked in the run_status collection:
| Phase | What happens |
|---|---|
QUEUED |
Run created, waiting to start |
PARSING |
Source files (PDF/TXT/MD/CSV) → plain text |
CHUNKING |
Text → chunks (per the configured method and params) |
EMBEDDING |
Chunks → embedding vectors (Voyage API, local model, or SIE gateway) |
STORING |
Write chunks + embeddings to MongoDB |
QUERYING |
Execute all test queries against the vector index |
RERANKING |
Cross-encoder reranks top-K initial results to top-K final |
COMPLETE / FAILED / INTERRUPTED |
Terminal state |
Three embedding providers routed via embedder_factory.get_embedder(provider) — the orchestrator never branches on provider directly.
Embedding provider (embedding.provider):
local→server/core/local_embedder.py→ sentence-transformersall-MiniLM-L6-v2(384-dim)voyage→server/core/embedder.py→ Voyage AI API (1024-dim);voyage-context-3usescontextualized_embed()with per-document segment splitting (32K-token window)sie→server/core/sie_embedder.py→ BGE-M3, Stella-v5 (1024-dim dense), SPLADE-v3 (30522-dim sparse); preflight viasie_guard.py; see SIE setup
Retrieval configuration (retrieval.retrievers):
- List of retriever types to sweep — each entry becomes one run (never combined)
- Traditional:
{type: dense|sparse|hybrid}— no provider/model needed - Rerankers:
{type: reranker|cross_encoder, provider: local|voyage, model: ...}provider: local→server/core/local_reranker.py→ CrossEncodercross-encoder/ms-marco-MiniLM-L-6-v2provider: voyage→server/core/reranker.py→ Voyage AI rerank API- Reranker runs fetch dense candidates internally before reranking
- Old format (
methods+retrieval_provider/retrieval_model) auto-migrates toretrieversvia Pydantic validator
Provider flows explicitly through RunParams → orchestrator → embedder_factory → embedder/reranker. The model_registry.py validates that model names match the declared provider at config load time.
Tier 1 sweep API: POST /api/v1/sweep accepts a ranked sweep request (caller supplies corpus list); GET /health includes sie and version fields when SIE is configured.
Two deployment modes share identical query syntax ($vectorSearch, $search):
| Mode | URI pattern | Index provisioning |
|---|---|---|
| Atlas cloud | mongodb+srv://... |
Manual in Atlas UI on M0/M2/M5; server preflights on submit |
| Atlas Local (Docker) | mongodb://localhost:27017/...?directConnection=true |
bootstrap_indexes() on server boot — no UI steps |
Detection: server/db/mongodb_uri.py (is_atlas_uri). TLS enabled only for cloud URIs (server/db/atlas.py). Docker: ./start-services.sh --local or RAG_LOCAL_ATLAS=1. See MongoDB Setup.
| Collection | Purpose | Key Indexes |
|---|---|---|
chunks |
Text chunks + embeddings | Vector index on embedding (384 or 1024-dim cosine) + filter fields |
experiments |
Experiment metadata + sweep config | created_at, status |
run_status |
Per-run phase tracking | experiment_id, phase |
results |
Per-query top-K results | experiment_id, query_id |
Critical: always filter vector search by embedding_model — vectors from different models have incompatible geometry and must never be mixed in the same search.
See docs/adr/ for Architecture Decision Records:
- ADR-001: Why CLI + Server (two-process architecture)
- ADR-002: Why dual embedding/reranking providers
- ADR-003: Why MongoDB Atlas over Pinecone/Weaviate
Key design choices not covered by ADRs:
| Decision | Rationale |
|---|---|
FastAPI BackgroundTasks (not Celery) |
No queue infrastructure needed while sweep runs execute sequentially (see SLICE-16 for honoring parallelism > 1 and optional Celery path) |
| Hand-mirrored TypeScript types | No codegen tooling (typeshare/quicktype); 5 types + 3 enums is manageable manually |
| Separate vector indexes per dimension | Atlas requires exact numDimensions — vector_index_1024 (Voyage) and vector_index_384 (local) coexist on the same collection |
| Lazy-load + cache for local models | First run downloads from HuggingFace; subsequent runs instant — avoids blocking server startup |
numpy<2 pinned |
torch compiled against NumPy 1.x ABI; NumPy 2.x causes _ARRAY_API not found crashes |
Shared DashboardShell + AppPageChrome components |
Unified header, navigation, and page layout across all screens — consistent UX, easier maintenance |
fetchWithProgress for streamed downloads |
ReadableStream byte-level progress → visible loading bars; better UX than spinner for large payloads |
| Pagination on all screens | Prevents DOM overload and cognitive fatigue; default 10 items per page (experiments/runs), 5 per page (configs) |
| Dual loading indicators (panel + polling badge) | Initial load → full progress panel; background polls → subtle "Syncing..." badge; clear state transitions |
| Two progress patterns (network vs experiment) | LoadingFeedbackPanel for network/API loads (byte-level); ExperimentProgressCard for experiment execution (run completion); distinct concerns, reusable components |
| Cascade delete with confirmation | DELETE endpoint scrubs all collections (experiments, run_status, chunks, results); ConfirmDeleteModal shows experiment details + deletion statistics; prevents deletion of running experiments |
| Boot orphan reconciliation | BackgroundTasks sweeps die on process exit; startup marks in-flight runs interrupted and sets terminal experiment status — separate from Slice 10 retry |
| Pause / resume sweeps | Cooperative halt via _SweepControl threading events; resume_sweep() skips completed parameter signatures; status paused is non-terminal |
| Vector DB stats API + dashboard | GET /experiments/vector-db-stats and /{id}/db-stats; estimated storage from chunk counts + model dimensions; optional Atlas quota bar with tier/provider/region via resolve_tier_specs() |
| Timezone-aware UTC timestamps | PyMongo tz_aware=True; all writes use datetime.now(timezone.utc) so JSON includes Z and browser elapsed/duration math is correct |
started_at on first run |
Duration and ETA exclude queue time between submission and first pipeline phase |
| Search index preflight | required_search_indexes(config) + cluster snapshot; fail before runs if missing/quota exhausted; HTTP 422 on submit |
| Atlas index CLI | indexes list / indexes reset for M0 3-index cluster-wide quota troubleshooting |
| Option A scoped logging | [rag-params-finder] [Scope] operation — details in server (scope_log.py) and dashboard dev console (devLog.ts) |
Dedicated thread pools (executors.py) |
Sweeps and heavy Mongo aggregations no longer compete with lightweight GET /experiments on the default executor |
| Batched vector-db-stats queries | Three aggregation pipelines replace per-experiment N+1 round-trips on the experiments list |
| Decoupled dashboard polling | List 2 s / vector DB stats 60 s / Search Explorer 15 s while running — each with appropriate fetch timeouts in frontend/src/constants.ts |
| Search Explorer poll indicator timing | PollingIndicator showDelay + minVisibleMs reduce badge flicker on 15 s explore polls |
| Mode | Command | Notes |
|---|---|---|
| Manual (default dev) | uvicorn + npm run dev |
Two terminals; hot reload |
| Docker (prod profile) | ./start-services.sh |
Server + dashboard containers; Atlas cloud from .env |
| Docker + Atlas Local | ./start-services.sh --local |
Adds mongodb-atlas-local container; auto-provisions search indexes |
| Docker (dev profile) | docker compose --profile dev up |
Bind mounts + HMR |
Atlas connection string and API keys live in .env on the host (mounted into the server container). See SLICE-14-DOCKER-COMPOSE.md and MongoDB Setup.
| Enhancement | Notes |
|---|---|
| Run recovery (retry failed / interrupted runs) | Reconciliation on boot ✅ — status fix only. Retry planned as Slice 10: recover CLI + API; RECOVER_ON_BOOT = retry INTERRUPTED only |
| SSE live updates | Replace 2-second polling with Server-Sent Events |
Parallel sweep (execution.parallelism > 1) |
Planned as Slice 16 — Parallel Sweep Runs; bounded in-process pool first; Celery + Redis when multi-process fairness or isolation is needed |
| Dashboard-triggered runs | Submit experiments from the React UI, not just CLI |
| Experiment cleanup CLI | rag-params-finder cleanup --older-than 30d |
- Extending the System — add new models, chunkers, or endpoints
- Development Guide — dev loop, quality gates, slice playbook
- ADR-001 · ADR-002 · ADR-003 — detailed rationale for key decisions