semantic chunker ignores overlap — produces duplicate runs across overlap sweep values

**Summary**

When a sweep includes the `semantic` chunking method together with multiple `overlaps`, every overlap value produces byte-for-byte identical chunks, because `semantic` never receives the overlap parameter.

**Where**
- `server/core/chunkers/__init__.py:39` dispatches `chunk_semantic(text, chunk_size)` — the `overlap` argument is dropped.
- `server/core/chunkers/semantic.py:9` — `def chunk_semantic(text, chunk_size)` doesn't accept `overlap`.
- `server/models/config.py` `expand_sweep` still Cartesians `chunk_sizes × overlaps` for all methods including semantic.

**Impact**

With e.g. `overlaps: [50, 100, 150]`, the three semantic runs are identical but are each embedded, stored in Atlas, queried, and scored — 3× the API/storage/compute cost, plus three indistinguishable rows in the results table that look like a comparison but aren't.

**Reproduce**

Run any sweep with `methods: [semantic]` and `overlaps: [50, 100, 150]` → three runs, identical chunk output.

**Proposed fix**

Implement a real overlap for semantic chunking: carry the trailing sentence(s) of each semantic group into the start of the next group (sentence-granular, consistent with how the `sentence` chunker handles overlap). This makes `overlap` a meaningful dimension for semantic rather than a no-op.

Alternative considered: dedupe semantic in `expand_sweep` so it runs once. Rejected in favour of giving overlap real meaning, but happy to go that route if you prefer.

I'm happy to open a PR for the proposed fix if you're on board with the direction.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

semantic chunker ignores overlap — produces duplicate runs across overlap sweep values #44

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

semantic chunker ignores overlap — produces duplicate runs across overlap sweep values #44

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions