Skip to content

Commit bd42530

Browse files
committed
feat: add semantic search mode (--mode semantic)
Natural-language code search via HNSW vector similarity: - Embeds query via Ollama at search time using ~/.config/magellan/config.toml - Loads persisted HNSW index (symbols) from Magellan database - Resolves vector hits back to graph_entities with distance + similarity score - Supports --path filtering and all output formats - Graceful error when no HNSW index exists - llmgrep only searches; Magellan owns embedding generation New files: - src/query/semantic.rs — core implementation + 8 unit tests Updated: - src/commands/search.rs — Semantic dispatch branch - src/display.rs — output_semantic() formatting - src/output.rs — SemanticMatch + SemanticSearchResponse types - src/query/mod.rs — module export - src/cli.rs — SearchMode::Semantic variant + examples - Cargo.toml — added ureq, toml deps; bumped to 3.8.0 - MANUAL.md, README.md, CHANGELOG.md — documentation
1 parent 849d936 commit bd42530

12 files changed

Lines changed: 994 additions & 10 deletions

File tree

CHANGELOG.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,23 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [Unreleased]
9+
10+
### Added
11+
12+
- **`semantic` search mode** (`--mode semantic`) — Natural-language code search via HNSW vector similarity:
13+
- Embeds the query via Ollama at search time using `~/.config/magellan/config.toml`
14+
- Loads the persisted HNSW index (`symbols`) from the Magellan database
15+
- Resolves vector hits back to `graph_entities` with file paths, kinds, and FQNs
16+
- Returns cosine distance + 0-100 similarity score
17+
- Supports `--path` filtering and all output formats (`human`, `json`, `pretty`)
18+
- Graceful error when no HNSW index exists: suggests `magellan embed --db <db>`
19+
- llmgrep only **searches** embeddings; Magellan owns embedding generation
20+
```bash
21+
llmgrep search --db code.db --query "parse command line arguments" --mode semantic
22+
llmgrep search --db code.db --query "error handling" --mode semantic --output json
23+
```
24+
825
## [3.7.0] - 2026-05-29
926

1027
### Added

Cargo.lock

Lines changed: 180 additions & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[package]
22
name = "llmgrep"
3-
version = "3.7.0"
3+
version = "3.8.0"
44
edition = "2021"
55
description = "Smart grep over Magellan code maps with schema-aligned JSON output"
66
license = "GPL-3.0-only"
@@ -32,6 +32,8 @@ signal-hook = "0.3"
3232
sqlitegraph = "3.2.5"
3333
tempfile = "3.10"
3434
thiserror = "1.0"
35+
toml = "0.8"
36+
ureq = "3"
3537

3638
[dev-dependencies]
3739
rusqlite = "0.31"

MANUAL.md

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# llmgrep Manual
22

3-
**v3.6.0** (shipped 2026-05-28)
3+
**v3.8.0** (shipped 2026-06-07)
44

55
llmgrep is a read-only query tool for Magellan's code map. Part of the sqlitegraph toolset alongside Magellan (indexing), Mirage (CFG analysis), and Splice (precision editing).
66

@@ -35,18 +35,49 @@ llmgrep evolve --db <FILE> [OPTIONS]
3535
| `references` | Search references to symbols |
3636
| `calls` | Search function calls |
3737
| `implements` | Search type-trait implementations |
38+
| `semantic` | Natural-language semantic search via vector similarity (requires embeddings) |
3839
| `docs` | Search source documents (wiki, specs, messages) |
3940
| `facts` | Search candidate knowledge triples |
4041
| `auto` | Run symbols, references, and calls modes combined |
4142

43+
### Semantic search (`--mode semantic`)
44+
45+
Natural-language code search using HNSW vector similarity. Finds symbols by meaning rather than exact name match.
46+
47+
**Prerequisites:**
48+
- Embeddings must be generated by Magellan: `magellan embed --db <db>`
49+
- Ollama must be running with the same embedding model configured in `~/.config/magellan/config.toml`
50+
51+
**How it works:**
52+
1. llmgrep reads `~/.config/magellan/config.toml` to discover the Ollama endpoint and model
53+
2. The query is embedded via Ollama (query embedding happens at search time)
54+
3. The persisted HNSW index is loaded from the database
55+
4. Returns the nearest symbol vectors sorted by cosine similarity
56+
57+
**Important:** llmgrep only **searches** existing embeddings — it does not create them. Magellan owns embedding generation and freshness.
58+
59+
**Examples:**
60+
```bash
61+
# Find code related to command-line argument parsing
62+
llmgrep search --db code.db --query "parse command line arguments" --mode semantic
63+
64+
# Semantic search with path filter
65+
llmgrep search --db code.db --query "database connection pooling" --mode semantic --path src/db
66+
67+
# JSON output for programmatic use
68+
llmgrep search --db code.db --query "error handling" --mode semantic --output json
69+
```
70+
71+
**Graceful degradation:** If the database has no HNSW index, llmgrep returns a clear error suggesting `magellan embed --db <db>`. All other search modes work normally without embeddings.
72+
4273
### Options
4374

4475
**Required:**
4576
- `--db <FILE>` — Path to Magellan SQLite database
4677
- `--query <STRING>` — Search query string
4778

4879
**Search mode:**
49-
- `--mode <MODE>` — Search mode: `symbols` (default), `references`, `calls`, `implements`, `docs`, `facts`, `auto`
80+
- `--mode <MODE>` — Search mode: `symbols` (default), `references`, `calls`, `implements`, `semantic`, `docs`, `facts`, `auto`
5081

5182
**Filters:**
5283
- `--path <PATH>` — Filter by file path prefix

0 commit comments

Comments
 (0)