Follow-up from the record-link RFC (#48) implementation. Deferred from the link-model PRs deliberately — independent of the link model shipping, lowest-value-per-risk of the work.
Problem
relate(src, dst, ...) unconditionally upserts both endpoints into the entities table (graph_ops.rs). When called with a memory rid as an endpoint, the rid:
- gets a phantom
entity_type='unknown' row in entities, AND
- becomes an actual node in the in-memory entity graph —
graph_index::build_from_db's edge loader does ensure_entity(src/dst, "unknown") for every edges/claims endpoint, so on each rebuild the rid is re-added as a graph node that expand_entities BFS traverses.
This is happening in production today: cognition/consolidate.rs transfers edges during consolidation via db.relate(&consolidated_rid, &edge.dst, ...), minting phantoms every consolidation.
Verified during #48 work (PR #53/#55). The phantoms clutter list_entities() and add stray BFS nodes; they do NOT produce incorrect recall results (the rid nodes are mostly disconnected from query-relevant entities), which is why this was safe to defer.
The real fix (not just a guard)
Memory→entity associations belong in the memory_entities join table — which graph_index::build_from_db already handles correctly (rids as keys in memory_to_entities, NOT as graph nodes). The fix:
consolidate.rs edge-transfer should use link_memory_entity() (memory_entities) instead of relate() for memory→entity associations.
relate() should guard against rid-shaped endpoints creating entities rows (defense in depth).
graph_index::build_from_db edge loader should skip rid-shaped endpoints (so existing claims-with-rid don't re-pollute the in-memory graph on rebuild).
- Cleanup migration: delete existing phantom
entities rows (rid-shaped name + entity_type='unknown') and the claims-with-rid-endpoints that consolidate.rs created.
Why deferred, not done in #53/#55
- Lowest value (hygiene; no correctness impact).
- Highest risk-per-value (touches the entity-graph build path = recall quality).
- Independent of the link model — the substrate + recall + bindings (the actual value) ship without it.
- A partial guard would leave the in-memory graph still polluted on rebuild; the complete fix is a focused multi-touch change (relate + graph_index build + consolidate + migration) that deserves its own pass + soak.
Needs: a rid-shape helper (crate::id::is_rid_shaped), a test asserting relate(rid, entity) creates no entities row, and a test asserting BFS doesn't surface rid nodes post-fix.
Follow-up from the record-link RFC (#48) implementation. Deferred from the link-model PRs deliberately — independent of the link model shipping, lowest-value-per-risk of the work.
Problem
relate(src, dst, ...)unconditionally upserts both endpoints into theentitiestable (graph_ops.rs). When called with a memory rid as an endpoint, the rid:entity_type='unknown'row inentities, ANDgraph_index::build_from_db's edge loader doesensure_entity(src/dst, "unknown")for everyedges/claimsendpoint, so on each rebuild the rid is re-added as a graph node thatexpand_entitiesBFS traverses.This is happening in production today:
cognition/consolidate.rstransfers edges during consolidation viadb.relate(&consolidated_rid, &edge.dst, ...), minting phantoms every consolidation.Verified during #48 work (PR #53/#55). The phantoms clutter
list_entities()and add stray BFS nodes; they do NOT produce incorrect recall results (the rid nodes are mostly disconnected from query-relevant entities), which is why this was safe to defer.The real fix (not just a guard)
Memory→entity associations belong in the
memory_entitiesjoin table — whichgraph_index::build_from_dbalready handles correctly (rids as keys inmemory_to_entities, NOT as graph nodes). The fix:consolidate.rsedge-transfer should uselink_memory_entity()(memory_entities) instead ofrelate()for memory→entity associations.relate()should guard against rid-shaped endpoints creatingentitiesrows (defense in depth).graph_index::build_from_dbedge loader should skip rid-shaped endpoints (so existing claims-with-rid don't re-pollute the in-memory graph on rebuild).entitiesrows (rid-shaped name +entity_type='unknown') and the claims-with-rid-endpoints that consolidate.rs created.Why deferred, not done in #53/#55
Needs: a rid-shape helper (
crate::id::is_rid_shaped), a test assertingrelate(rid, entity)creates noentitiesrow, and a test asserting BFS doesn't surface rid nodes post-fix.