Skip to content

links: route memory→entity edges through memory_entities, not claims (phantom entity cleanup) #56

Description

@spranab

Follow-up from the record-link RFC (#48) implementation. Deferred from the link-model PRs deliberately — independent of the link model shipping, lowest-value-per-risk of the work.

Problem

relate(src, dst, ...) unconditionally upserts both endpoints into the entities table (graph_ops.rs). When called with a memory rid as an endpoint, the rid:

  1. gets a phantom entity_type='unknown' row in entities, AND
  2. becomes an actual node in the in-memory entity graph — graph_index::build_from_db's edge loader does ensure_entity(src/dst, "unknown") for every edges/claims endpoint, so on each rebuild the rid is re-added as a graph node that expand_entities BFS traverses.

This is happening in production today: cognition/consolidate.rs transfers edges during consolidation via db.relate(&consolidated_rid, &edge.dst, ...), minting phantoms every consolidation.

Verified during #48 work (PR #53/#55). The phantoms clutter list_entities() and add stray BFS nodes; they do NOT produce incorrect recall results (the rid nodes are mostly disconnected from query-relevant entities), which is why this was safe to defer.

The real fix (not just a guard)

Memory→entity associations belong in the memory_entities join table — which graph_index::build_from_db already handles correctly (rids as keys in memory_to_entities, NOT as graph nodes). The fix:

  1. consolidate.rs edge-transfer should use link_memory_entity() (memory_entities) instead of relate() for memory→entity associations.
  2. relate() should guard against rid-shaped endpoints creating entities rows (defense in depth).
  3. graph_index::build_from_db edge loader should skip rid-shaped endpoints (so existing claims-with-rid don't re-pollute the in-memory graph on rebuild).
  4. Cleanup migration: delete existing phantom entities rows (rid-shaped name + entity_type='unknown') and the claims-with-rid-endpoints that consolidate.rs created.

Why deferred, not done in #53/#55

  • Lowest value (hygiene; no correctness impact).
  • Highest risk-per-value (touches the entity-graph build path = recall quality).
  • Independent of the link model — the substrate + recall + bindings (the actual value) ship without it.
  • A partial guard would leave the in-memory graph still polluted on rebuild; the complete fix is a focused multi-touch change (relate + graph_index build + consolidate + migration) that deserves its own pass + soak.

Needs: a rid-shape helper (crate::id::is_rid_shaped), a test asserting relate(rid, entity) creates no entities row, and a test asserting BFS doesn't surface rid nodes post-fix.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions