All agent implementations are not the same: Hermes Agent integration can't be compared to the Claude Code and OpenClaw plugins #2253

hammerhoundai · 2026-05-26T15:23:34Z

hammerhoundai
May 26, 2026

docs/en/agent-integrations/01-overview.md says:

## Integration depths

Some integrations go beyond what a generic MCP client can do:

- **Generic MCP clients** call OpenViking on demand through tools the model decides to invoke. Setup is one config snippet.
- **Hooks-based / native plugins** (Claude Code, Codex, OpenClaw, Hermes Agent, AstrBot) drive recall and capture from runtime lifecycle events — every prompt, every turn, session start/end, compact, subagent spawn. The model doesn't need to "remember to recall."
- **SDK integrations** (LangChain/LangGraph) wire OpenViking into framework-native abstractions such as retrievers, tools, chat history, stores, and middleware.

For agents whose runtime exposes hooks, middleware, or a context-engine slot, the native integration path is usually the better default.

However this misses the nuance that not all the plugins and builtin integrations work in the same way. Indeed Claude Code and OpenClaw integrations work very similarly as far as they can given the difference in their nature. But hermes agent is a significant outlier which the document doesn't correctly highlight.

Hermes vs Claude Code: OpenViking Plugin

I read both implementations line by line. Here's what I found.

The Headline

Hermes gives the model context about what the user asked LAST turn. Claude Code gives the model context about what the user is asking THIS turn.

That's not a nuance. It's the difference between a working memory system and an architectural misunderstanding.

How Hermes Works (Stale Prefetch)

Turn N (user asks about AWS):
  Agent responds → background thread searches OV for "AWS" → stores in cache

Turn N+1 (user asks about CSS):
  Agent starts → pulls cache → injects results about "AWS"
  The query "CSS" is literally ignored

The evidence from openviking/__init__.py:

def prefetch(self, query: str, ...) -> str:
    # The query parameter? Completely ignored.
    # Just returns whatever the background thread found.
    with self._prefetch_lock:
        result = self._prefetch_result
    return f"## OpenViking Context\n{result}"

First turn of every session gets nothing. Every subsequent turn gets the previous turn's topic. This isn't a bug — it's how MemoryProvider was designed. queue_prefetch() fires at the end of turn N, prefetch() consumes it at the start of turn N+1. The API signature takes a query parameter but the OpenViking provider throws it away.

The background search that produces this stale context is also bare-minimum: single POST /api/v1/search/find with top_k=5, no scope, no score filter, no ranking. Takes top 3 abstracts from whatever comes back and formats them as - [0.87] Abstract text (uri) lines.

How Claude Code Works (Synchronous Recall)

User types "how do I fix the CSS layout bug?"
  → hook fires BEFORE the model sees it
  → searches OV with "how do I fix the CSS layout bug?"
  → ranks results (vector score + query profile boosts + lexical overlap)
  → resolves content within token budget
  → injects <openviking-context> block into the prompt
  → model gets relevant context about CSS bugs

Every turn. With the current query. Within 8 seconds.

What Else Is Different

Recall quality

	Hermes	Claude Code
Search scope	Unscoped (whole OV)	3 parallel scoped searches: user/memories, agent/memories, agent/skills
Candidates	top_k=5, hardcoded	max(recallLimit × 2, 8) per source = up to 36
Ranking	Raw vector score	Vector + query intent boosts + lexical overlap + leaf preference
Score threshold	None	0.35 (configurable)
Dedup	None	By abstract text (events/cases by URI)
Token budget	None — raw top 3	2000 tokens, items beyond budget degrade to URI hints
Content resolution	None — just abstracts	Configurable: prefer abstract or fetch full content
Injection format	`- [score] abstract (uri)` lines	Structured `<openviking-context>` with content lines and hint lines

Capture

	Hermes	Claude Code
Content	`user_content[:4000]` + `assistant_content[:4000]`	Full transcript JSONL, incremental delta only
Tool interactions	Not captured at all	Tool inputs preserved verbatim, tool results dropped (but configurable)
Pollution prevention	None	Strips 5 types of injected context before push
Commit timing	Session end only	When pending_tokens ≥ 20,000 + PreCompact + SessionEnd + SubagentStop
What crashes lose	All pending turns	Nothing (committed incrementally)

The commit timing difference is significant. An 8-hour Hermes session that crashes or gets OOM-killed loses everything since the last session end. A Claude Code session loses at most 20,000 tokens of pending messages (because it commits mid-session whenever that threshold is crossed). The PreCompact hook also commits before context compaction rewrites the transcript.

Subagent isolation

Claude Code gives each subagent its own OV session (cc-<parent>__agent-<agentType>) with a typed namespace in the agent header. Subagent memories segregate by type in viking://agent/<type>/memories/.

Hermes has no subagent isolation. on_delegation() fires on the parent side with just task+result text — no transcript, no separate session.

Configuration

Hermes: 5 env vars (endpoint, key, account, user, agent). Everything else is hardcoded.

Claude Code: 33 env vars + ovcli.conf + ov.conf fallback chain. Every tuning knob is configurable.

Hooks

Hermes has ~5 integration points in the agent loop. Claude Code has 7 dedicated hook scripts with individual timeouts:

SessionStart     (120s) — inject profile + archive overview
UserPromptSubmit  (8s)  — synchronous recall
Stop             (45s)  — incremental capture + threshold commit
PreCompact       (30s)  — synchronous commit before transcript rewrite
SessionEnd       (30s)  — final commit
SubagentStart    (10s)  — create isolated OV session
SubagentStop     (45s)  — push subagent transcript + commit

The most important one that Hermes has no equivalent for is PreCompact. When context compression happens, Hermes loses any pending OV messages. Claude Code commits them first.

Bottom Line

These aren't two flavors of the same integration. The Claude Code plugin is a context engine — it actively manages recall, capture, ranking, injection, commit timing, and subagent isolation. The Hermes plugin is a thin adapter — it converts Python method calls to HTTP and delegates everything else to either the server or the model.

The stale prefetch alone means that for the lifetime of any Hermes session, the automatic memory context is always about the wrong topic. The model can compensate by calling viking_search manually, but the "automatic" recall that every turn gets for free is working against it, not for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All agent implementations are not the same: Hermes Agent integration can't be compared to the Claude Code and OpenClaw plugins #2253

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

All agent implementations are not the same: Hermes Agent integration can't be compared to the Claude Code and OpenClaw plugins #2253

Uh oh!

hammerhoundai May 26, 2026

Hermes vs Claude Code: OpenViking Plugin

The Headline

How Hermes Works (Stale Prefetch)

How Claude Code Works (Synchronous Recall)

What Else Is Different

Recall quality

Capture

Subagent isolation

Configuration

Hooks

Bottom Line

Replies: 0 comments

hammerhoundai
May 26, 2026