Skip to content

davccavalcante/agenticstash

Agentic Stash

status: stable license version node tests coverage runtime deps

Agentic Stash

Star History Chart

Deterministic record and replay for Massive Intelligence (IM) agents. The rr of the agentic world, the git stash of agent runs. Zero runtime dependencies.

Agents are non-deterministic by design: model outputs are sampled, tool and MCP responses depend on external state, the clock and randomness drift. "Worked yesterday, fails today" is the normal state without replay infrastructure, and post-mortem becomes forensic guesswork. agenticstash captures every source of non-determinism a run touches and serves it back on replay, turning an irreproducible production failure into a step-through-debuggable one. Then fork a run to explore an alternate decision, diff two runs to find exactly where they diverged, seal a run with a tamper-evident digest, and redact secrets before they ever reach storage.

Determinism is by substitution: Agentic Stash replays the values the original run observed; it does not make a model deterministic. That is what makes the replay exact and the debugging real.

Core promise: zero required runtime dependencies, one facade you instrument once, strict TypeScript types, ESM + CJS dual distribution, a node-free core that runs on the edge, and SLSA provenance on every release.


Install

pnpm add @takk/agenticstash
# or: npm install @takk/agenticstash
# or: yarn add @takk/agenticstash
# or: bun add @takk/agenticstash

The package has zero required runtime dependencies. Sibling @takk packages (@takk/alkaline, @takk/keymesh, and the rest of the family) are optional peers, installed only if you bridge to them.


Quickstart, record then replay

Instrument each non-deterministic call once with intercept. Construct the stash without a recording to record; construct it from a recording to replay the same code deterministically.

import { createStash, loadStash } from '@takk/agenticstash';

// The agent step, written once. Runs identically in record and replay.
async function plan(stash, prompt: string) {
  return stash.intercept('llm', 'plan', () => callModel(prompt), { input: prompt });
}

// 1. Record a run in production.
const rec = createStash({ id: 'run-2026-06-19' });
const answer = await plan(rec, 'summarize the incident');
const tape = rec.save(); // persist this JSON wherever you keep recordings

// 2. Replay it locally to step through the exact run, no live model call.
const replay = loadStash(tape);
const sameAnswer = await plan(replay, 'summarize the incident'); // served from the tape

On replay, recorded errors re-throw at the original site, and a call whose input no longer matches the recording raises ERR_DIVERGENCE, the signal that the code changed since recording.


Find exactly where the new code diverged

Replay the old recording against your new code and collect every divergence in one pass, instead of throwing on the first.

const replay = loadStash(tape, { onDivergence: 'collect' });
await runAgent(replay); // your agent, unchanged
const report = replay.report();

report.matched;          // false if anything diverged
report.firstDivergence;  // { kind: 'input-mismatch', channel: 'llm', key: 'plan', ... }
report.divergences;      // input-mismatch, extra-call, missing-call, in order

This is the "bug that took weeks, found in minutes" workflow: the report points at the first call where the run departed from the recording.


Fork a run to explore an alternate decision

import { fork } from '@takk/agenticstash';

// Keep everything before event 4, force a different decision there, drop the tail.
const branch = fork(recording, { at: 4, override: { value: 'use the cheaper model' } });
const replay = loadStash(JSON.stringify(branch));
// Replay serves the shared prefix, applies the forced decision, then runs live.

Diff two runs

import { diffRecordings } from '@takk/agenticstash';

const d = diffRecordings(before, after);
d.firstDivergence; // { channel, key, ordinal }
d.changed;         // events present in both whose value or outcome changed
d.added; d.removed;

Seal a run for compliance, then verify it

The integrity seal is a SHA-256 hash chain over the recording, the tamper-evident primitive EU AI Act Article 12 logging asks for. It is an integrity seal, not a digital signature.

import { sealRecording, verifyRecording } from '@takk/agenticstash';

const seal = await sealRecording(recording); // { algorithm: 'SHA-256', root, count }
// ... store the recording and the seal ...
const result = await verifyRecording(recording, seal);
result.valid; // false if a single event, value, order, or id changed

Keep secrets out with redaction

import { createStash, DROP } from '@takk/agenticstash';

const rec = createStash({
  id: 'run',
  redact: (value, { kind }) =>
    typeof value === 'string' ? value.replace(/sk-[A-Za-z0-9]+/g, '[REDACTED]') : value,
});
// Return DROP instead to store only a marker (a metadata-only event).

Redaction is record-time and one-way: a redacted field replays as its redacted form, so redact what replay does not need to reproduce. The seal computes over the stored (redacted) recording, so redaction and integrity compose: you seal exactly what you keep.


Entry points

Entry point Import What it gives you
Core @takk/agenticstash createStash, loadStash, every function, types, errors
Record @takk/agenticstash/record createRecorder, DROP
Replay @takk/agenticstash/replay createReplayer (divergence collection)
Storage @takk/agenticstash/storage BlobStore, encodeRecording, decodeRecording, recordingStats
Fork @takk/agenticstash/fork fork
Diff @takk/agenticstash/diff diffRecordings
Interceptors @takk/agenticstash/interceptors deterministic clock, seeded random, wrap, wrapSync
MCP @takk/agenticstash/mcp interceptMcpClient, recordMcpTool (duck-typed, no SDK import)
Seal @takk/agenticstash/seal sealRecording, verifyRecording
Edge @takk/agenticstash/edge the full core under a worker condition

The whole engine is Node-free (the seal uses the Web Crypto API), so the edge entry is the full core and runs in Cloudflare Workers, Vercel Edge, Deno, Bun, and the browser.


CLI

# Summarize a recording
npx @takk/agenticstash inspect run.json

# Find where two runs diverged
npx @takk/agenticstash diff before.json after.json

# Fork a recording at an event index
npx @takk/agenticstash fork run.json --at 4 --out branch.json

# Seal a recording, then verify it (exit 1 if tampered)
npx @takk/agenticstash seal run.json --out run.seal.json
npx @takk/agenticstash verify run.json run.seal.json

Exit codes follow the sysexits convention: 0 success, 64 usage, 65 bad data, 66 unreadable input, and verify exits 1 when a recording fails its seal.


Where it fits

The deterministic-replay category is real and forming fast, partly driven by the EU AI Act Article 12 record-keeping requirement. Agentic Stash is one option in it, with a specific shape:

Agentic Stash Forkline LangGraph time-travel LangSmith / Braintrust / Langfuse
Distribution npm library library framework feature hosted / SDK
Runs in-process, re-executes your code in-process reconstructs framework state dashboards / replay-testing
Framework-coupled no (interceptors) no yes (LangGraph) no
Tamper-evident seal yes (SHA-256) n/a n/a varies
Runtime dependencies 0 varies framework n/a (hosted)

Pick a hosted platform for dashboards and team analytics. Pick Agentic Stash when you want the recording itself, as a zero-dependency primitive your code, or an agent, can record, replay, fork, diff, and seal in-process and on the edge.


Honest limits

  • Determinism is by substitution. Agentic Stash does not make a model deterministic; it replays recorded values. A code path that changes its calls or their order will diverge (which is exactly what the divergence report is for).
  • You must instrument. Like rr, replay only works for calls you wrapped with intercept or wrap. Provider adapters that lower this friction are on the roadmap.
  • Concurrent same-key calls. Replay matches by per-key call order; for a Promise.all of same-key calls, use distinct keys, or wait for the input-matched mode planned for 1.1.
  • Redaction trades replay fidelity. A redacted field replays as its redacted form; a dual-tape redacted export is on the roadmap.
  • The seal is tamper-evident, not a signature. It proves a recording matches a trusted root, not who produced it. Pair it with your own signing for non-repudiation.

Quality

  • 86 tests across 14 suites, all passing under Vitest 4, green on Node 20, 22, and 24.
  • Coverage: statements 87.96%, lines 88.62%, functions 89.83%, branches 76.94%.
  • Lint clean under Biome 2; typecheck clean under TypeScript 6 in maximum strict mode (exactOptionalPropertyTypes, useUnknownInCatchVariables, noUncheckedIndexedAccess, noImplicitOverride, noImplicitReturns).
  • publint clean; are-the-types-wrong clean across all ten entry points.
  • A dist smoke test exercises the built artefact end-to-end (record-then-replay, divergence, fork, seal, redaction, MCP, CLI exit codes).
  • Published with --provenance (SLSA attestation by GitHub Actions).

See SPEC.md for the formal specification, public surface, and stability promise.


FAQ

Is this machine learning? No. Agentic Stash is a deterministic record and replay engine: it captures values and serves them back. No model, no training, no inference of its own.

How is it different from LangGraph time-travel or LangSmith replay-testing? LangGraph reconstructs framework state rather than re-executing your code, and it is coupled to LangGraph. Hosted platforms give you dashboards and replay-testing. Agentic Stash is a framework-agnostic, zero-dependency library that re-executes your own code against substituted recorded values, in-process and on the edge.

Does it require runtime dependencies? No. Zero required runtime dependencies. The seal uses the Web Crypto API (a platform standard), not a bundled crypto library.

Does it run in Cloudflare Workers, Vercel Edge, Bun, or Deno? Yes. The whole engine is Node-free; @takk/agenticstash/edge exposes the full core under a worker condition.

My recordings contain prompts and secrets. What do I do? Use the redact hook to mask or drop sensitive values before they reach a recording, and seal the result for integrity. See PRIVACY.md and SECURITY.md.

Can I trust a replay matched the original run? Replay serves recorded values by call identity and checks the input hash; a mismatch surfaces as a divergence rather than a silent substitution. Seal a recording and verify it to prove it was not altered between record and replay.


Contributing

See .github/CONTRIBUTING.md for the contributor guide. Substantive proposals open a GitHub Issue first; trivial fixes can go straight to a PR. All commits require DCO sign-off (git commit -s). Non-trivial contributions are governed by the Contributor License Agreement.

Community & support

  • Issues & feature requests. Open a GitHub issue at davccavalcante/agenticstash/issues. For each report, include the package version, a minimal reproduction, expected vs. actual behaviour, and, where relevant, the recording or the divergence report.
  • Security disclosures. Do NOT open public issues for vulnerabilities. Follow the responsible-disclosure flow in SECURITY.md, contact davcavalcante@proton.me (or say@takk.ag) with the [SECURITY] prefix.
  • Code of Conduct. This project follows the Contributor Covenant 2.1. Participation in any Agentic Stash space (issues, PRs, discussions) implies agreement.
  • Contributions. All non-trivial contributions go through the Contributor License Agreement. Tests, lint, typecheck, and build must be green before review (pnpm verify).

Author

Created by David C Cavalcante, davcavalcante@proton.me (preferred), say@takk.ag (Takk relay), linkedin.com/in/hellodav, x.com/davccavalcante, takk.ag

agenticstash is one package in a broader portfolio of NPM libraries targeting Massive Intelligence (IM) infrastructure for 2026-2030, built at Takk Innovate Studio.


Related research by the author

The architecture behind agenticstash, separating capture, replay, and verification into composable, independently-governed layers, echoes the author's research frameworks:

  • MAIC (Massive Artificial Intelligence Consciousness), the Universe, the framework: a systemic intelligence framework to coordinate, supervise, and govern large-scale intelligence ecosystems with global context awareness, alignment, and orchestration.
  • HIM (Hybrid Entity Intelligence Model), the spirit, the model: a hybrid intelligence layer that integrates Massive Intelligence with human-defined logic, rules, and strategic intent before and after model execution.
  • NHE (Noumenal Higher-order Entity), the body, the agent: a non-human cognitive entity with a defined functional identity and operational agency, operating through coordinated intelligence layers while maintaining a non-anthropomorphic identity.

These frameworks are published independently of agenticstash and are separate works:


Sponsors

Join the journey as the portfolio continues to ship Massive Intelligence (IM) infrastructure. Your support is the cornerstone of this work.


Privacy

agenticstash runs entirely inside your own process and infrastructure. It makes no outbound calls to the author, collects no telemetry, and ships no analytics. It records only the values you route through it, and persists nothing on its own. See PRIVACY.md for the full data-handling notice, including the redact hook and GDPR/LGPD posture.


License

Licensed under the Apache License 2.0. See LICENSE for the full text and NOTICE for attribution and third-party component licenses. You may use, modify, and distribute the code under the terms of that license, including its patent grant and attribution requirements.

About

Deterministic record and replay for agent runs. A zero-runtime-dependency TypeScript library and CLI that captures every source of non-determinism an agent touches (model output, tools, MCP, clock, randomness) and replays it exactly, with fork, diff, a tamper-evident seal, and redaction.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Sponsor this project

Packages

 
 
 

Contributors