PythiaLabs

Deterministic evidence gates for high-risk agentic actions.

PythiaLabs evaluates whether an AI/agent action should be allowed, blocked, or escalated under current evidence, authorization, environment, credential, and recovery context — producing replayable traces, stable stop reasons, and tamper-checkable evidence artifacts.

Positioning

PythiaLabs does not replace transaction simulation, wallet security, or contract-monitoring tools.

It sits earlier in the workflow: evaluating AI-agent proposed actions before tools are called.

Web3 treasury is one high-risk demo scenario, not the product category.

Project summary

For a concise reviewer-facing overview, see:

NGI Commons reviewer path: docs/NGI_COMMONS_REVIEWER_PATH.md
NGI Commons budget and milestones: docs/BUDGET_AND_MILESTONES_COMMONS.md
NGI Commons milestone tracker: docs/GRANT_MILESTONE_TRACKER_COMMONS.md
Reviewer path: docs/REVIEWER_PATH.md
One-page summary: docs/PYTHIALABS_ONE_PAGE_SUMMARY.md
Agent Action Audit Snapshot example: docs/examples/agent-action-audit-snapshot-example.md
Documentation index: docs/README.md
Portfolio relationship: docs/PORTFOLIO_RELATIONSHIP.md
Limitations: docs/LIMITATIONS.md
Architecture diagram: docs/architecture_diagram.md
Paid review demo reviewer checklist: docs/paid_review_demo_reviewer_checklist.md
Evidence artifact schema: docs/evidence_artifact_schema.md
OTF reviewer path: docs/OTF_REVIEWER_PATH.md
Related LS grant path: docs/RELATED_LS_GRANT_PATH.md
ProofPath continuation for reviewers: docs/PROOFPATH_CONTINUATION_FOR_REVIEWERS.md
Demo video: https://youtu.be/IUk3iO0N4YU
Support-safety gate demo: https://youtu.be/A6UAR3e2r3k
Landing page: site/ (deployable via GitHub Pages)

NGI Commons grant metadata

Application: 2026-06-133
Fund: NGI Zero Commons / Commons Fund
Requested amount: EUR 30,000
Repository: https://github.com/safal207/pythiaLabs

Landing Page

A landing page for PythiaLabs lives in site/. It is a zero-runtime-JS static site optimized for fast loading: a small Node build script renders three localized pages (English, Russian, Chinese) with all CSS inlined, so each page is a single HTTP request.

Output structure:

dist/index.html — English (default)
dist/ru/index.html — Русский
dist/zh/index.html — 中文 (Simplified)

<link rel="alternate" hreflang> tags and a header language switcher connect the three locales.

Local preview

cd site
npm install
npm run build     # writes dist/
npm run preview   # serves dist/ at http://localhost:5173
npm run dev       # build + serve in one step

Deploy to GitHub Pages

A workflow at .github/workflows/pages.yml deploys the contents of site/ to GitHub Pages on pushes to main. To enable it:

In the repository settings, go to Pages and set Source to GitHub Actions.
Push to main (or trigger the workflow manually from the Actions tab).

PythiaLabs is currently an open-source MVP with deterministic local demos. It is not presented as a production enforcement system, regulatory compliance product, or certified safety framework.

Why this matters

AI agents are moving from text generation into real actions: infrastructure changes, financial decisions, governance actions, and treasury operations. Prompt instructions alone are not a reliable safety boundary. PythiaLabs explores deterministic action gates that make high-risk decisions reviewable, replayable, and auditable.

Current showcases

Agent Infrastructure Action Safety — destructive infrastructure actions such as production volume deletion.

Expected output: docs/agent_infra_action_showcase_expected_output.md
Banking AI Risk — high-risk financial/agentic actions with operator approval, freshness, and decision-time knowledge checks.

Expected output: docs/banking_ai_risk_showcase_expected_output.md
Web3 Treasury Governance — DAO treasury action review with quorum, timelock, temporal authorization, evidence export, and tamper rejection.

Expected output: docs/web3_treasury_full_showcase_expected_output.md
AI Coding Agents / CI Autofix — pre-execution review for autonomous coding-agent actions such as CI fixes, PR updates, dependency changes, and deploy-adjacent workflows.

Snapshot example: docs/examples/agent-action-audit-snapshot-example.md

Reviewer quickstart

mix deps.get
mix test
mix run examples/agent_infra_action_showcase.exs
mix run examples/banking_ai_risk_showcase.exs
mix run examples/web3_treasury_full_showcase.exs

Paid review demo (recordable in ~30 seconds)

Run:

make demo

A single-command, deterministic demo that drives the real Pythia.Showcase.Web3TreasuryAction engine through four Web3 treasury scenarios — one accepted transfer plus three orthogonal rejection reasons (quorum, timelock, transfer-window expiration) — and a counterfactual that flips one evidence field to show the decision flip.

For each scenario the demo:

runs the engine and prints the per-check evidence trace,
builds an evidence record with a real SHA-256 digest, and
calls Pythia.Showcase.Web3TreasuryAction.verify_evidence/1 to confirm the digest round-trips (plain evidence verification, not the signed verify_evidence_envelope/1 path).

Inputs live in examples/paid_review_demo_input.json; the run writes a bundle of evidence records to examples/output/paid_review_demo_artifact.json (gitignored — regenerated each run). For expected reviewer-facing output, see examples/paid_review_demo_expected_output.md.

Cursor / IDE bridge (MCP)

A minimal stdio MCP server in integrations/mcp/ calls mix pythia.eval_json locally so Cursor (or any MCP host) can run deterministic gates (agent_infra_action, banking_risk_action, web3_treasury_action) from JSON.

# Any supported gate — see integrations/mcp/README.md
echo '{"gate":"agent_infra_action","action":{...},"safety_context":{...}}' | mix pythia.eval_json

For users who do not want to remember the mix invocation, a thin wrapper at bin/pythia exposes the same gate as pythia eval — and ships machine-readable input schemas under schemas/mcp/:

# Stdin (auto-locates the repo from the script path)
echo '{"gate":"agent_infra_action", ...}' | ./bin/pythia eval

# Or from a file
./bin/pythia eval --file proposal.json

# List supported gates and inspect their JSON Schema
./bin/pythia gates
./bin/pythia describe banking_risk_action

# Help
./bin/pythia --help

The JSON Schemas (draft-07) describe required fields and types per gate so editors can give you autocomplete and inline errors before you ever invoke the evaluator. Set PYTHIA_REPO_ROOT if you symlink the script into your $PATH from outside the repository.

Setup steps and mcp.json snippet: integrations/mcp/README.md.

What PythiaLabs is not yet

PythiaLabs currently does not claim:

production cryptography
wallet integration
smart contract execution
RPC/indexer integration
on-chain enforcement
production identity verification
persistent external storage
cloud-provider integration
IAM enforcement
backup management
regulatory compliance
production-grade cybersecurity protection

See docs/LIMITATIONS.md for the reviewer-facing scope note.

Agent Infrastructure Action Safety Showcase

PythiaLabs includes a deterministic local showcase for high-risk agent infrastructure actions. It demonstrates decision-time replay reasoning for destructive operations such as production database volume deletion.

Run:

mix run examples/agent_infra_action_showcase.exs

For expected reviewer-facing output, see: docs/agent_infra_action_showcase_expected_output.md

This is a deterministic local showcase only. It does not implement production infrastructure controls and does not claim cloud-provider integration, IAM enforcement, backup management, or cybersecurity protection.

Banking AI Risk Showcase

PythiaLabs includes a deterministic banking-risk action showcase for AI-enabled financial workflows. It demonstrates how a proposed high-risk action can be accepted or rejected during deterministic decision-time replay based on operator approval, evidence freshness, temporal authorization, and decision-time knowledge. The showcase emits stable stop reasons and replayable evidence artifacts for audit and review.

Run:

mix run examples/banking_ai_risk_showcase.exs

For expected reviewer-facing output, see: docs/banking_ai_risk_showcase_expected_output.md

This is a deterministic local showcase for governance/audit reasoning. It does not claim production banking integration, regulatory compliance, or cybersecurity protection.

Full Web3 Treasury Showcase

Run:

mix run examples/web3_treasury_full_showcase.exs

For expected reviewer-facing output, see: docs/web3_treasury_full_showcase_expected_output.md

This single deterministic demo shows:

accepted and rejected treasury actions
chronological decision traces
evidence export
SHA-256 digest generation
evidence verification
tamper rejection
unsigned evidence envelope verification
signed_demo envelope generation
signed_demo verification

The signed_demo flow is deterministic local demo logic only and is not production cryptography.

Web3 Treasury Action Showcase

PythiaLabs includes a deterministic in-memory showcase for DAO treasury transfer reasoning.

Run:

mix run examples/web3_treasury_action_showcase.exs

The demo shows how a proposed treasury transfer can be accepted or rejected based on:

proposal matching
quorum
voting window
timelock
temporal authorization
transfer expiration

This demonstrates the Web3 Consensus Reason Layer roadmap without requiring smart contracts, wallets, RPC nodes, or chain adapters.

The result includes a structured chronological trace explaining which governance check passed or failed. The structured trace can also be exported as a deterministic JSON-ready audit artifact.

mix run examples/web3_treasury_trace_export.exs

The exported trace can also be wrapped into a SHA-256 evidence artifact:

mix run examples/web3_treasury_trace_evidence.exs

Evidence artifacts can also be verified locally:

mix run examples/web3_treasury_trace_verify.exs

Evidence can also be wrapped into a signature-ready envelope:

mix run examples/web3_treasury_evidence_envelope.exs

The project also includes a local deterministic demo signer to show how evidence envelopes may later support authorship verification. This signed_demo flow is a deterministic local demo only and is not production cryptography.

mix run examples/web3_treasury_signed_envelope_demo.exs

Agent Safety Showcase

PythiaLabs includes a deterministic showcase for controlled agent actions.

The demo shows two core outcomes:

a safe action proceeds when the required permission is present
an unsafe action is rejected when authorization is missing

It also includes an invalid action example to show strict shape validation. The goal is to demonstrate a core PythiaLabs principle: agent actions should produce observable traces and stable stop reasons, not just outputs.

Run:

mix run examples/agent_safety_showcase.exs

Bitemporal Authorization Showcase

PythiaLabs includes a deterministic showcase for temporal authorization reasoning.

Run:

mix run examples/bitemporal_authorization_showcase.exs

The demo shows how an agent action can be accepted or rejected based on:

whether the permission was valid at action_time
whether the system knew about the permission at decision_time
whether the permission was expired or scheduled for the future

This demonstrates the Temporal-Causal Memory Stack idea without requiring XTDB or any external database.

Stack

Elixir/BEAM for orchestration
Rust NIF (via rustler) for fast kernels
Rust Port worker for sandboxed solvers (BFS maze)
JSON via jason
CI: GitHub Actions
License: Apache License 2.0

Runtime note: canonical float encoding in evidence paths uses Erlang/OTP 25+ behavior (:erlang.float_to_binary/2 with [:short]).

Storage Architecture

PythiaLabs separates three kinds of truth:

Postgres — product truth: teams, paid reviews, pilots, workflow submissions.
TimescaleDB — temporal measurement truth: decision events, latency, outcome trends, failure classes.
LiminalDB — adaptive causal-memory truth: agentic state, causal transitions, evidence evolution, and replayable decision context.

The gate decides before execution. The storage architecture preserves why that decision was valid, measurable, and auditable.

Read the database architecture

Current MVP status

PythiaLabs is currently an MVP focused on:

deterministic refinement loops
observable traces
stable stop reasons
Elixir/BEAM orchestration
Rust NIF / Rust Port worker integration
deterministic evidence gates for high-risk agentic actions
Agent Infrastructure Action Safety Showcase
Banking AI Risk Showcase
Web3 Treasury Governance Showcase

The Datomic/Neo4j/XTDB/EventStoreDB/TimescaleDB-style memory layers are not implemented yet. They are part of the architectural roadmap.

Core MVP / legacy demos

Mission: Minimal HRM-style reasoning loop for LIMINAL: propose → run → measure → refine with transparent step traces, fast kernels, and safe isolation.

Values

Business Value

Lower compute cost by winning via refinement, not giant params
Explainability for clients/regulators (auditable traces)
Easy integration on top of GPT-5/LLMs with step control
Edge/On‑prem friendly footprint
Reliability thanks to BEAM supervision

Human Value

Transparent reasoning (no black box)
Co‑thinking: human can inspect/interrupt/refine
Ethics & control: limits, stop rules, visible logic
Learning effect: users adopt the refinement habit
Accessibility: good performance without huge hardware

Legacy quickstart

mix deps.get
mix compile

# refinement demo (strings)
mix run examples/lev_demo.exs

# port worker build + demo (maze)
cd workers/solver_port && cargo build --release && cd ../../
mix run examples/port_demo.exs

# benchmarks (NIF vs fallback)
mix run benches/bench.exs

Planner loop (concept)

state ← init(problem)
repeat up to max_steps:
  proposal ← propose(state)
  candidate ← execute(proposal)
  score ← measure(candidate)
  if score ≤ threshold → stop
  if no_improve ≥ limit → (hook) critic → stop (MVP)
  state ← refine(state)
return best

Roadmap: persistent reasoning memory

Future versions may add persistent reasoning memory with two complementary layers:

Datomic-style append-only step log
Stores every reasoning step as an immutable event for replay, audit, debugging, and version comparison.
Neo4j-style hypothesis graph
Connects actions, constraints, proposals, failures, stop reasons, and successful paths.

This would allow PythiaLabs to support replay, audit, recurring failure analysis, and cross-run reasoning patterns.

This layer is not implemented in the current MVP.

For details, see docs/persistent_reasoning_memory.md.

For the broader five-layer roadmap, see docs/temporal_causal_memory_stack.md.

For the Web3 application roadmap, see docs/web3_consensus_reason_layer.md.

For the design principles behind these decisions, see docs/design_principles.md.

For grant preparation materials, see:

docs/grant_readiness.md
docs/threat_model_web3_treasury_reason_layer.md
docs/grant_one_pager_web3_treasury_reason_layer.md
docs/grant_application_summary.md

Additional roadmap items:

temporal-causal memory stack for facts, relations, bitemporal validity, events, and metrics
critic triggers based on confidence, repeated failure classes, and trace patterns
multi-domain executors for QA, graph problems, puzzles, and agent actions
Web3 consensus reason layer for DAO governance, treasury safety, and agentic on-chain actions

Research / Positioning / Docs

Open agentic models need evidence gates: docs/open_agentic_models_need_evidence_gates.md

Project trust and governance

Project governance and trust materials:

CONTRIBUTING.md
CODE_OF_CONDUCT.md
SECURITY.md
CHANGELOG.md
docs/license_strategy.md
docs/security_automation.md

These documents describe contribution expectations, security reporting, project maturity, release notes, and licensing strategy.

The repository also includes a minimal GitHub Actions security workflow for secret scanning.

Name		Name	Last commit message	Last commit date
Latest commit History 327 Commits
.github		.github
benches		benches
bin		bin
config		config
docs		docs
examples		examples
integrations		integrations
lib		lib
mix/tasks		mix/tasks
native/fast_kernels		native/fast_kernels
schemas		schemas
site		site
test		test
workers/solver_port		workers/solver_port
.formatter.exs		.formatter.exs
.gitignore		.gitignore
AI_SAFETY_PORTFOLIO.md		AI_SAFETY_PORTFOLIO.md
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
REVIEWER_START_HERE.md		REVIEWER_START_HERE.md
SECURITY.md		SECURITY.md
mix.exs		mix.exs
mix.lock		mix.lock

Folders and files

Latest commit

History

Repository files navigation

PythiaLabs

Positioning

Project summary

NGI Commons grant metadata

Landing Page

Local preview

Deploy to GitHub Pages

Why this matters

Current showcases

Reviewer quickstart

Paid review demo (recordable in ~30 seconds)

Cursor / IDE bridge (MCP)

What PythiaLabs is not yet

Agent Infrastructure Action Safety Showcase

Banking AI Risk Showcase

Full Web3 Treasury Showcase

Web3 Treasury Action Showcase

Agent Safety Showcase

Bitemporal Authorization Showcase

Stack

Storage Architecture

Current MVP status

Core MVP / legacy demos

Values

Business Value

Human Value

Legacy quickstart

Planner loop (concept)

Roadmap: persistent reasoning memory

Further Reading

Research / Positioning / Docs

Project trust and governance

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages