#

ai-reliability

Here are 85 public repositories matching this topic...

relai-ai / relai-sdk

A platform for building reliable AI agents

ai-agents ai-reliability

Updated Apr 3, 2026
Python

hermes-labs-ai / zer0dex

zer0dex is a local dual-layer memory pattern for AI agents: a compressed, human-readable markdown index plus a vector store queried automatically before each message. Built for cross-project recall and cross-reference where flat memory files or vector-only RAG fall short. Local-first, low-latency. Reference implementation by Hermes Labs.

persistent-memory ai-agents rag vector-search local-first semantic-memory llm llm-memory ai-memory agent-memory claude-code ai-reliability hermes-labs dual-layer-memory

Updated Jun 13, 2026
Python

hermes-labs-ai / lintlang

lintlang is a static linter for AI agent configs, tool descriptions, and system prompts that runs zero-LLM quality gating in CI. Catches language-level failures (vague tool descriptions, missing stop conditions, schema gaps) before they reach runtime, with deterministic regex + structural detectors and no model calls.

Updated Jun 2, 2026
Python

ai-agent-eval-harness

najeed / ai-agent-eval-harness

The open-source MultiAgentOps evaluation and verification harness for any industry business workflow.

Updated Jun 14, 2026
Python

TaimoorKhan10 / replayd

Turn failed AI agent runs into replayable regression tests. Catch regressions before you ship.

python open-source sdk regression-testing ai-agents release-control prompt-testing llm-ops llm-testing ai-infrastructure agent-ops agent-testing ai-reliability replay-testing

Updated Jun 4, 2026
Python

Harshit-J004 / toolguard

The "Cloudflare for AI Agents". 7-layer security interceptor, real-time observability dashboard, and automated reliability testing for MCP and AI tool chains. Prevent hallucinations, prompt injection, and destructive tool calls.

Updated May 4, 2026
Python

elsium-ai / elsium-ai

Production-grade TypeScript AI runtime focused on reliability, governance, and reproducible LLM systems. Multi-provider gateway, agents, RAG, workflows, policy engine, audit trails, and deterministic testing — built for teams shipping AI in production.

typescript ai-framework rag agent-framework ai-compliance llm ai-governance ai-runtime open-source-ai ai-infrastructure llm-gateway reproducible-ai llm-runtime ai-reliability deterministic-ai ai-production

Updated Jun 4, 2026
TypeScript

ejentum / ejentum-mcp

MCP server for the Ejentum API. 8 cognitive operations across 4 harnesses (reasoning, code, anti-deception, memory) in dynamic and adaptive modes.

typescript mcp code-review claude anthropic llm-tools agentic-ai model-context-protocol mcp-server ai-reliability reasoning-harness ejentum anti-deception cognitive-scaffold

Updated Jun 11, 2026
JavaScript

openvals

vishwanathakuthota / openvals

Open-source AI model evaluation and benchmarking framework for LLMs (OpenAI, Ollama, Claude, Gemini)

machine-learning gemini openai ai-safety ai-agents ai-evaluation ai-testing ai-quality llm-tools ollama llm-benchmarking ai-evaluation-framework calude ai-reliability vishwanath-akuthota

Updated Jun 11, 2026
Python

FailproofAI / ai-reliability-standards

Architectural standards and best practices for building reliable AI Agents and LLM workflows. Defining the framework for AI Reliability Engineering (AIRE).

enterprise ai reliability-engineering evaluation sre observability ai-agents aiops evals durable-execution ai-reliability

Updated Feb 14, 2026
Dockerfile

hermes-labs-ai / hermes-blind

Context-compensation scaffold for LLM evaluation prompts. A short language prefix you prepend so the model discloses prior exposure, scores on quoted evidence only, and hedges on thin evidence — for scorers that can see your CLAUDE.md, memory, or session context. Backend-agnostic. Experimental: variance-reduction effect not yet measured.

evaluation scaffold ai-safety ai-agents rubric multi-turn debiasing llm prompt-engineering evals llm-evaluation ai-reliability lpci hermes-labs context-compensation language-as-state agent-scaffold drift-recovery recovery-scaffold

Updated May 27, 2026
Python

AionSystem / VERITAS

Sheldon K. Salmon — AI Reliability Architect. Creator of the AION Constitutional Stack and the CERTUS certainty‑engineering methodology. He designed, directed, and red‑teamed VERITAS — applying epistemic scoring, Uncertainty Mass, and permanent STP seals to community crisis data. Code is open source. The judgment is not.

community ai humanitarian photo undp crisis-support disaster-response disaster-relief damage-assessment crisis-response ai-audit ai-reliability aion-system sheldon-k-salmon fsve certus-engine

Updated May 16, 2026
JavaScript

AionSystem / AION-SCAFFOLDING

AION Scaffold — Intelligent tree-to-filesystem generator. Built by Sheldon K. Salmon, AI Reliability Architect. Part of the AION Constitutional Stack. Free forever. No tracking.

tools scaffold dev developer-tools scaffolder scaffolding dev-tools ai-architect ai-audit ai-reliability aion-system

Updated May 6, 2026
HTML

hermes-labs-ai / quick-gate-js

quick-gate-js (npm: quick-gate) is a deterministic JS/TS CI quality gate that unifies ESLint, TypeScript, build, and Lighthouse checks into one fail-fast result, with bounded auto-repair and structured escalation evidence for humans or agents. Works with Next.js, React, Vue, Svelte, or any Node project. A gate-and-escalate wrapper, not a dashboard.

eslint frontend linting ci static-analysis devtools ci-cd code-quality agents lighthouse quality-gate auto-repair ai-reliability

Updated Jun 1, 2026
JavaScript

vbepipe / vmrrb-benchmark

Benchmark for evaluating advanced reasoning, recursive dependency resolution, and robustness capabilities of large language models in dynamic, noisy, and structurally challenging environments.

benchmark dependency-resolution ai-accuracy multistep-reasoning ai-evaluation large-language-models ai-reasoning llm-evaluation reasoning-benchmark llm-benchmark ai-reliability recursive-reasoning ai-stability long-chain-reasoning

Updated May 15, 2026
Python

Nefza99 / Rebis-AI-auditing-Architecture

Orchestration runtime for AI agent workflows that preserves task-state fidelity, prevents reasoning drift, and reduces wasted computation in long-horizon pipelines.

Updated Mar 19, 2026
JavaScript

LivingFramework / LC-OS

Research archive — eight published papers, Mahdi Ledger, and empirical foundations of the LC-OS governance framework.

ai-safety ai-research prompt-engineering ai-governance llm-framework human-ai-collaboration context-engineering ai-reliability

Updated May 25, 2026

hermes-labs-ai / hermeneutic

hermeneutic is an evidence-first drift gate for AI agents. It mines corrections from your AI chat logs (prior response, user correction, repair), classifies the drift, and runs a cheap-to-expensive pre-flight gate on the next response before drift ships. Regex, then structured scoring, then a pressure probe. MIT, zero dependencies, by Hermes Labs.

ai-agents hallucination drift-detection guardrails prompt-engineering llm-ops llm-evaluation claude-code ai-audit ai-reliability hermes-labs ai-overclaim retrieval-scaffold

Updated May 31, 2026
Python

aditikhare007 / ai-decision-intelligence-system

Enterprise AI system for decision intelligence — transforming research into scalable, context-aware insights at production scale | AditiKhare.com — AI Product Ecosystem

ai mlops ai-systems inference-optimization ai-platform ai-product decision-intelligence ai-evaluation llm generative-ai context-engineering ai-reliability

Updated Apr 20, 2026

Yuchi-Wang02 / bizhallu

Span-level hallucination detection for LLM-generated business analysis on Online Retail transaction data.

business-analytics retail-analytics qwen llm-evaluation hallucination-detection ai-reliability

Updated May 26, 2026
Python

Improve this page

Add a description, image, and links to the ai-reliability topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the ai-reliability topic, visit your repo's landing page and select "manage topics."