Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.1.0] — 2026-04-17

First public release. 7 failure-mode detectors, 4 data adapters, Markdown/HTML reporters, FastAPI server with Docker, embeddings similarity backend, MkDocs documentation site. 203 tests, 90% coverage.

Added — MAANG-grade polish

Rewritten README: badges, value prop, Mermaid architecture diagram, positioning section explaining the niche without competitor comparisons, examples links, development quickstart
ARCHITECTURE.md — full system design doc with data flow diagrams, extension points, design decisions, performance characteristics, security model
SUPPORT.md — where to go for help, response-time expectations
examples/ directory with 5 runnable end-to-end scripts and an index
.editorconfig — cross-editor consistency
.github/CODEOWNERS — PR review routing
.github/FUNDING.yml — sponsor button (template)
YAML-form issue templates (GitHub's newer format) replacing the old Markdown templates
.github/ISSUE_TEMPLATE/config.yml redirects support questions to Discussions and security issues to private advisories
PUBLISHING.md — first-push instructions, PyPI trusted publisher setup, release process, deprecation policy

Added — FastAPI server + embeddings backend (complete)

chatbot_auditor.server FastAPI application with /healthz, /readyz, /version, /detectors, and /analyze endpoints
Optional bearer-token auth on /analyze via CHATBOT_AUDITOR_API_KEYS environment variable (comma-separated)
CHATBOT_AUDITOR_MAX_CONVERSATIONS_PER_REQUEST env cap (default 1000)
Per-startup detector registry via FastAPI lifespan
Auto-generated OpenAPI docs at /docs (Swagger) and /redoc
Multi-stage Dockerfile running as unprivileged user with healthcheck
docker-compose.yml for local development
chatbot_auditor.backends.embeddings.EmbeddingsSimilarity — drop-in semantic similarity backend for DeathLoopDetector using sentence-transformers; LRU cache by text, injectable encoder for tests
Docs: self-host tutorial, LLM & embedding backends tutorial, server and backends reference pages
203 tests passing, 90% coverage, mypy strict clean, ruff clean, docs strict build clean

Added — Reporting (complete)

reporting module with MarkdownReporter, HTMLReporter, Reporter base class, and ReportSummary dataclass
render_markdown() and render_html() convenience functions
Reports include: overall summary metrics, detections-by-severity table, detections-by-detector table, and top-N ranked conversations with evidence
HTML output is a self-contained document with inline CSS — email-safe, Slack-attachable, and fully escapes user-provided content (XSS-safe)
CLI reworked: --format text|json|markdown|html (default text) replaces the previous --json flag; --output PATH writes to a file
Updated docs, tutorials, and reference pages to cover the new commands

Added — Docs site (complete)

MkDocs Material site with home, getting-started, concepts, tutorials, reference sections, auto-deployed to GitHub Pages on push to main via the docs.yml workflow
Auto-generated API reference via mkdocstrings[python] covering schema, detectors, adapters, knowledge bases, and audit entry points
Three tutorials: audit Intercom data, write a custom detector, configure a policy base
[docs] optional dependency group: pip install chatbot-auditor[docs]

Added — Adapters (complete)

Adapter abstract base class defining the common fetch() contract
JSONAdapter: reads conversations from .json (single or list) or .jsonl files with format auto-detection
CSVAdapter: reads conversations from CSV/TSV files with flexible header detection (accepts conversation_id/conv_id/thread_id, role/author, content/message/body), customizable role mapping, and ISO-8601 or Unix timestamp parsing
IntercomAdapter: pulls conversations via Intercom REST API with cursor-based pagination, HTML body cleaning, rate-limit retry/backoff, and role mapping across user/bot/admin author types
ZendeskAdapter: pulls tickets + comments via Zendesk API with OAuth or email+API-token auth, pagination, rate-limit handling, bot user ID configuration, and public/private comment role mapping
CLI analyze-intercom and analyze-zendesk commands for direct API access
CLI analyze command now auto-detects file type from extension

Added — Phase 2 (complete)

SentimentCollapseDetector: pluggable SentimentScorer protocol with a stdlib-only KeywordSentimentScorer default; compares early/late thirds of user messages and flags meaningful sentiment drops with severity scaling
BrandDamageDetector: pluggable ContentSafetyChecker with a stdlib-only PatternSafetyChecker default covering profanity, self-deprecation, competitor endorsements, and off-brand content (poems, jokes, politics); configurable competitor names
ConfidentLiesDetector: regex-based detection of bot commitments (refunds, timelines, guarantees, account changes); takes an optional PolicyBase knowledge base to distinguish allowed from disallowed commitments; without a policy, flags all commitments for review
ConfidentMisinformationDetector: regex-based detection of factual claims (pricing, hours, availability, policy); takes an optional FactBase to cross-check claims against ground truth; distinguishes "verified", "contradiction", and "unverifiable" outcomes
knowledge module: PolicyBase and FactBase dataclasses defining the minimal knowledge-base interfaces
default_registry() now includes SentimentCollapse and BrandDamage. ConfidentLies/Misinformation are available but opt-in — they need a knowledge base to be genuinely useful.
131 tests passing, 93% coverage, mypy strict clean, ruff clean

Added — Phase 1 (complete)

similarity module: normalize, lexical_similarity (SequenceMatcher-based, stdlib-only, explicitly symmetric), token_jaccard, and a SimilarityFn type for pluggable backends
DeathLoopDetector: connected-components grouping over pairwise similarity, configurable threshold, min repeat count, minimum content length, pluggable similarity function, confidence scoring with frustration-keyword boost, severity scaling from low to critical
SilentChurnDetector: flags multi-turn conversations that ended with no customer-side resolution signal; confidence boosted when the platform reported the conversation as resolved
EscalationBurialDetector: detects explicit human-agent requests the bot deflects; aggregates multiple burials per conversation into one detection with severity scaling; transfer-confirmation phrases recognized as properly handled escalations
ConversationGenerator: deterministic synthetic generator for healthy conversations, death loops (3 paraphrase levels), silent churn, and escalation burial scripts
audit() and default_registry() entry points; default registry includes all three Phase 1 detectors
CLI analyze command: accepts a JSON file, prints or emits JSON detections, returns non-zero exit code when failures are detected
scripts/benchmark.py: writes docs/benchmarks.md with precision/recall/F1 for every detector against the synthetic corpus
Property-based tests using hypothesis: identical-message detection invariant, unique-message non-detection invariant, detector idempotence
96 tests passing, 90% coverage, mypy strict clean, ruff clean

Added — Phase 0

Initial project scaffold: pyproject.toml, CI, linting, type-checking configuration
Core Pydantic schema: Message, Conversation, Detection, FailureMode, Severity
Abstract Detector base class defining the detection contract
Detector registry for dynamic loading
Typer-based CLI skeleton
Apache 2.0 license, NOTICE file, SECURITY policy, Code of Conduct, contributing guide
GitHub Actions CI pipeline (test matrix on Python 3.11, 3.12, 3.13 across Linux, macOS, Windows)
Pre-commit hooks (ruff, mypy, pytest)

[0.0.1] — 2026-04-17

Initial pre-release. Not published to PyPI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog

[Unreleased]

[0.1.0] — 2026-04-17

Added — MAANG-grade polish

Added — FastAPI server + embeddings backend (complete)

Added — Reporting (complete)

Added — Docs site (complete)

Added — Adapters (complete)

Added — Phase 2 (complete)

Added — Phase 1 (complete)

Added — Phase 0

[0.0.1] — 2026-04-17

FilesExpand file tree

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

[Unreleased]

[0.1.0] — 2026-04-17

Added — MAANG-grade polish

Added — FastAPI server + embeddings backend (complete)

Added — Reporting (complete)

Added — Docs site (complete)

Added — Adapters (complete)

Added — Phase 2 (complete)

Added — Phase 1 (complete)

Added — Phase 0

[0.0.1] — 2026-04-17