Skip to content

Latest commit

 

History

History
153 lines (138 loc) · 8.29 KB

File metadata and controls

153 lines (138 loc) · 8.29 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

[Unreleased]

[0.1.0] — 2026-04-17

First public release. 7 failure-mode detectors, 4 data adapters, Markdown/HTML reporters, FastAPI server with Docker, embeddings similarity backend, MkDocs documentation site. 203 tests, 90% coverage.

Added — MAANG-grade polish

  • Rewritten README: badges, value prop, Mermaid architecture diagram, positioning section explaining the niche without competitor comparisons, examples links, development quickstart
  • ARCHITECTURE.md — full system design doc with data flow diagrams, extension points, design decisions, performance characteristics, security model
  • SUPPORT.md — where to go for help, response-time expectations
  • examples/ directory with 5 runnable end-to-end scripts and an index
  • .editorconfig — cross-editor consistency
  • .github/CODEOWNERS — PR review routing
  • .github/FUNDING.yml — sponsor button (template)
  • YAML-form issue templates (GitHub's newer format) replacing the old Markdown templates
  • .github/ISSUE_TEMPLATE/config.yml redirects support questions to Discussions and security issues to private advisories
  • PUBLISHING.md — first-push instructions, PyPI trusted publisher setup, release process, deprecation policy

Added — FastAPI server + embeddings backend (complete)

  • chatbot_auditor.server FastAPI application with /healthz, /readyz, /version, /detectors, and /analyze endpoints
  • Optional bearer-token auth on /analyze via CHATBOT_AUDITOR_API_KEYS environment variable (comma-separated)
  • CHATBOT_AUDITOR_MAX_CONVERSATIONS_PER_REQUEST env cap (default 1000)
  • Per-startup detector registry via FastAPI lifespan
  • Auto-generated OpenAPI docs at /docs (Swagger) and /redoc
  • Multi-stage Dockerfile running as unprivileged user with healthcheck
  • docker-compose.yml for local development
  • chatbot_auditor.backends.embeddings.EmbeddingsSimilarity — drop-in semantic similarity backend for DeathLoopDetector using sentence-transformers; LRU cache by text, injectable encoder for tests
  • Docs: self-host tutorial, LLM & embedding backends tutorial, server and backends reference pages
  • 203 tests passing, 90% coverage, mypy strict clean, ruff clean, docs strict build clean

Added — Reporting (complete)

  • reporting module with MarkdownReporter, HTMLReporter, Reporter base class, and ReportSummary dataclass
  • render_markdown() and render_html() convenience functions
  • Reports include: overall summary metrics, detections-by-severity table, detections-by-detector table, and top-N ranked conversations with evidence
  • HTML output is a self-contained document with inline CSS — email-safe, Slack-attachable, and fully escapes user-provided content (XSS-safe)
  • CLI reworked: --format text|json|markdown|html (default text) replaces the previous --json flag; --output PATH writes to a file
  • Updated docs, tutorials, and reference pages to cover the new commands

Added — Docs site (complete)

  • MkDocs Material site with home, getting-started, concepts, tutorials, reference sections, auto-deployed to GitHub Pages on push to main via the docs.yml workflow
  • Auto-generated API reference via mkdocstrings[python] covering schema, detectors, adapters, knowledge bases, and audit entry points
  • Three tutorials: audit Intercom data, write a custom detector, configure a policy base
  • [docs] optional dependency group: pip install chatbot-auditor[docs]

Added — Adapters (complete)

  • Adapter abstract base class defining the common fetch() contract
  • JSONAdapter: reads conversations from .json (single or list) or .jsonl files with format auto-detection
  • CSVAdapter: reads conversations from CSV/TSV files with flexible header detection (accepts conversation_id/conv_id/thread_id, role/author, content/message/body), customizable role mapping, and ISO-8601 or Unix timestamp parsing
  • IntercomAdapter: pulls conversations via Intercom REST API with cursor-based pagination, HTML body cleaning, rate-limit retry/backoff, and role mapping across user/bot/admin author types
  • ZendeskAdapter: pulls tickets + comments via Zendesk API with OAuth or email+API-token auth, pagination, rate-limit handling, bot user ID configuration, and public/private comment role mapping
  • CLI analyze-intercom and analyze-zendesk commands for direct API access
  • CLI analyze command now auto-detects file type from extension

Added — Phase 2 (complete)

  • SentimentCollapseDetector: pluggable SentimentScorer protocol with a stdlib-only KeywordSentimentScorer default; compares early/late thirds of user messages and flags meaningful sentiment drops with severity scaling
  • BrandDamageDetector: pluggable ContentSafetyChecker with a stdlib-only PatternSafetyChecker default covering profanity, self-deprecation, competitor endorsements, and off-brand content (poems, jokes, politics); configurable competitor names
  • ConfidentLiesDetector: regex-based detection of bot commitments (refunds, timelines, guarantees, account changes); takes an optional PolicyBase knowledge base to distinguish allowed from disallowed commitments; without a policy, flags all commitments for review
  • ConfidentMisinformationDetector: regex-based detection of factual claims (pricing, hours, availability, policy); takes an optional FactBase to cross-check claims against ground truth; distinguishes "verified", "contradiction", and "unverifiable" outcomes
  • knowledge module: PolicyBase and FactBase dataclasses defining the minimal knowledge-base interfaces
  • default_registry() now includes SentimentCollapse and BrandDamage. ConfidentLies/Misinformation are available but opt-in — they need a knowledge base to be genuinely useful.
  • 131 tests passing, 93% coverage, mypy strict clean, ruff clean

Added — Phase 1 (complete)

  • similarity module: normalize, lexical_similarity (SequenceMatcher-based, stdlib-only, explicitly symmetric), token_jaccard, and a SimilarityFn type for pluggable backends
  • DeathLoopDetector: connected-components grouping over pairwise similarity, configurable threshold, min repeat count, minimum content length, pluggable similarity function, confidence scoring with frustration-keyword boost, severity scaling from low to critical
  • SilentChurnDetector: flags multi-turn conversations that ended with no customer-side resolution signal; confidence boosted when the platform reported the conversation as resolved
  • EscalationBurialDetector: detects explicit human-agent requests the bot deflects; aggregates multiple burials per conversation into one detection with severity scaling; transfer-confirmation phrases recognized as properly handled escalations
  • ConversationGenerator: deterministic synthetic generator for healthy conversations, death loops (3 paraphrase levels), silent churn, and escalation burial scripts
  • audit() and default_registry() entry points; default registry includes all three Phase 1 detectors
  • CLI analyze command: accepts a JSON file, prints or emits JSON detections, returns non-zero exit code when failures are detected
  • scripts/benchmark.py: writes docs/benchmarks.md with precision/recall/F1 for every detector against the synthetic corpus
  • Property-based tests using hypothesis: identical-message detection invariant, unique-message non-detection invariant, detector idempotence
  • 96 tests passing, 90% coverage, mypy strict clean, ruff clean

Added — Phase 0

  • Initial project scaffold: pyproject.toml, CI, linting, type-checking configuration
  • Core Pydantic schema: Message, Conversation, Detection, FailureMode, Severity
  • Abstract Detector base class defining the detection contract
  • Detector registry for dynamic loading
  • Typer-based CLI skeleton
  • Apache 2.0 license, NOTICE file, SECURITY policy, Code of Conduct, contributing guide
  • GitHub Actions CI pipeline (test matrix on Python 3.11, 3.12, 3.13 across Linux, macOS, Windows)
  • Pre-commit hooks (ruff, mypy, pytest)

[0.0.1] — 2026-04-17

Initial pre-release. Not published to PyPI.