99[ Static Pages mirror] ( https://karimbaidar.github.io/false-success-lab/ ) |
1010[ Core package: agent-consistency] ( https://github.com/karimbaidar/agent-consistency )
1111
12- Scan your AI workflow repo for unverified completion risks .
12+ Stop false "done" before it ships .
1313
1414False Success Lab is the interactive developer lab for ` agent-consistency ` .
1515It helps you explore false-success risks in AI workflows and see how
@@ -43,7 +43,8 @@ the completion claim.
4343
4444## What you can do in the lab
4545
46- - Scan public repos for false-success risk.
46+ - Scan public repos for false-success risk, with repo-fit confidence instead of
47+ fake certainty.
4748- Import local scan reports without giving the browser filesystem access.
4849- Run built-in false-success scenarios.
4950- Compare naive vs protected behavior.
@@ -69,9 +70,10 @@ https://github.com/org/repo
6970
7071The FastAPI backend calls the scanner exposed by the installed
7172` agent-consistency ` package, downloads the public repo to a temporary directory,
72- and returns a false-success report card plus Markdown output. If the backend is
73- running with an older ` agent-consistency ` package that does not expose the
74- scanner yet, it returns a clear ` 503 ` instead of pretending a scan happened.
73+ and returns a false-success report card plus Markdown output. The scanner accepts
74+ any public GitHub repo, but reports whether the repo looks like an
75+ agentic-workflow repo, workflow-adjacent repo, or general code. Weak matches are
76+ shown as possible risks that need review.
7577
7678### Local Report Import
7779
@@ -135,8 +137,8 @@ report cards, proof trails, and copyable fixes.
135137 the demo lightweight and deterministic.
136138- ** Lab backend:** validates public scan requests, calls the scanner, and runs
137139 the refund scenario through the real workflow path where available.
138- - ** Scanner:** reads source code and returns report-card metrics, findings,
139- severity, confidence, missing evidence, and suggested fixes.
140+ - ** Scanner:** reads source code and returns repo applicability, grouped
141+ findings, severity, confidence, missing evidence, and suggested fixes.
140142- ** Verified action / outcome gate:** blocks or reviews unverified completions
141143 before customer-visible claims continue.
142144- ** Verifier packs:** scenario-specific checks for the expected result, such as
@@ -227,11 +229,9 @@ still have duration and resource limits, so very large repository scans may need
227229the local CLI path. The UI stays honest: if the backend is unavailable, it shows
228230static demo mode and still supports local report import and built-in scenarios.
229231
230- The hosted backend currently installs ` agent-consistency ` from the pinned public
231- GitHub commit in ` requirements.txt ` , aligned with the current scanner-enabled
232- ` agent-consistency ` 0.3.2 source. After PyPI is confirmed to have the same
233- scanner APIs, switch the dependency back to a PyPI range such as
234- ` agent-consistency>=0.3.2,<0.4.0 ` .
232+ The hosted backend installs ` agent-consistency>=0.3.5,<0.4.0 ` from PyPI. That
233+ version includes repo applicability, grouped findings, raw exposure, and
234+ conservative low-confidence wording for weak matches.
235235
236236To deploy the backend:
237237
0 commit comments