PSA-core — Posture Sequence Analysis Engine

Multi-classifier behavioral analysis engine for LLM responses.

PSA-core is the standalone engine that powers PSA. It classifies every AI response into behavioral postures, then derives metrics from posture sequences to detect adversarial stress, sycophancy, hallucination risk, persuasion techniques, input pressure, and agentic behavioral drift — in real time.

For the full web application (FastAPI, dashboards, billing, REST API), see the PSA repository.

Components

Component	Function
PSA v2	7 micro-classifiers (C0–C4, C3-v3, CA), DRM session-level risk engine, SIGTRACK v2 incident archive, CPF3 behavioral snapshot analysis
PSA Human Layer	Longitudinal behavioral profile of the human (Layers 1–4), built across sessions
PSA v3	Multi-agent analysis — Swiss Cheese detection (SCS), contagion metrics (PPI, CAHS, WLS, CER, AGM), action-risk classification (C5/PAI), HMM temporal prediction, swarm coordination, corpus-wide intelligence
PSA-RAG (RDM)	Retrieval Drift Monitor — detects context-biased RAG retrieval (FPC + RDS) for legal, health, finance
Browser Extension	Chrome MV3 — real-time PSA monitoring + PSA Legal extension (RDM-powered)

Requirements

API key from splabs.io/settings — Pro or Enterprise plan.

Quick Start

curl -X POST https://splabs.io/api/v2/psa/analyze \
  -H "Authorization: Bearer psa_your_key" \
  -H "Content-Type: application/json" \
  -d '{"response_text": "Of course, I would be happy to help!", "dry_run": true}'

{
  "c1": { "postures": [5], "poi": 0.0, "pe": 0.0, "dpi": 0.31, "mps": 0 },
  "c2": { "postures": [2], "sd": 0.82 },
  "c3": { "postures": [0], "hri": 0.0 },
  "c4": { "postures": [0], "pd": 0.0, "td": 0 },
  "bhs": 0.67,
  "alert": "yellow",
  "dry_run": true
}

See API.md for the full endpoint reference.

PSA v2 — Classifiers

Micro-classifiers sharing a fine-tuned MiniLM embedding backbone (384-dim, L2-normalised, ONNX runtime):

ID	Name	Code prefix	Classes	Classifies	Detects
C0	Input Pressure	I0–I9	10	User messages	Override commands, authority claims, emotional loading, jailbreak attempts
C1	Adversarial Stress	P0–P20	21	Model responses	Boundary erosion — RESTRICT vs. CONCEDE vs. SOFT posture
C2	Sycophancy Delta	S0–S9	10	Model responses	Agreement creep, validation seeking, opinion mirroring
C3	Hallucination Risk	H0–H7	8	Model responses	Over-specification, phantom attribution, confidence-hedge mismatch
C4	Persuasion Density	M0–M11	12	Model responses	Framing, anchoring, authority, social proof, scarcity, reciprocity
C3-v3	Agentic Behavioral Stability	G0–G10	11	Agent turns	Boundary dissolution, role capture, epistemic overconfidence, conceptual substitution
CA	Inter-Agent Pressure	A0–A11	12	Agent-to-agent messages	Authority spoofing, constraint removal, cascade amplification, anomaly suppression

H-layer (user-side classifiers, used in Human Profile feature):

ID	Code prefix	Classes	Detects
H2	0–5	6	Relational dynamics — validation seeking, agency erosion, dependency
H3	0–4	5	Cognitive patterns — rigidity, reality anchoring, distortion, semantic compression
H4	0–3	4	Social dynamics — legibility adaptation, reciprocity expectation, social substitution
H5	0–3	4	Adversarial patterns — manipulation, ideological drift, radicalization

Inference Pipeline

sentence → MiniLM encoder (ONNX / ST fallback) → 384-dim embedding
         → MLP head (2–3 layers) → softmax → (label, confidence)

ONNX path: encoder.onnx + {clf}_head.npz — < 1 ms/sentence
Fallback: sentence-transformers from HuggingFace
All heads use minimum 2-layer MLP; C3-v3 uses 3-layer (512→256→11)

PSA v2 Metrics

All metrics returned per turn by POST /api/v2/psa/analyze:

Metric	Full Name	Range	Description
BHS	Behavioral Health Score	0–1	Per-turn composite health. Low = degraded. `1 − (0.4×POI + 0.2×SD + 0.2×HRI + 0.2×PD×TD)`
POI	Posture Oscillation Index	0–1	Variability of C1 postures across turns. High = unstable — no stable boundary.
PE	Posture Entropy	0 to log₂(N)	Shannon entropy of posture distribution. Low = uniform (normal or post-dissolution); High = active stress.
DPI	Dissolution Position Index	0–1	Normalised mean ordinal position of CONCEDE/RESTRICT postures. 0 = no concession; ≥ 0.53 = active dissolution.
MPS	Max Posture Span	0 to 20	Range of posture indices in a single response. High = wide behavioral range = high stress.
CPI	Contextual Pressure Index	0–1	Adversarial pressure from user input (C0-derived). High = high user pressure.
IRS	Input Risk Score	0–1	Clinical risk in user message — suicidality, dissociation, grandiosity, urgency.
RAS	Response Alignment Score	0–1	Alignment of model response with guidelines. Sub-signals: `boundary_maintained`, `crisis_acknowledgment`, `reality_grounding`.
BCS	Boundary Compliance Score	0–1	Per-turn user boundary adherence. Rising BCS slope + rising SD = R6-Spiraling (DRM orange).
SD	Sycophancy Delta	0–1	Session-level sycophancy accumulation from C2.
HRI	Hallucination Risk Index	0–1	Hallucination risk from C3. High = confabulation signals.
PD	Persuasion Density	0–1	Persuasion technique density from C4.
ABI	Agentic Behavioral Index	0–1	Agentic stability from C3-v3 G-class distribution. ≥ 0.50 = hard stop.
DRM	Dyadic Risk Module alert	green/yellow/orange/red	Session-level dyadic risk. Seven detection rules (R1–R7).
OCRS	Organizational Coercion Risk Score	0–1	Contextual external pressure: `0.30·employment_distress + 0.30·financial_conflict + 0.20·academic_pressure + 0.20·authority_coercion`. Safety override if any dim ≥ 0.60. Levels: none / low / medium / high / critical.
User ACT	User Adversarial Coherence Tracker	0–1	Linguistic disruption composite: `0.35·(1−ttr) + 0.25·entropy + 0.20·staccato_ratio + 0.20·(1−hedge_ratio)`. > 0.5 = significant disruption; < 0.2 = normal.

BHS thresholds:

Range	Level
≥ 0.70	Green
≥ 0.50	Yellow
≥ 0.30	Orange
≥ 0.15	Red
< 0.15	Critical

ABI thresholds (C3-v3):

ABI	Action
≥ 0.50	Hard stop — re-read source, re-verify, re-draft
0.25–0.49	Rephrase — partial drift detected
< 0.25	Continue — stable

DRM — Dyadic Risk Module

Session-level engine combining IRS, RAS, PSA metrics, and BCS slope:

Rule	Level	Trigger
R1-Pressure	Yellow	Elevated CPI + medium+ IRS
R2-Sycophancy	Yellow	Elevated SD over session
R3-Dissolution	Red	POI + DPI + critical IRS
R4-Contagion	Red	Affect metrics + high IRS
R5-Silence	Red	High CPI, near-zero POI
R6-Spiraling	Orange	BCS slope > 0.05/turn AND SD_avg > 0.30 AND IRS ≥ medium

R6-Spiraling detects a feedback loop: user grows more certain (rising BCS) while the model grows more sycophantic (rising SD).

SIGTRACK v2

Privacy-compliant incident archive. Stores posture sequences, not raw text.

Triggers: DRM_RED, BCS_SPIKE (> 0.5 BHS drop), CONSECUTIVE_ORANGE (3+), ACUTE_COLLAPSE, MANUAL_FLAG

GDPR erasure: Single-row DELETE — no cascade, no raw text.

Verifiable certificate export: any incident can be exported as a self-contained JSON certificate, anchored to the drand public randomness beacon and chained via SHA-256. PSA holds no signing key — verification (integrity + time + chain) runs entirely against public infrastructure, so it does not require trusting PSA. See API.md → Certificate Export.

PSA v3 — Multi-Agent Metrics

Metric	Range	Description
PPI — Posture Propagation Index	−1 to 1	Concession contagion probability across an edge. Positive = contagious; negative = unexpected capitulation.
Cascade Depth	0 to N	Longest chain of consecutive CONCEDE agents on any path. ≥ 3 = critical.
WLS — Weakest Link Score	0–1	Minimum BHS on the critical path. < 0.2 = critical.
AGM — Alignment Gap Matrix	0–1 per cell	N×N posture divergence matrix across all agent pairs.
CER — Context Erosion Rate	0–1	Rate at which adversarial context is lost through the graph. 0 = preserved; 1 = total loss.
CAHS — Cross-Agent Health Score	0–1	Composite: `BHS_system × (1−
SCS — Swiss Cheese Score	0–1	Bayesian failure probability on the critical path — detects aligned holes across the agent pipeline.
PAI — Posture-Action Incongruence	0–4	Mismatch between agent behavioral posture (BHS) and action risk level per tool call. High = dangerous action from conceding agent.

SCS thresholds:

Level	SCS
green	< 0.30
yellow	0.30–0.59
red	0.60–0.79
critical	≥ 0.80

C5 — Action-Risk Classifier

Classifies tool calls and code execution. Used to compute PAI.

Code	Name	Risk score
A0	Read-Only Safe	0.0
A1	Read Sensitive	1.0
A2	Write Safe	0.5
A3	Write Destructive	2.5
A4	Execute Safe	1.0
A5	Execute Risky	3.0
A6	Network Safe	0.5
A7	Network Exfiltration	3.5
A8	Privilege Escalation	3.5
A9	System Control	4.0

PSA v3 Modules

Module	File	Purpose
Graph Topology	`psa_v3/graph.py`	DAG of agent interactions
Swiss Cheese	`psa_v3/bayesian_scs.py`	Bayesian alignment failure detection
Contagion Metrics	`psa_v3/metrics.py` + `metrics_composite.py`	Cross-agent posture propagation
Action Classifier	`psa_v3/actions.py`	C5 action-risk + PAI
HMM Prediction	`psa_v3/temporal_hmm.py`	Future posture prediction

Additional v3 surfaces (see API.md): agent state & baseline (forward-algorithm HMM over the full agent history), causal attribution (Shapley-inspired SCS contribution per critical-path node), deterministic supervisor brief (plain-language reading, no LLM), swarm coordination (status + broadcast), and a corpus-wide corpus-intelligence endpoint (framework-agnostic aggregate analytics).

PSA Human Layer

Longitudinal behavioral profile of the human in the conversation, accumulated across sessions. Five layers; the API returns Layers 1–4 (Layer 5 is stored, never returned):

Layer	Focus
1	Input risk over time (IRS avg/max/trend)
2	Relational dynamics (validation-seeking, agency erosion, trust over/under, dependency)
3	Cognitive state (rigidity, reality anchoring, distortion, semantic compression)
4	Social adaptation (legibility, reciprocity expectation, social substitution)

Endpoints: GET /api/v2/psa/user/profile, GET /api/v2/psa/user/sessions, POST /api/v2/psa/user/profile/consent (grant/revoke professional access).

PSA-RAG — Retrieval Drift Monitor

Detects when conversational context biases a RAG pipeline into retrieving documents it would not retrieve on a clean query — the silent attack surface of retrieval-augmented LLMs. Scoped to three commercial domains: legal, health, finance. Powers the PSA Legal Chrome extension.

Component	Function
FPC — Framing Pressure Classifier	Detects framing pressure in user language: `neutral` / `semantic_drift` / `rhetorical_framing`. val_acc 95.7%, multilingual (en/it/fr/de/es)
RDS — Retrieval Drift Score	Measures actual retrieval divergence: `1 − Jaccard(context_docs, topic_docs)`; `rds_rank = 1 − RBO` catches reorder-only steering
Consistency Score	Retrieval stability across query paraphrases
attack_class	Compound taxonomy: `clean` · `framing_only` · `topical_drift` · `rank_steering` · `vocab_injection` · `compound`

Verdicts: drift (RDS ≥ 0.70) · weak_signal (≥ 0.35) · stable (< 0.35). Endpoints: POST /api/v2/rag/score, POST /api/v2/rag/fpc, plus summary / sessions / analytics reads. See API.md → PSA-RAG.

CPF3 — Contextual Pattern Framework v3

Analyzes structured behavioral snapshots. Does not receive raw text — caller sends pre-computed indicators in a snapshot payload.

Output: CPF score (0–100), risk level (GREEN/YELLOW/RED), per-category breakdown, L2 model classification, longitudinal forecast.

Alert thresholds — vary by subject type:

subject_type	YELLOW	RED
`human`	≥ 10	≥ 30
`ai_agent`	≥ 5	≥ 15
`human+ai`	≥ 7	≥ 20

AI-involved subjects use lower thresholds because PSAv2/PSAv3 signals are pre-calibrated and more reliable than proxy indicators in the human-only path.

Scoring categories:

#	Category
1	Authorization & Access
2	Information Handling
3	Behavioral Deviations
4	Affect & Emotional State
5	Communication Patterns
6	Technical Footprint
7	Organizational Context
8	Unconscious Signals
9	AI & Automation (uses PSAv2/PSAv3 inputs)
10	Network & Relationship

C1 Posture Reference (P0–P20)

Groups: RESTRICT {P1–P4, P7, P8, P16} | CONCEDE {P5, P6, P9–P15} | SOFT {P17–P20} | NEUTRAL {P0}

Code	Name	Zone
P0	neutral_baseline	—
P1	hard_refusal	RESTRICT
P2	safety_reference	RESTRICT
P3	conditional_compliance	RESTRICT
P4	meta_referential_deflection	RESTRICT
P5	authority_appeal	CONCEDE
P6	partial_compliance	CONCEDE
P7	excessive_hedging	RESTRICT
P8	topic_deflection	RESTRICT
P9	progressive_boundary_softening	CONCEDE
P10	framing_concession	CONCEDE
P11	boundary_ambiguity	CONCEDE
P12	qualified_compliance	CONCEDE
P13	full_compliance_under_pressure	CONCEDE
P14	stance_reversal	CONCEDE
P15	complete_dissolution	CONCEDE
P16	flat_assertiveness	RESTRICT
P17	temporal_deferral	SOFT
P18	selective_omission	SOFT
P19	narrative_inflation	SOFT
P20	self_exculpatory_revision	SOFT

For the full posture reference including C0, C2–C4, C3-v3, CA, and H-layer, see tutorials/03-posture-reference.md.

Regime Shifts

Type	Pattern	Meaning
Progressive Drift	Slow monotonic BHS decline	Boundaries eroding under pressure
Boundary Oscillation	Alternating posture modes	Unstable boundary
Acute Collapse	Sudden BHS discontinuity	Specific input triggers shift
Sub-Threshold Migration	Below per-turn thresholds	Silent drift — multi-session only
Boundary Instability	C1-POI std > 0.25	Training gap in this domain

Browser Extension

Chrome MV3 extension for real-time PSA monitoring. Location: app/static/extension/

Files:

manifest.json — Extension metadata (MV3)
background.js — Service Worker for API communication
content.js — Page injection and message monitoring
sidebar.html/js/css — Dashboard UI with Chart.js visualization
admin.html/js/css — Settings and configuration panel
popup.html/js/css — Quick status view
icons/ — Extension icons (16, 48, 128px)
INSTALL.md — Installation instructions
README.md — Extension documentation

Essays

Strategic and philosophical reading of PSA — each bilingual (EN/IT) and ending with a PSA self-analysis of its own text. See essays/. Most recent: Alignment Is an Ecosystem Property — a reading of Emergence World through behavioral telemetry.

Authors

Giuseppe Canale, Kashyap Thimmaraju — SiliconPsycheLabs

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
app/static/extension		app/static/extension
essays		essays
legal-extension		legal-extension
scripts		scripts
sdk-js		sdk-js
sdk-python		sdk-python
tutorials		tutorials
API.md		API.md
CLAUDE.md		CLAUDE.md
FIELD_GUIDE.md		FIELD_GUIDE.md
README.md		README.md
SiliconPsycheLabs_Pitch.pptx		SiliconPsycheLabs_Pitch.pptx

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PSA-core — Posture Sequence Analysis Engine

Components

Requirements

Quick Start

PSA v2 — Classifiers

Inference Pipeline

PSA v2 Metrics

DRM — Dyadic Risk Module

SIGTRACK v2

PSA v3 — Multi-Agent Metrics

C5 — Action-Risk Classifier

PSA v3 Modules

PSA Human Layer

PSA-RAG — Retrieval Drift Monitor

CPF3 — Contextual Pattern Framework v3

C1 Posture Reference (P0–P20)

Regime Shifts

Browser Extension

Essays

Related

Authors

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PSA-core — Posture Sequence Analysis Engine

Components

Requirements

Quick Start

PSA v2 — Classifiers

Inference Pipeline

PSA v2 Metrics

DRM — Dyadic Risk Module

SIGTRACK v2

PSA v3 — Multi-Agent Metrics

C5 — Action-Risk Classifier

PSA v3 Modules

PSA Human Layer

PSA-RAG — Retrieval Drift Monitor

CPF3 — Contextual Pattern Framework v3

C1 Posture Reference (P0–P20)

Regime Shifts

Browser Extension

Essays

Related

Authors

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages