Skip to content

yanjiaju16-art/orgsense

Repository files navigation

OrgSense: LLMs as Organizational Sensors for Strategic Drift

Official implementation of the OrgSense framework and StratDrift-10K corpus from:

Beyond Sentiment: Large Language Models as Organizational Sensors for Detecting Strategic Drift in Corporate Communications
Justin Yan — Baylor University


Overview

Organizations use language to manage institutional expectations and negotiate legitimacy with external stakeholders. OrgSense repurposes instruction-tuned LLMs as theory-grounded organizational sensors capable of detecting strategic drift — the gradual misalignment between an organization's stated strategy and its enacted behaviors.

Key results

Model Zero-Shot F1 Few-Shot F1 vs. Expert
GPT-4o + OrgSense 0.78 0.81 +4.1 pts
Claude 3.5 Sonnet + OrgSense 0.76 0.79 +2.1 pts
Gemini 1.5 Pro + OrgSense 0.73 0.77 +0.1 pts
Llama-3-70B + OrgSense 0.69 0.74 −2.9 pts
Fine-tuned RoBERTa-large 0.72 −4.9 pts
Domain Expert Baseline 0.77 0.0 pts

LLM-extracted drift signals predict financial restatements (AUC = 0.74) and unplanned CEO turnover (AUC = 0.69) in prospective holdout data.


Repository structure

orgsense/
├── orgsense/
│   ├── __init__.py               # Package exports
│   ├── framework.py              # OrgSense prompting framework (Section 4)
│   ├── corpus.py                 # StratDrift-10K corpus construction (Section 3)
│   ├── annotation.py             # Annotation schema & IRR computation (Section 3.3)
│   ├── benchmark.py              # Benchmarking & ablation harness (Section 5)
│   └── predictive_validity.py    # Downstream validity analyses (Section 6)
├── scripts/
│   ├── run_benchmark.py          # Full multi-model benchmark runner
│   ├── collect_corpus.py         # SEC EDGAR corpus collection
│   └── run_predictive_validity.py
├── tests/
│   └── test_orgsense.py          # Unit & integration tests
├── configs/                      # Model and task configuration files
├── notebooks/                    # Analysis notebooks (see below)
├── requirements.txt
└── pyproject.toml

Installation

git clone https://github.com/justinyan-baylor/orgsense.git
cd orgsense
pip install -e ".[dev]"

Set environment variables for your LLM providers:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."

Quickstart

Analyze a single document

from openai import OpenAI
from orgsense import OrgSenseAnalyzer

client = OpenAI()
analyzer = OrgSenseAnalyzer(client=client, model="gpt-4o")

# Q4 earnings call transcript text
transcript = """
Our cloud infrastructure investment remains our top strategic priority.
We are doubling down on this initiative despite near-term revenue headwinds.
European segment revenue declined 8% year-over-year due to macroeconomic factors.
"""

# Optional: prior-year transcript for Consistency scoring
prior_transcript = """
We are strategically pivoting toward on-premise enterprise solutions
as the core of our long-term value proposition.
"""

# Optional: 10-K MD&A for Audience Divergence scoring
mda_text = """
Management is evaluating strategic priorities and may adjust resource
allocation in the coming fiscal year given market conditions.
"""

result = analyzer.analyze(
    target_text=transcript,
    prior_text=prior_transcript,
    cross_document_text=mda_text,
)

print(f"Composite drift score: {result.composite_drift_score:.2f} / 5.0")
print(f"Consistency:           {result.consistency.score} / 5")
print(f"Commitment escalation: {result.commitment_escalation.score} / 5")
print(f"Hedging asymmetry:     {result.hedging_asymmetry.score} / 5")
print(f"Audience divergence:   {result.audience_divergence.score} / 5")
print(f"\nJustification: {result.composite_justification}")

if result.hallucination_flags:
    print(f"\nWarning: {len(result.hallucination_flags)} unverified evidence quote(s)")

Run the full benchmark

python scripts/run_benchmark.py \
    --test-annotations data/annotations/test_set.jsonl \
    --bundles-dir      data/corpus \
    --output           results/benchmark_results.json \
    --models gpt-4o claude-3-5-sonnet-20241022 \
    --shot-modes zero_shot few_shot \
    --max-docs 100

Build the corpus from SEC EDGAR

python scripts/collect_corpus.py \
    --sp500-tickers    data/sp500_tickers.csv \
    --years 2003 2023 \
    --output-dir       data/corpus \
    --user-agent       "Your Name your@email.edu" \
    --earnings-calls-dir data/earnings_calls

Run predictive validity analyses (Section 6)

python scripts/run_predictive_validity.py \
    --drift-scores results/drift_scores.csv \
    --compustat    data/compustat_annual.csv \
    --restatements data/audit_analytics_restatements.csv \
    --ceo-turnover data/execucomp_departures.csv \
    --output       results/predictive_validity.json

OrgSense prompt architecture (Section 4.2)

The framework uses a four-component prompt structure:

Component Purpose
1. Theoretical Context Communicates the construct's theoretical meaning to the model
2. Dimensional Scaffolding Decomposes strategic drift into four sub-questions answered sequentially
3. Evidence Requirements Requires verbatim quote anchoring for each judgment
4. Output Schema Structured JSON with Likert scores, quotes, confidence, and justification

A calibration block (three annotated examples) is appended by default, reducing hallucination rates from 8.9% to 3.2% (Section 5.4).

Four dimensions of strategic drift

Dimension Theoretical basis F1 (best model)
Consistency Burgelman (2002); Zajac & Shortell (1989) 0.86
Hedging Asymmetry Staw et al. (1983) 0.83
Commitment Escalation Staw (1981) 0.77
Audience Divergence Gioia & Chittipeddi (1991) 0.72

Data

StratDrift-10K corpus

The corpus comprises 10,847 document-year observations across 487 S&P 500 firms (2003–2023). Each observation includes:

  • Q4 earnings call transcript
  • 10-K MD&A section (Item 7)
  • 10-K Risk Factors (Item 1A)
  • 127,412 paragraph-level annotations across four dimensions

Access: Due to data licensing restrictions (Refinitiv Eikon; SEC EDGAR terms), the raw corpus cannot be distributed directly. The annotation JSONL files, codebook, and all prompts are released in this repository. SEC EDGAR 10-K filings can be collected freely using scripts/collect_corpus.py. Earnings call transcripts require a Refinitiv Eikon or equivalent subscription.

Data directory layout

data/
├── sp500_tickers.csv           # S&P 500 constituent tickers (2003–2023)
├── annotations/
│   ├── train_set.jsonl         # Training annotations (70%)
│   ├── val_set.jsonl           # Validation annotations (15%)
│   └── test_set.jsonl          # Test annotations (15%, stratified)
├── few_shot_examples.jsonl     # 8 curated (prompt, response) pairs
├── corpus/                     # Built by collect_corpus.py
│   └── {TICKER}/{YEAR}/
│       ├── earnings_call.txt
│       ├── 10k_mda.txt
│       ├── 10k_risks.txt
│       └── bundle.json
└── codebook/
    └── annotation_codebook.pdf  # Full annotation codebook with examples

Testing

pytest tests/ -v --cov=orgsense --cov-report=term-missing

Citation

@article{yan2024orgsense,
  title   = {Beyond Sentiment: Large Language Models as Organizational Sensors
             for Detecting Strategic Drift in Corporate Communications},
  author  = {Yan, Justin},
  year    = {2024},
  journal = {Working Paper},
  institution = {Baylor University}
}

License

MIT License. See LICENSE for details.

The StratDrift-10K annotation files are released under CC BY 4.0. Raw transcript and filing data is subject to the terms of Refinitiv Eikon and SEC EDGAR respectively.

About

Official implementation of OrgSense — a theory-grounded LLM prompting framework for detecting strategic drift in corporate earnings calls and 10-K filings, with the StratDrift-10K annotation corpus and downstream predictive validity analyses.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages