A comprehensive research project evaluating multiple AI text detection approaches for identifying AI-generated and humanized AI text.
- Athena Baseline: 98.82% accuracy
- Athena Improved: 98.92% accuracy
- Athena User Humanized: 98.90% accuracy (specialized for Undetectable.ai)
- TF-IDF (99.17% test, failed in real-world)
- Structure Detector (86.83%)
- Hybrid TF-IDF+Structure (98.80%, failed in real-world)
- Perplexity Single Feature (90%, failed)
- Enhanced Perplexity (70.30%, failed)
- Transformer (99.84% test, failed in real-world)
- Athena Baseline (98.82%, SUCCESS)
- Athena Improved (98.92%, SUCCESS)
- Athena User Humanized (98.90%, SUCCESS)
- Test accuracy does not equal real-world performance
- Dataset quality matters more than model complexity
- Humanizer detection is tool-specific, not universal
- Training on Undetectable.ai samples enables detection of that specific humanizer
- Python 3.8+
- CUDA-capable GPU (recommended for training)
- Clone this repository:
git clone https://github.com/yourusername/scifair.git
cd scifair- Install dependencies:
pip install -r requirements.txt- For GPU support with PyTorch, visit PyTorch.org for CUDA-specific installation instructions.
scifair/
├── analysis/ # Analysis scripts for model behavior
├── docs/ # Documentation and research findings
├── results/ # JSON result files from experiments
├── scripts/
│ ├── training/ # Model training scripts (11 files)
│ │ ├── athena_train*.py
│ │ ├── *_detector.py
│ │ └── retrain_detectors.py
│ ├── testing/ # Model testing scripts (10 files)
│ │ └── test_*.py
│ └── analysis/ # Script analysis tools
└── util/ # Utility functions
# Test baseline model
python scripts/testing/test_athena.py
# Test with adjusted threshold (5% instead of 50%)
python scripts/testing/test_athena_threshold.py baseline
# Test Undetectable.ai specialist
python scripts/testing/test_athena_threshold.py user# Train Athena baseline
python scripts/training/athena_train.py
# Train improved version
python scripts/training/athena_train_improved.py
# Train specialized humanized detector
python scripts/training/athena_train_user_humanized.pyNote: Large model files and datasets are excluded from this repository due to size constraints.
You'll need to prepare your own datasets with the following structure:
- Training data: CSV files with
textandlabelcolumns - Label 0: Human-written text
- Label 1: AI-generated text
- Label 2: Humanized AI text (optional, for specialized models)
text,label
"Human written text example",0
"AI generated text example",1
"Humanized AI text example",2docs/RESULTS_SUMMARY.md- Complete results summarydocs/COMPLETE_RESULTS.md- Detailed analysisresults/- JSON result files from all experimentsPROJECT_STRUCTURE.md- Detailed explanation of project structure
This project systematically evaluated various approaches to AI text detection:
- Traditional ML: TF-IDF with Logistic Regression
- Structural Analysis: Sentence length, punctuation patterns
- Perplexity-based: Using GPT-2 perplexity scores
- Transformer-based: Fine-tuned DistilBERT models (Athena)
This project builds upon the Athena AI detector framework. The original Athena dataset and baseline model provided the foundation for our improvements and specialized variants.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions, issues, and feature requests are welcome! Feel free to check the issues page.
This research is for educational purposes. AI detection is an evolving field, and no detector is 100% accurate. Use these tools responsibly and in conjunction with other verification methods.