Source Code Plagiarism Detection API

About

A comprehensive REST API for detecting plagiarism in source code files. Built using FastAPI for the REST API and RabbitMQ with Celery-style workers for background processing.

Using the REST API, users submit code files for plagiarism analysis. The worker processes the files asynchronously and stores the results in PostgreSQL. If an error occurs during processing, the task is sent to a dead-letter queue for later review.

Python version 3.11+ is required for the web application to work correctly

Repository Structure:

src/ - FastAPI application with plagiarism detection endpoints
worker/ - Background worker for processing plagiarism checks
frontend/ - React frontend application
database/ - Database migrations
docker-compose.yml - Docker orchestration for all services
scripts/ - Setup and testing scripts

🚀 Quick Start (One-Command Setup)

Prerequisites

Docker
Docker Compose

Option 1: Automatic Setup (Recommended)

Run the comprehensive setup script:

./scripts/setup-complete.sh

This will:

✅ Check all prerequisites
✅ Generate secure .env file with random passwords
✅ Create necessary directories
✅ Build all Docker images
✅ Start all services (API, Worker, Database, RabbitMQ)
✅ Wait for services to be healthy
✅ Run a quick health check

That's it! Your application will be running at:

API: http://localhost:8000
API Docs: http://localhost:8000/docs
RabbitMQ Management: http://localhost:15672

Option 2: Development Mode with Hot Reload

For development with hot reload enabled:

./scripts/setup-complete.sh dev

This starts the API with hot reload for rapid development.

Option 3: Manual Docker Setup

If you prefer manual control:

# Generate environment file
./setup.sh

# Build and start all services
docker-compose up -d --build

# View logs
docker-compose logs -f

Option 4: Manual Setup (Without Docker)

See Manual Setup section below.

🧪 Testing

Run Integration Tests

To verify everything is working correctly:

./scripts/test-integration.sh

This will test:

✅ API health endpoints
✅ Database connectivity
✅ RabbitMQ connectivity
✅ File upload functionality
✅ Plagiarism check workflow
✅ Frontend serving

Run Smoke Test (Quick)

curl http://localhost:8000/health

🛠️ Development

Frontend Development

The frontend is built with React and TypeScript.

Prerequisites: Node.js 20.19+ or 22.12+ is required.

cd frontend
npm install
npm run dev

The frontend dev server runs at http://localhost:3000

API Development

For API development with hot reload:

./scripts/setup-complete.sh dev

Or manually:

docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d

Running Tests

Unit Tests

cd src
pytest

Integration Tests

./scripts/test-integration.sh

Load Tests

# Run 10 concurrent health checks
for i in {1..10}; do
  curl -s http://localhost:8000/health &
done
wait

📊 API Documentation

When the API is running, visit:

Swagger UI: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Key Authentication

The API supports authentication via API keys in addition to JWT tokens. This is useful for programmatic access, CI/CD pipelines, and third-party integrations.

Creating an API Key

Login via /auth/login to get a JWT token
POST to /auth/api-keys with {"name": "my-key-name", "expires_in_days": 30}
Save the returned raw_key value - it won't be shown again!

Example:

# First, get a JWT token
curl -X POST http://localhost:8000/plagitype/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email":"admin@example.com","password":"your-password"}'

# Use the token to create an API key
curl -X POST http://localhost:8000/plagitype/auth/api-keys \
  -H "Authorization: Bearer YOUR_JWT_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"name":"My API Key","expires_in_days":30}'

Using an API Key

Include the X-API-Key header in your requests:

curl -H "X-API-Key: YOUR_API_KEY" \
  http://localhost:8000/plagitype/plagiarism/tasks

Managing API Keys

List keys: GET /auth/api-keys (requires authentication)
Create key: POST /auth/api-keys (requires authentication)
Revoke key: DELETE /auth/api-keys/{key_id} (requires authentication)

API keys can also be managed through the web UI in the Settings page at /dashboard/settings.

Security Notes

API keys have the same permissions as the user who created them
Keys are hashed (SHA-256) before storage - the raw key is only returned once during creation
Expired keys are automatically rejected
Revoked keys are permanently deleted

🏗️ Architecture

The system uses a producer-consumer pattern:

API receives file uploads and publishes tasks to RabbitMQ
Worker consumes tasks from the queue and performs plagiarism analysis
Database stores task status and results
Dead Letter Queue handles failed tasks for retry or review
Inverted Index (Redis) enables fast candidate filtering for cross-task comparisons

Performance Optimization

The system includes an Inverted Index for efficient cross-task plagiarism detection:

Without Inverted Index: O(n×m) comparisons (new files × all existing files)
With Inverted Index: Only O(n×k) comparisons (new files × viable candidates)

The inverted index uses Redis to store fingerprint-to-files mappings, allowing the system to:

Index all file fingerprints as they're processed
Quickly find candidate files that share significant fingerprint overlap
Skip detailed analysis for files below the similarity threshold (default: 15%)

This dramatically reduces processing time when the database contains thousands of files.

Configuration:

Set INVERTED_INDEX_MIN_OVERLAP_THRESHOLD in .env (default: 0.15 for 15%)
Lower values = more thorough but slower
Higher values = faster but may miss borderline cases

Service Architecture

┌─────────────┐     ┌──────────────┐     ┌─────────────┐
│   Client    │────▶│   API (8000) │────▶│  PostgreSQL │
└─────────────┘     └──────┬───────┘     └─────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │   RabbitMQ   │
                    └──────┬───────┘
                           │
                           ▼
                    ┌──────────────┐
                    │    Worker    │
                    └──────────────┘

🔧 Manual Setup

Start the API

Go to the directory src

cd src

Create .env file:

touch .env

Add environment variables:

DB_HOST=localhost
DB_PORT=5432
DB_NAME=plagiarism_db
DB_USER=appuser
DB_PASS=password

RMQ_HOST=localhost
RMQ_PORT=5672
RMQ_USER=guest
RMQ_PASSWORD=guest

RMQ_QUEUE_EXCHANGE=plagiarism
RMQ_QUEUE_ROUTING_KEY=plagiarism
RMQ_QUEUE_NAME=plagiarism_queue
RMQ_QUEUE_DEAD_LETTER_EXCHANGE=plagiarism_dlx
RMQ_QUEUE_ROUTING_KEY_DEAD_LETTER=plagiarism.dead
RMQ_QUEUE_DEAD_LETTER_NAME=plagiarism_dead

Install packages:

pip install -r requirements.txt

Run the API:

uvicorn app:app --reload

Start the Worker

Go to the directory worker

cd worker

Create .env file with the same variables as above
Install packages:

pip install -r requirements.txt

Run the worker:

python3 worker.py

🗄️ Database Setup

To set up the database schema:

Go to the directory database

cd database

Create .env file:

DB_HOST=localhost
DB_PORT=5432
DB_NAME=plagiarism_db
DB_USER=appuser
DB_PASS=password

Run migrations:

alembic upgrade head

🐳 Docker Commands

# Build and start all services
docker-compose up -d --build

# View logs
docker-compose logs -f

# View specific service logs
docker-compose logs -f api

# Stop all services
docker-compose down

# Stop and remove volumes (clears all data)
docker-compose down -v

# Restart a service
docker-compose restart api

# Scale workers
docker-compose up -d --scale worker=4

🔒 Security

The setup script automatically generates secure passwords for:

Database password
RabbitMQ password
API secret key

Never commit the .env file! It's already in .gitignore.

📝 Environment Variables

See .env.example for all available environment variables and their descriptions.

🤝 Contributing

Fork the repository
Create a feature branch
Run tests: ./scripts/test-integration.sh
Submit a pull request

📄 License

This project is for educational use at the university.

🆘 Troubleshooting

Services won't start

# Check what's running
docker-compose ps

# View logs
docker-compose logs <service-name>

# Restart everything
docker-compose down -v
docker-compose up -d --build

Frontend not showing

The frontend is built into the API Docker image. If you need to rebuild:

docker-compose build --no-cache api
docker-compose up -d

Database connection issues

Make sure the database is healthy before starting the API:

docker-compose up -d postgres
sleep 10
docker-compose up -d api worker

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github/workflows		.github/workflows
cli		cli
database		database
frontend		frontend
plagiarism_core		plagiarism_core
rabbitmq-config		rabbitmq-config
rabbitmq		rabbitmq
scripts		scripts
shared		shared
src		src
tests		tests
worker		worker
.env.example		.env.example
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile.migrations		Dockerfile.migrations
Makefile		Makefile
README.md		README.md
api.Dockerfile		api.Dockerfile
db.Dockerfile		db.Dockerfile
docker-compose.profiling.yml		docker-compose.profiling.yml
docker-compose.test.yml		docker-compose.test.yml
docker-compose.yml		docker-compose.yml
grid_search_ultimate.py		grid_search_ultimate.py
package-lock.json		package-lock.json
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
setup.sh		setup.sh
uv.lock		uv.lock
worker.Dockerfile		worker.Dockerfile

Folders and files

Latest commit

History

Repository files navigation

Source Code Plagiarism Detection API

About

Repository Structure:

🚀 Quick Start (One-Command Setup)

Prerequisites

Option 1: Automatic Setup (Recommended)

Option 2: Development Mode with Hot Reload

Option 3: Manual Docker Setup

Option 4: Manual Setup (Without Docker)

🧪 Testing

Run Integration Tests

Run Smoke Test (Quick)

🛠️ Development

Frontend Development

API Development

Running Tests

Unit Tests

Integration Tests

Load Tests

📊 API Documentation

API Key Authentication

Creating an API Key

Using an API Key

Managing API Keys

Security Notes

🏗️ Architecture

Performance Optimization

Service Architecture

🔧 Manual Setup

Start the API

Start the Worker

🗄️ Database Setup

🐳 Docker Commands

🔒 Security

📝 Environment Variables

🤝 Contributing

📄 License

🆘 Troubleshooting

Services won't start

Frontend not showing

Database connection issues

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Contributors

Uh oh!

Languages