A comprehensive REST API for detecting plagiarism in source code files. Built using FastAPI for the REST API and RabbitMQ with Celery-style workers for background processing.
Using the REST API, users submit code files for plagiarism analysis. The worker processes the files asynchronously and stores the results in PostgreSQL. If an error occurs during processing, the task is sent to a dead-letter queue for later review.
Python version 3.11+ is required for the web application to work correctly
- src/ - FastAPI application with plagiarism detection endpoints
- worker/ - Background worker for processing plagiarism checks
- frontend/ - React frontend application
- database/ - Database migrations
- docker-compose.yml - Docker orchestration for all services
- scripts/ - Setup and testing scripts
- Docker
- Docker Compose
Run the comprehensive setup script:
./scripts/setup-complete.shThis will:
- β Check all prerequisites
- β
Generate secure
.envfile with random passwords - β Create necessary directories
- β Build all Docker images
- β Start all services (API, Worker, Database, RabbitMQ)
- β Wait for services to be healthy
- β Run a quick health check
That's it! Your application will be running at:
- API: http://localhost:8000
- API Docs: http://localhost:8000/docs
- RabbitMQ Management: http://localhost:15672
For development with hot reload enabled:
./scripts/setup-complete.sh devThis starts the API with hot reload for rapid development.
If you prefer manual control:
# Generate environment file
./setup.sh
# Build and start all services
docker-compose up -d --build
# View logs
docker-compose logs -fSee Manual Setup section below.
To verify everything is working correctly:
./scripts/test-integration.shThis will test:
- β API health endpoints
- β Database connectivity
- β RabbitMQ connectivity
- β File upload functionality
- β Plagiarism check workflow
- β Frontend serving
curl http://localhost:8000/healthThe frontend is built with React and TypeScript.
Prerequisites: Node.js 20.19+ or 22.12+ is required.
cd frontend
npm install
npm run devThe frontend dev server runs at http://localhost:3000
For API development with hot reload:
./scripts/setup-complete.sh devOr manually:
docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -dcd src
pytest./scripts/test-integration.sh# Run 10 concurrent health checks
for i in {1..10}; do
curl -s http://localhost:8000/health &
done
waitWhen the API is running, visit:
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
The API supports authentication via API keys in addition to JWT tokens. This is useful for programmatic access, CI/CD pipelines, and third-party integrations.
- Login via
/auth/loginto get a JWT token - POST to
/auth/api-keyswith{"name": "my-key-name", "expires_in_days": 30} - Save the returned
raw_keyvalue - it won't be shown again!
Example:
# First, get a JWT token
curl -X POST http://localhost:8000/plagitype/auth/login \
-H "Content-Type: application/json" \
-d '{"email":"admin@example.com","password":"your-password"}'
# Use the token to create an API key
curl -X POST http://localhost:8000/plagitype/auth/api-keys \
-H "Authorization: Bearer YOUR_JWT_TOKEN" \
-H "Content-Type: application/json" \
-d '{"name":"My API Key","expires_in_days":30}'Include the X-API-Key header in your requests:
curl -H "X-API-Key: YOUR_API_KEY" \
http://localhost:8000/plagitype/plagiarism/tasks- List keys:
GET /auth/api-keys(requires authentication) - Create key:
POST /auth/api-keys(requires authentication) - Revoke key:
DELETE /auth/api-keys/{key_id}(requires authentication)
API keys can also be managed through the web UI in the Settings page at /dashboard/settings.
- API keys have the same permissions as the user who created them
- Keys are hashed (SHA-256) before storage - the raw key is only returned once during creation
- Expired keys are automatically rejected
- Revoked keys are permanently deleted
The system uses a producer-consumer pattern:
- API receives file uploads and publishes tasks to RabbitMQ
- Worker consumes tasks from the queue and performs plagiarism analysis
- Database stores task status and results
- Dead Letter Queue handles failed tasks for retry or review
- Inverted Index (Redis) enables fast candidate filtering for cross-task comparisons
The system includes an Inverted Index for efficient cross-task plagiarism detection:
- Without Inverted Index: O(nΓm) comparisons (new files Γ all existing files)
- With Inverted Index: Only O(nΓk) comparisons (new files Γ viable candidates)
The inverted index uses Redis to store fingerprint-to-files mappings, allowing the system to:
- Index all file fingerprints as they're processed
- Quickly find candidate files that share significant fingerprint overlap
- Skip detailed analysis for files below the similarity threshold (default: 15%)
This dramatically reduces processing time when the database contains thousands of files.
Configuration:
- Set
INVERTED_INDEX_MIN_OVERLAP_THRESHOLDin.env(default: 0.15 for 15%) - Lower values = more thorough but slower
- Higher values = faster but may miss borderline cases
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
β Client ββββββΆβ API (8000) ββββββΆβ PostgreSQL β
βββββββββββββββ ββββββββ¬ββββββββ βββββββββββββββ
β
βΌ
ββββββββββββββββ
β RabbitMQ β
ββββββββ¬ββββββββ
β
βΌ
ββββββββββββββββ
β Worker β
ββββββββββββββββ
- Go to the directory
src
cd src- Create
.envfile:
touch .env- Add environment variables:
DB_HOST=localhost
DB_PORT=5432
DB_NAME=plagiarism_db
DB_USER=appuser
DB_PASS=password
RMQ_HOST=localhost
RMQ_PORT=5672
RMQ_USER=guest
RMQ_PASSWORD=guest
RMQ_QUEUE_EXCHANGE=plagiarism
RMQ_QUEUE_ROUTING_KEY=plagiarism
RMQ_QUEUE_NAME=plagiarism_queue
RMQ_QUEUE_DEAD_LETTER_EXCHANGE=plagiarism_dlx
RMQ_QUEUE_ROUTING_KEY_DEAD_LETTER=plagiarism.dead
RMQ_QUEUE_DEAD_LETTER_NAME=plagiarism_dead
- Install packages:
pip install -r requirements.txt- Run the API:
uvicorn app:app --reload- Go to the directory
worker
cd worker-
Create
.envfile with the same variables as above -
Install packages:
pip install -r requirements.txt- Run the worker:
python3 worker.pyTo set up the database schema:
- Go to the directory
database
cd database- Create
.envfile:
DB_HOST=localhost
DB_PORT=5432
DB_NAME=plagiarism_db
DB_USER=appuser
DB_PASS=password
- Run migrations:
alembic upgrade head# Build and start all services
docker-compose up -d --build
# View logs
docker-compose logs -f
# View specific service logs
docker-compose logs -f api
# Stop all services
docker-compose down
# Stop and remove volumes (clears all data)
docker-compose down -v
# Restart a service
docker-compose restart api
# Scale workers
docker-compose up -d --scale worker=4The setup script automatically generates secure passwords for:
- Database password
- RabbitMQ password
- API secret key
Never commit the .env file! It's already in .gitignore.
See .env.example for all available environment variables and their descriptions.
- Fork the repository
- Create a feature branch
- Run tests:
./scripts/test-integration.sh - Submit a pull request
This project is for educational use at the university.
# Check what's running
docker-compose ps
# View logs
docker-compose logs <service-name>
# Restart everything
docker-compose down -v
docker-compose up -d --buildThe frontend is built into the API Docker image. If you need to rebuild:
docker-compose build --no-cache api
docker-compose up -dMake sure the database is healthy before starting the API:
docker-compose up -d postgres
sleep 10
docker-compose up -d api worker