This is a simple tool that is designed to score financial transactions based on anamolies. The anamolies are trained on an unsurpervised machine learning model and then fed into an LLM to explain each flagged transaction in plain language.
The core idea of this project is to create a simple machine learning model that identifies transactional anamolies and generates an output in plain text language for an analyst at a bank.
The motivation is that anamoly scores alone are difficult for analyst to act on. Thus, an explanation of why a score is a possible fraud alert makes it easier for an analyst to triage.
The code has four key files which are compiled in the fifth file, main.py, to generate the output. The four files are:
- Synthetic Data: Because there was no real data available, this file generates synthetic data and plants them with anamolies.
- Feature Engineering: is a process that computes per users baseline and identifies anamolies based on z-score, rare-country and unusal-hour.
- Scoring: This flags the most unusal transactions in the generated data.
- Explanation: In this, flagged transactions are sent to LLM, which them narrates why each is suspicious.
- Python
- Pandas
- Scikit-Learn
- Anthropic API
- Streamlit
git clone cd transaction-anamoly-detector python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt echo "Antropic-API-Key = Your-key-here" > .env
python generate_data.py python main.py
user_id amount country hour anomaly_score 4516 46 53.48 UK 16 -0.000120 This transaction was flagged because the $53.48 charge is approximately 2x the user's median spend of $28, representing a 1.3 standard deviation increase from their normal spending pattern. Additionally, the transaction occurred in the UK, which is an unusual country for this user's typical transaction history.
- Currently the project uses synthetic data rather than real transactions.
- unusual hour rule is global rather than per user.
- Per user baselines are across all users data, which shows mild data leakage, and a production version should only have prior transactions.