Transaction Anamoly Detector with LLM Explainations

This is a simple tool that is designed to score financial transactions based on anamolies. The anamolies are trained on an unsurpervised machine learning model and then fed into an LLM to explain each flagged transaction in plain language.

Overview

The core idea of this project is to create a simple machine learning model that identifies transactional anamolies and generates an output in plain text language for an analyst at a bank.

Motivation

The motivation is that anamoly scores alone are difficult for analyst to act on. Thus, an explanation of why a score is a possible fraud alert makes it easier for an analyst to triage.

How it works

The code has four key files which are compiled in the fifth file, main.py, to generate the output. The four files are:

Synthetic Data: Because there was no real data available, this file generates synthetic data and plants them with anamolies.
Feature Engineering: is a process that computes per users baseline and identifies anamolies based on z-score, rare-country and unusal-hour.
Scoring: This flags the most unusal transactions in the generated data.
Explanation: In this, flagged transactions are sent to LLM, which them narrates why each is suspicious.

Tech Stack

Python
Pandas
Scikit-Learn
Anthropic API
Streamlit

Setup

git clone cd transaction-anamoly-detector python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt echo "Antropic-API-Key = Your-key-here" > .env

Usage

python generate_data.py python main.py

Example Output

user_id amount country hour anomaly_score 4516 46 53.48 UK 16 -0.000120 This transaction was flagged because the $53.48 charge is approximately 2x the user's median spend of $28, representing a 1.3 standard deviation increase from their normal spending pattern. Additionally, the transaction occurred in the UK, which is an unusual country for this user's typical transaction history.

Limitations

Currently the project uses synthetic data rather than real transactions.
unusual hour rule is global rather than per user.
Per user baselines are across all users data, which shows mild data leakage, and a production version should only have prior transactions.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
README.md		README.md
Streamlit Screenshot.png		Streamlit Screenshot.png
dashboard.py		dashboard.py
explain.py		explain.py
features.py		features.py
generate_data.py		generate_data.py
main.py		main.py
requirements.txt		requirements.txt
score.py		score.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Transaction Anamoly Detector with LLM Explainations

Overview

Motivation

How it works

Tech Stack

Setup

Usage

Example Output

Limitations

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Transaction Anamoly Detector with LLM Explainations

Overview

Motivation

How it works

Tech Stack

Setup

Usage

Example Output

Limitations

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages