Skip to content

shleshavan/transaction-anomaly-detector

Repository files navigation

Transaction Anamoly Detector with LLM Explainations

This is a simple tool that is designed to score financial transactions based on anamolies. The anamolies are trained on an unsurpervised machine learning model and then fed into an LLM to explain each flagged transaction in plain language.

Overview

The core idea of this project is to create a simple machine learning model that identifies transactional anamolies and generates an output in plain text language for an analyst at a bank.

Motivation

The motivation is that anamoly scores alone are difficult for analyst to act on. Thus, an explanation of why a score is a possible fraud alert makes it easier for an analyst to triage.

How it works

The code has four key files which are compiled in the fifth file, main.py, to generate the output. The four files are:

  1. Synthetic Data: Because there was no real data available, this file generates synthetic data and plants them with anamolies.
  2. Feature Engineering: is a process that computes per users baseline and identifies anamolies based on z-score, rare-country and unusal-hour.
  3. Scoring: This flags the most unusal transactions in the generated data.
  4. Explanation: In this, flagged transactions are sent to LLM, which them narrates why each is suspicious.

Tech Stack

  • Python
  • Pandas
  • Scikit-Learn
  • Anthropic API
  • Streamlit

Setup

git clone cd transaction-anamoly-detector python3 -m venv venv && source venv/bin/activate pip install -r requirements.txt echo "Antropic-API-Key = Your-key-here" > .env

Usage

python generate_data.py python main.py

Example Output

user_id amount country hour anomaly_score 4516 46 53.48 UK 16 -0.000120 This transaction was flagged because the $53.48 charge is approximately 2x the user's median spend of $28, representing a 1.3 standard deviation increase from their normal spending pattern. Additionally, the transaction occurred in the UK, which is an unusual country for this user's typical transaction history.

Limitations

  • Currently the project uses synthetic data rather than real transactions.
  • unusual hour rule is global rather than per user.
  • Per user baselines are across all users data, which shows mild data leakage, and a production version should only have prior transactions.

About

The purpose of this project is to detect anomalies in synthetic data, and generate an LLM based explanation on flagged transactions. The vision is to increase the triage efficiency through explainable insights on suspicious transactions.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages