Neural Reranking for UK Statutory Retrieval

This repository contains the official code for the paper "Neural Reranking for UK Statutory Retrieval". It explains how to reproduce the retrieval, reranking, and evaluation experiments reported in the paper, including the distillation of a Voyage-law-2 teacher into a ModernBERT student model.

📂 Data Access

The dataset is NOT included in this repository. Please access the UK Legislation Corpus and Verified Query Set via the Durham University Research Data Repository:

DOI: 10.15128/r14x51hj064

Expected Data Structure

To run the scripts, place the downloaded data in the data/ directory or adjust paths accordingly:

data/Data.json: The corpus file.
data/queries.jsonl: The evaluation queries with relevance judgments.

🛠️ Installation

pip install -r requirements.txt

Requirements: torch, transformers, sentence-transformers, pyserini, scipy, pandas, faiss-cpu (or faiss-gpu), matplotlib.

🚀 Usage

All scripts are located in src/.

1. Retrieval (Candidate Generation)

Generate candidates using BM25 or Dense retrieval (MPNet).

# BM25
python retrieval/bm25_search.py \
  --data_json data/Data.json \
  --eval_jsonl data/queries.jsonl \
  --output_file outputs/candidates_bm25.jsonl \
  --k 100

# Dense (MPNet)
python retrieval/dense_search.py \
  --data_json data/Data.json \
  --eval_jsonl data/queries.jsonl \
  --output_file outputs/candidates_dense.jsonl \
  --k 100

2. Reranking

Run inference with various rerankers (CrossEncoders, ColBERT, RankZephyr, or APIs).

Distilled ModernBERT (Our Student Model):

python reranking/rerank.py \
  --model_type hf-seqcls \
  --model_name amal1994/distilled-voyage-modernbert \
  --data_json data/Data.json \
  --candidates outputs/candidates_dense.jsonl \
  --out_tsv results/results_modernbert.tsv

ColBERT:

python reranking/rerank_colbert.py \
  --model_name colbert-ir/colbertv2.0 \
  --data_json data/Data.json \
  --candidates outputs/candidates_dense.jsonl \
  --out_tsv results/results_colbert.tsv

RankZephyr (LLM):

python reranking/rerank_llm.py \
  --model_name castorini/rank_zephyr_7b_v1_full \
  --data_json data/Data.json \
  --candidates outputs/candidates_dense.jsonl \
  --out_tsv results/results_rankzephyr.tsv

3. Evaluation & Significance

Compute significance using Holm-Bonferroni correction and generate forest plots.

python evaluation/significance.py \
  --base_file results/results_modernbert.tsv \
  --others results/results_colbert.tsv results/results_rankzephyr.tsv \
  --k 10

4. Distillation (Training)

Train your own student model using teacher scores (e.g., from Voyage).

python distillation/train_distill.py \
  --data data/train.jsonl \
  --output_dir checkpoints/distilled_model \
  --model_name nomic-ai/modernbert-embed-base \
  --epochs 3

📖 Publication

If you use this code or the UK-STATUTECORPUS in your research, please cite our paper:

Alshehri, A.S., Eken, C., Bencomo, N. et al. Neural reranking for UK statutory retrieval: Provision-level evaluation and an open distilled model. Artif Intell Law (2025).

Full Paper: https://doi.org/10.1007/s10506-025-09501-6
Journal: Artificial Intelligence and Law

BibTeX

@article{alshehri2026neural,
  title={Neural reranking for UK statutory retrieval: Provision-level evaluation and an open distilled model},
  author={Alshehri, Amal Saad and Eken, Can and Bencomo, Nelly and Atapour-Abarghouei, Amir},
  journal={Artificial Intelligence and Law},
  year={2026},
  month={Feb},
  day={05},
  publisher={Springer},
  doi={10.1007/s10506-025-09501-6},
  url={https://doi.org/10.1007/s10506-025-09501-6}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
distillation		distillation
evaluation		evaluation
reranking		reranking
retrieval		retrieval
LICENSE		LICENSE
README.md		README.md
paper_results_forest_plot.png		paper_results_forest_plot.png
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Neural Reranking for UK Statutory Retrieval

📂 Data Access

Expected Data Structure

🛠️ Installation

🚀 Usage

1. Retrieval (Candidate Generation)

2. Reranking

3. Evaluation & Significance

4. Distillation (Training)

📖 Publication

BibTeX

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Neural Reranking for UK Statutory Retrieval

📂 Data Access

Expected Data Structure

🛠️ Installation

🚀 Usage

1. Retrieval (Candidate Generation)

2. Reranking

3. Evaluation & Significance

4. Distillation (Training)

📖 Publication

BibTeX

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages