docs: Evaluation module demo and documentation by frayle-ons · Pull Request #194 · datasciencecampus/classifai

frayle-ons · 2026-06-24T09:56:20Z

✨ Summary

These changes add documentation for the recent Evaluation module #171 #172 . Includes a new demo notebook evaluation_workflow_demo.ipynb walking users through the important concepts and code of the evaluation module. It also contains a new demo file of ground truth queries that work with the existing fake_soc_dataset.csv data, used to showcase evaluation.

It also has minor updates to the demo readme file.

📜 Changes Introduced

(feat:) new demo notebook to showcase/explain the evaluation module
(docs:) updated demo readme
(feat:) new mock demo data to simulate ground truth queries for fake_soc_dataset.csv data

✅ Checklist

Please confirm you've completed these checks before requesting a review.

Code passes linting with Ruff
Security checks pass using Bandit
API and Unit tests are written and pass using pytest
Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
DocStrings follow Google-style and are added as per Pylint recommendations
Documentation has been updated if needed

🔍 How to Test

Running the new notebook in a fresh install of the package to verify it works, and reading the content carefully to ensure it's correct.
Quickly browse the new mock data file
and read through the updates to the readme documentation describing the new notebook

jamie-ons · 2026-06-24T12:50:20Z

Overview

Have read through the notebook and think its really good and quite clear. Love the diagram.

I have:

Run the new notebook in a fresh install of the package and can confirm it all runs without errors
I have read the content and am mostly happy with the content.
Read the new mock data file and am happy with it
Read through the updates to the readme documentation describing the new notebook

Proposed changes

There are a few changes that I think would improve the demo:

Spelling
This would help it read better in some places, here is a few examples, there are more in the notebook.

Typo	Correction
provies	provides
mulit-class	multi-class
peformance	performance
mutiple	multiple
there names	their names
Typing this all together	Tying this all together
fake_soc_daraset.csv	fake_soc_dataset.csv

If you could go through and check the spellings that would be great :)

VectorStore creation

I think setting overwrite=True to all VectorStore creations would be better as it means if you rerun the cell for whatever reason, you don't end up getting the error:

Output directory already exists. Pass overwrite=True to overwrite the folder.

This exists for the

demo_vectorstore_full
demo_vectorstore_half

partially missing comment

results = evaluator.evaluate(
    vectorstores=[demo_vectorstore_full, demo_vectorstore_half],
    vectorstore_names=["full data vectorstore", "half data vectorstore"],
    output_file="./classifai_temp/demo_eval_results.csv",  # leaving this line blank will save the results to
)

I believe this comment should say

    output_file="./classifai_temp/demo_eval_results.csv",  # leaving this line blank will save the results to evaluation_results.csv

DEMO/README.md

This all looks good apart from the Note at the bottom is massive.

I would change it to the following to avoid this

> **Note:** The Evaluation module is currently in development and its API is subject to change in future releases.

Co-authored-by: Jamie Milsom <jamie.milsom@ons.gov.uk>

frayle-ons · 2026-06-24T15:43:54Z

I've applied additional changes based on feedback. I've set all the VectorStores instantiations in the eval notebook to use skip_save=True so that no data is saved to file. This is fine since we don't actually load anything back in and resolves some of Jamie's review comments per using overwrite=True

jamie-ons

Yes, had a chat to luke and skip_save is a better idea.

Have had another look at this and it all looks good now :)

lukeroantreeONS

This looks good - noted a few small tweaks to be made before merging

lukeroantreeONS · 2026-06-26T13:01:55Z

+   "source": [
+    "To begin, we're going to create 2 VectorStores from our `fake_soc_dataset.csv` file which contains <b>mock</b> SOC survey responses and their corresponding occupation codes. One VectorStore will be built from the full dataset, and the second one will be built from half the dataset. \n",
+    "\n",
+    "Since the second VectorStore will contain only have the training data, we can reason that this lack of coverage will showcase poorer performance against a evaluation dataset that assesses the full coverage of the training data."


"only have half of the training data"

"will not perform as well as the full dataset due to lack of coverage"

lukeroantreeONS · 2026-06-26T13:04:16Z

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "If the code cells in this section ran successfully then we now have 2 VectorStores we can use to evaluate with the Evaluation module."


Suggest cutting this line - it makes it seem like we're not confident it works

lukeroantreeONS · 2026-06-26T13:05:39Z

+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The results object is a dataframe with provided VectorStore names as the row indexes, and each column is associated with a give metric. We should also see the results have been saved to a CSV file in the specified directory."


given metric

"saved to the specified output CSV file"

added evaluation demo notebook, demo query data, and updated demo readme

f75489e

frayle-ons requested a review from a team as a code owner June 24, 2026 09:56

frayle-ons linked an issue Jun 24, 2026 that may be closed by this pull request

Notebook Demo and Docs for Eval Module #187

Open

github-actions Bot added the documentation Improvements or additions to documentation label Jun 24, 2026

jamie-ons self-requested a review June 24, 2026 10:31

frayle-ons added 3 commits June 24, 2026 11:47

added diagram for eva demo notebook

3d15344

render image in eval notebook

2d671cd

added content for eval module to the main readme

f8bb435