Skip to content

docs: Evaluation module demo and documentation#194

Open
frayle-ons wants to merge 8 commits into
mainfrom
187-notebook-demo-and-docs-for-eval-module
Open

docs: Evaluation module demo and documentation#194
frayle-ons wants to merge 8 commits into
mainfrom
187-notebook-demo-and-docs-for-eval-module

Conversation

@frayle-ons

Copy link
Copy Markdown
Contributor

✨ Summary

These changes add documentation for the recent Evaluation module #171 #172 . Includes a new demo notebook evaluation_workflow_demo.ipynb walking users through the important concepts and code of the evaluation module. It also contains a new demo file of ground truth queries that work with the existing fake_soc_dataset.csv data, used to showcase evaluation.

It also has minor updates to the demo readme file.

📜 Changes Introduced

  • (feat:) new demo notebook to showcase/explain the evaluation module
  • (docs:) updated demo readme
  • (feat:) new mock demo data to simulate ground truth queries for fake_soc_dataset.csv data

✅ Checklist

Please confirm you've completed these checks before requesting a review.

  • Code passes linting with Ruff
  • Security checks pass using Bandit
  • API and Unit tests are written and pass using pytest
  • Terraform files (if applicable) follow best practices and have been validated (terraform fmt & terraform validate)
  • DocStrings follow Google-style and are added as per Pylint recommendations
  • Documentation has been updated if needed

🔍 How to Test

  • Running the new notebook in a fresh install of the package to verify it works, and reading the content carefully to ensure it's correct.
  • Quickly browse the new mock data file
  • and read through the updates to the readme documentation describing the new notebook

@frayle-ons frayle-ons requested a review from a team as a code owner June 24, 2026 09:56
@frayle-ons frayle-ons linked an issue Jun 24, 2026 that may be closed by this pull request
@github-actions github-actions Bot added the documentation Improvements or additions to documentation label Jun 24, 2026
@jamie-ons jamie-ons self-requested a review June 24, 2026 10:31
@jamie-ons

Copy link
Copy Markdown

Overview

Have read through the notebook and think its really good and quite clear. Love the diagram.

I have:

  • Run the new notebook in a fresh install of the package and can confirm it all runs without errors
  • I have read the content and am mostly happy with the content.
  • Read the new mock data file and am happy with it
  • Read through the updates to the readme documentation describing the new notebook

Proposed changes

There are a few changes that I think would improve the demo:

Spelling
This would help it read better in some places, here is a few examples, there are more in the notebook.

Typo Correction
provies provides
mulit-class multi-class
peformance performance
mutiple multiple
there names their names
Typing this all together Tying this all together
fake_soc_daraset.csv fake_soc_dataset.csv

If you could go through and check the spellings that would be great :)

VectorStore creation

I think setting overwrite=True to all VectorStore creations would be better as it means if you rerun the cell for whatever reason, you don't end up getting the error:

Output directory already exists. Pass overwrite=True to overwrite the folder. 

This exists for the

  • demo_vectorstore_full
  • demo_vectorstore_half

partially missing comment

results = evaluator.evaluate(
    vectorstores=[demo_vectorstore_full, demo_vectorstore_half],
    vectorstore_names=["full data vectorstore", "half data vectorstore"],
    output_file="./classifai_temp/demo_eval_results.csv",  # leaving this line blank will save the results to
)

I believe this comment should say

    output_file="./classifai_temp/demo_eval_results.csv",  # leaving this line blank will save the results to evaluation_results.csv

DEMO/README.md

This all looks good apart from the Note at the bottom is massive.
image

I would change it to the following to avoid this

> **Note:** The Evaluation module is currently in development and its API is subject to change in future releases.

Comment thread DEMO/README.md Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
Comment thread DEMO/evaluation_workflow_demo.ipynb Outdated
jamie-ons and others added 3 commits June 24, 2026 14:36
Co-authored-by: Jamie Milsom <jamie.milsom@ons.gov.uk>
Co-authored-by: Jamie Milsom <jamie.milsom@ons.gov.uk>
@frayle-ons

Copy link
Copy Markdown
Contributor Author

I've applied additional changes based on feedback. I've set all the VectorStores instantiations in the eval notebook to use skip_save=True so that no data is saved to file. This is fine since we don't actually load anything back in and resolves some of Jamie's review comments per using overwrite=True

@frayle-ons frayle-ons requested a review from jamie-ons June 24, 2026 15:44

@jamie-ons jamie-ons left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, had a chat to luke and skip_save is a better idea.

Have had another look at this and it all looks good now :)

@lukeroantreeONS lukeroantreeONS left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good - noted a few small tweaks to be made before merging

"source": [
"To begin, we're going to create 2 VectorStores from our `fake_soc_dataset.csv` file which contains <b>mock</b> SOC survey responses and their corresponding occupation codes. One VectorStore will be built from the full dataset, and the second one will be built from half the dataset. \n",
"\n",
"Since the second VectorStore will contain only have the training data, we can reason that this lack of coverage will showcase poorer performance against a evaluation dataset that assesses the full coverage of the training data."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"only have half of the training data"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"will not perform as well as the full dataset due to lack of coverage"

"cell_type": "markdown",
"metadata": {},
"source": [
"If the code cells in this section ran successfully then we now have 2 VectorStores we can use to evaluate with the Evaluation module."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggest cutting this line - it makes it seem like we're not confident it works

"cell_type": "markdown",
"metadata": {},
"source": [
"The results object is a dataframe with provided VectorStore names as the row indexes, and each column is associated with a give metric. We should also see the results have been saved to a CSV file in the specified directory."

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given metric

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"saved to the specified output CSV file"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Notebook Demo and Docs for Eval Module

3 participants