Skip to content

shapiq Benchmark#521

Open
IsaH57 wants to merge 33 commits into
mainfrom
benchmark_ep
Open

shapiq Benchmark#521
IsaH57 wants to merge 33 commits into
mainfrom
benchmark_ep

Conversation

@IsaH57

@IsaH57 IsaH57 commented May 18, 2026

Copy link
Copy Markdown
Collaborator

Motivation and Context

This PR consolidates and validates a full shapiq_benchmark layer, ensuring all benchmark types (Interventional, Pathdependent, Local XAI, Image, and TabPFN) share a consistent interface for exact value computation and game access. It also strengthens the string-based dataset/model loading flow and the known-parameter lookup path, so benchmarks can be reproduced reliably across datasets and model families while still supporting custom inputs.

What’s included

  • Benchmark protocol and ground-truth computer abstractions
  • Dataset/model loading and validation, including string-based loaders and known-parameter lookups
  • Benchmark implementations:
    -- Interventional tree explanations in InterventionalBench()
    -- Pathdependent tree explanations in PathdependentBench()
    -- Local XAI explanations in LocalXAIBench()
    -- Image explanations in ImageBench()
    -- TabPFN explanations in TabPFNBench()
  • Metrics for approximation quality and ranking comparison
  • Configs holding optimized parameters for 6 model - dataset combinations

Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

How Has This Been Tested?

  • Manual test runs for all benchmarks
  • Each test covers both string-based loading and custom model/data paths where applicable

Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • An entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

@codecov

codecov Bot commented May 20, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@Advueu963 Advueu963 self-requested a review May 25, 2026 15:25

@Advueu963 Advueu963 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall already a very nice PR.
Some things regarding the overall structure need minor changes, namely making the relationship between the benchmark classes more clear. Importantly we can reduce code redundancy, using abstract classes and inheritance of Benchmark and LocalXAIBench.

Comment thread src/shapiq_benchmark/computers.py Outdated
Comment thread src/shapiq_benchmark/computers.py Outdated
Comment thread src/shapiq_benchmark/computers.py Outdated
Comment thread src/shapiq_benchmark/base.py Outdated
Comment thread src/shapiq_benchmark/base.py Outdated
Comment thread src/shapiq_benchmark/setup.py Outdated
Comment on lines +240 to +249
def infer_data_type(model: object) -> Literal["classification", "regression"]:
"""Infer whether a model is a classifier or regressor based on its attributes."""
estimator_type = getattr(model, "_estimator_type", None)
if estimator_type == "classifier":
return "classification"
if estimator_type == "regressor":
return "regression"
if hasattr(model, "predict_proba") or hasattr(model, "classes_"):
return "classification"
return "regression"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recon this will only work with estimators using the sklearn api. I do not thinnk this is bad, but we should make this clear via a comment. Maybe even just enforce that the dataset/user always gives us information on what type of dataset this is. Our loading functions always should provide this. Regarding an external user, we can just enforce that he provides this information.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adjusted the docstring. We know this information when the dataset is given as a string. How would you expect a user to provide this info on a dataset object? As an additional string? I thought about this before and am not sure whether that is the best solution

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here it is somewhat similar to the LocaXAI Benchmark. Try to make it an sub-class and reduce code redundancy

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about make TabPFNBench a subclass of LocalXAIBench. It eg has a different way of uploading model and dataset as it needs labels, too.

Comment thread src/shapiq_benchmark/tabpfn_bench.py Outdated
Comment thread src/shapiq_games/benchmark/__init__.py
@Advueu963 Advueu963 moved this to 🏗 In progress in shapiq development May 29, 2026
@Advueu963 Advueu963 added this to the v1.6 milestone May 29, 2026

@Advueu963 Advueu963 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay overall tis is a very nice refactor! Some small improvements I think would make the code even better and then it is good to go for a merge! See the comments

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file contain something really different than the california one? It would be nice to maybe merge them into one file which can be called via command line args. I think this would hold for all these optimisation scripts, that you can control via command args which model to pick and which hyperparameters. Basically the model command arg, would then decide which sweep over hyperparameters is done.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge into bigger script

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge into bigger script

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge into bigger script

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge into bigger script

random_state=random_state,
class_index=class_index,
**kwargs,
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is somewhat dangerous to silently set here the model and data without noting this also in the method name. At least the name should suggest _set_and_load_dataset_model(...) with an accompaniying docstring.

random_state: Random state used for data split and model init.
**kwargs: Additional keyword arguments for model building.
"""
class_index, _data_type = self._load_dataset_and_model(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here then also the renaming.

Comment on lines +63 to +84
def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues:
"""Compute exact interaction values using the PathdependentBench computer.

Args:
index: The index for which to compute interaction values.
order: The order of interactions to compute.
**kwargs: Additional keyword arguments for computation.

Returns:
InteractionValues: The computed interaction values.
"""
return self._computer.exact_values(index=index, order=order, **kwargs)

@property
def game(self) -> TreeSHAPIQXAI:
"""Game instance used by the Pathdependent Benchmark."""
return self._game

@property
def computer(self) -> PathdependentComputer[IndexType]:
"""Ground truth computer used by the Pathdependent Benchmark."""
return self._computer

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is quite common among all the benchmark classes, So it Should maybe be already content of the base Benchmark class? because this is just redundant code at this point.

Comment on lines +240 to +249
def infer_data_type(model: object) -> Literal["classification", "regression"]:
"""Infer whether a model is a classifier or regressor based on its attributes."""
estimator_type = getattr(model, "_estimator_type", None)
if estimator_type == "classifier":
return "classification"
if estimator_type == "regressor":
return "regression"
if hasattr(model, "predict_proba") or hasattr(model, "classes_"):
return "classification"
return "regression"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.

Comment on lines +112 to +133
def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues:
"""Compute exact interaction values using the TabPFNComputer.

Args:
index: The index for which to compute interaction values.
order: The order of interactions to compute.
**kwargs: Additional keyword arguments for computation.

Returns:
InteractionValues: The computed interaction values.
"""
return self._computer.exact_values(index=index, order=order, **kwargs)

@property
def game(self) -> TabPFNImputer:
"""Game instance used by the TabPFN Benchmark."""
return self._game

@property
def computer(self) -> GroundTruthComputer[IndexType]:
"""Ground truth computer used by the TabPFN Benchmark."""
return self._computer

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is redundant and should be in the general class.

@mmschlk mmschlk mentioned this pull request Jun 17, 2026
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🏗 In progress

Development

Successfully merging this pull request may close these issues.

2 participants