shapiq Benchmark by IsaH57 · Pull Request #521 · mmschlk/shapiq

IsaH57 · 2026-05-18T19:18:07Z

Motivation and Context

This PR consolidates and validates a full shapiq_benchmark layer, ensuring all benchmark types (Interventional, Pathdependent, Local XAI, Image, and TabPFN) share a consistent interface for exact value computation and game access. It also strengthens the string-based dataset/model loading flow and the known-parameter lookup path, so benchmarks can be reproduced reliably across datasets and model families while still supporting custom inputs.

What’s included

Benchmark protocol and ground-truth computer abstractions
Dataset/model loading and validation, including string-based loaders and known-parameter lookups
Benchmark implementations:
-- Interventional tree explanations in InterventionalBench()
-- Pathdependent tree explanations in PathdependentBench()
-- Local XAI explanations in LocalXAIBench()
-- Image explanations in ImageBench()
-- TabPFN explanations in TabPFNBench()
Metrics for approximation quality and ranking comparison
Configs holding optimized parameters for 6 model - dataset combinations

Public API Changes

No Public API changes
Yes, Public API changes (Details below)

How Has This Been Tested?

Manual test runs for all benchmarks
Each test covers both string-based loading and custom model/data paths where applicable

Checklist

The changes have been tested locally.
Documentation has been updated (if the public API or usage changes).
An entry has been added to CHANGELOG.md (if relevant for users).
The code follows the project's style guidelines.
I have considered the impact of these changes on the public API.

Co-authored-by: Copilot <copilot@github.com>

…_test

codecov · 2026-05-20T14:56:14Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Advueu963

Overall already a very nice PR.
Some things regarding the overall structure need minor changes, namely making the relationship between the benchmark classes more clear. Importantly we can reduce code redundancy, using abstract classes and inheritance of Benchmark and LocalXAIBench.

Advueu963 · 2026-05-25T15:53:13Z

+def infer_data_type(model: object) -> Literal["classification", "regression"]:
+    """Infer whether a model is a classifier or regressor based on its attributes."""
+    estimator_type = getattr(model, "_estimator_type", None)
+    if estimator_type == "classifier":
+        return "classification"
+    if estimator_type == "regressor":
+        return "regression"
+    if hasattr(model, "predict_proba") or hasattr(model, "classes_"):
+        return "classification"
+    return "regression"


I recon this will only work with estimators using the sklearn api. I do not thinnk this is bad, but we should make this clear via a comment. Maybe even just enforce that the dataset/user always gives us information on what type of dataset this is. Our loading functions always should provide this. Regarding an external user, we can just enforce that he provides this information.

I adjusted the docstring. We know this information when the dataset is given as a string. How would you expect a user to provide this info on a dataset object? As an additional string? I thought about this before and am not sure whether that is the best solution

So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.

Advueu963 · 2026-05-25T15:54:07Z

Also here it is somewhat similar to the LocaXAI Benchmark. Try to make it an sub-class and reduce code redundancy

I am not sure about make TabPFNBench a subclass of LocalXAIBench. It eg has a different way of uploading model and dataset as it needs labels, too.

Advueu963

Okay overall tis is a very nice refactor! Some small improvements I think would make the code even better and then it is good to go for a merge! See the comments

Advueu963 · 2026-06-15T19:19:34Z

Does this file contain something really different than the california one? It would be nice to maybe merge them into one file which can be called via command line args. I think this would hold for all these optimisation scripts, that you can control via command args which model to pick and which hyperparameters. Basically the model command arg, would then decide which sweep over hyperparameters is done.

Advueu963 · 2026-06-15T19:20:46Z

merge into bigger script

Advueu963 · 2026-06-15T19:20:55Z

merge into bigger script

Advueu963 · 2026-06-15T19:21:06Z

merge into bigger script

Advueu963 · 2026-06-15T19:21:15Z

merge into bigger script

Advueu963 · 2026-06-15T19:30:44Z

+            random_state=random_state,
+            class_index=class_index,
+            **kwargs,
+        )


I think it is somewhat dangerous to silently set here the model and data without noting this also in the method name. At least the name should suggest _set_and_load_dataset_model(...) with an accompaniying docstring.

Advueu963 · 2026-06-15T19:31:41Z

+            random_state: Random state used for data split and model init.
+            **kwargs: Additional keyword arguments for model building.
+        """
+        class_index, _data_type = self._load_dataset_and_model(


Here then also the renaming.

Advueu963 · 2026-06-15T19:32:37Z

+    def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues:
+        """Compute exact interaction values using the PathdependentBench computer.
+
+        Args:
+            index: The index for which to compute interaction values.
+            order: The order of interactions to compute.
+            **kwargs: Additional keyword arguments for computation.
+
+        Returns:
+            InteractionValues: The computed interaction values.
+        """
+        return self._computer.exact_values(index=index, order=order, **kwargs)
+
+    @property
+    def game(self) -> TreeSHAPIQXAI:
+        """Game instance used by the Pathdependent Benchmark."""
+        return self._game
+
+    @property
+    def computer(self) -> PathdependentComputer[IndexType]:
+        """Ground truth computer used by the Pathdependent Benchmark."""
+        return self._computer


This is quite common among all the benchmark classes, So it Should maybe be already content of the base Benchmark class? because this is just redundant code at this point.

Advueu963 · 2026-06-15T19:35:18Z

+def infer_data_type(model: object) -> Literal["classification", "regression"]:
+    """Infer whether a model is a classifier or regressor based on its attributes."""
+    estimator_type = getattr(model, "_estimator_type", None)
+    if estimator_type == "classifier":
+        return "classification"
+    if estimator_type == "regressor":
+        return "regression"
+    if hasattr(model, "predict_proba") or hasattr(model, "classes_"):
+        return "classification"
+    return "regression"


So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.

Advueu963 · 2026-06-15T19:35:57Z

+    def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues:
+        """Compute exact interaction values using the TabPFNComputer.
+
+        Args:
+            index: The index for which to compute interaction values.
+            order: The order of interactions to compute.
+            **kwargs: Additional keyword arguments for computation.
+
+        Returns:
+            InteractionValues: The computed interaction values.
+        """
+        return self._computer.exact_values(index=index, order=order, **kwargs)
+
+    @property
+    def game(self) -> TabPFNImputer:
+        """Game instance used by the TabPFN Benchmark."""
+        return self._game
+
+    @property
+    def computer(self) -> GroundTruthComputer[IndexType]:
+        """Ground truth computer used by the TabPFN Benchmark."""
+        return self._computer


This is redundant and should be in the general class.

Advueu963 and others added 11 commits March 20, 2026 10:51

dump benchmark

7eae600

dump

9f1a576

add IntBench to instantiate a benchmark with either strings or objects

1458edc

add draft for benchmark structure

5bc582a

update computers

6c9a33f

add metrics

4f6c8aa

add benchmark for image and tabpfn

4fc0c44

Co-authored-by: Copilot <copilot@github.com>

fix structure for ty

4370212

Co-authored-by: Copilot <copilot@github.com>

add mlp, make model initialization more flexible, change x_train to x…

3959241

…_test

add loading best params from config and optimization

a6a1eae

fix overriding kwargs

d4be783

github-project-automation Bot added this to shapiq development May 18, 2026

IsaH57 self-assigned this May 18, 2026

IsaH57 added 10 commits May 19, 2026 11:14

handle ruff errors

7751af4

new check after lint/cache cleanup

130e570

delete src/shapiq-benchmark directory

ad1d0a5

Fix model type check for TabPFN

c990381

delete src/shapiq_benchmark/own_test.py

e1f2b5a

fix for ruff

be6b89b

fix for ruff

1f7c128

Merge branch 'main' into benchmark_ep

5908e1f

fix jsons

39c4d04

fix format

6925276

Merge branch 'main' into benchmark_ep

d87f51a

Advueu963 self-requested a review May 25, 2026 15:25

Advueu963 requested changes May 25, 2026

View reviewed changes

IsaH57 added 3 commits May 28, 2026 15:00

adjust code based on comments

7298e1b

fix format

9fa62da

optimization scripts

2ccfdb2

IsaH57 added 8 commits May 28, 2026 17:16

reformatting

010154d

reformatting

8ba77d6

reformatting

820afb1

reformatting

c5b7bbb

delete src/shapiq_benchmark/tests directory

824709f

fix formatting

bea234c

Merge branch 'main' into benchmark_ep

b2dff28

remove budget

a7fa77d

Advueu963 moved this to 🏗 In progress in shapiq development May 29, 2026

Advueu963 added this to the v1.6 milestone May 29, 2026

Advueu963 requested changes Jun 15, 2026

View reviewed changes

mmschlk mentioned this pull request Jun 17, 2026

ConfoundingSHAP #551

Open

7 tasks

Conversation

IsaH57 commented May 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation and Context

What’s included

Public API Changes

How Has This Been Tested?

Checklist

Uh oh!

codecov Bot commented May 20, 2026

Codecov Report

Uh oh!

Advueu963 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Advueu963 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

IsaH57 commented May 18, 2026 •

edited

Loading