shapiq Benchmark#521
Conversation
Co-authored-by: Copilot <copilot@github.com>
Co-authored-by: Copilot <copilot@github.com>
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Advueu963
left a comment
There was a problem hiding this comment.
Overall already a very nice PR.
Some things regarding the overall structure need minor changes, namely making the relationship between the benchmark classes more clear. Importantly we can reduce code redundancy, using abstract classes and inheritance of Benchmark and LocalXAIBench.
| def infer_data_type(model: object) -> Literal["classification", "regression"]: | ||
| """Infer whether a model is a classifier or regressor based on its attributes.""" | ||
| estimator_type = getattr(model, "_estimator_type", None) | ||
| if estimator_type == "classifier": | ||
| return "classification" | ||
| if estimator_type == "regressor": | ||
| return "regression" | ||
| if hasattr(model, "predict_proba") or hasattr(model, "classes_"): | ||
| return "classification" | ||
| return "regression" |
There was a problem hiding this comment.
I recon this will only work with estimators using the sklearn api. I do not thinnk this is bad, but we should make this clear via a comment. Maybe even just enforce that the dataset/user always gives us information on what type of dataset this is. Our loading functions always should provide this. Regarding an external user, we can just enforce that he provides this information.
There was a problem hiding this comment.
I adjusted the docstring. We know this information when the dataset is given as a string. How would you expect a user to provide this info on a dataset object? As an additional string? I thought about this before and am not sure whether that is the best solution
There was a problem hiding this comment.
So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.
There was a problem hiding this comment.
Also here it is somewhat similar to the LocaXAI Benchmark. Try to make it an sub-class and reduce code redundancy
There was a problem hiding this comment.
I am not sure about make TabPFNBench a subclass of LocalXAIBench. It eg has a different way of uploading model and dataset as it needs labels, too.
Advueu963
left a comment
There was a problem hiding this comment.
Okay overall tis is a very nice refactor! Some small improvements I think would make the code even better and then it is good to go for a merge! See the comments
There was a problem hiding this comment.
Does this file contain something really different than the california one? It would be nice to maybe merge them into one file which can be called via command line args. I think this would hold for all these optimisation scripts, that you can control via command args which model to pick and which hyperparameters. Basically the model command arg, would then decide which sweep over hyperparameters is done.
There was a problem hiding this comment.
merge into bigger script
There was a problem hiding this comment.
merge into bigger script
There was a problem hiding this comment.
merge into bigger script
There was a problem hiding this comment.
merge into bigger script
| random_state=random_state, | ||
| class_index=class_index, | ||
| **kwargs, | ||
| ) |
There was a problem hiding this comment.
I think it is somewhat dangerous to silently set here the model and data without noting this also in the method name. At least the name should suggest _set_and_load_dataset_model(...) with an accompaniying docstring.
| random_state: Random state used for data split and model init. | ||
| **kwargs: Additional keyword arguments for model building. | ||
| """ | ||
| class_index, _data_type = self._load_dataset_and_model( |
There was a problem hiding this comment.
Here then also the renaming.
| def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues: | ||
| """Compute exact interaction values using the PathdependentBench computer. | ||
|
|
||
| Args: | ||
| index: The index for which to compute interaction values. | ||
| order: The order of interactions to compute. | ||
| **kwargs: Additional keyword arguments for computation. | ||
|
|
||
| Returns: | ||
| InteractionValues: The computed interaction values. | ||
| """ | ||
| return self._computer.exact_values(index=index, order=order, **kwargs) | ||
|
|
||
| @property | ||
| def game(self) -> TreeSHAPIQXAI: | ||
| """Game instance used by the Pathdependent Benchmark.""" | ||
| return self._game | ||
|
|
||
| @property | ||
| def computer(self) -> PathdependentComputer[IndexType]: | ||
| """Ground truth computer used by the Pathdependent Benchmark.""" | ||
| return self._computer |
There was a problem hiding this comment.
This is quite common among all the benchmark classes, So it Should maybe be already content of the base Benchmark class? because this is just redundant code at this point.
| def infer_data_type(model: object) -> Literal["classification", "regression"]: | ||
| """Infer whether a model is a classifier or regressor based on its attributes.""" | ||
| estimator_type = getattr(model, "_estimator_type", None) | ||
| if estimator_type == "classifier": | ||
| return "classification" | ||
| if estimator_type == "regressor": | ||
| return "regression" | ||
| if hasattr(model, "predict_proba") or hasattr(model, "classes_"): | ||
| return "classification" | ||
| return "regression" |
There was a problem hiding this comment.
So basically given the GameBenchmarkSetup we always can extract the dataset, as we control the loading itself. If the user also gives a custom dataset, we just require them also to add the type as a string. Or differently, if the target variable in the dataset (which the user must also provide in some way or another) is ordinal --> classification; numerical --> regression. But this latter approach is quite brittle.
| def exact_values(self, index: IndexType, order: int, **kwargs: object) -> InteractionValues: | ||
| """Compute exact interaction values using the TabPFNComputer. | ||
|
|
||
| Args: | ||
| index: The index for which to compute interaction values. | ||
| order: The order of interactions to compute. | ||
| **kwargs: Additional keyword arguments for computation. | ||
|
|
||
| Returns: | ||
| InteractionValues: The computed interaction values. | ||
| """ | ||
| return self._computer.exact_values(index=index, order=order, **kwargs) | ||
|
|
||
| @property | ||
| def game(self) -> TabPFNImputer: | ||
| """Game instance used by the TabPFN Benchmark.""" | ||
| return self._game | ||
|
|
||
| @property | ||
| def computer(self) -> GroundTruthComputer[IndexType]: | ||
| """Ground truth computer used by the TabPFN Benchmark.""" | ||
| return self._computer |
There was a problem hiding this comment.
This is redundant and should be in the general class.
Motivation and Context
This PR consolidates and validates a full
shapiq_benchmarklayer, ensuring all benchmark types (Interventional, Pathdependent, Local XAI, Image, and TabPFN) share a consistent interface for exact value computation and game access. It also strengthens the string-based dataset/model loading flow and the known-parameter lookup path, so benchmarks can be reproduced reliably across datasets and model families while still supporting custom inputs.What’s included
-- Interventional tree explanations in
InterventionalBench()-- Pathdependent tree explanations in
PathdependentBench()-- Local XAI explanations in
LocalXAIBench()-- Image explanations in
ImageBench()-- TabPFN explanations in
TabPFNBench()Public API Changes
How Has This Been Tested?
Checklist
CHANGELOG.md(if relevant for users).