Skip to content

Leverageshap Docs#542

Draft
FabianK-Dev wants to merge 35 commits into
mmschlk:mainfrom
FabianK-Dev:leverageshap-docs
Draft

Leverageshap Docs#542
FabianK-Dev wants to merge 35 commits into
mmschlk:mainfrom
FabianK-Dev:leverageshap-docs

Conversation

@FabianK-Dev

Copy link
Copy Markdown

Motivation and Context


Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

How Has This Been Tested?


Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • An entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

42logos and others added 30 commits May 11, 2026 02:46
Forward-looking spec for the 3 new SV approximators (LeverageSHAP,
PolySHAP, OddSHAP). Approximator classes are looked up dynamically by
name, so the file auto-skips classes that have not yet been registered
in shapiq.approximator. As each implementation lands, the corresponding
parametrizations activate.

- Interface conformance (always required): index='SV', n_players,
  max_order/min_order, values shape and dtype, interaction_lookup.
- Numerical convergence vs ExactComputer (xfail strict=False): atol
  schedule by budget percentage.
- Determinism: same (n, random_state, budget, game) -> bit-identical
  output.

75 tests, all currently SKIP on main. Will activate as classes land.
Honors the cross-method testing platform promised to the tutor:
unified harness covering every SV approximator in shapiq (the
existing 11 — KernelSHAP, SVARM, Permutation*, ProxySPEX, ... — and
the 3 new ones from this project) instead of only the new line-up.

Approximator list is sourced dynamically from
shapiq.approximator.SV_APPROXIMATORS (canonical registry) plus the
3 new project names, deduplicated. Future shapiq additions land in
the harness automatically.

Split into two scopes:

  * test_interface_conformance — strict shape/dtype/index/lookup
    contract from the API spec. Applied ONLY to the 3 new
    approximators (the contract is ours; existing methods have
    different default output conventions like ProxySPEX defaulting
    to FBII and max_order=n).

  * test_numerical_convergence_vs_exact + test_determinism — apply
    to ALL SV approximators. Cross-method comparison against
    ExactComputer ground truth on identical SOUM games. xfail with
    strict=False so methods that do converge surface as XPASS;
    methods still under development surface as XFAIL.

Two robustness helpers:

  * _construct_or_skip — tries (n=, index='SV', max_order=1,
    random_state=) first (covers multi-index methods like SPEX,
    ProxySPEX, ProxySHAP, MSRBiased, kADDSHAP), then falls back to
    minimal signature for SV-only methods (KernelSHAP, OwenSamplingSV).

  * _safe_approximate — skips on ValueError raised by approximators
    that explicitly refuse a regime (e.g. SPEX 'Insufficient budget
    to compute the transform' at low budgets).

Results: 10 passed, 95 skipped, 90 xfailed, 23 xpassed. The 23
xpassed are existing shapiq SV methods that converge cleanly at
full budget on small SOUM — a useful baseline for the upcoming
benchmark report.
Drop-in framework that any teammate can merge into their feature branch
to run head-to-head benchmarks against ExactComputer across every SV
approximator in shapiq, then plot the standard SHAP-literature metric
curves. No source files are modified — adds a top-level benchmark/
package, a single test file, and a small in-place test-helper sys.path
hook. Does not touch pyproject.toml or any other upstream config.

Files added:

  * benchmark/__init__.py: makes the runner a proper Python package so
    invocation is 'python -m benchmark.performance'.

  * benchmark/_discovery.py: single source of truth for SV approximator
    discovery + SV-mode construction. Holds:
      - PROJECT_APPROXIMATOR_NAMES: LeverageSHAP, PolySHAP,
        PolySHAPKAdd / Partial / Prior, OddSHAP.
      - _SV_CONSTRUCT_OVERRIDES: per-class kwargs for non-standard
        constructors (PolySHAP variants need max_order /
        n_explanation_terms / q_prior).
      - construct_for_sv(): three-stage construction (override ->
        explicit SV signature -> minimal signature), returning
        (estimator, exc) so the caller can report the most informative
        exception. A ValueError from inside a matched signature wins
        over a TypeError from a signature mismatch.
      - safe_approximate(): catches ValueError and RuntimeError so
        sparse approximators that refuse a budget regime (SPEX,
        ProxySPEX, ...) skip the cell cleanly instead of crashing.

  * benchmark/performance.py: CLI runner that consumes _discovery,
    sweeps (method, game, budget, seed), records every cell in a
    long-format CSV, and emits one PNG per (game, metric) plus a
    runtime PNG. Seven metrics chosen from the union of LeverageSHAP,
    PolySHAP, OddSHAP and shapiq.benchmark.metrics literature:
    MSE / MAE / SSE / SAE / Precision@5 / Precision@10 / KendallTau.
    Includes a '--check' interface-probe mode that prints a
    constructibility table without running a sweep.

  * benchmark/README.md: usage doc covering merge workflow, --check,
    sweep CLI, output layout, CSV format, metric definitions, plot
    conventions, and notes on the multi-index approximators that need
    explicit (index='SV', max_order=1).

Files modified:

  * tests/shapiq/tests_unit/tests_approximators/test_approximators_vs_exact.py:
    now imports the shared helpers from benchmark._discovery via a
    tightly-scoped sys.path hook at the top of the file. Picks up the
    ValueError-priority construction and the RuntimeError-catch that
    the test file previously did not have. Interface conformance is
    now applied to the project's six new approximator names
    (LeverageSHAP, PolySHAP + 3 variants, OddSHAP), so Matthias's
    PolySHAP variants are no longer silently skipped by the contract
    check.

Verified locally:

  * pytest test_approximators_vs_exact.py: 10 passed, 170 skipped,
    87 xfailed, 26 xpassed. No failures.
  * python -m benchmark.performance --check: surfaces all 17 method
    names (11 existing on main + 6 project additions) correctly.
  * Drop-in compatibility verified by temporary merge into all three
    feature branches (oddshap_approximator, leverageSHAP, PolySHAP) —
    clean merge in each, --check picks up the local approximator.
…tions set, Z_list and probs_list and create all-true and all-false coalitions
…ferent game variables to avoid access counters interfering; Also compare metadata
…d tiny-n edge case and add comments to document and explain the test
…use its core claim was not reliable

With n = 6, a budget of 100 is above 2^n = 64, so the implementation enters the full-budget/exact regime.  In that regime, the result should be identical no matter which seed you use, so asserting that different seeds must differ is false and will fail even though the code is correct.
=> I lowered the budget to budget=20
…o test_exact_regime_seed_independence and test_stochastic_regime_seed_variability
…est_exact_matches_multiple_small_games, test_null_player_axiom and test_minimal_budget_sweep
…using @pytest.mark.parametrize("seed", DIVERSE_SEEDS) to prevent accidental overfitting on fixed seeds
@mmschlk

mmschlk commented Jun 10, 2026

Copy link
Copy Markdown
Owner

@FabianK-Dev, What is different from this PR to #524? I would like to bundle all of the leverage shap implementation together into the Leverageshap PR and not have these separate here.

@FabianK-Dev

Copy link
Copy Markdown
Author

@FabianK-Dev, What is different from this PR to #524? I would like to bundle all of the leverage shap implementation together into the Leverageshap PR and not have these separate here.

Hello @mmschlk. Thanks for reviewing. The intention behind this PR was to separate task 4 from the PR #524 so that we can e.g. work on the benchmark and discussion file (as requested in task 4 of the project description) without "bloating up" #524 with more commits. That way you don't have to review any new incoming commits again that aren't part of task 1-3.

I would like to bundle all of the leverage shap implementation together into the Leverageshap PR and not have these separate here.

No problem! If you want to I can continue working on task 4 in #524. In that case, feel free to close this PR. 👍

@mmschlk

mmschlk commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Ah I see. Or actually I assume I see, I do not really see a "Task 4" in the PR you linked. 😅

So I will only look at the other PR first, and once this got merged take a look at this one. You can merge/rebase from the other PR branch then into this. :)

@mmschlk mmschlk moved this to 📋 Backlog in shapiq development Jun 10, 2026
@FabianK-Dev

Copy link
Copy Markdown
Author

Ah I see. Or actually I assume I see, I do not really see a "Task 4" in the PR you linked. 😅

Great catch! I just opened this as an empty placeholder/draft PR for now. I will push the actual discussion file and anything related to task 4 here later.

So I will only look at the other PR first, and once this got merged take a look at this one. You can merge/rebase from the other PR branch then into this. :)

Sounds great, thanks, will do!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋 Backlog

Development

Successfully merging this pull request may close these issues.

4 participants