Track: Track1; Team name: PushparajD; Model: Graph Attention with self/neighbour-separated attention#355
Open
Pdevadiga45 wants to merge 6 commits into
Open
Conversation
…t, cite Eq. 1/2/4)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Checklist
Description
Adds GATE (Graph Attention with self/neighbour-separated attention) as a Track 1 graph backbone.
GATE is GATv2 with one targeted change (Eq. 4): the attention logit uses a separate learnable vector for a node's self-loop (
a_t) versus its neighbours (a_s), so a node can parameterize its own self-attention independently and suppress aggregation from unrelated ("intrusive") neighbours. That's what makes it robust on heterophilic graphs - the axis GraphUniverse sweeps.Files
topobench/nn/backbones/graph/gate.py:GATEConv(the attention layer) andGATE(stacked backbone). Docstrings cite the paper's Eq. 1/2/4.configs/model/graph/gate.yaml: Hydra config onGNNWrapper+NoReadOut; one config serves both challenge tasks.test/nn/backbones/graph/test_gate.py: 10 tests, 100% backbone coverage.test/pipeline/test_pipeline.py: registersgraph/gatefor the CI MUTAG integration test.2026_tdl_challenge/outputs/.../results.json: GraphUniverse grid outputFidelity
The reference repo doesn't run under modern PyG (it ships a modified, older-PyG
MessagePassingbase), so I implemented a clean standalone version directly from the paper's equations, and validated it three ways:GATv2Conv: in the shared-attention special case our layer matches PyG bit-for-bit (external reference for the routing/softmax/aggregation);I follow the paper's GATE (Eq. 1/2/4), not the reference repo's optional
omegagate or separate self-loop value transform, neither is part of the published model (the paper adds only thed-dimensionala_t). Init follows the paper: zero attention (Thm. 4.3) + random-orthogonal weights.Initialization
I apply the paper's prescription where it bears on the GATE mechanism: zero attention vectors (Thm. 4.3 - no initial inductive bias, so the layer starts as uniform mean-aggregation) and random-orthogonal weight matrices. I deliberately do not reproduce the paper's full looks-linear (channel-mirroring) weight construction: zero-attention already delivers the at-init uniform-aggregation property that looks-linear is there to support, and orthogonal init captures the random-orthogonal specification it builds on. The mirroring would add implementation surface without changing the attention mechanism this PR contributes. This is documented in the module docstring.
TopoBench integration
forward(x, edge_index, edge_weight=None, **kwargs)accepts theGNNWrapperarguments; hidden width matches the encoder so the wrapper's residual is consistent._target_(topobench.nn.backbones.graph.gate.GATE): the backbone auto-discovery loads files under a synthetic module name, which otherwise breaks PyGMessagePassing's inspector.Cost. Attention is
O(E·H·d); the benchmarked config (hidden 64, 4 heads, 2 layers) has 16,896 trainable parameters.Results (72 runs, seeds 42/43/44). In-distribution community-detection accuracy 0.31–0.69 (mean 0.45; chance ≈ 0.05 over 20 communities); triangle-count MSE/triangles 0.015–3.80, all finite. Per-setting/per-seed/OOD values are in
results.json. Run on CPU (no CUDA device) underWANDB_MODE=offline, so the optional W&B fields are empty; metrics are seeded anddevice-independent.
On
results.jsongeneration. The shipped evaluation notebook can't run as-is (its integrity-check cell's stored hash doesn't match its own cells, so it aborts). Without modifying the notebook orutils.py, I called the functions it wraps i.e.,run_challenge_grid+save_challenge_artifacts- which run the identical pipeline.Issue
Track 1 entry for the TDL Challenge 2026.
Additional context
Python 3.11, torch 2.3.0.
pre-commit(ruff-format, ruff, numpydoc-validation, standard hooks) passes; 10 unit tests at 100% backbone coverage; MUTAG pipeline test passes.