A structured companion to our KAN review paper.
We welcome corrections, discussions, and new contributions — The updates below come from recent communications with researchers and newly released studies.
If you notice any missing or misattributed references, kindly contact amir_noori@hkbu.edu.hk so they can be added in the next GitHub update and preprint revision.
- Citation
- Kolmogorov Superposition Theorem (KST) and Its Refinement Toward Neural Networks
- Review and Survey Papers on KANs
- Representative Repositories
- Bridging KANs and MLPs
- Basis Functions
- Accuracy Improvement
- Efficiency Improvement
- Sparsity & Regularization
- Convergence & Scaling Laws
Paper and repository reference information:
@article{GuideToKAN,
title = {A practitioner's guide to {Kolmogorov--Arnold} networks},
author = {Noorizadegan, Amir and Wang, Sifan and Ling, Leevan and Dominguez-Morales, Juan P.},
journal = {Computer Science Review},
volume = {62},
pages = {100991},
year = {2026},
issn = {1574-0137},
doi = {10.1016/j.cosrev.2026.100991},
url = {https://www.sciencedirect.com/science/article/pii/S1574013726000997}
}More detailed explanations and citations are provided in our KAN review paper.
| Year | Reference | Key Contribution |
|---|---|---|
| 1900 | Hilbert | Poses Hilbert's 13th problem |
| 1956 | Kolmogorov | Preliminary idea of superpositions; first hint toward the theorem |
| 1957 | Arnol'd | First explicit 3-variable construction (9 terms); counterexample to Hilbert 13 |
| 1957 | Kolmogorov | Full Kolmogorov Superposition Theorem; first general (n)-D proof |
| 1958 | Arnol'd | Supplies missing lemmas; completes Kolmogorov’s proof |
| 1962 | Lorentz | Simplified canonical form with a single outer function |
| 1965 | Sprecher | First single universal inner function |
| 1967 | Fridman | Shows universal inner functions can be taken Lipschitz-1 |
| 1980 | de Figueiredo | First network-like interpretation; block diagram + learned outer function (Chebyshev basis) |
| 1987 | Hecht–Nielsen | First explicit neural mapping theorem based on KST |
| 1989 | Girosi–Poggio | First rigorous critique: inner functions must be non-smooth; outer functions non-parametric |
| 1989 | Frisch et al. | First computational implementation of Lorentz form; iterative outer-function learning |
| 1991 | Kurková | First approximation-theoretic reinterpretation; relates network size to modulus of continuity |
| 1992 | Kurková | Two-hidden-layer sigmoidal approximants; universal inner weights |
| 1993 | Sprecher | Single universal inner function valid for all input dimensions |
| 1993 | Nakamura et al. | First fully constructive version with guaranteed accuracy |
| 1994 | Nees | First piecewise-linear inner maps with geometric error decay; constructive algorithm |
| 1996 | Sprecher | First executable version of inner function with verified separation property |
| 1997 | Sprecher | Explicit constructive algorithm for the outer functions |
| 2002 | Köppen | Corrected continuous monotone inner function; first training-ready KST inner map |
| 2003 | Igelnik–Parikh | Kolmogorov Spline Network (KSN): trainable spline-based inner/outer functions |
| 2009 | Braun–Griebel | First correct constructive KST; repairs Sprecher’s scheme |
| 2019 | Actor–Knepley | Proves (C^1) inner functions impossible; smoothness obstruction |
| 2024 | Liu | Introduces KAN, the first deep architecture inspired by the Kolmogorov–Arnold representation |
| Title | Paper |
|---|---|
| A Practitioner's Guide to Kolmogorov-Arnold Networks | Noorizadegan |
| The first two months of Kolmogorov-Arnold Networks (KANs): A survey of the state-of-the-art | Dutta |
| KAT to KANs: A review of Kolmogorov–Arnold Networks and the neural leap forward | Basina |
| Scientific machine learning with Kolmogorov–Arnold Networks | Faroughi |
| Kolmogorov-Arnold Networks: Overview of Architectures and Use Cases | Essahraui |
| Kolmogorov–Arnold Networks for interpretable and efficient function approximation | Andrade |
| Scalable and interpretable function-based architectures: A survey of Kolmogorov–Arnold Networks | Beatrize |
| Convolutional Kolmogorov–Arnold Networks: A survey | Kilani |
| Convolutional Kolmogorov–Arnold Networks | Bonder |
| A survey on Kolmogorov–Arnold Networks | Somvanshi |
| Kolmogorov-Arnold Networks: A Critical Assessment of Claims, Performance, and Practical Viability | Hou |
| Repository | Description |
|---|---|
| .../pykan | Official PyKAN for “KAN” and “KAN 2.0”. |
| .../Gaussian-KAN | Pure Gaussian RBF-KAN implementation, focusing on Gaussian basis functions and scale-parameter effects. |
| .../PU-GKAN | Partition-of-Unity Gaussian KAN implementation, using normalized Gaussian basis functions. |
| .../pinn_learnable_activation | Compares various KAN bases vs. MLP on PDEs. |
| .../torchkan | Simplified PyTorch KAN with multiple variants. |
| .../awesome-kan | Curated list of KAN resources, projects, and papers. |
| .../Deep-KAN | Spline-KAN examples and PyPI package. |
| .../RBF-KAN | Gaussian RBF-based KAN implementation. |
| .../KANbeFair | Fair benchmarking of KANs vs. MLPs. |
| .../efficient-kan | Efficient PyTorch implementation of KAN. |
| .../jaxKAN | JAX-based KAN with grid extension support. |
| .../fast-kan | FastKAN using RBFs for acceleration. |
| .../faster-kan | Uses reflectional switch activations. |
| .../LKAN | Lightweight KAN variants and experiments. |
| .../neuromancer (fbkans branch) | Partition of unity (FBKAN) for PDE solving. |
| .../relu_kan | Minimal ReLU-KAN example. |
| .../MatrixKAN | Matrix-parallelized KAN implementation. |
| .../PowerMLP | MLP-type network with KAN-level expressiveness. |
| .../FourierKAN | Fourier-based KAN layer. |
| .../FusedFourierKAN | Optimized FourierKAN with fused GPU kernels. |
| .../fKAN | Fractional KAN using Jacobi functions. |
| .../rKAN | Rational KAN (Padé/Jacobi rational designs). |
| .../CVKAN | Complex-valued KANs. |
| .../SincKAN | Sinc-based KAN for PINN applications. |
| .../ChebyKAN | Chebyshev polynomial-based KAN. |
| .../OrthogPolyKANs | Orthogonal polynomial-based KAN implementations. |
| .../kaf_act | RFF-based activation library. |
| .../KAF | Kolmogorov–Arnold Fourier Networks. |
| .../HRKAN | Higher-order ReLU-KANs. |
| .../KINN | PIKAN for solid mechanics PDEs. |
| .../KAN_PointNet_CFD | Jacobi-based KAN for CFD predictions. |
| .../FKAN-GCF | FourierKAN-GCF for graph filtering. |
| .../KKANs_PIML | Kurkova-KANs combining MLP with basis functions. |
| .../MLP-KAN | MLP-augmented KAN activations. |
| .../kat | Kolmogorov–Arnold Transformer. |
| .../FAN | Fourier Analysis Network (FAN). |
| .../Basis_Functions | Polynomial bases for KANs. |
| .../Wav-KAN | Wavelet-based KANs. |
| .../qkan | Quantum-inspired KAN variants and pruning. |
| .../KAN-Converge | Additive & hybrid KANs for convergence-rate experiments. |
| .../BSRBF_KAN | Combines B-spline and RBF bases. |
| .../Bayesian-HR-KAN | Bayesian higher-order ReLU-KANs with uncertainty quantification. |
| .../Legend-KINN | Legendre polynomial–based KAN for efficient PDE solving. |
| .../DeepOKAN | Deep Operator Network based on KAN. |
| .../LeanKAN | A memory-efficient Kolmogorov–Arnold Network. |
| .../SPIKANs | A Separation-of-variables to decompose high-dimensional PDEs into smaller KANs. |
| openkan.org | Features a non-spline KAN trained via Newton–Kaczmarz. |
| .../Anant-Net | High-dimensional PDE solver with tensor sweeps. |
| .../RGA-KANs | Deep cPIKANs with variance-preserving initialization. |
| .../lmkan | Lookup-based KAN for fast high-dimensional mappings. |
| .../KAN_Initialization_Schemes | Initialization schemes for spline-based KANs. |
| .../mlp-kan | KAN vs. MLP for PDEs in DeepONet/GNS frameworks. |
| .../KANQAS_code | KANQAS: KAN for quantum architecture search. |
| .../pkan | Probabilistic KAN via divisive data re-sorting. |
| .../spikans | Separable PIKAN (SPIKAN) for high-dimensional PDEs. |
| Brief result | Paper , Code |
|---|---|
| Equivalence: ReLU^k MLP ↔ B-spline KAN. | Wang |
| Piecewise-linear KAN = ReLU MLP. | Schoots |
| Adaptive spline KANs mimic MLPs with data-driven capacity. | Actor |
| NTK view: richer KAN bases reduce spectral bias vs MLP. | Gao |
| Name | Support | Equation | Grid | Type | Paper , Code |
|---|---|---|---|---|---|
| B-spline | Local | Yes | B-spline | Liu & Liu , Code & Actor & Basina & Coffman & Guo & Kalesh & Gao & Zeng & Khedr & Lei & Li & Lin & Pal & Howard & Jacob , Code & Aghaei & Patra & Ranasinghe & Rigas , Code & Shuai & Wang & Zhang & Raffel & Schoots & Wang & Wang & Xu & Shen & Yang & Howard , Code & Code & Gong & Guo & Lee & Mallick & Sen | |
| Chebyshev | Global | No | Chebyshev + tanh | Sidharth , Code & Code & Yang & Mahmoud & Guo & Faroughi & Yu, Code & Rigas 2025 , Code | |
| Stabilized Chebyshev | Global | No | Chebyshev + linear head | Daryakenari | |
| Chebyshev (grid) | Global | Yes | Chebyshev + tanh | Toscano , Code | |
| ReLU-KAN | Local | Yes | Squared ReLU | Qiu , Code | |
| HRKAN | Local | Yes | Polynomial ReLU | So , Code | |
| Adaptive ReLU-KAN | Local | Yes | Adaptive ReLU | Rigas , Code | |
| fKAN (Jacobi) | Global | No | Jacobi | Aghaei , Code | |
| rKAN (Padé/Jacobi) | Global | No | Rational + Jacobi | Aghaei , Code | |
| Jacobi-KAN | Global | No | Jacobi + tanh | Kashefi , Code & Shukla & Xiong & Zhang , Code | |
| FourierKAN | Global | No | Fourier | Xu , Code & Code & Guo & Jiang , Code | |
| KAF | Global | No | Random Fourier + GELU | Zhang , Code | |
| Gaussian + residual | Local | Yes | Gaussian RBF with SiLU | Li , Code & Lee & Abueidda , Code & Koenig , Code & Ta , Code & Buhler & Zhang | |
| Gaussian | Local | Yes | Pure Gaussian RBF | Noorizadegan , Code | |
| Partition of Unity Gaussian | Local | Yes | Partition of Unity Gaussian RBF | Noorizadegan , Code | |
| RSWAF-KAN | Local | Yes | Switch ( |
Code | |
| CVKAN | Local | Yes | Complex Gaussian | Wolff , Code & Che | |
| BSRBF-KAN | Local | Yes | B-spline + Gaussian | Ta , Code | |
| Wav-KAN | Local | No | Wavelet | Bozorgasl , Code & Patra & Pratyush & Seydi & Meshir 2025 | |
| FBKAN | Local | Yes | PoU + B-spline | Howard , Code | |
| SincKAN | Global | Yes | Sinc | Yu , Code | |
| Poly-KAN | Global | No | Polynomial | Seydi , Code & Attouri 2025 |
| Brief result | Paper , Code |
|---|---|
| Physics-informed KAN (cPIKAN): residual attention, entropy-viscosity. | Shukla |
| KAN-PINN for strongly nonlinear PDEs (actuator deflection). | Zhang |
| Attention-guided KAN with NSE residuals + BC losses. | Yang |
| Residual physics + sparse regression (variable-coeff. PDEs). | Guo |
| Self-scaled residual reweighting (ssRBA). | Toscano , Code |
| Augmented-Lagrangian PINN–KAN (learnable multipliers). | Zhang |
| Velocity–vorticity loss for turbulence reconstruction. | Toscano |
| Fractional/integro-diff. operators in KAN. | Aghaei |
| Physics-informed KAN for high-index DAEs (dual-network structure). | Lou |
| Holomorphic KAN for elliptic PDEs; trains only on boundary conditions. | Clafa , Code |
| Brief result | Paper , Code |
|---|---|
| Multilevel knots (coarse→fine) for nested spline spaces. | Actor |
| Free-knot KAN (trainable knots via cumulative softmax). | Actor |
| Grid extension with optimizer state transition. | Rigas , Code |
| Residual-adaptive sampling (RAD). | Rigas , Code |
| Multi-resolution sampling schedule for cPIKAN. | Yang |
| Brief result | Paper , Code |
|---|---|
| Finite-basis KAN (FBKAN) with PoU blending of local KANs. | Howard , Code |
| Temporal subdomains to improve NTK conditioning. | Faroughi |
| Brief result | Paper , Code |
|---|---|
| Multi-fidelity KAN (freeze LF, learn HF linear + nonlinear heads). | Howard , Code |
| Separable PIKAN (sum of products of 1D KAN factors). | Jacob , Code |
| KAN-SR: recursive simplification for symbolic discovery. | Buhler |
| Brief result | Paper , Code |
|---|---|
| MLP–KAN mixture of experts. | He , Code |
| Parallel KAN ∥ MLP branches with learnable fusion. | Xu |
| KKAN: per-dim MLP features + explicit basis expansion. | Toscano , Code |
| Brief result | Paper , Code |
|---|---|
| FlashKAT: group-rational KAN blocks in Transformers. | Raffel |
| GINN-KAN: interpretable growth + KAN in PINNs. | Ranasinghe |
| KAN-ODE: KAN as |
Koeing |
| AAKAN-WGAN: adaptive KAN + GAN for data augmentation. | Shen |
| Attention-KAN-PINN for battery SOH forecasting. | Wei |
| KANQAS: uses KAN Double Deep Q-Network for quantum architecture search. | Kundu 2024 , Code |
| Brief result | Paper , Code |
|---|---|
| SincKAN for kinks/boundary layers. | Yu , Code |
| rKAN (rational bases) for asymptotics/jumps. | Aghaei , Code |
| DKAN: |
Lei |
| KINN for singularities/stress concentrations. | Wang , Code |
| Two-phase PINN–KAN for saturation fronts. | Kalesh |
| Brief result | Paper , Code |
|---|---|
| Adam/RAdam warmup → (L-)BFGS refinement. | Mostajeran & Daryakenari & Zeng |
| Hybrid optimizers for sharp fronts. | Kalesh |
| Bayesian hyperparameter tuning for KANs. | Lin |
| Bayesian PINN–KAN (variational + KL) for UQ. | Giroux , Code |
| NTK perspective: conditioning ↔ convergence. | Faroughi |
| Brief result | Paper , Code |
|---|---|
| ReLU^m activations replace splines (CUDA-friendly). | Qiu , Code |
| Spline→matmul CUDA kernels (GEMM fusion). | Qiu, Code & So, Code |
| Matrix B-spline evaluation fused on GPU. | Coffman , Code |
| Dual-matrix merge + trainable RFF for scaling. | Zhang , Code |
| Custom GPU backward for KAN attention blocks. | Raffel |
| Parallel KAN ∥ MLP branches (stream/layer parallelism). | Xu |
| Domain decomposition parallelism (multi-GPU, PoU/separable). | Shukla & Howard , Code & Jacob , Code |
JAX/XLA: jit/vmap/pmap, fusion, memory-aware. |
Daryakenari & Rigas , Code |
| lmKANs: multivariate spline lookup tables, CUDA-friendly. | Michalkiewicz , Code |
| Brief result | Paper , Code |
|---|---|
| ReLU^m activations replace splines (CUDA-friendly). | Qiu , Code |
| Spline→matmul CUDA kernels (GEMM fusion). | Qiu , Code & So , Code |
| Matrix B-spline evaluation fused on GPU. | Coffman , Code |
| Dual-matrix merge + trainable RFF for scaling. | Zhang , Code |
| Custom GPU backward for KAN attention blocks. | Raffel , Code |
| Parallel KAN ∥ MLP branches (stream/layer parallelism). | Xu , Code |
| Domain decomposition parallelism (multi-GPU, PoU/separable). | Shukla & Howard , Code & Jacob , Code |
JAX/XLA acceleration: jit, vmap, pmap, fusion. |
Daryakenari & Rigas , Code |
| lmKANs: multivariate spline lookup tables, CUDA-friendly. | Pozdnyakov 2025 , Code |
| Brief result | Paper , Code |
|---|---|
| ReLU-power vs B-splines: fewer params, vectorized polynomials. | Qiu , Code & Qiu , Code & So , Code |
| Orthogonal polynomials with cheap recurrences. | Shukla & Guo & Mostajeran & Mostajeran & Wang , Code |
| Compact RBF bases (local Gaussians). | Lin & Koeing , Code |
| Wavelets for multi-resolution and sparse coeffs. | Patra |
| Dual-matrix + RFF compression to cut memory traffic. | Zhang , Code |
| Sparsity regularization (ℓ1/group) with pruning. | Guo |
| Hierarchical channel-wise refinement (shared params). | Actor |
| DEKAN: connectivity via Differential Evolution. | Li |
| Mix spectral (derivatives) + spatial (coeffs) sparsity for operators. | Lee |
| Tensor sweeps + selective differentiation for scalable high-D PDEs. | Sidharth , Code |
| Operator-aware spectral–spatial mixing for near-diagonal matvecs. | Lee |
| Brief result | Paper , Code |
|---|---|
| Layerwise ℓ1 on edge activations + entropy balance. | Liu , Code |
| EfficientKAN: direct ℓ1 on weights (simple, practical). | EfficientKAN |
| Sparse symbolic discovery with ℓ1 + entropy. | Wang |
| PDE KAN: ℓ1 + smoothness penalty to denoise coefficients. | Guo |
| Post-training pruning with layerwise ℓ1. | Koeing , Code |
| KAN-SR: magnitude + entropy at subunit level (+ℓ1 on bases). | Buhler |
| Brief result | Paper , Code |
|---|---|
| AAKAN: ℓ2 + temporal smoothing + MI regularizer. | Shen |
| Small ℓ2 (e.g., 1e−5) improves stability in PINNs/DeepOKAN. | Shukla & Toscano |
| Brief result | Paper , Code |
|---|---|
| Nested activations (e.g., tanh∘tanh) for bounded outputs & smooth grads. | Daryakenari |
| DropKAN: post-activation masking (noise after spline eval). | Altarabichi , Code |
| Brief result | Paper |
|---|---|
| Depth-based convergence rate for spline KANs. | Wang |
| Optimal Besov approximation; dimension-free sample complexity. | Kratsios 2025 |
| Minimax statistical rates for additive & hybrid KANs; optimal knot scaling. | Liu , Code |
| Generalization bounds via RKHS and coefficient/Lipschitz complexity. | Zhang 2024 |
| Lipschitz-controlled layers improve stability and generalization. | Li 2025 |
| Brief result | Paper |
|---|---|
| KANs show reduced spectral bias vs. MLPs; faster high-frequency learning. | Wang 2025 |
| Learnable bases widen NTK spectra; trade off between reach and curvature. | Farea 2025 , Code |
| Gradient-flow convergence guarantees for two-layer KANs. | Gao |
| Chebyshev/cPIKAN maintain better NTK conditioning for PDEs. | Faroughi 2025 |
| Initialization schemes improve NTK stability. | Rigas 2025 , Code |
| Brief result | Paper |
|---|---|
| KAN error follows consistent power-law decay; grid refinement improves accuracy. | Liu , Code |
| Depth/grid refinement matches theoretical convergence trends. | Wang |
| Scaling behavior influenced by optimization, not just expressivity. | Kratsios 2025 |
| Minimax results align: grid resolution drives learning efficiency. | Liu , Code |
| Power-law patterns observed across PDE benchmarks. | Faroughi 2025 |