SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
-
Updated
Jun 24, 2026 - Python
SOTA low-bit LLM quantization (INT8/FP8/MXFP8/INT4/MXFP4/NVFP4) & sparsity; leading model compression techniques on PyTorch, TensorFlow, and ONNX Runtime
Must read research papers and links to tools and datasets that are related to using machine learning for compilers and systems optimisation
Kernel Tuner
[DEPRECATED] Moved to ROCm/rocm-libraries repo
Machine Learning Framework for Operating Systems - Brings ML to Linux kernel
CLTune: An automatic OpenCL & CUDA kernel tuner
Alchemy Cat —— 🔥Config System for SOTA
Phoebe
Benchmark scripts for TVM
ebpf profiler for jvm
CLI-based multi-agents for Auto-Tuning (e.g. HPC code optimazation loops) supporting Local LLMs
Collective Knowledge crowd-tuning extension to let users crowdsource their experiments (using portable Collective Knowledge workflows) such as performance benchmarking, auto tuning and machine learning across diverse platforms with Linux, Windows, MacOS and Android provided by volunteers. Demo of DNN crowd-benchmarking and crowd-tuning:
K2vTune (A Workload-aware Configuration Tuning for RocksDB)
A Generic Distributed Auto-Tuning Infrastructure
A GPU benchmark suite for autotuners
Backoff uses an exponential backoff algorithm to backoff between retries with optional auto-tuning functionality.
Add a description, image, and links to the auto-tuning topic page so that developers can more easily learn about it.
To associate your repository with the auto-tuning topic, visit your repo's landing page and select "manage topics."