All

71 repositories

LeetCUDA
Public
📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
cuda cuda-kernels cuda-demo
cuda cuda-kernels cuda-demo cuda-toolkit cuda-library cuda-kernel learn-cuda cuda-cpp hgemm flash-attention
Cuda
•
GNU General Public License v3.0
•1.2k•11k•1•3•Updated Jun 29, 2026Jun 29, 2026
diffusers
Public
🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
Python
•
Apache License 2.0
•7.1k•1•0•0•Updated Jun 29, 2026Jun 29, 2026
sglang
Public
SGLang is a fast serving framework for large language models and vision language models.
Python
•
Apache License 2.0
•6.8k•3•0•0•Updated Jun 29, 2026Jun 29, 2026
ffpa-attn
Public
🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
cuda tensor-cores flash-attention
cuda tensor-cores flash-attention gemma-4 gemma4
Python
•
Apache License 2.0
•22•310•14•0•Updated Jun 29, 2026Jun 29, 2026
modern-gpu-programming-for-mlsys
Public
modern gpu programming
HTML
•60•2•0•0•Updated Jun 28, 2026Jun 28, 2026
vllm-omni
Public
A framework for efficient model inference with omni-modality models
Python
•
Apache License 2.0
•1.2k•1•0•0•Updated Jun 26, 2026Jun 26, 2026
vllm
Public
A high-throughput and memory-efficient inference and serving engine for LLMs
Python
•
Apache License 2.0
•19k•0•0•0•Updated Jun 26, 2026Jun 26, 2026
cutlass
Public
CUDA Templates and Python DSLs for High-Performance Linear Algebra
C++
•
Other
•1.9k•2•0•0•Updated Jun 26, 2026Jun 26, 2026
flash-attention
Public
Fast and memory-efficient exact attention
Python
•
BSD 3-Clause "New" or "Revised" License
•2.9k•0•0•0•Updated Jun 26, 2026Jun 26, 2026
flashinfer
Public
FlashInfer: Kernel Library for LLM Serving
Python
•
Apache License 2.0
•1.1k•0•0•0•Updated Jun 26, 2026Jun 26, 2026
quack
Public
A Quirky Assortment of CuTe Kernels
Python
•
Apache License 2.0
•138•2•0•0•Updated Jun 25, 2026Jun 25, 2026
DeepGEMM
Public
DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
Cuda
•
MIT License
•1.1k•0•0•0•Updated Jun 25, 2026Jun 25, 2026
Awesome-LLM-Inference
Public
📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
mla vllm llm-inference
mla vllm llm-inference awesome-llm flash-attention tensorrt-llm paged-attention deepseek flash-attention-3 deepseek-v3
Python
•
GNU General Public License v3.0
•417•5.4k•1•1•Updated Jun 23, 2026Jun 23, 2026
deepseek-v4-for-copilot
Public
Pick DeepSeek V4 from the Copilot Chat model picker — and keep everything else Copilot already gives you.
TypeScript
•
MIT License
•96•0•0•0•Updated Jun 20, 2026Jun 20, 2026
GCMP
Public
通过集成国内主流原生大模型提供商，为开发者提供更加丰富、更适合本土需求的 AI 编程助手选择。目前已内置支持智谱AI、MiniMax、MoonshotAI、DeepSeek、阿里云百炼、快手万擎、火山方舟、腾讯云、Xiaomi MiMo 等原生大模型提供商。此外，扩展插件已适配支持 OpenAI 与 Anthro…
TypeScript
•
MIT License
•44•0•0•0•Updated Jun 16, 2026Jun 16, 2026
DeepEP
Public
DeepEP: an efficient expert-parallel communication library
Cuda
•
MIT License
•1.3k•0•0•0•Updated Jun 15, 2026Jun 15, 2026
Awesome-DiT-Inference
Public
📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
flux wan diffusion
flux wan diffusion dit sora stable-diffusion sdxl sd15 deepcache open-sora-plan
Python
•
GNU General Public License v3.0
•27•573•1•0•Updated Jun 13, 2026Jun 13, 2026
cache-dit
Public
A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.
Python
•
Apache License 2.0
•76•4•0•0•Updated Jun 13, 2026Jun 13, 2026
cllms-for-copilot
Public
Pick Qwen, GLM, MiniMax, Xiaomi MiMo, Moonshot Kimi & Tencent Hunyuan models from the Copilot Chat model picker. Vision, thinking, BYOK.
TypeScript
•
MIT License
•2•1•0•0•Updated Jun 13, 2026Jun 13, 2026
.github
Public
0•1•0•0•Updated Jun 12, 2026Jun 12, 2026
svdquant-kernels
Public
Cross-architecture CUDA kernels for SVDQuant (W4A4 with low-rank correction)
Python
•
Apache License 2.0
•3•2•0•0•Updated May 28, 2026May 28, 2026
FlashMLA
Public
FlashMLA: Efficient Multi-head Latent Attention Kernels
C++
•
MIT License
•1.1k•1•0•0•Updated Apr 30, 2026Apr 30, 2026
TileKernels
Public
A kernel library written in tilelang
Python
•
MIT License
•145•0•0•0•Updated Apr 23, 2026Apr 23, 2026
TensorRT-LLM
Public
TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inferen…
Python
•
Other
•2.5k•1•0•0•Updated Apr 1, 2026Apr 1, 2026
nunchaku
Public
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
Python
•
Apache License 2.0
•258•3•0•0•Updated Mar 31, 2026Mar 31, 2026
lite.ai.toolkit
Public
🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
tensorrt mnn ncnn
tensorrt mnn ncnn onnx onnxruntime yolov5 tnn mnn-model yolox robustvideomatting
C++
•
GNU General Public License v3.0
•784•4.4k•0•1•Updated Mar 19, 2026Mar 19, 2026
Triton-distributed
Public
Distributed Compiler based on Triton for Parallel Systems
Python
•
MIT License
•157•0•0•0•Updated Mar 11, 2026Mar 11, 2026
ao
Public
PyTorch native quantization and sparsity for training and inference
Python
•
Other
•548•1•0•0•Updated Mar 10, 2026Mar 10, 2026
ComfyUI-CacheDiT
Public
Cache-DiT Node for Comfyui
Python
•
Apache License 2.0
•16•1•0•0•Updated Feb 3, 2026Feb 3, 2026
SageAttention
Public
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics a…
Cuda
•
Apache License 2.0
•439•0•0•0•Updated Jan 22, 2026Jan 22, 2026

ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

xlite-dev

All

All

71 repositories

LeetCUDA

diffusers

sglang

ffpa-attn

modern-gpu-programming-for-mlsys

vllm-omni

vllm

cutlass

flash-attention

flashinfer

quack

DeepGEMM

Awesome-LLM-Inference

deepseek-v4-for-copilot

GCMP

DeepEP

Awesome-DiT-Inference

cache-dit

cllms-for-copilot

.github

svdquant-kernels

FlashMLA

TileKernels

TensorRT-LLM

nunchaku

lite.ai.toolkit

Triton-distributed

ao

ComfyUI-CacheDiT

SageAttention

All

All

Repositories list

71 repositories