Skip to content
Change the repository type filter

All

    Repositories list

    • LeetCUDA

      Public
      📚LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners🐑, 200+ CUDA Kernels, Tensor Cores, HGEMM, FA-2 MMA.🎉
      Cuda
      GNU General Public License v3.0
      1.2k11k13Updated Jun 29, 2026Jun 29, 2026
    • diffusers

      Public
      🤗 Diffusers: State-of-the-art diffusion models for image, video, and audio generation in PyTorch and FLAX.
      Python
      Apache License 2.0
      7.1k100Updated Jun 29, 2026Jun 29, 2026
    • sglang

      Public
      SGLang is a fast serving framework for large language models and vision language models.
      Python
      Apache License 2.0
      6.8k300Updated Jun 29, 2026Jun 29, 2026
    • ffpa-attn

      Public
      🤖FFPA: Extends FlashAttention-2 via Split-D for large headdims, 1.5x~3×↑🎉 vs SDPA, up to 430T🎉 on H200.
      Python
      Apache License 2.0
      22310140Updated Jun 29, 2026Jun 29, 2026
    • modern gpu programming
      HTML
      60200Updated Jun 28, 2026Jun 28, 2026
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      Apache License 2.0
      1.2k100Updated Jun 26, 2026Jun 26, 2026
    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      Apache License 2.0
      19k000Updated Jun 26, 2026Jun 26, 2026
    • cutlass

      Public
      CUDA Templates and Python DSLs for High-Performance Linear Algebra
      C++
      Other
      1.9k200Updated Jun 26, 2026Jun 26, 2026
    • Fast and memory-efficient exact attention
      Python
      BSD 3-Clause "New" or "Revised" License
      2.9k000Updated Jun 26, 2026Jun 26, 2026
    • FlashInfer: Kernel Library for LLM Serving
      Python
      Apache License 2.0
      1.1k000Updated Jun 26, 2026Jun 26, 2026
    • quack

      Public
      A Quirky Assortment of CuTe Kernels
      Python
      Apache License 2.0
      138200Updated Jun 25, 2026Jun 25, 2026
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      MIT License
      1.1k000Updated Jun 25, 2026Jun 25, 2026
    • 📚A curated list of Awesome LLM/VLM Inference Papers with Codes: Flash-Attention, Paged-Attention, WINT8/4, Parallelism, etc.🎉
      Python
      GNU General Public License v3.0
      4175.4k11Updated Jun 23, 2026Jun 23, 2026
    • Pick DeepSeek V4 from the Copilot Chat model picker — and keep everything else Copilot already gives you.
      TypeScript
      MIT License
      96000Updated Jun 20, 2026Jun 20, 2026
    • GCMP

      Public
      通过集成国内主流原生大模型提供商,为开发者提供更加丰富、更适合本土需求的 AI 编程助手选择。 目前已内置支持 智谱AI、MiniMax、MoonshotAI、DeepSeek、阿里云百炼、快手万擎、火山方舟、腾讯云、Xiaomi MiMo 等原生大模型提供商。 此外,扩展插件已适配支持 OpenAI 与 Anthro…
      TypeScript
      MIT License
      44000Updated Jun 16, 2026Jun 16, 2026
    • DeepEP

      Public
      DeepEP: an efficient expert-parallel communication library
      Cuda
      MIT License
      1.3k000Updated Jun 15, 2026Jun 15, 2026
    • 📚A curated list of Awesome Diffusion Inference Papers with Codes: Sampling, Cache, Quantization, Parallelism, etc.🎉
      Python
      GNU General Public License v3.0
      2757310Updated Jun 13, 2026Jun 13, 2026
    • cache-dit

      Public
      A PyTorch-native inference engine with cache, parallelism, quantization for Diffusion Transformers.
      Python
      Apache License 2.0
      76400Updated Jun 13, 2026Jun 13, 2026
    • Pick Qwen, GLM, MiniMax, Xiaomi MiMo, Moonshot Kimi & Tencent Hunyuan models from the Copilot Chat model picker. Vision, thinking, BYOK.
      TypeScript
      MIT License
      2100Updated Jun 13, 2026Jun 13, 2026
    • .github

      Public
      0100Updated Jun 12, 2026Jun 12, 2026
    • Cross-architecture CUDA kernels for SVDQuant (W4A4 with low-rank correction)
      Python
      Apache License 2.0
      3200Updated May 28, 2026May 28, 2026
    • FlashMLA

      Public
      FlashMLA: Efficient Multi-head Latent Attention Kernels
      C++
      MIT License
      1.1k100Updated Apr 30, 2026Apr 30, 2026
    • A kernel library written in tilelang
      Python
      MIT License
      145000Updated Apr 23, 2026Apr 23, 2026
    • TensorRT LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and supports state-of-the-art optimizations to perform inferen…
      Python
      Other
      2.5k100Updated Apr 1, 2026Apr 1, 2026
    • nunchaku

      Public
      [ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
      Python
      Apache License 2.0
      258300Updated Mar 31, 2026Mar 31, 2026
    • 🛠A lite C++ AI toolkit: 100+ models with MNN, ORT and TRT, including Det, Seg, Stable-Diffusion, Face-Fusion, etc.🎉
      C++
      GNU General Public License v3.0
      7844.4k01Updated Mar 19, 2026Mar 19, 2026
    • Distributed Compiler based on Triton for Parallel Systems
      Python
      MIT License
      157000Updated Mar 11, 2026Mar 11, 2026
    • ao

      Public
      PyTorch native quantization and sparsity for training and inference
      Python
      Other
      548100Updated Mar 10, 2026Mar 10, 2026
    • Cache-DiT Node for Comfyui
      Python
      Apache License 2.0
      16100Updated Feb 3, 2026Feb 3, 2026
    • Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics a…
      Cuda
      Apache License 2.0
      439000Updated Jan 22, 2026Jan 22, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.