vllm-serve

Star

Here are 42 public repositories matching this topic...

xerrors / mvllm

Star

Intelligent load balancer for distributed vLLM server clusters 分布式 vLLM 服务器集群的智能负载均衡器

inference balancer llms vllm vllm-serve

Updated Oct 22, 2025
Python

aminalaee / vllm-doctor

Sponsor

Star

Diagnostic tool for vLLM inference servers

llm vllm llm-inference vllm-serve

Updated Jun 25, 2026
Rust

agentsculptor is an experimental AI-powered development agent designed to analyze, refactor, and extend Python projects automatically. It uses an OpenAI-like planner–executor loop on top of a vLLM backend, combining project context analysis, structured tool calls, and iterative refinement. It has only been tested with gpt-oss-120b via vLLM.

nlp open-source ai hackathon-project coding-assistant llms vllm agentic-ai vllm-serve gpt-oss gpt-oss-120b gpt-oss-20b vllm-server-config

Updated Sep 17, 2025
Python

KempnerInstitute / distributed-inference-vllm

Star

Distributed Inference with vLLM

hpc slurm vllm llama3 qwen2-5 vllm-serve

Updated Apr 24, 2026
Shell

MekayelAnik / vllm-cpu

Star

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Updated Jun 22, 2026
Shell

BudEcosystem / Awesome-vLLM-plugins

Star

A curated list of plugins built on top of vLLM

plugins vllm vllm-operator vllm-serve vllm-integration vllm-plugins

Updated Dec 12, 2025

inboxpraveen / Easy-vLLM

Sponsor

Star

A simple UI and config generator to run vLLM with Docker, GPU settings, model config parsing, memory estimation, and OpenAI-compatible test clients.

python docker flask docker-compose cuda flask-application model-serving mlops huggingface llm vllm vllm-ui vllm-serve vllm-server vllm-serving

Updated Apr 30, 2026
Python

project-david-ai / projectdavid-core

Star

The core source files to this self-hostable successor to the OpenAI Assistants API. To contribute to the core logic, fork or submit pull requests to this repro.

python docker self-hosted orchestration multi-agent firejail gdpr ai-platform llm vllm assistants-api rag-pipeline tool-calling vllm-serve openai-compatible

Updated Apr 24, 2026
Python

hadi-technology / vllm-mlops

Star

Performant LLM inferencing on Kubernetes via vLLM

kubernetes digitalocean machine-learning mlops vllm vllm-serve

Updated Feb 11, 2025

iguanesolutions / qwen35-rp

Star

Qwen 3.5 Reverse Proxy for handling instant / thinking modes and their variants automatically

inference reverse-proxy instant thinking openai-api llm vllm genai vllm-serve qwen3-5 sampling-parameters

Updated May 26, 2026
Go

brokedba / vllm-lab

Star

This Repository contains terraform configuration for vllm production-stack in the cloud managed K8s

gke aks civo eks oke vllm llmcache vllm-operator vllm-serve vllm-production-stack

Updated Nov 10, 2025
HCL

SeungjaeLim / Efficient-Road-Repairs-System

Star

[KAIST CS632] Road damage detection using YOLOv8 on Xilinx FPGA, repair estimation with vLLM-Serve Phi-3.5 FAISS RAG, and data management via GS1 EPCISv2 and React dashboard

react gs1 xilinx-fpga epcis faiss lmm rag yolov8 microsoft-phi3 vllm-serve

Updated Dec 19, 2024
Python

kingabzpro / Deploying-the-Magistral-with-Modal

Star

Deploy the Magistral-Small-2506 model using vLLM and Modal

modal mistral openai-api vllm-serve

Updated Jun 16, 2025
Python

arshad234567 / learn-ai-by-building

Star

Documenting my journey of understanding modern AI concepts—from Transformers and LLMs to RAG, Agents, Embeddings, Vector Databases, and vLLM.

python transformers pytorch rag qdrant llms langchain chromadb langgraph agentic-ai vllm-serve

Updated Jun 19, 2026
Jupyter Notebook

AbdulSametTurkmenoglu / vllm_rag_api

Star

This project offers a production-ready RAG (Retrieval-Augmented Generation) API running on FastAPI, utilizing the high-performance vLLM engine.

rag llm vllm rag-chatbot vllm-serve

Updated Oct 31, 2025
Python

harryboi17 / LLM-Evaluation-Pipeline

Star

A production-style LLM evaluation pipeline spanning vLLM serving, lm-eval-harness integration, performance metrics (TTFT/TPOT/p95), deterministic guardrails, and statistically significant benchmark improvements.

benchmark-framework performance-testing tpot lm-evaluation-harness vllm vllm-serve p95-p99-metrics ttft-optimization

Updated Apr 19, 2026
Python

mahimairaja / modal-qwen-3.5-9B

Sponsor

Star

Deploy the SOTA qwen 3.5 to 9B Serverless

modal llm vllm llm-inference vllm-serve qwen3-5 qwen9b

Updated Mar 4, 2026
Python

Aquiles-ai / load-test-vllm-gpt-oss-20b

Star

Load testing openai/gpt-oss-20b with vLLM and Docker

docker load-testing vllm-serve gpt-oss-20b

Updated Sep 8, 2025
Python

Lin-shuaibi / Ai-forge-VLLM_Cluster_dashboard

Star

由 AI 辅助开发的实用开源项目合集

react benchmark dashboard gpu fastapi ai-inference llm vllm vllm-serve cluster-vllm

Updated Jun 5, 2026
Python

rohitkt10 / vllm-bench

Star

A reproducible benchmarking suite for vLLM inference. Measure latency, throughput, and VRAM across model configurations, quantization schemes, and deployment environments.

modal inference quantization llm vllm vllm-serve

Updated Jan 25, 2026
Python

Improve this page

Add a description, image, and links to the vllm-serve topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vllm-serve topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vllm-serve

Here are 42 public repositories matching this topic...

xerrors / mvllm

aminalaee / vllm-doctor

Perpetue237 / agentsculptor

KempnerInstitute / distributed-inference-vllm

MekayelAnik / vllm-cpu

BudEcosystem / Awesome-vLLM-plugins

inboxpraveen / Easy-vLLM

project-david-ai / projectdavid-core

hadi-technology / vllm-mlops

iguanesolutions / qwen35-rp

brokedba / vllm-lab

SeungjaeLim / Efficient-Road-Repairs-System

kingabzpro / Deploying-the-Magistral-with-Modal

arshad234567 / learn-ai-by-building

AbdulSametTurkmenoglu / vllm_rag_api

harryboi17 / LLM-Evaluation-Pipeline

mahimairaja / modal-qwen-3.5-9B

Aquiles-ai / load-test-vllm-gpt-oss-20b

Lin-shuaibi / Ai-forge-VLLM_Cluster_dashboard

rohitkt10 / vllm-bench

Improve this page

Add this topic to your repo