This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
-
Updated
Feb 13, 2024 - Jupyter Notebook
This is suite of the hands-on training materials that shows how to scale CV, NLP, time-series forecasting workloads with Ray.
An Asynchronous Reinforcement Learning Engine for Omni-Modal Post-Training at Scale
Self-hosted, OpenAI-compatible inference for the agentic era: reasoning LLMs, universal tool calling, and the Responses API alongside embeddings, speech, and image models — many models sharing your GPUs, one gateway. Powered by Ray Serve.
Batch LLM Inference with Ray Data LLM: From Simple to Advanced
Building Real-Time Inference Pipelines with Ray Serve
A Production-Ready, Scalable RAG-powered LLM-based Context-Aware QA App
BioEngine is a distributed AI platform that brings the power of cloud computing to bioimage analysis.
Create Context-Aware Q&A Interfaces from Your Own Data with LLMs and Vector Embeddings - Includes an automated embedding pipeline and a model-powered Q&A interface
Plugin-first framework for modular Python services with FastAPI ingress and optional Ray execution.
Distributed RAG platform on Kubernetes using Ray Serve, FastAPI, vector databases, and LLM orchestration.
Production-grade scalable embedding API server using SentenceTransformers "intfloat/multilingual-e5-base" model, powered by Ray Serve for multi-GPU orchestration, with Prometheus & Grafana monitoring.
A comprehensive guide to setting up and managing Raspberry Pi, Ray Clusters, and distributed AI workloads. Includes network troubleshooting, IP configuration, Ray Dashboard, and Python script execution for scalable AI applications.
A drop-in replacement of fastapi to enable scalable and fault tolerant deployments with ray serve
Distributed RAG document-intelligence on Ray: trained ML owns retrieval ranking & query routing, the LLM only writes citation-grounded answers. Runs free on a CPU laptop, scales to a cluster unchanged.
A distributed ML recommendation system — real-time streaming, multi-node distributed training, and fault-tolerant, scalable serving.
Ray Serve backend for Arabic Speech Recognition
Real-time multimodal fashion recommendation system with Java Spring Boot, Ray Serve, Milvus, Redis, Kafka, and Docker Compose.
Overview of our graduation project “Cairo Dictionary AI” – an Arabic dictionary enriched with AI. Includes our speech correction pipeline, HuggingFace models/datasets, backend prototypes (Ray & FastAPI), and academic report.
Add a description, image, and links to the ray-serve topic page so that developers can more easily learn about it.
To associate your repository with the ray-serve topic, visit your repo's landing page and select "manage topics."