Minimal FastAPI service that serves text embeddings from OpenCLIP.
- Endpoint:
POST /embed→ returns embeddings for input texts - Model:
openai/clip-vit-base-patch32viaopen-clip-torch - Device: Auto-selects CUDA if available, else CPU
- Dimension: 512-d embeddings; optional L2 normalization
You can use either uv (fast) or pip.
# Using uv (recommended)
uv pip install fastapi uvicorn open-clip-torch torch
# Or using pip
pip install fastapi uvicorn open-clip-torch torchIf you plan to expose the service over the internet from a notebook, also install:
pip install pyngrokThis repo contains openclip_service.ipynb with a ready-to-run FastAPI app.
Steps:
- Open
openclip_service.ipynb. - Run the install cell if needed.
- Run the cell that defines the FastAPI
appand the/embedroute. - Start the server: Use the ngrok cell to run uvicorn in a background thread and expose a public URL.
Local server default: http://localhost:8000
Tip: To run outside notebooks, you can convert the notebook to a script:
jupyter nbconvert --to script openclip_service.ipynb
python openclip_service.pyOr copy the FastAPI snippet from the notebook into your own app.py and run:
uvicorn app:app --host 0.0.0.0 --port 8000The notebook includes a cell that starts uvicorn in a background thread and creates an ngrok tunnel.
You’ll be prompted for your ngrok authtoken (get one from https://dashboard.ngrok.com/get-started/your-authtoken). The cell prints a public URL such as https://xyz.ngrok-free.app.
Use that base URL in requests (see examples below).
GET /Response:
{ "status": "ok" }POST /embed
Content-Type: application/jsonRequest body:
{
"texts": ["a photo of a dog", "a red car"],
"normalize": true
}- texts: array of strings (required)
- normalize: boolean (optional, default
true). Iftrue, L2-normalizes embeddings.
Response body:
{
"embeddings": [[0.12, 0.01, ...], [0.03, -0.07, ...]]
}- Shape:
[num_texts, 512] - Dtype: float32 (JSON numbers)
Auth: No authentication is enforced by default.
Assume the service is running at http://localhost:8000.
curl -Method POST `
-Uri http://localhost:8000/embed `
-ContentType 'application/json' `
-Body '{"texts":["a photo of a dog","a red car"],"normalize":true}'curl -s -X POST http://localhost:8000/embed \
-H 'Content-Type: application/json' \
-d '{"texts":["a photo of a dog","a red car"],"normalize":true}'import requests
url = "http://localhost:8000/embed"
payload = {"texts": ["a photo of a dog", "a red car"], "normalize": True}
r = requests.post(url, json=payload, timeout=15)
r.raise_for_status()
embeddings = r.json()["embeddings"] # List[List[float]], shape [N, 512]
print(len(embeddings), len(embeddings[0]))test_embed_api.py exercises the API and prints the returned shape.
python test_embed_api.py --url http://localhost:8000 "a photo of a dog" "a red car"If you exposed the service via ngrok, pass the public URL:
python test_embed_api.py --url https://YOUR-TUNNEL.ngrok-free.app "a photo of a dog" "a red car"You can also pass an Authorization header (the server ignores it by default, but it’s useful if you add auth later):
python test_embed_api.py --url http://localhost:8000 --key YOUR_KEY "hello world"You can replace the model with any OpenCLIP-supported checkpoint.
Option A — Hugging Face Hub repo id (current approach):
- Edit
MODEL_IDinopenclip_service.ipynb. - Set it to another HF repo, prefixed with
hf-hub:. No other code changes are required when using Hub IDs.
Option B — Built-in OpenCLIP names and weights:
Replace the creation calls in the notebook like this:
model_name = 'ViT-B-32'
pretrained = 'laion2b_s34b_b79k'
model, _, preprocess = open_clip.create_model_and_transforms(model_name, pretrained=pretrained, device=DEVICE)
tokenizer = open_clip.get_tokenizer(model_name)Notes:
- Embedding dimensionality depends on the chosen model (e.g., 512 for
ViT-B/32, 768 forViT-L/14, etc.). Client code should not assume a fixed size. - If you change the model, restart the kernel/server so the new weights are loaded.