GitHub - YangTuanAnh/openclip-as-a-service: A deployable notebook to generate text embeddings, taking advantage of Colab's free GPUs. Model card can be replaced with any OpenCLIP supported models.

OpenCLIP as a Service

Minimal FastAPI service that serves text embeddings from OpenCLIP.

What you get

Endpoint: POST /embed → returns embeddings for input texts
Model: openai/clip-vit-base-patch32 via open-clip-torch
Device: Auto-selects CUDA if available, else CPU
Dimension: 512-d embeddings; optional L2 normalization

Quickstart

1) Install dependencies

You can use either uv (fast) or pip.

# Using uv (recommended)
uv pip install fastapi uvicorn open-clip-torch torch

# Or using pip
pip install fastapi uvicorn open-clip-torch torch

If you plan to expose the service over the internet from a notebook, also install:

pip install pyngrok

2) Start the service (from the notebook)

This repo contains openclip_service.ipynb with a ready-to-run FastAPI app.

Steps:

Open openclip_service.ipynb.
Run the install cell if needed.
Run the cell that defines the FastAPI app and the /embed route.
Start the server: Use the ngrok cell to run uvicorn in a background thread and expose a public URL.

Local server default: http://localhost:8000

Tip: To run outside notebooks, you can convert the notebook to a script:

jupyter nbconvert --to script openclip_service.ipynb
python openclip_service.py

Or copy the FastAPI snippet from the notebook into your own app.py and run:

uvicorn app:app --host 0.0.0.0 --port 8000

3) Optional: expose via ngrok

The notebook includes a cell that starts uvicorn in a background thread and creates an ngrok tunnel.

You’ll be prompted for your ngrok authtoken (get one from https://dashboard.ngrok.com/get-started/your-authtoken). The cell prints a public URL such as https://xyz.ngrok-free.app.

Use that base URL in requests (see examples below).

API

Health

GET /

Response:

{ "status": "ok" }

Embed

POST /embed
Content-Type: application/json

Request body:

{
  "texts": ["a photo of a dog", "a red car"],
  "normalize": true
}

texts: array of strings (required)
normalize: boolean (optional, default true). If true, L2-normalizes embeddings.

Response body:

{
  "embeddings": [[0.12, 0.01, ...], [0.03, -0.07, ...]]
}

Shape: [num_texts, 512]
Dtype: float32 (JSON numbers)

Auth: No authentication is enforced by default.

Examples

Assume the service is running at http://localhost:8000.

curl (PowerShell)

curl -Method POST `
  -Uri http://localhost:8000/embed `
  -ContentType 'application/json' `
  -Body '{"texts":["a photo of a dog","a red car"],"normalize":true}'

curl (bash)

curl -s -X POST http://localhost:8000/embed \
  -H 'Content-Type: application/json' \
  -d '{"texts":["a photo of a dog","a red car"],"normalize":true}'

Python (requests)

import requests

url = "http://localhost:8000/embed"
payload = {"texts": ["a photo of a dog", "a red car"], "normalize": True}
r = requests.post(url, json=payload, timeout=15)
r.raise_for_status()
embeddings = r.json()["embeddings"]  # List[List[float]], shape [N, 512]
print(len(embeddings), len(embeddings[0]))

Provided test script

test_embed_api.py exercises the API and prints the returned shape.

python test_embed_api.py --url http://localhost:8000 "a photo of a dog" "a red car"

If you exposed the service via ngrok, pass the public URL:

python test_embed_api.py --url https://YOUR-TUNNEL.ngrok-free.app "a photo of a dog" "a red car"

You can also pass an Authorization header (the server ignores it by default, but it’s useful if you add auth later):

python test_embed_api.py --url http://localhost:8000 --key YOUR_KEY "hello world"

Use a different OpenCLIP model

You can replace the model with any OpenCLIP-supported checkpoint.

Option A — Hugging Face Hub repo id (current approach):

Edit MODEL_ID in openclip_service.ipynb.
Set it to another HF repo, prefixed with hf-hub:. No other code changes are required when using Hub IDs.

Option B — Built-in OpenCLIP names and weights:

Replace the creation calls in the notebook like this:

model_name = 'ViT-B-32'
pretrained = 'laion2b_s34b_b79k'
model, _, preprocess = open_clip.create_model_and_transforms(model_name, pretrained=pretrained, device=DEVICE)
tokenizer = open_clip.get_tokenizer(model_name)

Notes:

Embedding dimensionality depends on the chosen model (e.g., 512 for ViT-B/32, 768 for ViT-L/14, etc.). Client code should not assume a fixed size.
If you change the model, restart the kernel/server so the new weights are loaded.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md
openclip_service.ipynb		openclip_service.ipynb
test_embed_api.py		test_embed_api.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenCLIP as a Service

What you get

Quickstart

1) Install dependencies

2) Start the service (from the notebook)

3) Optional: expose via ngrok

API

Health

Embed

Examples

curl (PowerShell)

curl (bash)

Python (requests)

Provided test script

Use a different OpenCLIP model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenCLIP as a Service

What you get

Quickstart

1) Install dependencies

2) Start the service (from the notebook)

3) Optional: expose via ngrok

API

Health

Embed

Examples

curl (PowerShell)

curl (bash)

Python (requests)

Provided test script

Use a different OpenCLIP model

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages