Community infrastructure for distributed inference of open source LLMs.
Prometeu turns ordinary machines into a collaborative inference network for open source language models. Many people have unused CPU, RAM, small GPUs, laptops, home servers, VPS instances, lab machines, or workstations. Alone, those resources may look small. Together, they become real capacity for serving open LLMs through a transparent, auditable, community-owned stack.
Prometeu exists to give power back to communities that believe in open source LLMs, democratic access, and infrastructure that can be inspected, self-hosted, improved, and shared. People who cannot or do not want to pay high monthly fees need a collective alternative: a network where users can also contribute, idle capacity becomes access, and every operational component is open.
- Mission
- What Prometeu does
- How it works
- Architecture
- Core capabilities
- Usage flows
- Inference API
- Participant nodes
- Model pools
- Reciprocity and authentication
- Security
- Observability
- Installation
- Testing and development
- Roadmap
- Contributing
- Attribution and license
Open source LLMs only fulfill their promise when communities also control the infrastructure needed to run them.
An open model without accessible inference capacity still leaves many people outside. Prometeu addresses that gap by building a community layer for:
- sharing inference capacity across participants;
- serving open source models through a simple API;
- routing requests to volunteer nodes with available resources;
- measuring real contribution with signed receipts;
- giving contributors higher usage headroom;
- protecting users from poisoned weights with a hash-pinned allowlist;
- supporting full self-hosting without black boxes;
- laying groundwork for a public open AI network maintained by people, projects, independent labs, and local communities.
Prometeu does not try to win by raw throughput alone. It aims to win by sovereignty, shared cost, transparency, and collective capacity.
Prometeu should not be constrained to tiny models. Small models are only the bootstrap path. As community capacity grows, the network must make medium, large, and very large open models visible, measurable, and eventually runnable through dynamic pools.
Prometeu is a full stack for distributed LLM inference:
| Capability | Description |
|---|---|
/v1 gateway |
HTTP entrypoint for apps, CLIs, bots, and UIs using a common chat/completions format. |
| P2P mesh | Nodes connect through a mesh with Ed25519 identity and encrypted Iroh transport. |
| Node registry | Participants announce capacity, active models, limits, and health state. |
| Pool orchestration | Coordinator receives a model request, sizes it, selects peers, warms the pool, and tracks state. |
| Peer-direct routing | Gateway can route inference to the volunteer node serving the requested model. |
| GGUF sizer | Reads GGUF metadata and estimates RAM / peer count requirements. |
| Inference sandbox | Dedicated runner user with systemd/cgroup resource limits. |
| Curated allowlist | Only approved models with known hashes can be loaded by participant nodes. |
| Reciprocity | Nodes that serve tokens receive more usage headroom. |
| Ed25519 auth | Challenge/response proves key ownership without trusting claimed identity. |
| Signed receipts | Sessions produce signed CBOR receipts with counters. |
| Metrics | /metrics exposes Prometheus data. |
| Watchdog | Periodically checks pools and alerts only on real incidents. |
| Self-hosting | Scripts support running coordinator, workers, gateway, and node daemon. |
Prometeu separates three roles.
A user sends a request to /v1/chat/completions with a target model. The gateway applies limits, meters usage, and routes the request to a pool or node able to answer.
sequenceDiagram
participant U as User/App
participant G as Gateway
participant P as Pool/Node
participant M as Model
U->>G: POST /v1/chat/completions
G->>G: rate limit + metering
G->>P: route to active peer/pool
P->>M: inference
M-->>P: tokens
P-->>G: stream/response
G-->>U: tokens
A contributor installs the local daemon. The node detects hardware, sets local limits, announces capacity, and can serve allowlisted models.
sequenceDiagram
participant N as Participant node
participant C as Coordinator
participant A as Allowlist
participant R as Sandbox runner
N->>C: heartbeat with capacity
C->>N: request model load
N->>A: verify model_id + sha256
N->>R: start runner with limits
R-->>N: model ready
N-->>C: READY + session receipts
An operator runs the gateway/coordinator, catalog, registry, Redis, metrics, and policy layer. This can power a public community network, local collective, school lab, research cluster, project-specific network, or self-hosted deployment.
┌─────────────────────────────┐
│ Apps / UIs / CLIs / Bots │
└──────────────┬──────────────┘
│ HTTP /v1
▼
┌────────────────────────────────────────────────────────────┐
│ Gateway / Coordinator │
│ - inference routing │
│ - rate limiting and metering │
│ - pool orchestration │
│ - GGUF sizing │
│ - hash-pinned allowlist │
│ - Ed25519 auth │
│ - reciprocity ledger │
│ - registry and metrics │
└──────────────┬─────────────────────┬───────────────────────┘
│ │
│ Registry/Redis │ P2P mesh / peer-direct
▼ ▼
┌────────────────┐ ┌──────────────────────────┐
│ Node state │ │ Participant nodes │
│ Pools │ │ - local daemon │
│ Receipts │ │ - dashboard :8787 │
│ Ledger │ │ - sandbox runner │
└────────────────┘ │ - llama.cpp server/RPC │
│ - signed receipts │
└──────────────────────────┘
Main components:
| Component | Role |
|---|---|
gateway/ |
FastAPI coordinator: /v1, pools, registry, allowlist, auth, metrics, reciprocity. |
mesh/ |
Rust binary with Iroh P2P, Ed25519 identity, TCP bridge, and signed receipts. |
node/ |
Participant daemon, local dashboard, installer, sandbox runner. |
node-agent/ |
Lightweight telemetry/capacity agent. |
scripts/ |
Build, installation, workers, watchdog, and distribution proof tooling. |
tests/ |
pytest suite covering router, sizer, pools, reciprocity, and allowlist. |
web/ |
Minimal web chat interface, no build step. |
assets/ |
Branding and badge assets. |
docs/ |
Technical notes and design decisions. |
/v1/chat/completionsendpoint;- streaming support;
- proxy to inference servers that support common chat/completions semantics;
- token metering for streaming and non-streaming responses;
- limits by IP and authenticated identity;
- automatic attribution header.
- Iroh transport;
- persistent Ed25519 identity;
NodeIdderived from the public key;- CBOR handshake;
- bidirectional TCP bridge;
- registry-based discovery;
- no router port-forwarding required in common NAT scenarios.
- model request by
model_id; - automatic resource sizing;
- peer selection based on RAM/capacity;
- remote instruction to load a model;
- states:
REQUESTED,WARMING,READY,DEGRADED,FAILED,STOPPED; - periodic watchdog for incidents.
- local dashboard at
http://localhost:8787; - hardware fingerprint: CPU, RAM, disk, GPU/VRAM when available;
- model selection from catalog;
- local resource limits;
- heartbeat into registry;
- sandbox runner with dedicated user;
- cgroup/systemd limits;
- preflight blockers before serving.
Prometeu prioritizes Q8 quantization for runnable models because quality matters. Lower quantization can help during bootstrap, but it should not define the long-term network.
Policy:
- prefer
Q8_0when the model publishes it; - use lower quantization only when Q8 is unavailable or explicitly approved for constrained pools;
- show large models in the catalog even before the current network can run them;
- expose capacity gaps so the community knows what is needed to unlock larger models.
Model tiers:
| Tier | Typical size | Role |
|---|---|---|
| Tiny | 0.5B–3B | bootstrap, fast tests, low-resource nodes |
| Small | 4B–8B | early community pools |
| Medium | 12B–34B | growing network target |
| Large | 70B+ | high-capacity community pools |
| MoE | 8x7B+ | scheduler/research target |
| Embed | varied | RAG/search infrastructure |
| Vision | varied | multimodal roadmap |
| Code | varied | developer tooling |
- allowlist with
model_id, source, and expectedsha256; - off-list models rejected;
- hash mismatches rejected;
- no fallback to unverified downloads;
- primary defense against poisoned weights.
- contribution measured by tokens served;
- consumption measured by tokens used;
- signed receipts feed the ledger;
- contributors receive higher headroom;
- anonymous users keep a baseline floor;
- designed as a soft quota, not a paywall.
- Choose a model available in the catalog.
- Send a request to
/v1/chat/completions. - Receive a JSON response or token stream.
- For more headroom, run a participant node and authenticate your key.
- Install
prometeu-node. - Set CPU/RAM/bandwidth limits.
- Choose allowlisted models.
- Node announces capacity.
- Coordinator sends workload when demand exists.
- Node generates signed receipts.
- Your reciprocity standing grows.
- Run gateway/coordinator.
- Configure Redis/registry.
- Publish a curated allowlist.
- Install participant nodes.
- Point apps to your coordinator
/v1endpoint. - Monitor
/metrics.
Prometeu uses a common HTTP chat/completions format under /v1.
Example with curl:
curl -X POST http://localhost:3000/v1/chat/completions \
-H 'content-type: application/json' \
-d '{
"model": "qwen",
"messages": [
{"role": "user", "content": "Explain distributed inference in one sentence."}
],
"stream": false
}'Streaming example:
curl -N -X POST http://localhost:3000/v1/chat/completions \
-H 'content-type: application/json' \
-d '{
"model": "qwen",
"messages": [
{"role": "user", "content": "Write a short manifesto for open AI."}
],
"stream": true
}'Useful endpoints:
| Endpoint | Purpose |
|---|---|
POST /v1/chat/completions |
Chat/completions inference. |
GET /api/catalog/llms?sort=updated |
Discover latest GGUF models from the public catalog. |
GET /api/catalog/llms?sort=downloads |
Discover popular GGUF models from the public catalog. |
GET /api/catalog/allowlist |
List verified/runnable model candidates and expected hashes. |
POST /api/pools/request |
Request model pool creation/warmup. |
GET /api/pools |
List pools and states. |
GET /api/registry/nodes |
List registered nodes. |
GET /api/mesh/peers |
List announced P2P peers. |
POST /api/auth/challenge |
Create nonce for an Ed25519 public key. |
POST /api/auth/verify |
Verify signature and issue short-lived token. |
GET /api/reciprocity/standing |
Check reciprocity standing. |
GET /metrics |
Prometheus metrics. |
Use http://localhost:3000 locally or replace it with your own coordinator URL.
A participant node is a machine that donates controlled capacity to the network.
Install from the repository:
git clone https://github.com/maxwellmelo/prometeu.git
cd prometeu
sudo bash node/install.sh http://YOUR_COORDINATOR:3000After installation:
- local dashboard:
http://localhost:8787; - hardware detected automatically;
- runner created with dedicated user;
- limits applied with systemd/cgroups;
- heartbeat sent to coordinator;
- models load only after allowlist verification;
- signed receipts record served work.
Typical node config:
{
"coordinator_url": "http://YOUR_COORDINATOR:3000",
"cpu_quota": "200%",
"memory_max": "6G",
"bandwidth_mbps": 20,
"allow_public_inference": true
}bandwidth_mbps is currently declared but not yet shaped; real bandwidth enforcement is on the roadmap.
A pool is a group of peers prepared to serve a model.
Flow:
- Client requests a model.
- Coordinator checks the allowlist.
- Sizer estimates required resources.
- Registry finds eligible peers.
- Coordinator instructs peers to load the model.
- Peers download and verify the GGUF.
- Pool warms until quorum.
- Gateway routes inference.
Example:
curl -X POST http://localhost:3000/api/pools/request \
-H 'content-type: application/json' \
-d '{
"model_id": "org/model/file.gguf",
"source": "hf",
"context": 4096
}'Check state:
curl http://localhost:3000/api/poolsStates:
| State | Meaning |
|---|---|
REQUESTED |
Request received. |
WARMING |
Peers are loading/verifying the model. |
READY |
Pool ready to serve. |
DEGRADED |
Pool works but lost capacity or partial quorum. |
FAILED |
Pool failed. |
STOPPED |
Pool intentionally stopped. |
Prometeu uses reciprocity because community infrastructure must balance usage and contribution.
- Contribution: tokens served by your node, proven by signed receipts.
- Consumption: tokens used through
/v1. - Standing: relationship between contribution and consumption.
- Quota: soft limit derived from standing.
Anonymous users receive a baseline floor. Contributors receive more headroom. Above the limit, the gateway may return 429 with Retry-After.
Identity is not a password. Identity is a public key.
Flow:
# 1. request challenge
curl -X POST http://localhost:3000/api/auth/challenge \
-H 'content-type: application/json' \
-d '{"public_key":"<base64-ed25519-pub>"}'
# 2. sign nonce with secret key and verify
curl -X POST http://localhost:3000/api/auth/verify \
-H 'content-type: application/json' \
-d '{
"public_key":"<base64-ed25519-pub>",
"nonce":"<nonce>",
"signature":"<base64-signature>"
}'
# 3. check standing with bearer token
curl http://localhost:3000/api/reciprocity/standing \
-H 'authorization: Bearer TOKEN'Properties:
- single-use nonce;
- short-lived token;
- invalid signature rejected;
- replay rejected;
- no trust-on-first-use.
Prometeu assumes a community environment, so trust is treated as scarce.
Current controls:
- Hash-pinned allowlist: every model must exist in the curated catalog and match its expected sha256.
- Sandbox runner: inference runs as a dedicated user with CPU/RAM limits.
- Preflight blockers: node refuses to serve if required safety prerequisites are missing.
- Ed25519 identities: peers and users can prove key ownership.
- Signed receipts: sessions generate signed proof of service.
- Rate limiting:
/v1has basic abuse protection. - Metrics: operational state is exposed for auditability.
- NOTICE/header: downstream use preserves attribution.
Current non-goals:
- running models outside the curated catalog;
- trusting a hash supplied by the client;
- accepting a model when the hash differs;
- counting uptime as contribution without real served work;
- hiding pool failures.
Planned:
- mTLS between coordinator and peers;
- real bandwidth shaping;
- stronger reputation commitments;
- public transparency dashboard.
Prometeu exposes /metrics for Prometheus.
Metrics include:
- pools by state;
- reciprocity consumption;
- recorded contribution;
- normalized route labels to avoid unbounded cardinality;
- node state through the registry;
- watchdog behavior that stays silent when healthy and alerts on incidents.
Example:
curl http://localhost:3000/metricsWatchdog behavior:
- runs periodically;
- alerts on
FAILED; - alerts on
DEGRADED; - alerts on under-quorum non-terminal pools;
- does not alert on intentional
STOPPEDpools.
- Debian/Ubuntu Linux recommended;
- Python 3.11+;
- Redis;
- systemd for sandbox/cgroups;
- build tools for llama.cpp;
- Rust toolchain if building
mesh/; - disk space for GGUF models.
git clone https://github.com/maxwellmelo/prometeu.git
cd prometeu
python3 -m venv .venv
. .venv/bin/activate
pip install -r gateway/requirements.txt
uvicorn gateway.app:app --host 0.0.0.0 --port 3000If local module names differ, check gateway/ and installation scripts in scripts/.
sudo bash scripts/build-llama-cpp.shsudo bash scripts/install-worker.shsudo bash scripts/install.shsudo bash node/install.sh http://YOUR_COORDINATOR:3000bash scripts/prove-distribution.sh http://YOUR_COORDINATOR:3000Expected output shows CPU/network/TCP activity on nodes during generation.
Run suite:
cd prometeu
python3 -m venv .venv
. .venv/bin/activate
pip install -r gateway/requirements.txt pytest
pytest tests/ -qSuite covers:
- routing;
- GGUF sizer;
- pools;
- allowlist;
- reciprocity;
- Ed25519 authentication;
- critical metrics.
Before opening a PR:
pytest tests/ -qIf you change HTTP contracts, add tests. If you touch security behavior, add rejection tests, not only happy-path tests.
/v1gateway with streaming and metering.- Per-IP rate limiting.
- TTL-based node registry.
- Capacity telemetry.
- Iroh P2P mesh.
- Ed25519 identity.
- TCP bridge over mesh.
- Signed CBOR receipts.
- Pool orchestration.
- Pool state machine.
- GGUF sizer.
- Node daemon with local dashboard.
- CPU/RAM/disk/GPU detection.
- Sandbox runner with cgroups.
- Model catalog.
- Hash-pinned allowlist.
- Peer-direct routing.
- Signed-challenge auth.
- Reciprocity ledger.
- Prometheus
/metrics. - Pool watchdog.
- Automated test suite.
- Real bandwidth shaping per node.
- Public transparency dashboard.
- mTLS between coordinator and peers.
- Reputation with slashable commitments.
- Better node onboarding UX.
- Dashboard sections for latest discovered models, popular models, verified runnable models, and capacity targets.
- Larger curated catalog with Q8-first policy and explicit lower-quantization fallback only when Q8 is unavailable.
- Capacity planner showing which medium/large models become runnable as more peers join.
- Peer health scoring.
- Automatic recovery for degraded pools.
- Partial layer downloads.
- True layer-level weight sharding.
- Heterogeneous CPU/GPU workers.
- Large and very large models through dynamic community pools.
- Latency/region-aware scheduling.
- Stronger remote execution verification.
Prometeu needs help across many areas:
| Area | Examples |
|---|---|
| Security | threat model, hardening, mTLS, sandboxing, supply chain. |
| Distributed systems | scheduler, pool recovery, peer scoring, gossip/discovery. |
| Inference | llama.cpp, GGUF sizing, streaming, batching, quantization. |
| Frontend | node dashboard, public panel, onboarding UX. |
| DevOps | systemd, CI, releases, packages, observability. |
| Documentation | tutorials, guides, diagrams, integration examples. |
| Community | model curation, tests on diverse hardware, translations. |
How to contribute:
- Open an issue with clear context.
- Run tests before submitting a PR.
- Keep changes small when possible.
- Document new endpoints or behavior.
- Do not add a model to the allowlist without a source and verifiable hash.
- Do not commit tokens, keys, credentials, or sensitive URLs.
Prometeu uses Apache License 2.0 with an attribution clause in NOTICE.
If you use, modify, redistribute, embed, publish an interface, operate a derived API, or build on top of Prometeu, you must preserve attribution.
Main requirements:
- User-facing interfaces must show Powered by Prometeu with a visible link to this repository.
- Derived APIs must preserve this header:
X-Powered-By: Prometeu (https://github.com/maxwellmelo/prometeu)
- Redistributions must include NOTICE.
- Publications, model cards, and materials using Prometeu must cite the project.
Badge:
[](https://github.com/maxwellmelo/prometeu)Prometeu exists so open source LLM communities can share capacity instead of waiting for access to be granted from above.