Prometeu

Community infrastructure for distributed inference of open source LLMs.

Prometeu turns ordinary machines into a collaborative inference network for open source language models. Many people have unused CPU, RAM, small GPUs, laptops, home servers, VPS instances, lab machines, or workstations. Alone, those resources may look small. Together, they become real capacity for serving open LLMs through a transparent, auditable, community-owned stack.

Prometeu exists to give power back to communities that believe in open source LLMs, democratic access, and infrastructure that can be inspected, self-hosted, improved, and shared. People who cannot or do not want to pay high monthly fees need a collective alternative: a network where users can also contribute, idle capacity becomes access, and every operational component is open.

Mission

Open source LLMs only fulfill their promise when communities also control the infrastructure needed to run them.

An open model without accessible inference capacity still leaves many people outside. Prometeu addresses that gap by building a community layer for:

sharing inference capacity across participants;
serving open source models through a simple API;
routing requests to volunteer nodes with available resources;
measuring real contribution with signed receipts;
giving contributors higher usage headroom;
protecting users from poisoned weights with a hash-pinned allowlist;
supporting full self-hosting without black boxes;
laying groundwork for a public open AI network maintained by people, projects, independent labs, and local communities.

Prometeu does not try to win by raw throughput alone. It aims to win by sovereignty, shared cost, transparency, and collective capacity.

Prometeu should not be constrained to tiny models. Small models are only the bootstrap path. As community capacity grows, the network must make medium, large, and very large open models visible, measurable, and eventually runnable through dynamic pools.

What Prometeu does

Prometeu is a full stack for distributed LLM inference:

Capability	Description
`/v1` gateway	HTTP entrypoint for apps, CLIs, bots, and UIs using a common chat/completions format.
P2P mesh	Nodes connect through a mesh with Ed25519 identity and encrypted Iroh transport.
Node registry	Participants announce capacity, active models, limits, and health state.
Pool orchestration	Coordinator receives a model request, sizes it, selects peers, warms the pool, and tracks state.
Peer-direct routing	Gateway can route inference to the volunteer node serving the requested model.
GGUF sizer	Reads GGUF metadata and estimates RAM / peer count requirements.
Inference sandbox	Dedicated runner user with systemd/cgroup resource limits.
Curated allowlist	Only approved models with known hashes can be loaded by participant nodes.
Reciprocity	Nodes that serve tokens receive more usage headroom.
Ed25519 auth	Challenge/response proves key ownership without trusting claimed identity.
Signed receipts	Sessions produce signed CBOR receipts with counters.
Metrics	`/metrics` exposes Prometheus data.
Watchdog	Periodically checks pools and alerts only on real incidents.
Self-hosting	Scripts support running coordinator, workers, gateway, and node daemon.

How it works

Prometeu separates three roles.

1. Inference user

A user sends a request to /v1/chat/completions with a target model. The gateway applies limits, meters usage, and routes the request to a pool or node able to answer.

sequenceDiagram
    participant U as User/App
    participant G as Gateway
    participant P as Pool/Node
    participant M as Model

    U->>G: POST /v1/chat/completions
    G->>G: rate limit + metering
    G->>P: route to active peer/pool
    P->>M: inference
    M-->>P: tokens
    P-->>G: stream/response
    G-->>U: tokens

2. Node contributor

A contributor installs the local daemon. The node detects hardware, sets local limits, announces capacity, and can serve allowlisted models.

sequenceDiagram
    participant N as Participant node
    participant C as Coordinator
    participant A as Allowlist
    participant R as Sandbox runner

    N->>C: heartbeat with capacity
    C->>N: request model load
    N->>A: verify model_id + sha256
    N->>R: start runner with limits
    R-->>N: model ready
    N-->>C: READY + session receipts

3. Coordinator operator

An operator runs the gateway/coordinator, catalog, registry, Redis, metrics, and policy layer. This can power a public community network, local collective, school lab, research cluster, project-specific network, or self-hosted deployment.

Architecture

                    ┌─────────────────────────────┐
                    │ Apps / UIs / CLIs / Bots     │
                    └──────────────┬──────────────┘
                                   │ HTTP /v1
                                   ▼
┌────────────────────────────────────────────────────────────┐
│ Gateway / Coordinator                                      │
│ - inference routing                                        │
│ - rate limiting and metering                               │
│ - pool orchestration                                       │
│ - GGUF sizing                                              │
│ - hash-pinned allowlist                                    │
│ - Ed25519 auth                                             │
│ - reciprocity ledger                                       │
│ - registry and metrics                                     │
└──────────────┬─────────────────────┬───────────────────────┘
               │                     │
               │ Registry/Redis      │ P2P mesh / peer-direct
               ▼                     ▼
      ┌────────────────┐     ┌──────────────────────────┐
      │ Node state     │     │ Participant nodes         │
      │ Pools          │     │ - local daemon            │
      │ Receipts       │     │ - dashboard :8787         │
      │ Ledger         │     │ - sandbox runner          │
      └────────────────┘     │ - llama.cpp server/RPC     │
                             │ - signed receipts         │
                             └──────────────────────────┘

Main components:

Component	Role
`gateway/`	FastAPI coordinator: `/v1`, pools, registry, allowlist, auth, metrics, reciprocity.
`mesh/`	Rust binary with Iroh P2P, Ed25519 identity, TCP bridge, and signed receipts.
`node/`	Participant daemon, local dashboard, installer, sandbox runner.
`node-agent/`	Lightweight telemetry/capacity agent.
`scripts/`	Build, installation, workers, watchdog, and distribution proof tooling.
`tests/`	pytest suite covering router, sizer, pools, reciprocity, and allowlist.
`web/`	Minimal web chat interface, no build step.
`assets/`	Branding and badge assets.
`docs/`	Technical notes and design decisions.

Core capabilities

Inference gateway

/v1/chat/completions endpoint;
streaming support;
proxy to inference servers that support common chat/completions semantics;
token metering for streaming and non-streaming responses;
limits by IP and authenticated identity;
automatic attribution header.

P2P mesh

Iroh transport;
persistent Ed25519 identity;
NodeId derived from the public key;
CBOR handshake;
bidirectional TCP bridge;
registry-based discovery;
no router port-forwarding required in common NAT scenarios.

Pool orchestration

model request by model_id;
automatic resource sizing;
peer selection based on RAM/capacity;
remote instruction to load a model;
states: REQUESTED, WARMING, READY, DEGRADED, FAILED, STOPPED;
periodic watchdog for incidents.

Node daemon

local dashboard at http://localhost:8787;
hardware fingerprint: CPU, RAM, disk, GPU/VRAM when available;
model selection from catalog;
local resource limits;
heartbeat into registry;
sandbox runner with dedicated user;
cgroup/systemd limits;
preflight blockers before serving.

Model quality and quantization policy

Prometeu prioritizes Q8 quantization for runnable models because quality matters. Lower quantization can help during bootstrap, but it should not define the long-term network.

Policy:

prefer Q8_0 when the model publishes it;
use lower quantization only when Q8 is unavailable or explicitly approved for constrained pools;
show large models in the catalog even before the current network can run them;
expose capacity gaps so the community knows what is needed to unlock larger models.

Model tiers:

Tier	Typical size	Role
Tiny	0.5B–3B	bootstrap, fast tests, low-resource nodes
Small	4B–8B	early community pools
Medium	12B–34B	growing network target
Large	70B+	high-capacity community pools
MoE	8x7B+	scheduler/research target
Embed	varied	RAG/search infrastructure
Vision	varied	multimodal roadmap
Code	varied	developer tooling

Curated catalog security

allowlist with model_id, source, and expected sha256;
off-list models rejected;
hash mismatches rejected;
no fallback to unverified downloads;
primary defense against poisoned weights.

Reciprocity

contribution measured by tokens served;
consumption measured by tokens used;
signed receipts feed the ledger;
contributors receive higher headroom;
anonymous users keep a baseline floor;
designed as a soft quota, not a paywall.

Usage flows

I want to use an open source LLM

Choose a model available in the catalog.
Send a request to /v1/chat/completions.
Receive a JSON response or token stream.
For more headroom, run a participant node and authenticate your key.

I want to contribute capacity

Install prometeu-node.
Set CPU/RAM/bandwidth limits.
Choose allowlisted models.
Node announces capacity.
Coordinator sends workload when demand exists.
Node generates signed receipts.
Your reciprocity standing grows.

I want to run my own network

Run gateway/coordinator.
Configure Redis/registry.
Publish a curated allowlist.
Install participant nodes.
Point apps to your coordinator /v1 endpoint.
Monitor /metrics.

Inference API

Prometeu uses a common HTTP chat/completions format under /v1.

Example with curl:

curl -X POST http://localhost:3000/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "qwen",
    "messages": [
      {"role": "user", "content": "Explain distributed inference in one sentence."}
    ],
    "stream": false
  }'

Streaming example:

curl -N -X POST http://localhost:3000/v1/chat/completions \
  -H 'content-type: application/json' \
  -d '{
    "model": "qwen",
    "messages": [
      {"role": "user", "content": "Write a short manifesto for open AI."}
    ],
    "stream": true
  }'

Useful endpoints:

Endpoint	Purpose
`POST /v1/chat/completions`	Chat/completions inference.
`GET /api/catalog/llms?sort=updated`	Discover latest GGUF models from the public catalog.
`GET /api/catalog/llms?sort=downloads`	Discover popular GGUF models from the public catalog.
`GET /api/catalog/allowlist`	List verified/runnable model candidates and expected hashes.
`POST /api/pools/request`	Request model pool creation/warmup.
`GET /api/pools`	List pools and states.
`GET /api/registry/nodes`	List registered nodes.
`GET /api/mesh/peers`	List announced P2P peers.
`POST /api/auth/challenge`	Create nonce for an Ed25519 public key.
`POST /api/auth/verify`	Verify signature and issue short-lived token.
`GET /api/reciprocity/standing`	Check reciprocity standing.
`GET /metrics`	Prometheus metrics.

Use http://localhost:3000 locally or replace it with your own coordinator URL.

Participant nodes

A participant node is a machine that donates controlled capacity to the network.

Install from the repository:

git clone https://github.com/maxwellmelo/prometeu.git
cd prometeu
sudo bash node/install.sh http://YOUR_COORDINATOR:3000

After installation:

local dashboard: http://localhost:8787;
hardware detected automatically;
runner created with dedicated user;
limits applied with systemd/cgroups;
heartbeat sent to coordinator;
models load only after allowlist verification;
signed receipts record served work.

Typical node config:

{
  "coordinator_url": "http://YOUR_COORDINATOR:3000",
  "cpu_quota": "200%",
  "memory_max": "6G",
  "bandwidth_mbps": 20,
  "allow_public_inference": true
}

bandwidth_mbps is currently declared but not yet shaped; real bandwidth enforcement is on the roadmap.

Model pools

A pool is a group of peers prepared to serve a model.

Flow:

Client requests a model.
Coordinator checks the allowlist.
Sizer estimates required resources.
Registry finds eligible peers.
Coordinator instructs peers to load the model.
Peers download and verify the GGUF.
Pool warms until quorum.
Gateway routes inference.

Example:

curl -X POST http://localhost:3000/api/pools/request \
  -H 'content-type: application/json' \
  -d '{
    "model_id": "org/model/file.gguf",
    "source": "hf",
    "context": 4096
  }'

Check state:

curl http://localhost:3000/api/pools

States:

State	Meaning
`REQUESTED`	Request received.
`WARMING`	Peers are loading/verifying the model.
`READY`	Pool ready to serve.
`DEGRADED`	Pool works but lost capacity or partial quorum.
`FAILED`	Pool failed.
`STOPPED`	Pool intentionally stopped.

Reciprocity and authentication

Prometeu uses reciprocity because community infrastructure must balance usage and contribution.

Ledger

Contribution: tokens served by your node, proven by signed receipts.
Consumption: tokens used through /v1.
Standing: relationship between contribution and consumption.
Quota: soft limit derived from standing.

Anonymous users receive a baseline floor. Contributors receive more headroom. Above the limit, the gateway may return 429 with Retry-After.

Ed25519 auth

Identity is not a password. Identity is a public key.

Flow:

# 1. request challenge
curl -X POST http://localhost:3000/api/auth/challenge \
  -H 'content-type: application/json' \
  -d '{"public_key":"<base64-ed25519-pub>"}'

# 2. sign nonce with secret key and verify
curl -X POST http://localhost:3000/api/auth/verify \
  -H 'content-type: application/json' \
  -d '{
    "public_key":"<base64-ed25519-pub>",
    "nonce":"<nonce>",
    "signature":"<base64-signature>"
  }'

# 3. check standing with bearer token
curl http://localhost:3000/api/reciprocity/standing \
  -H 'authorization: Bearer TOKEN'

Properties:

single-use nonce;
short-lived token;
invalid signature rejected;
replay rejected;
no trust-on-first-use.

Security

Prometeu assumes a community environment, so trust is treated as scarce.

Current controls:

Hash-pinned allowlist: every model must exist in the curated catalog and match its expected sha256.
Sandbox runner: inference runs as a dedicated user with CPU/RAM limits.
Preflight blockers: node refuses to serve if required safety prerequisites are missing.
Ed25519 identities: peers and users can prove key ownership.
Signed receipts: sessions generate signed proof of service.
Rate limiting: /v1 has basic abuse protection.
Metrics: operational state is exposed for auditability.
NOTICE/header: downstream use preserves attribution.

Current non-goals:

running models outside the curated catalog;
trusting a hash supplied by the client;
accepting a model when the hash differs;
counting uptime as contribution without real served work;
hiding pool failures.

Planned:

mTLS between coordinator and peers;
real bandwidth shaping;
stronger reputation commitments;
public transparency dashboard.

Observability

Prometeu exposes /metrics for Prometheus.

Metrics include:

pools by state;
reciprocity consumption;
recorded contribution;
normalized route labels to avoid unbounded cardinality;
node state through the registry;
watchdog behavior that stays silent when healthy and alerts on incidents.

Example:

curl http://localhost:3000/metrics

Watchdog behavior:

runs periodically;
alerts on FAILED;
alerts on DEGRADED;
alerts on under-quorum non-terminal pools;
does not alert on intentional STOPPED pools.

Installation

General requirements

Debian/Ubuntu Linux recommended;
Python 3.11+;
Redis;
systemd for sandbox/cgroups;
build tools for llama.cpp;
Rust toolchain if building mesh/;
disk space for GGUF models.

Run local coordinator

git clone https://github.com/maxwellmelo/prometeu.git
cd prometeu
python3 -m venv .venv
. .venv/bin/activate
pip install -r gateway/requirements.txt
uvicorn gateway.app:app --host 0.0.0.0 --port 3000

If local module names differ, check gateway/ and installation scripts in scripts/.

Build inference backend

sudo bash scripts/build-llama-cpp.sh

Install classic worker

sudo bash scripts/install-worker.sh

Install full stack by script

sudo bash scripts/install.sh

Install participant node

sudo bash node/install.sh http://YOUR_COORDINATOR:3000

Prove distribution

bash scripts/prove-distribution.sh http://YOUR_COORDINATOR:3000

Expected output shows CPU/network/TCP activity on nodes during generation.

Testing and development

Run suite:

cd prometeu
python3 -m venv .venv
. .venv/bin/activate
pip install -r gateway/requirements.txt pytest
pytest tests/ -q

Suite covers:

routing;
GGUF sizer;
pools;
allowlist;
reciprocity;
Ed25519 authentication;
critical metrics.

Before opening a PR:

pytest tests/ -q

If you change HTTP contracts, add tests. If you touch security behavior, add rejection tests, not only happy-path tests.

Roadmap

Done

/v1 gateway with streaming and metering.
Per-IP rate limiting.
TTL-based node registry.
Capacity telemetry.
Iroh P2P mesh.
Ed25519 identity.
TCP bridge over mesh.
Signed CBOR receipts.
Pool orchestration.
Pool state machine.
GGUF sizer.
Node daemon with local dashboard.
CPU/RAM/disk/GPU detection.
Sandbox runner with cgroups.
Model catalog.
Hash-pinned allowlist.
Peer-direct routing.
Signed-challenge auth.
Reciprocity ledger.
Prometheus /metrics.
Pool watchdog.
Automated test suite.

In progress / planned

Real bandwidth shaping per node.
Public transparency dashboard.
mTLS between coordinator and peers.
Reputation with slashable commitments.
Better node onboarding UX.
Dashboard sections for latest discovered models, popular models, verified runnable models, and capacity targets.
Larger curated catalog with Q8-first policy and explicit lower-quantization fallback only when Q8 is unavailable.
Capacity planner showing which medium/large models become runnable as more peers join.
Peer health scoring.
Automatic recovery for degraded pools.

Research

Partial layer downloads.
True layer-level weight sharding.
Heterogeneous CPU/GPU workers.
Large and very large models through dynamic community pools.
Latency/region-aware scheduling.
Stronger remote execution verification.

Contributing

Prometeu needs help across many areas:

Area	Examples
Security	threat model, hardening, mTLS, sandboxing, supply chain.
Distributed systems	scheduler, pool recovery, peer scoring, gossip/discovery.
Inference	llama.cpp, GGUF sizing, streaming, batching, quantization.
Frontend	node dashboard, public panel, onboarding UX.
DevOps	systemd, CI, releases, packages, observability.
Documentation	tutorials, guides, diagrams, integration examples.
Community	model curation, tests on diverse hardware, translations.

How to contribute:

Open an issue with clear context.
Run tests before submitting a PR.
Keep changes small when possible.
Document new endpoints or behavior.
Do not add a model to the allowlist without a source and verifiable hash.
Do not commit tokens, keys, credentials, or sensitive URLs.

Attribution and license

Prometeu uses Apache License 2.0 with an attribution clause in NOTICE.

If you use, modify, redistribute, embed, publish an interface, operate a derived API, or build on top of Prometeu, you must preserve attribution.

Main requirements:

User-facing interfaces must show Powered by Prometeu with a visible link to this repository.
Derived APIs must preserve this header:

X-Powered-By: Prometeu (https://github.com/maxwellmelo/prometeu)

Redistributions must include NOTICE.
Publications, model cards, and materials using Prometeu must cite the project.

Badge:

[![Powered by Prometeu](https://img.shields.io/badge/powered_by-Prometeu-7c3aed?style=flat-square)](https://github.com/maxwellmelo/prometeu)

Prometeu exists so open source LLM communities can share capacity instead of waiting for access to be granted from above.

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
assets		assets
docs		docs
gateway		gateway
mesh		mesh
node-agent		node-agent
node		node
scripts		scripts
tests		tests
web		web
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Prometeu

Table of contents

Mission

What Prometeu does

How it works

1. Inference user

2. Node contributor

3. Coordinator operator

Architecture

Core capabilities

Inference gateway

P2P mesh

Pool orchestration

Node daemon

Model quality and quantization policy

Curated catalog security

Reciprocity

Usage flows

I want to use an open source LLM

I want to contribute capacity

I want to run my own network

Inference API

Participant nodes

Model pools

Reciprocity and authentication

Ledger

Ed25519 auth

Security

Observability

Installation

General requirements

Run local coordinator

Build inference backend

Install classic worker

Install full stack by script

Install participant node

Prove distribution

Testing and development

Roadmap

Done

In progress / planned

Research

Contributing

Attribution and license

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 7

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages