|
1 | | -# Anvil: Open‑Source Object Storage in Rust |
| 1 | +# Anvil: An Open-Source Object Store for AI/ML Research |
2 | 2 |
|
3 | | -**Anvil** is an open‑source, S3‑compatible object storage server written in Rust. Built by the team behind Worka, Anvil is designed to host large files—such as open‑source model weights—with high performance and reliability. It exposes a familiar S3 HTTP gateway, a high‑performance gRPC API, multi‑tenant isolation, and the ability to scale from a single development node to a multi‑region cluster. |
| 3 | +[](https://github.com/worka-ai/anvil-enterprise/actions/workflows/ci.yml) |
| 4 | +[](https://opensource.org/licenses/Apache-2.0) |
| 5 | +[](https://joss.theoj.org/papers/10.21105/joss.XXXXX) |
4 | 6 |
|
5 | | ---- |
6 | | - |
7 | | -## 🔥 Why Anvil? |
8 | | - |
9 | | -- **Written in Rust**: Modern, memory-safe, and highly concurrent. |
10 | | -- **S3-Compatible**: Works out of the box with AWS SDKs, CLI, and third-party tools. |
11 | | -- **gRPC API**: For low-latency, high-throughput access. |
12 | | -- **Multi-Tenant**: Serve different model groups or clients in isolation. |
13 | | -- **Clusterable**: Run standalone or as a horizontally-scalable distributed system. |
14 | | -- **Model Hosting Friendly**: Built to serve billions of tokens efficiently. |
| 7 | +**Anvil** is a high-performance, open-source distributed object store built in Rust. It is designed to address the data management and storage challenges inherent in modern computational research, particularly for large-scale Artificial Intelligence (AI) and Machine Learning (ML) workloads. By providing an S3-compatible interface, a native high-throughput gRPC API, and first-class support for content-addressing, Anvil serves as a foundational infrastructure layer for reproducible and efficient research. |
15 | 8 |
|
16 | 9 | --- |
17 | 10 |
|
18 | | -## 🚀 Quick Start (Standalone) |
| 11 | +## Key Features |
19 | 12 |
|
20 | | -```bash |
21 | | -cargo install anvil |
22 | | -anvil server --root ./data --port 9000 |
23 | | -``` |
24 | | - |
25 | | -Now test it: |
26 | | - |
27 | | -```bash |
28 | | -aws --endpoint-url http://localhost:9000 s3 ls |
29 | | -``` |
| 13 | +- **Content-Addressable Storage:** Automatically deduplicates identical data using BLAKE3 hashing, dramatically reducing storage costs for versioned models and datasets. |
| 14 | +- **High-Performance gRPC Streaming:** A native gRPC API with bidirectional streaming, ideal for high-throughput ML data loaders that feed GPUs directly from storage. |
| 15 | +- **S3-Compatible Gateway:** Provides drop-in compatibility with the vast ecosystem of existing research tools and SDKs that support the S3 API (Boto3, MLflow, Rclone, etc.). |
| 16 | +- **Built for the ML Ecosystem:** Includes features like the `anvil hf ingest` command to import model repositories directly from the Hugging Face Hub. |
| 17 | +- **Modern, Resilient Architecture:** Built in Rust for memory safety and high concurrency, with a SWIM-like gossip protocol over QUIC for clustering and failure detection. |
| 18 | +- **Multi-Tenant by Design:** Provides strong logical isolation between different users, teams, or projects. |
30 | 19 |
|
31 | 20 | --- |
32 | 21 |
|
33 | | -## 🧪 Example: Upload and Fetch via S3 |
| 22 | +## 🚀 Quick Start |
34 | 23 |
|
35 | | -```bash |
36 | | -# Upload a file |
37 | | -aws --endpoint-url http://localhost:9000 s3 cp weights.gguf s3://mymodels/weights.gguf |
| 24 | +The fastest way to get a single-node Anvil instance running is with Docker Compose. |
38 | 25 |
|
39 | | -# Fetch the file |
40 | | -curl http://localhost:9000/mymodels/weights.gguf |
41 | | -``` |
42 | | - |
43 | | ---- |
44 | | - |
45 | | -## 🏗️ Building From Source |
| 26 | +1. **Save the `docker-compose.yml`:** |
| 27 | + Save the example `docker-compose.yml` from the [Getting Started Guide](./docs/01-getting-started.md) to a local file. |
46 | 28 |
|
47 | | -Anvil uses [Rust](https://www.rust-lang.org/tools/install) and requires at least version 1.72. |
| 29 | +2. **Launch Anvil:** |
| 30 | + ```bash |
| 31 | + docker-compose up -d |
| 32 | + ``` |
48 | 33 |
|
49 | | -```bash |
50 | | -git clone https://github.com/worka-ai/anvil |
51 | | -cd anvil |
52 | | -cargo build --release |
53 | | -``` |
54 | | - |
55 | | ---- |
| 34 | +3. **Create Your First Tenant and App:** |
| 35 | + Use the `admin` tool to create a tenant and an app with API credentials. |
| 36 | + ```bash |
| 37 | + # Create a region and a tenant |
| 38 | + docker compose exec anvil1 admin region create europe-west-1 |
| 39 | + docker compose exec anvil1 admin tenant create my-first-tenant |
56 | 40 |
|
57 | | -## ⚙️ Running in Cluster Mode |
| 41 | + # Create an app and save the credentials |
| 42 | + docker compose exec anvil1 admin app create --tenant-name my-first-tenant --app-name my-cli-app |
| 43 | + ``` |
58 | 44 |
|
59 | | -Start multiple nodes with a shared cluster config (see [docs](https://worka.ai/docs/anvil/operational-guide/scaling)). |
60 | | - |
61 | | ---- |
62 | | - |
63 | | -## 📡 gRPC API |
64 | | - |
65 | | -See full [API reference](https://worka.ai/docs/anvil/user-guide/grpc-api). Example client use: |
66 | | - |
67 | | -```bash |
68 | | -anvil grpc-client --list-buckets |
69 | | -``` |
70 | | - |
71 | | ---- |
72 | | - |
73 | | -## 🔐 Authentication |
74 | | - |
75 | | -Supports API key-based tenant isolation. See [Auth docs](https://worka.ai/docs/anvil/user-guide/auth-permissions). |
| 45 | +4. **Configure the Anvil CLI:** |
| 46 | + Use the credentials from the previous step to configure your local `anvil` CLI. |
| 47 | + ```bash |
| 48 | + anvil configure --host http://localhost:50051 --client-id YOUR_CLIENT_ID --client-secret YOUR_CLIENT_SECRET |
| 49 | + ``` |
76 | 50 |
|
77 | 51 | --- |
78 | 52 |
|
79 | 53 | ## 📘 Documentation |
80 | 54 |
|
81 | | -- [Getting Started](https://worka.ai/docs/anvil/getting-started) |
82 | | -- [Deployment](https://worka.ai/docs/anvil/operational-guide/deployment) |
83 | | -- [S3 Gateway](https://worka.ai/docs/anvil/user-guide/s3-gateway) |
84 | | -- [Cluster Scaling](https://worka.ai/docs/anvil/operational-guide/scaling) |
85 | | -- [Contributing](https://worka.ai/docs/anvil/developer-guide/contributing) |
| 55 | +For complete guides on deployment, architecture, and usage, please see the [**Full Documentation**](./docs/index.md). |
| 56 | + |
| 57 | +- [Getting Started](./docs/01-getting-started.md) |
| 58 | +- [Authentication & Permissions](./docs/03-user-guide-authentication.md) |
| 59 | +- [Using the S3 Gateway](./docs/04-user-guide-s3-gateway.md) |
| 60 | +- [Deployment Guide](./docs/06-operational-guide-deployment.md) |
86 | 61 |
|
87 | 62 | --- |
88 | 63 |
|
89 | 64 | ## 🤝 Contributing |
90 | 65 |
|
91 | | -We welcome PRs! Check out [CONTRIBUTING.md](https://worka.ai/docs/anvil/developer-guide/contributing) and start with [good first issues](https://github.com/worka-ai/anvil/issues?q=is%3Aissue+is%3Aopen+label%3A%22good+first+issue%22). |
| 66 | +We welcome contributions of all kinds! Please read our [**Contributing Guide**](./CONTRIBUTING.md) to get started. All participation in the Anvil community is governed by our [**Code of Conduct**](./CODE_OF_CONDUCT.md). |
92 | 67 |
|
93 | 68 | --- |
94 | 69 |
|
95 | | -## 📣 Community |
96 | | - |
97 | | -- [Discord](https://discord.gg/uCWVg5STGh) — Chat with the team |
98 | | -- [Product Hunt](https://www.producthunt.com/products/worka-anvil) |
| 70 | +## 📜 Citing Anvil |
| 71 | + |
| 72 | +If you use Anvil in your research, please cite it. Once published in JOSS, a BibTeX entry will be provided here. |
| 73 | + |
| 74 | +```bibtex |
| 75 | +@article{Anvil2025, |
| 76 | + doi = {10.21105/joss.XXXXX}, |
| 77 | + url = {https://doi.org/10.21105/joss.XXXXX}, |
| 78 | + year = {2025}, |
| 79 | + publisher = {The Open Journal}, |
| 80 | + volume = {X}, |
| 81 | + number = {XX}, |
| 82 | + pages = {XXXXX}, |
| 83 | + author = {Your Name and Other Authors}, |
| 84 | + title = {Anvil: An Open-Source Object Store for AI/ML Research}, |
| 85 | + journal = {Journal of Open Source Software} |
| 86 | +} |
| 87 | +``` |
99 | 88 |
|
100 | 89 | --- |
101 | 90 |
|
102 | 91 | ## License |
103 | 92 |
|
104 | | -Licensed under [Apache 2.0](LICENSE). |
| 93 | +Anvil is licensed under the [Apache 2.0 License](./LICENSE). |
0 commit comments