Quick Start | Agent RL | Blog | Huggingface | Wandb
Scaling digital agents is bottlenecked by environments. Environments demand resources (CPU/memory) orthogonal to model training (GPU). NanoRollout is a lightweight rollout repo that (1) decouples agent harnesses (e.g., OpenHands, mini-swe-agent, Terminus2, OSWorld-MM-Agent, Cocoa-Agent) and environment backends (e.g., Docker, Modal, AWS EC2) from the trainer logic, so each can be developed and scaled independently; and (2) unifies the rollout service in evaluation, distillation, and reinforcment learning (RL) behind a single rollout server endpoint where clients submit a task and receive a trajectory.
Nanorollout powers fast parallel evaluation (SWE-Bench Verified in 18 min with 500 workers), large-scale distillation (300K+ trajectories → Mocha-Coder-32B), and stable RL at large batch sizes (bsz 4,096 → Mocha-RL-Alpha-32B), integrating with miles, veRL, and tunix.
git clone https://github.com/cocoa-org/NanoRollout.git
cd NanoRolloutWe recommend using uv with Python 3.12.
uv python pin 3.12
uv syncThis creates or reuses the project virtual environment and installs NanoRollout from
pyproject.toml/uv.lock.
If you prefer a minimal editable install instead of syncing the lockfile:
uv python pin 3.12
uv venv
uv pip install -e .Check that the CLI is available:
nro --helpFor RL training, also fetch the trainers/ submodule:
git submodule update --init --recursive| Domain | Benchmark | Harness (--agent) |
Sandbox (--env-type) |
|---|---|---|---|
| SWE | SWE-Bench Verified / Pro | oh-core (OpenHands), oh-lite, mini-swe-agent, r2egym, claude-code, qwen-code, opencode |
docker, modal, enroot |
| Terminal | Terminal-Bench 2.0 | terminus2, mini-swe-agent, claude-code, qwen-code, opencode |
docker, modal, enroot |
| Computer Use | OSWorld-Verified | qwen3vl-mmagents |
aws, docker, et al. |
| Unified | CocoaBench | cocoa-agent |
docker, modal |
Run a single SWE instance directly from the CLI:
nro run \
--task swe --agent oh-core \
--model-name deepseek-v4-flash \
--base-url https://api.deepseek.com/v1 --api-key $OPENAI_API_KEY \
--env-type docker --instance-id django__django-11095Scale to 500 parallel workers on Modal:
nro run \
--task swe --agent oh-core \
--model-name deepseek-v4-flash \
--base-url https://api.deepseek.com/v1 --api-key $OPENAI_API_KEY \
--env-type modal \
--request-file examples/eval/swe/data/swebench_verified.jsonl \
--concurrency 500nro run is best suited when environment resources are managed externally (e.g. Modal), so no Ray is needed. For self-hosted model endpoints (e.g. vLLM, SGLang), replace --base-url with your local endpoint (e.g. --base-url http://<server-ip>:8000/v1). For detailed examples across tasks (SWE-Bench, Terminal-Bench, OSWorld, CocoaBench) and agents, see examples/eval/.
We recommend starting an async rollout server for flexible async requests and self-managed resources (like CPU/RAM), for evaluation, distillation, or RL training at scale.
ray start --head
nro serve host=0.0.0.0 port=11000 concurrency=64Clients submit tasks to POST /run and receive trajectories with rewards and messages:
curl -s http://localhost:11000/run \
-H "Content-Type: application/json" \
-d '{
"instance_id": "django__django-11095",
"task": "swe", "agent": "oh-core",
"model_name": "deepseek-v4-flash",
"base_url": "https://api.deepseek.com/v1",
"api_key": "<your-api-key>"
}'RL trainers (miles, veRL, tunix) call this endpoint to generate rollout batches during training. See examples/server/ for multi-node Ray cluster setup.
NanoRollout serves trajectories to RL trainers through the same POST /run endpoint. Start nro serve (see Quick Start) first, then point your trainer at NANOROLLOUT_URL=http://<host>:11000. We have validated integration with miles, veRL, and tunix; veRL and tunix reference code is coming soon.
The miles side captures exact tokens and logprobs from agent calls via a TITO proxy so the trainer sees the same token stream the agent saw. See miles/examples/nanorollout for the launch script, hyperparameters, and full setup for an example to train Qwen3-4B-Instruct.
NanoRollout is an open-source effort to democratize large-scale agent training and evaluation. We are actively seeking collaborators to help build the future of digital agent infra.
- Submit PRs: We welcome contributions to both the core code and expansion of agent harnesses or benchmarks.
- Join the Discussion: Have an idea or need help? Chat with us on Discord.
- Report Bugs: Use GitHub Issues to report bugs or request new features.
If you use NanoRollout in academic work, please cite it using the following BibTeX entry:
@misc{nanorollout,
title = {NanoRollout: A Lightweight Infra for Digital Agent Rollout at Scale},
author = {Wang, Junli and Cheng, Zhoujun and Zhang, Yuxuan and Hao, Shibo and Tang, Yao and Hu, Zhiting and Ammanabrolu, Prithviraj and Zhang, Hao},
year = {2026},
howpublished = {\url{https://cocoa-org.notion.site/nanorollout}},
}
