Skip to content

Commit 514f2bc

Browse files
committed
Add Docker usage documentation
Comprehensive guide for using the Docker infrastructure, written for both new contributors and experienced users. Contents: - Quick start (build → run → develop) - Image descriptions and when to use each - Common workflows (training, model merging, evaluation) - Environment variable reference - Troubleshooting common issues - CI/CD integration patterns
1 parent 260b01f commit 514f2bc

1 file changed

Lines changed: 294 additions & 0 deletions

File tree

DOCKER.md

Lines changed: 294 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,294 @@
1+
# Docker Virtual Environments for AgentGym-RL
2+
3+
This document describes the Docker-based virtual environment setup for reproducible training, evaluation, and utility operations in AgentGym-RL.
4+
5+
## Overview
6+
7+
The Docker infrastructure provides:
8+
9+
- **Reproducible Environments**: Consistent CUDA 12.4 + PyTorch 2.4 + Python 3.10 across all machines
10+
- **Plug-and-Play Scripts**: Run model merging, formatting, and other utilities without local setup
11+
- **Isolated Training**: GPU-enabled containers for RL training without dependency conflicts
12+
- **Environment Servers**: Containerized environment servers for SearchQA, BabyAI, SciWorld, etc.
13+
14+
## Quick Start
15+
16+
### Prerequisites
17+
18+
- Docker 24.0+ with Docker Compose v2
19+
- NVIDIA Container Toolkit (for GPU support)
20+
- At least 32GB disk space for images
21+
22+
### Build All Images
23+
24+
```bash
25+
make docker-build
26+
```
27+
28+
This builds:
29+
30+
1. `agentgym-rl/base:latest` - Base image with CUDA, PyTorch, flash-attention
31+
2. `agentgym-rl/train:latest` - Training environment with verl and agentenv
32+
3. `agentgym-rl/scripts:latest` - Utilities for model merging and formatting
33+
4. `agentgym/eval:latest` - Lightweight evaluation runner
34+
35+
### Start Training Shell
36+
37+
```bash
38+
make docker-train-shell
39+
```
40+
41+
Inside the container:
42+
43+
```bash
44+
# Run training
45+
python -m verl.agent_trainer.main_ppo \
46+
algorithm.adv_estimator=grpo \
47+
algorithm.rounds_ctrl.type=fixed \
48+
algorithm.rounds_ctrl.rounds=5 \
49+
...
50+
```
51+
52+
## Docker Images
53+
54+
### Base Image (`docker/base.Dockerfile`)
55+
56+
Foundation image with:
57+
58+
- CUDA 12.4.1 (devel)
59+
- Python 3.10
60+
- PyTorch 2.4.0
61+
- flash-attention 2.7.3
62+
63+
Build independently:
64+
65+
```bash
66+
make docker-build-base
67+
```
68+
69+
### Training Image (`docker/train.Dockerfile`)
70+
71+
Extends base with:
72+
73+
- verl (AgentGym-RL training framework)
74+
- agentenv (environment client)
75+
- All training dependencies
76+
77+
Build:
78+
79+
```bash
80+
make docker-build-train
81+
```
82+
83+
Usage:
84+
85+
```bash
86+
# Interactive shell
87+
make docker-train-shell
88+
89+
# Or via docker compose
90+
docker compose --profile train run --rm train /bin/bash
91+
```
92+
93+
### Scripts Image (`docker/scripts.Dockerfile`)
94+
95+
Extends base with:
96+
97+
- transformers
98+
- huggingface_hub
99+
- yapf (formatter)
100+
- Model loading utilities
101+
102+
Build:
103+
104+
```bash
105+
make docker-build-scripts
106+
```
107+
108+
## Common Operations
109+
110+
### Model Merging
111+
112+
Merge FSDP checkpoints to HuggingFace format:
113+
114+
```bash
115+
# Single checkpoint
116+
make docker-merge LOCAL_DIR=saves/global_step_100/actor
117+
118+
# With custom output directory
119+
make docker-merge LOCAL_DIR=saves/global_step_100/actor SAVE_DIR=models/merged
120+
121+
# Upload to HuggingFace
122+
make docker-merge LOCAL_DIR=saves/global_step_100/actor HF_UPLOAD_PATH=username/model-name
123+
```
124+
125+
### Code Formatting
126+
127+
Format AgentGym-RL code with yapf:
128+
129+
```bash
130+
make docker-format
131+
```
132+
133+
### Environment Servers
134+
135+
Start an environment server:
136+
137+
```bash
138+
# SearchQA (default)
139+
make docker-env
140+
141+
# Other environments
142+
make docker-env ENV=babyai
143+
make docker-env ENV=sciworld
144+
```
145+
146+
### Evaluation
147+
148+
Run evaluation against a running environment server:
149+
150+
```bash
151+
make docker-eval ENV=searchqa
152+
```
153+
154+
## Volume Mounts
155+
156+
The Docker setup uses the following volume mounts:
157+
158+
| Host Path | Container Path | Purpose |
159+
|-----------|----------------|---------|
160+
| `./models` | `/workspace/models` | Pre-trained models |
161+
| `./saves` | `/workspace/saves` | Training checkpoints |
162+
| `./data` | `/workspace/data` | Training data |
163+
| `./AgentItemId` | `/workspace/AgentItemId` | Training item IDs |
164+
| `./AgentEval` | `/workspace/AgentEval` | Evaluation data |
165+
166+
## Environment Variables
167+
168+
Set these in `.env` or export before running:
169+
170+
| Variable | Default | Description |
171+
|----------|---------|-------------|
172+
| `ENV` | `searchqa` | Environment name |
173+
| `ENV_PORT` | `36001` | Environment server port |
174+
| `MODEL` | `gpt-4o-mini` | Model for evaluation |
175+
| `MAX_ROUND` | `10` | Max interaction rounds |
176+
| `LOCAL_DIR` | `saves/checkpoint` | Checkpoint path for merging |
177+
| `OPENAI_API_KEY` | - | Required for evaluation |
178+
| `WANDB_API_KEY` | - | Optional for training logging |
179+
| `WANDB_MODE` | `offline` | WandB mode |
180+
181+
## Docker Compose Profiles
182+
183+
The `docker-compose.yml` uses profiles to organize services:
184+
185+
| Profile | Services | Command |
186+
|---------|----------|---------|
187+
| `build` | base | `docker compose --profile build up base` |
188+
| `train` | train | `docker compose --profile train up -d` |
189+
| `scripts` | scripts | `docker compose --profile scripts up -d` |
190+
| `model-merger` | model-merger | `docker compose --profile model-merger run --rm model-merger` |
191+
| `formatter` | formatter | `docker compose --profile formatter run --rm formatter` |
192+
| `env` | env-server | `docker compose --profile env up -d` |
193+
| `eval` | eval-runner | `docker compose --profile eval up` |
194+
195+
## Troubleshooting
196+
197+
### GPU Not Detected
198+
199+
Ensure NVIDIA Container Toolkit is installed:
200+
201+
```bash
202+
nvidia-smi # Should work
203+
docker run --rm --gpus all nvidia/cuda:12.4.1-base-ubuntu22.04 nvidia-smi # Should work
204+
```
205+
206+
### Out of Memory
207+
208+
Increase shared memory for training:
209+
210+
```bash
211+
docker compose --profile train run --rm --shm-size=32g train /bin/bash
212+
```
213+
214+
### Build Failures
215+
216+
Clean and rebuild:
217+
218+
```bash
219+
make docker-clean
220+
make docker-build
221+
```
222+
223+
### Permission Issues
224+
225+
If you encounter permission issues with mounted volumes:
226+
227+
```bash
228+
# Run as current user
229+
docker compose --profile train run --rm --user $(id -u):$(id -g) train /bin/bash
230+
```
231+
232+
## Development Workflow
233+
234+
### Typical Training Session
235+
236+
```bash
237+
# 1. Build images (first time only)
238+
make docker-build
239+
240+
# 2. Start environment server
241+
make docker-env ENV=searchqa
242+
243+
# 3. Enter training container
244+
make docker-train-shell
245+
246+
# 4. Inside container: run training
247+
HYDRA_FULL_ERROR=1 python -m verl.agent_trainer.main_ppo \
248+
algorithm.adv_estimator=grpo \
249+
algorithm.rounds_ctrl.type=fixed \
250+
algorithm.rounds_ctrl.rounds=5 \
251+
data.train_file=AgentItemId/searchqa_train.json \
252+
actor_rollout_ref.agentgym.task_name=searchqa \
253+
actor_rollout_ref.agentgym.env_addr=http://host.docker.internal:36001 \
254+
actor_rollout_ref.model.path=/workspace/models/Qwen2.5-7B-Instruct \
255+
trainer.default_local_dir=/workspace/saves/experiment1 \
256+
trainer.total_epochs=20
257+
258+
# 5. Merge checkpoint to HuggingFace format
259+
make docker-merge LOCAL_DIR=saves/experiment1/global_step_100/actor
260+
```
261+
262+
### CI/CD Integration
263+
264+
For automated pipelines:
265+
266+
```yaml
267+
# GitHub Actions example
268+
jobs:
269+
train:
270+
runs-on: self-hosted
271+
steps:
272+
- uses: actions/checkout@v4
273+
- name: Build images
274+
run: make docker-build
275+
- name: Run training
276+
run: |
277+
docker compose --profile train run --rm train \
278+
python -m verl.agent_trainer.main_ppo ...
279+
```
280+
281+
## File Structure
282+
283+
```text
284+
AgentGym-RL/
285+
├── docker/
286+
│ ├── base.Dockerfile # CUDA + PyTorch base
287+
│ ├── train.Dockerfile # Training environment
288+
│ └── scripts.Dockerfile # Utilities environment
289+
├── docker-compose.yml # Service orchestration
290+
├── Dockerfile.eval # Evaluation runner
291+
├── .dockerignore # Build context exclusions
292+
├── Makefile # Convenient targets
293+
└── DOCKER.md # This documentation
294+
```

0 commit comments

Comments
 (0)