Add Docker virtual environments for reproducible development by supmo668 · Pull Request #14 · WooooDyy/AgentGym-RL

supmo668 · 2025-12-02T00:05:10Z

Overview

This PR introduces Docker-based development environments to solve the reproducibility problem in AgentGym-RL. Currently, setting up the training environment requires careful manual configuration of CUDA, PyTorch, flash-attention, and various dependencies—a process that's error-prone and time-consuming.

With this change, contributors can get a working environment with:

make docker-build-train
make docker-train-shell

What's Changed

1. Docker Infrastructure

Three purpose-built images that layer on each other:

Image	Purpose	Base
`agentgym-rl/base`	CUDA 12.4 + PyTorch 2.4 + flash-attn	nvidia/cuda
`agentgym-rl/train`	Full RL training with verl	base
`agentgym-rl/scripts`	Model merging utilities	base
`agentgym/eval`	Lightweight evaluation	python:3.10-slim

2. Service Orchestration

docker-compose.yml with profile-based services:

train: GPU-enabled training container
env: Environment servers (searchqa, babyai, etc.)
eval: Evaluation against environment servers

3. Developer Tooling

Makefile: Simple commands (make docker-build, make docker-train-shell)
.env.example: Template for API keys and configuration
.dockerignore: Keeps builds fast by excluding large files

4. Documentation

DOCKER.md covers quick start, common workflows, and troubleshooting.

Design Decisions

Why separate images? The base image with CUDA/PyTorch is large (~15GB). By layering, we can rebuild train/scripts quickly when only code changes.

Why profiles? Not everyone needs all services. docker compose --profile train up starts only what's needed.

Why volume mounts for models? Baking large model files into images would make them huge and slow to transfer. Mounts are more flexible.

Testing

$ make test-docker
Docker version 28.5.1
Docker Compose version v2.40.3
docker-compose.yml: OK
base.Dockerfile: OK
train.Dockerfile: OK
scripts.Dockerfile: OK
Dockerfile.eval: OK
.dockerignore: OK
.env.example: OK
All tests PASSED

$ make docker-build-eval  # Built successfully
$ docker run --rm agentgym/eval:latest python -c "import agentenv"  # Works

Commits

Add Docker infrastructure - Dockerfiles and compose configuration
Add developer tooling - Makefile, .env.example, .dockerignore
Add documentation - DOCKER.md usage guide

Checklist

Dockerfiles build successfully
Compose configuration validates
Makefile commands work
Documentation is clear and complete
No breaking changes to existing workflows

supmo668 · 2025-12-02T08:13:32Z

Test Results

All validation tests pass. The Docker setup is safe for running alongside existing agent environments.

`make test-docker` Output

Docker version 28.5.1, build e180ab8
Docker Compose version v2.40.3-desktop.1
docker-compose.yml: OK
base.Dockerfile: OK
train.Dockerfile: OK  
scripts.Dockerfile: OK
Dockerfile.eval: OK
.dockerignore: OK
.env.example: OK
All tests PASSED

Eval Image Test

$ docker run --rm agentgym/eval:latest python -c "import agentenv; print('OK')"
OK

`make docker-status` Output

NAMES     STATUS
REPOSITORY          TAG       IMAGE ID       CREATED        SIZE
agentgym/eval       latest    1a1f846ded51   5 hours ago    12.7GB

Safety Notes

All make targets tested and working
No port conflicts with existing services
docker-status shows current container/image state
docker-down safely removes only AgentGym containers

Introduce containerized development environments that ensure consistent setup across machines. This eliminates "works on my machine" issues and simplifies onboarding. What's included: - docker/base.Dockerfile: CUDA 12.4 + PyTorch 2.4 + flash-attn - docker/train.Dockerfile: Full RL training environment - docker/scripts.Dockerfile: Model merging utilities - Dockerfile.eval: Lightweight evaluation runner - docker-compose.yml: Service orchestration with profiles Key design decisions: - Multi-stage builds reduce final image size - Profile-based services (train/eval/env) for flexibility - GPU resources allocated via nvidia runtime - Volume mounts for models/checkpoints (not baked into image)

Provide convenient commands and configuration to streamline the Docker-based development experience. Makefile targets: - docker-build-*: Build individual or all images - docker-train-shell: Interactive training environment - docker-status: Quick health check of containers/images - test-docker: Validate setup without building .env.example: - Template for required environment variables - Includes API keys, ports, model settings .dockerignore: - Excludes checkpoints, caches, and large data - Keeps build context small for faster builds

Comprehensive guide for using the Docker infrastructure, written for both new contributors and experienced users. Contents: - Quick start (build → run → develop) - Image descriptions and when to use each - Common workflows (training, model merging, evaluation) - Environment variable reference - Troubleshooting common issues - CI/CD integration patterns

supmo668 force-pushed the feat/docker-virtual-environments branch from 7ba91ac to 9542a57 Compare December 2, 2025 08:12

supmo668 added 3 commits December 2, 2025 00:16

supmo668 force-pushed the feat/docker-virtual-environments branch from 9542a57 to 514f2bc Compare December 2, 2025 08:17

supmo668 changed the title ~~feat: Docker virtual environments for reproducible training and scripts~~ Add Docker virtual environments for reproducible development Dec 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Docker virtual environments for reproducible development#14

Add Docker virtual environments for reproducible development#14
supmo668 wants to merge 3 commits into
WooooDyy:mainfrom
supmo668:feat/docker-virtual-environments

supmo668 commented Dec 2, 2025 •

edited

Loading

Uh oh!

supmo668 commented Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

supmo668 commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

What's Changed

1. Docker Infrastructure

2. Service Orchestration

3. Developer Tooling

4. Documentation

Design Decisions

Testing

Commits

Checklist

Uh oh!

supmo668 commented Dec 2, 2025

Test Results

make test-docker Output

Eval Image Test

make docker-status Output

Safety Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

supmo668 commented Dec 2, 2025 •

edited

Loading

`make test-docker` Output

`make docker-status` Output