A Gymnasium-compatible GridWorld environment with a YAML-driven curriculum for adaptive RL research.
This is a research-oriented prototype. It is intentionally small enough to inspect, modify, and run on a normal laptop while still following the agent-environment interface expected by Gymnasium/OpenAI Gym workflows.
Adaptive reinforcement learning systems need environments whose task structure can change during training or evaluation. This project demonstrates a clean agent-environment interface, config-driven task reconfiguration, and modular simulation components that can support curriculum learning, adaptive policy testing, and future runtime adaptation layers.
The environment connects directly to adaptive RL infrastructure by exposing standard reset, step, render, observation space, and action space APIs while allowing the task definition to change between episodes without rewriting environment code.
Many RL examples use static benchmark environments. Adaptive systems often need a runtime that can change the task definition while preserving a stable API for agents. This project focuses on that interface: the agent still calls reset and step, but the environment can update goals, obstacles, and reward rules from an external curriculum.
The result is a compact prototype for testing curriculum schedules, task reconfiguration, and environment design choices before moving to larger simulations or game-like platforms.
AdaptiveGridWorldEnv, a customgymnasium.Env- Configurable grid size, start position, max steps, rewards, and penalties
- YAML-defined task stages in
configs/tasks.yaml - Fixed and random goal modes
- Static and dynamic obstacle layouts
- Terminal rendering for quick inspection
- Example random-agent rollout
- Example script for inspecting task transitions
- Random-policy evaluation grouped by active task stage
- Basic tests for API behavior and adaptation stages
pip install -r requirements.txtRun a random policy:
python examples/run_random_agent.pyInspect task-stage changes:
python examples/inspect_task_changes.pyEvaluate a random policy by task stage:
python examples/evaluate_random_policy.pyRun tests:
python -m pytestexamples/inspect_task_changes.py prints the active curriculum stage as episodes progress:
episode=00 stage=fixed_goal_open_field status=transition goal=(6, 6) obstacles=0
episode=05 stage=shifting_goal status=transition goal=(1, 0) obstacles=0
episode=10 stage=static_obstacle_layout status=transition goal=(4, 2) obstacles=4
episode=15 stage=increased_penalties status=transition goal=(2, 6) obstacles=4
episode=20 stage=dynamic_obstacle_challenge status=transition goal=(5, 2) obstacles=4
examples/evaluate_random_policy.py summarizes a baseline policy without adding heavier training dependencies:
Random policy evaluation over 30 episodes
stage episodes avg_reward avg_steps success_rate
------------------------------------------------------------------------------
fixed_goal_open_field 5 -0.60 60.0 0.00
shifting_goal 5 0.45 34.8 0.80
static_obstacle_layout 5 -1.55 49.0 0.40
increased_penalties 5 -1.72 39.6 0.40
dynamic_obstacle_challenge 10 -3.78 43.5 0.60
Note: The results above use a random policy baseline. Zero success rates on harder stages are expected because a random agent has no directional strategy. These numbers establish the baseline floor, not a bug.
from adaptive_gridworld import AdaptiveGridWorldEnv
env = AdaptiveGridWorldEnv()
observation, info = env.reset()
observation, reward, terminated, truncated, info = env.step(env.action_space.sample())The observation is a Gymnasium Dict space:
agent: row and column of the agentgoal: row and column of the current goalstage: integer index of the active task stage
Actions are discrete:
0: move up1: move right2: move down3: move left
The default curriculum uses five stages:
| Stage | Behavior |
|---|---|
fixed_goal_open_field |
Fixed goal and no obstacles |
shifting_goal |
Goal changes every episode |
static_obstacle_layout |
Static obstacles appear |
increased_penalties |
Step and obstacle penalties increase |
dynamic_obstacle_challenge |
Obstacles can be resampled during an episode |
Task definitions live in configs/tasks.yaml, so new curricula can be added without changing environment logic.
flowchart LR
Config["configs/tasks.yaml"] --> Loader["TaskConfig loader"]
Loader --> Env["AdaptiveGridWorldEnv"]
Env --> Spaces["Gymnasium spaces"]
Env --> Stage["Active TaskStage"]
Stage --> Dynamics["Goal, obstacle, reward rules"]
Env --> Renderer["TerminalRenderer"]
Agent["Agent or training loop"] --> Env
Env --> Agent
The project separates configuration parsing, environment dynamics, and rendering to keep the simulation easy to extend. The current adaptation happens at reset, which makes stage transitions clear and reproducible for simple RL experiments.
- Add a Stable-Baselines3 PPO training example behind an optional dependency group
- Support per-step event hooks for richer online reconfiguration
- Add matplotlib rendering for notebooks and reports
- Add vectorized environment examples for faster training
- Expand observations with local obstacle maps for more complex policies