Adaptive GridWorld Gymnasium Environment

A Gymnasium-compatible GridWorld environment with a YAML-driven curriculum for adaptive RL research.

This is a research-oriented prototype. It is intentionally small enough to inspect, modify, and run on a normal laptop while still following the agent-environment interface expected by Gymnasium/OpenAI Gym workflows.

Relevance to Adaptive RL Systems & Simulation Architecture

Adaptive reinforcement learning systems need environments whose task structure can change during training or evaluation. This project demonstrates a clean agent-environment interface, config-driven task reconfiguration, and modular simulation components that can support curriculum learning, adaptive policy testing, and future runtime adaptation layers.

The environment connects directly to adaptive RL infrastructure by exposing standard reset, step, render, observation space, and action space APIs while allowing the task definition to change between episodes without rewriting environment code.

Why This Matters for Adaptive RL

Many RL examples use static benchmark environments. Adaptive systems often need a runtime that can change the task definition while preserving a stable API for agents. This project focuses on that interface: the agent still calls reset and step, but the environment can update goals, obstacles, and reward rules from an external curriculum.

The result is a compact prototype for testing curriculum schedules, task reconfiguration, and environment design choices before moving to larger simulations or game-like platforms.

Features

AdaptiveGridWorldEnv, a custom gymnasium.Env
Configurable grid size, start position, max steps, rewards, and penalties
YAML-defined task stages in configs/tasks.yaml
Fixed and random goal modes
Static and dynamic obstacle layouts
Terminal rendering for quick inspection
Example random-agent rollout
Example script for inspecting task transitions
Random-policy evaluation grouped by active task stage
Basic tests for API behavior and adaptation stages

Installation

pip install -r requirements.txt

Usage

Run a random policy:

python examples/run_random_agent.py

Inspect task-stage changes:

python examples/inspect_task_changes.py

Evaluate a random policy by task stage:

python examples/evaluate_random_policy.py

Run tests:

python -m pytest

Quick Demo Output

examples/inspect_task_changes.py prints the active curriculum stage as episodes progress:

episode=00 stage=fixed_goal_open_field        status=transition goal=(6, 6) obstacles=0
episode=05 stage=shifting_goal                status=transition goal=(1, 0) obstacles=0
episode=10 stage=static_obstacle_layout       status=transition goal=(4, 2) obstacles=4
episode=15 stage=increased_penalties          status=transition goal=(2, 6) obstacles=4
episode=20 stage=dynamic_obstacle_challenge   status=transition goal=(5, 2) obstacles=4

examples/evaluate_random_policy.py summarizes a baseline policy without adding heavier training dependencies:

Random policy evaluation over 30 episodes
stage                          episodes  avg_reward  avg_steps  success_rate
------------------------------------------------------------------------------
fixed_goal_open_field                 5      -0.60       60.0         0.00
shifting_goal                         5       0.45       34.8         0.80
static_obstacle_layout                5      -1.55       49.0         0.40
increased_penalties                   5      -1.72       39.6         0.40
dynamic_obstacle_challenge           10      -3.78       43.5         0.60

Note: The results above use a random policy baseline. Zero success rates on harder stages are expected because a random agent has no directional strategy. These numbers establish the baseline floor, not a bug.

Environment API

from adaptive_gridworld import AdaptiveGridWorldEnv

env = AdaptiveGridWorldEnv()
observation, info = env.reset()
observation, reward, terminated, truncated, info = env.step(env.action_space.sample())

The observation is a Gymnasium Dict space:

agent: row and column of the agent
goal: row and column of the current goal
stage: integer index of the active task stage

Actions are discrete:

0: move up
1: move right
2: move down
3: move left

Example Task Adaptation

The default curriculum uses five stages:

Stage	Behavior
`fixed_goal_open_field`	Fixed goal and no obstacles
`shifting_goal`	Goal changes every episode
`static_obstacle_layout`	Static obstacles appear
`increased_penalties`	Step and obstacle penalties increase
`dynamic_obstacle_challenge`	Obstacles can be resampled during an episode

Task definitions live in configs/tasks.yaml, so new curricula can be added without changing environment logic.

Architecture

flowchart LR
    Config["configs/tasks.yaml"] --> Loader["TaskConfig loader"]
    Loader --> Env["AdaptiveGridWorldEnv"]
    Env --> Spaces["Gymnasium spaces"]
    Env --> Stage["Active TaskStage"]
    Stage --> Dynamics["Goal, obstacle, reward rules"]
    Env --> Renderer["TerminalRenderer"]
    Agent["Agent or training loop"] --> Env
    Env --> Agent

Design Notes

The project separates configuration parsing, environment dynamics, and rendering to keep the simulation easy to extend. The current adaptation happens at reset, which makes stage transitions clear and reproducible for simple RL experiments.

Future Improvements

Add a Stable-Baselines3 PPO training example behind an optional dependency group
Support per-step event hooks for richer online reconfiguration
Add matplotlib rendering for notebooks and reports
Add vectorized environment examples for faster training
Expand observations with local obstacle maps for more complex policies

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github		.github
adaptive_gridworld		adaptive_gridworld
configs		configs
examples		examples
tests		tests
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Adaptive GridWorld Gymnasium Environment

Relevance to Adaptive RL Systems & Simulation Architecture

Why This Matters for Adaptive RL

Features

Installation

Usage

Quick Demo Output

Environment API

Example Task Adaptation

Architecture

Design Notes

Future Improvements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Adaptive GridWorld Gymnasium Environment

Relevance to Adaptive RL Systems & Simulation Architecture

Why This Matters for Adaptive RL

Features

Installation

Usage

Quick Demo Output

Environment API

Example Task Adaptation

Architecture

Design Notes

Future Improvements

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages