Skip to content

Sonofmecury/adaptive-gridworld-gymnasium

Repository files navigation

Adaptive GridWorld Gymnasium Environment

A Gymnasium-compatible GridWorld environment with a YAML-driven curriculum for adaptive RL research.

This is a research-oriented prototype. It is intentionally small enough to inspect, modify, and run on a normal laptop while still following the agent-environment interface expected by Gymnasium/OpenAI Gym workflows.

Relevance to Adaptive RL Systems & Simulation Architecture

Adaptive reinforcement learning systems need environments whose task structure can change during training or evaluation. This project demonstrates a clean agent-environment interface, config-driven task reconfiguration, and modular simulation components that can support curriculum learning, adaptive policy testing, and future runtime adaptation layers.

The environment connects directly to adaptive RL infrastructure by exposing standard reset, step, render, observation space, and action space APIs while allowing the task definition to change between episodes without rewriting environment code.

Why This Matters for Adaptive RL

Many RL examples use static benchmark environments. Adaptive systems often need a runtime that can change the task definition while preserving a stable API for agents. This project focuses on that interface: the agent still calls reset and step, but the environment can update goals, obstacles, and reward rules from an external curriculum.

The result is a compact prototype for testing curriculum schedules, task reconfiguration, and environment design choices before moving to larger simulations or game-like platforms.

Features

  • AdaptiveGridWorldEnv, a custom gymnasium.Env
  • Configurable grid size, start position, max steps, rewards, and penalties
  • YAML-defined task stages in configs/tasks.yaml
  • Fixed and random goal modes
  • Static and dynamic obstacle layouts
  • Terminal rendering for quick inspection
  • Example random-agent rollout
  • Example script for inspecting task transitions
  • Random-policy evaluation grouped by active task stage
  • Basic tests for API behavior and adaptation stages

Installation

pip install -r requirements.txt

Usage

Run a random policy:

python examples/run_random_agent.py

Inspect task-stage changes:

python examples/inspect_task_changes.py

Evaluate a random policy by task stage:

python examples/evaluate_random_policy.py

Run tests:

python -m pytest

Quick Demo Output

examples/inspect_task_changes.py prints the active curriculum stage as episodes progress:

episode=00 stage=fixed_goal_open_field        status=transition goal=(6, 6) obstacles=0
episode=05 stage=shifting_goal                status=transition goal=(1, 0) obstacles=0
episode=10 stage=static_obstacle_layout       status=transition goal=(4, 2) obstacles=4
episode=15 stage=increased_penalties          status=transition goal=(2, 6) obstacles=4
episode=20 stage=dynamic_obstacle_challenge   status=transition goal=(5, 2) obstacles=4

examples/evaluate_random_policy.py summarizes a baseline policy without adding heavier training dependencies:

Random policy evaluation over 30 episodes
stage                          episodes  avg_reward  avg_steps  success_rate
------------------------------------------------------------------------------
fixed_goal_open_field                 5      -0.60       60.0         0.00
shifting_goal                         5       0.45       34.8         0.80
static_obstacle_layout                5      -1.55       49.0         0.40
increased_penalties                   5      -1.72       39.6         0.40
dynamic_obstacle_challenge           10      -3.78       43.5         0.60

Note: The results above use a random policy baseline. Zero success rates on harder stages are expected because a random agent has no directional strategy. These numbers establish the baseline floor, not a bug.

Environment API

from adaptive_gridworld import AdaptiveGridWorldEnv

env = AdaptiveGridWorldEnv()
observation, info = env.reset()
observation, reward, terminated, truncated, info = env.step(env.action_space.sample())

The observation is a Gymnasium Dict space:

  • agent: row and column of the agent
  • goal: row and column of the current goal
  • stage: integer index of the active task stage

Actions are discrete:

  • 0: move up
  • 1: move right
  • 2: move down
  • 3: move left

Example Task Adaptation

The default curriculum uses five stages:

Stage Behavior
fixed_goal_open_field Fixed goal and no obstacles
shifting_goal Goal changes every episode
static_obstacle_layout Static obstacles appear
increased_penalties Step and obstacle penalties increase
dynamic_obstacle_challenge Obstacles can be resampled during an episode

Task definitions live in configs/tasks.yaml, so new curricula can be added without changing environment logic.

Architecture

flowchart LR
    Config["configs/tasks.yaml"] --> Loader["TaskConfig loader"]
    Loader --> Env["AdaptiveGridWorldEnv"]
    Env --> Spaces["Gymnasium spaces"]
    Env --> Stage["Active TaskStage"]
    Stage --> Dynamics["Goal, obstacle, reward rules"]
    Env --> Renderer["TerminalRenderer"]
    Agent["Agent or training loop"] --> Env
    Env --> Agent
Loading

Design Notes

The project separates configuration parsing, environment dynamics, and rendering to keep the simulation easy to extend. The current adaptation happens at reset, which makes stage transitions clear and reproducible for simple RL experiments.

Future Improvements

  • Add a Stable-Baselines3 PPO training example behind an optional dependency group
  • Support per-step event hooks for richer online reconfiguration
  • Add matplotlib rendering for notebooks and reports
  • Add vectorized environment examples for faster training
  • Expand observations with local obstacle maps for more complex policies

Releases

No releases published

Packages

 
 
 

Contributors

Languages