2D Unity ML-Agents game where two players dodge, jump, crouch, dash, and shoot bullets. One side can be human controlled, or both sides can run trained policies. Play the game here
self-play-50s.mp4
If you find this helpful, consider supporting on Patreon!
- Included model:
Assets/RLAgents/results/env_sp11/BulletShooter/BulletShooter-3101431.onnx - Main config:
Assets/RLAgents/config/selfPlay.yaml - Network: PPO self-play policy, feed-forward MLP, 3 hidden layers, 512 units per layer, no recurrent memory.
Self-play is Machine Learning technique where the AI plays against previous versions of itself till it gets better! During training, we save older policy snapshots and re-use them as opponents for our current model. This updates policies to be diversely capable against a variety of strategies.
Open the project in Unity, then go to File > Build Settings.
Select WebGL, include Scenes/MainMenu and Scenes/GameEnv, then build to a folder such as Builds-WebGL.
For easiest local testing, set Player Settings > Publishing Settings > Compression Format to Disabled.
After the build finishes, serve the folder locally:
cd Builds-WebGL
python3 -m http.server 8080Then open:
http://localhost:8080
Play!!
To train headlessly, first make a standalone Unity build. In Unity, use File > Build Settings, choose Windows, Mac, Linux, include MainMenu and GameEnv, then build to a local path such as Builds/RLBuild.app.
Then run ML-Agents against that build:
mlagents-learn Assets/RLAgents/config/selfPlay.yaml --run-id env_sp_new --env Builds/RLBuild.app --num-envs 8 --no-graphicsTraining outputs go under Assets/RLAgents/results/. Keep only the final .onnx model and its .meta file in git.
-
Initial state: both players reset to their spawn positions with zero velocity, alive state, dash available, crouch off, and bullets reset.
-
State space: player position, velocity, crouch state, dash state, dash cooldown, shot cooldown, plus ray perception sensors for nearby bullets, walls, ground, and opponent.
-
Action space: five discrete controls: move left/right/idle, jump, crouch, dash, and shoot. Shooting is masked while crouching.
-
Reward space: death gives
-1, winning gives+1, and bullet dodge reward is currently0. Env is super sparse, which is great for self-play! -
Terminal state: a player dies by bullet, falling, or inactivity. The dead player ends the episode and the surviving player gets the win reward.
Playlist: https://www.youtube.com/playlist?list=PLGXWtN1HUjPdoJwzrCmfVCtOY2GN2kzEb
These were some of my first videos, so the audio is a bit weak.
The shared policy run used PPO with self-play:
- Learning rate:
0.0002- small enough to keep self-play training stable. - Batch size:
2048- large batch for smoother PPO updates. - Buffer size:
20480- collects enough experience before each update. - Time horizon:
256- lets rewards connect to longer dodge/shoot sequences. - Self-play window:
10- keeps a pool of older opponents. - Swap steps:
5,000- changes opponents often enough to avoid overfitting. - Opponent mix:
0.5latest model ratio - half recent opponent, half older snapshots.
- Move:
A/Dor left / right - Jump: space
- Crouch: down /
S - Dash: dash input from the Unity input map
- Shoot:
K
This project predates vibe-coding. I made it to learn Unity and build something cool with self-play. There may be superfluous files, old experiments, and dead code.
