Back to Projects
Robust Reinforcement Learning for Mixed Autonomy Traffic Systems
PPO, TRPO, SUMO — Multi-Agent POMDP Traffic Control
📋 Project Overview
Designed and tuned PPO and TRPO agents with KL annealing, entropy regularization, and return normalization, stabilizing policy learning in multi-agent partially observable Markov decision process (POMDP) settings.
Scaled training by orchestrating 40+ parallel SUMO simulations with Python multiprocessing, accelerating training throughput and improving cross-scenario generalization. Achieved a 20% increase in traffic throughput and zero safety violations compared to rule-based systems, validated through trajectory and time-space analytics.
⚡ Key Highlights
- PPO & TRPO: KL annealing, entropy regularization, return normalization
- Multi-Agent POMDP: Partially observable traffic control scenarios
- Parallel Training: 40+ SUMO simulations via Python multiprocessing
- 20% Throughput Increase: vs. rule-based systems
- Zero Safety Violations: Validated through trajectory and time-space analytics
- Cross-Scenario Generalization: Improved via scaled training
Skills Demonstrated
PPO
TRPO
Reinforcement Learning
SUMO
Multi-Agent
POMDP
Python
Multiprocessing
More details and images coming soon.