Back to Projects

Robust Reinforcement Learning for Mixed Autonomy Traffic Systems

PPO, TRPO, SUMO — Multi-Agent POMDP Traffic Control

Robust RL Mixed Autonomy Traffic

📋 Project Overview

Designed and tuned PPO and TRPO agents with KL annealing, entropy regularization, and return normalization, stabilizing policy learning in multi-agent partially observable Markov decision process (POMDP) settings.

Scaled training by orchestrating 40+ parallel SUMO simulations with Python multiprocessing, accelerating training throughput and improving cross-scenario generalization. Achieved a 20% increase in traffic throughput and zero safety violations compared to rule-based systems, validated through trajectory and time-space analytics.

⚡ Key Highlights

  • PPO & TRPO: KL annealing, entropy regularization, return normalization
  • Multi-Agent POMDP: Partially observable traffic control scenarios
  • Parallel Training: 40+ SUMO simulations via Python multiprocessing
  • 20% Throughput Increase: vs. rule-based systems
  • Zero Safety Violations: Validated through trajectory and time-space analytics
  • Cross-Scenario Generalization: Improved via scaled training

Skills Demonstrated

PPO TRPO Reinforcement Learning SUMO Multi-Agent POMDP Python Multiprocessing

More details and images coming soon.