Reinforcement Learning Algorithm Development via CARLA UE5 Simulator : control implementation using local planner and agent
Özaydin, Ali (2025)
Özaydin, Ali
2025
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-2025120332296
https://urn.fi/URN:NBN:fi:amk-2025120332296
Tiivistelmä
This thesis aimed to develop an autonomous driving agent in the CARLA simulator 0.10.0 version - Town10 environment by using reinforcement learning. The purpose of the work was to implement and evaluate a learning-based driving controller capable of following planned route and navigating safely by complying the traffic light states and avoiding collision.
The methodology incorporated reinforcement learning concepts relevant to policy-gradient methods and employed the Proximal Policy Optimization algorithm with multi-input model structure. The implementation relied on curriculum-based training in CARLA, where several training phases conducted with different reward functions, environment configurations, and step counts. Data for training collected via PythonAPI from various sensors attached to the vehicle. For model evaluation TensorBoard logs and training metrics used to analyze, and to compare the learning behaviour across experiments.
The results show that the agent achieved some driving behaviour as the reward function and training structure were refined. Improvements were visible in episode rewards, but overall driving behaviour remained inconsistent. The trained agent still demonstrated clear limitations: it failed to reliably stop at red lights and occasionally collided with vehicles and roadside objects. These outcomes indicate that while PPO is suitable for continuous-control driving tasks, further refinement of the reward design, sensing strategy, and training setup is required. The results nevertheless provide a foundation for continued development of autonomous driving agents within simulation-based research environments.
The methodology incorporated reinforcement learning concepts relevant to policy-gradient methods and employed the Proximal Policy Optimization algorithm with multi-input model structure. The implementation relied on curriculum-based training in CARLA, where several training phases conducted with different reward functions, environment configurations, and step counts. Data for training collected via PythonAPI from various sensors attached to the vehicle. For model evaluation TensorBoard logs and training metrics used to analyze, and to compare the learning behaviour across experiments.
The results show that the agent achieved some driving behaviour as the reward function and training structure were refined. Improvements were visible in episode rewards, but overall driving behaviour remained inconsistent. The trained agent still demonstrated clear limitations: it failed to reliably stop at red lights and occasionally collided with vehicles and roadside objects. These outcomes indicate that while PPO is suitable for continuous-control driving tasks, further refinement of the reward design, sensing strategy, and training setup is required. The results nevertheless provide a foundation for continued development of autonomous driving agents within simulation-based research environments.
