Training quadruped robot controllers in Isaac Sim using reinforcement learning : exploring quadruped robot performance in different simulated surfaces
Jubaer, A S M (2026)
Jubaer, A S M
2026
All rights reserved. This publication is copyrighted. You may download, display and print it for Your own personal use. Commercial use is prohibited.
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-202605049154
https://urn.fi/URN:NBN:fi:amk-202605049154
Tiivistelmä
A deep reinforcement learning approach was presented in this thesis using the Proximal Policy Optimization (PPO) algorithm to obtain robust quadrupedal locomotion on the Unitree Go2 robot via the Isaac Lab simulation. The research examined the learning performance, convergence and generalization ability of the learned policy across various terrains; these include flat ground, slope, stairs and an obstacle course. The experimental data indicated that a similar learning behavior was exhibited when trained with the same PPO-approach, however a stable and predictable convergence was achieved on flat and slope terrains. In addition to providing a basis for developing smooth and efficient locomotion behaviors for easier terrain conditions, this method demonstrated an ability to provide adaptability through changes in foot placement and balance strategies as a function of changing terrain conditions. Conversely, significant increases were seen in both variance and performance degradation when locomotion was in environments with stairs or obstacles. Increased variance and decreased performance were observed in the stair and obstacle course terrains
because of their high degree of complexity and non-continuous contact dynamics. In terms of the training efficiency it was determined that training in simpler terrains required significantly less time than those terrains which had a higher level of complexity, therefore slower training times and higher levels of variability were also evident. Therefore, the need for progressive training methods such as curriculum learning are highlighted as being essential to improve the robustness of the developed policy. Although encouraging results have been obtained in terms of demonstrating the applicability of the PPO reinforcement learning method to develop terrain adaptive quadrupedal locomotion behaviors, some
limitations remain; specifically, the potential for instability in locomotion when operating on irregular terrain has been noted and the lack of consideration for visual perception methods for determining balance and navigation has been acknowledged.
because of their high degree of complexity and non-continuous contact dynamics. In terms of the training efficiency it was determined that training in simpler terrains required significantly less time than those terrains which had a higher level of complexity, therefore slower training times and higher levels of variability were also evident. Therefore, the need for progressive training methods such as curriculum learning are highlighted as being essential to improve the robustness of the developed policy. Although encouraging results have been obtained in terms of demonstrating the applicability of the PPO reinforcement learning method to develop terrain adaptive quadrupedal locomotion behaviors, some
limitations remain; specifically, the potential for instability in locomotion when operating on irregular terrain has been noted and the lack of consideration for visual perception methods for determining balance and navigation has been acknowledged.
