RoboSubSim: Robust Reinforcement Learning Policies Trained Using Sim-to-Real Methods and Aquatic, Randomized, Simple Physics

In the context of Autonomous Underwater Vehicles (AUVs), control is still mostly done using manual rules, which exhibit limited predictive and adaptive properties, known as PID controllers. Recent advances in Rein- forcement Learning research also focus on its application in robotic settings, where it can generalizing quite well to unknown, shifting environments with variable levels of interference, including varying forms of dynamics in the environment, differing sensory inputs, differing positional points of interest and external forces from the environment. As normal submarine vehicles are unable to be remotely instructed at great depths, we argue that this issue could be mitigated by deploying Rein- forcement Learning (RL) policies to autonomously complete missions with- out human interference. However, most of the prior works in RL applied in AUV context are only ever tested in simulation, without much consider- ation for the physical dynamics of the environments that these vehicles end up in eventually. To this end, we attempt to train Proximal Policy Optimization (PPO) RL policies on various constructed RL environments within Isaac Labs (for- merly Isaac Orbit), combining Domain Randomization (DR), Potential-Based Reward Shaping (PBRS) and several other novel approaches to create differ- ent configurations under which the aforementioned policy can be trained, varying from abstract to more realistic environmental configurations. After training, we observe convergence in simpler simulations in one of the en- vironments, but this worsens to more unstable and/or non-convergence as complexity of the environments increase and begin to match real scenarios. We provide an extensive discussion on a multitude of factors and reasons as to why this could be happening. Our work contributes to the expanding field of RL in terms of its applica- tion within the real world, which has remained quite limited so far. We also list several approaches that are interesting for future research; for instance, convergence could be easier if the model ’sticks’ to one action for multiple time steps or by forcing extreme actions through discrete action spaces. We additionally found that a lot of dynamics had to be implemented on our end within Isaac Labs and we thereby present a variety of dynamics related features that could be investigated further in order to see their effect on con- vergence and to increase the realism of the simulator.

Keywords

Reinforcement Learning; AUVs; Isaac Labs; Proximal Policy Optimization; PBRS; Domain Randomization

URI

https://studenttheses.uu.nl/handle/20.500.12932/50765

RoboSubSim: Robust Reinforcement Learning Policies Trained Using Sim-to-Real Methods and Aquatic, Randomized, Simple Physics

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI