RoboSubSim: Robust Reinforcement Learning Policies Trained Using Sim-to-Real Methods and Aquatic, Randomized, Simple Physics
Publication date
Authors
DOI
Document Type
Master Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
In the context of Autonomous Underwater Vehicles (AUVs), control is
still mostly done using manual rules, which exhibit limited predictive and
adaptive properties, known as PID controllers. Recent advances in Rein-
forcement Learning research also focus on its application in robotic settings,
where it can generalizing quite well to unknown, shifting environments
with variable levels of interference, including varying forms of dynamics
in the environment, differing sensory inputs, differing positional points of
interest and external forces from the environment.
As normal submarine vehicles are unable to be remotely instructed at
great depths, we argue that this issue could be mitigated by deploying Rein-
forcement Learning (RL) policies to autonomously complete missions with-
out human interference. However, most of the prior works in RL applied
in AUV context are only ever tested in simulation, without much consider-
ation for the physical dynamics of the environments that these vehicles end
up in eventually.
To this end, we attempt to train Proximal Policy Optimization (PPO)
RL policies on various constructed RL environments within Isaac Labs (for-
merly Isaac Orbit), combining Domain Randomization (DR), Potential-Based
Reward Shaping (PBRS) and several other novel approaches to create differ-
ent configurations under which the aforementioned policy can be trained,
varying from abstract to more realistic environmental configurations. After
training, we observe convergence in simpler simulations in one of the en-
vironments, but this worsens to more unstable and/or non-convergence as
complexity of the environments increase and begin to match real scenarios.
We provide an extensive discussion on a multitude of factors and reasons
as to why this could be happening.
Our work contributes to the expanding field of RL in terms of its applica-
tion within the real world, which has remained quite limited so far. We also
list several approaches that are interesting for future research; for instance,
convergence could be easier if the model ’sticks’ to one action for multiple
time steps or by forcing extreme actions through discrete action spaces. We
additionally found that a lot of dynamics had to be implemented on our
end within Isaac Labs and we thereby present a variety of dynamics related
features that could be investigated further in order to see their effect on con-
vergence and to increase the realism of the simulator.
Keywords
Reinforcement Learning; AUVs; Isaac Labs; Proximal Policy Optimization; PBRS; Domain Randomization