Social curiosity in deep multi-agent reinforcement learning

In Multi-Agent Reinforcement Learning (MARL), social dilemma environments make cooperation hard to learn. It is even harder in the case of decentralized models, where agents do not share model components. Intrinsic rewards have only been partially explored to solve this problem, and training still requires a large amount of samples and thus time. In an attempt to speed up this process, we propose a combination of the two main categories of intrinsic rewards, curiosity and empowerment. We perform experiments in the cleanup and harvest social dilemma environments for several types of models, both with and without intrinsic motivation. We find no conclusive evidence that intrinsic motivation significantly alters experiment outcomes when using the PPO algorithm. We also find that PPO is unable to succeed in the harvest environment. However, for both of these findings we only show this to be the case without hyperparameter tuning.

Keywords

reinforcement learning, multi-agent, multi-agent reinforcement learning, policy gradient, PPO, A3C, actor-critic, social dilemmas, sequential social dilemmas, tragedy of the commons, commons dilemma, intrinsic reward, intrinsic motivation, empowerment, curiosity, social curiosity module

URI

https://studenttheses.uu.nl/handle/20.500.12932/38059

Social curiosity in deep multi-agent reinforcement learning

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI