Reinforcement Learning Coach by Intel

Oct 20, 2017

Today, Intel is announcing the release of our Reinforcement Learning Coach — an open source research framework for training and evaluating reinforcement learning (RL) agents by harnessing the power of multi-core CPU processing to achieve state-of-the-art results. Coach contains multi-threaded implementations for some of today’s leading RL algorithms, combined with various games and robotics environments. It enables efficient training of reinforcement learning agents on a desktop computer, without requiring any additional hardware.

Since the introduction of asynchronous methods for deep reinforcement learning[1] in 2016, many algorithms have been able to achieve better policies faster by running multiple instances in parallel on many CPU cores. So far, these algorithms include A3C[1], DDPG[2], PPO[3], DFP[4] and NAF[5], and we believe that this is only the beginning. Coach includes implementations of these and other state-of-the-art algorithms, and is a good starting point for anyone who wants to use and build on the best techniques available in the field.

How do you use Coach? Start by defining the problem that you would like to solve, or select an existing one. Choose a set of reinforcement learning algorithms to use and make progress towards solving your problem. Coach enables easy experimentation with existing algorithms and is used as a sandbox for simplifying the development of new algorithms. The framework defines a set of APIs and key components used in reinforcement learning that enables the user to easily reuse components and build new algorithms on top of existing ones.  

Coach is integrated with some of the top available environments, such as OpenAI Gym*, Roboschool* and ViZDoom*. It also offers various techniques for visualizing the training process and understanding the underlying mechanisms of the agents. All of the algorithms are implemented using Intel-optimized TensorFlow, and some are also available through our neon™ framework.

 

The Agents

Coach contains implementations for many agent types, including seamless transition from single threaded implementations to multithreaded implementations. The agents are implemented in a modular way, to allow the reuse of different building blocks for building new and more complex agents. Moreover, Coach enables writing new agents with a single worker in mind, and switching to a synchronous or asynchronous multi-worker implementation, with a minimal amount of changes.

A variety of agent types that were introduced in the past few years are implemented in Coach. This allows users to solve environments with different requirements and means of interaction with the agent, such as continuous and discrete action spaces, visual observations spaces or observation spaces that include only raw measurements.

 

The Environments

Coach uses OpenAI Gym as the main tool for interacting with different environments. It also supports external extensions to Gym such as Roboschool, gym-extensions[6] and PyBullet, and its environment wrapper allows adding even more custom environments to solve a much wider variety of learning problems.

Visualizations

As a complementary tool for visualization and debugging, we are releasing a dashboard for the Reinforcement Learning Coach. The dashboard is a graphical user interface that displays different signals from the training process and lets you compare the quality of different runs in an easy and comprehensible fashion. During training, Coach tracks any meaningful internal information and stores it to allow visualization of progress during execution and after completion.

Other debug and visualization methods are also supported through Coach, such as storing GIF animations of the best episodes, displaying action values during game play, and more.

Getting Started

Start by heading over to our GitHub repository and follow the instructions for installing Coach on your machine. Using Coach is quite straight-forward, and there are a few simple examples in the GitHub repository README document for you to try. There is also a comprehensive usage and implementation documentation here.

We have already prepared over 60 predefined presets exercising the usage of different agents, with many of the available environments. These presets were already used to train hundreds of agents and were verified to achieve good performance. However, using those presets is not mandatory, and creating new presets is as easy as picking an existing agent and an existing environment, and giving them a try. As a simple starting point, try running the following command:

python3 coach.py -r -p CartPole_DQN

What’s Next?

We are planning to add more algorithms and environments in future releases, and encourage you to contribute by making pull requests, suggestions, and comments on GitHub.

 

References
  1. Asynchronous Methods for Deep Reinforcement Learning
    Volodymyr Mnih, Adria Puigdom`enech Badia, Mehdi Mirza, Alex Graves, Timothy P. Lillicrap, Tim ` Harley, David Silver, and Koray Kavukcuoglu. CoRR, abs/1602.01783, 2016.
  2. Data-efficient Deep Reinforcement Learning for Dexterous Manipulation
    Ivaylo Popov, Nicolas Heess, Timothy P. Lillicrap, Roland Hafner, Gabriel Barth-Maron, Matej Vecerik, Thomas Lampe, Yuval Tassa, Tom Erez, and Martin A. Riedmiller. CoRR, abs/1704.03073, 2017.
  3. Emergence of Locomotion Behaviours in Rich Environments
    Nicolas Heess, Dhruva TB, Srinivasan Sriram, Jay Lemmon, Josh Merel, Greg Wayne, Yuval Tassa, Tom Erez, Ziyu Wang, S. M. Ali Eslami, Martin A. Riedmiller, and David Silver. CoRR, abs/1707.02286, 2017.
  4. Learning to Act by Predicting the Future
    Alexey Dosovitskiy and Vladlen Koltun. CoRR, abs/1611.01779, 2016.
  5. Deep Reinforcement Learning for Robotic Manipulation
    Shixiang Gu, Ethan Holly, Timothy P. Lillicrap, and Sergey Levine. CoRR, abs/1610.00633, 2016.
  6. Benchmark environments for multitask learning in continuous domains
    Peter Henderson, Wei-Di Chang, Florian Shkurti, Johanna Hansen, David Meger, and Gregory Dudek. CoRR, abs/1708.04352, 2017.