Blog Post (Part III): Deep Reinforcement Learning with OpenAI Gym

Apr 27, 2016

Author Bio Image

JD Co-Reyes

Nervana Systems Intern

This is part 3 of a blog series on deep reinforcement learning. See “Part 1: Demystifying Deep Reinforcement Learning” for an introduction to the topic and “Part 2: Deep Reinforcement Learning with Neon” for the original implementation in Simple-DQN.

In this blog post we will extend a Simple-DQN to work with OpenAI Gym, a new toolkit for developing and comparing reinforcement learning algorithms. Read more about the release on their blog. We will cover how to train and test an agent with the new environment using Neon.

Update: Code has been updated and is now at https://github.com/tambetm/simple_dqn.

GymEnvironment

OpenAI Gym integration

Figure 1. Agent Environment Loop

OpenAI Gym provides a simple interface for interacting with the environment. Given an observation of previous state and reward, an agent chooses an action to perform on the environment to provide the next state and reward.

observation, reward, done, info = environment.step(action)


In our case, the environment is an Atari game, the observation is a game screen, and the reward is the score obtained from that action. Since OpenAI Gym uses a different interface (atari_py) to the Arcade Learning Environment (ALE), we can create a wrapper class, GymEnvironment, around the OpenAI Gym environment to work with the Simple-DQN training code. Before, Simple-DQN retrieved the screen and terminal state directly from the ALE environment after performing an action whereas the OpenAI Gym environment returns this data each time the agent acts on the environment. So we can instead store these variables as fields in our wrapper and use them as needed. Creating an environment also differs slightly in that we specify which game to use with an environment id such as “Breakout-v0” instead of loading directly from a rom file.

class GymEnvironment(Environment):
def __init__(self, env_id, args):
import gym
self.gym = gym.make(env_id)
self.obs = None
self.terminal = None

def numActions(self):
return self.gym.action_space.n

def restart(self):
self.gym.reset()
self.obs = None
self.terminal = None

def act(self, action):
self.obs, reward, self.terminal, _ = self.gym.step(action)
return reward

def getScreen(self):
assert self.obs is not None
return self.obs

def isTerminal(self):
assert self.terminal is not None
return self.terminal

Training

To train with OpenAI Gym instead of ALE, we just specify the environment (OpenAI Gym or ALE) and the game. OpenAI Gym returns the full RGB screen (210, 160) that we then convert to grayscale and resize to (84, 84).

./train.sh Breakout-v0 –environment gym

This will train a model using the OpenAI Gym environment and save model snapshots every epoch.

Testing

To test a trained model on OpenAI Gym, we will first create a GymAgent that

  • Stores the last four screen observations in memory
  • Given the last four screen observations, uses the trained model to find the action with the highest q value

class GymAgent():
def __init__(self, env, net, memory, args):
self.env = env
self.net = net
self.memory = memory
self.history_length = args.history_length
self.exploration_rate_test = args.exploration_rate_test

def add(self, observation):
self.memory[0, :-1] = self.memory[0, 1:]
self.memory[0, -1] = np.array(observation)

def get_action(self, t, observation):
self.add(observation)
if t < self.history_length or random.random() < self.exploration_rate_test:
action = env.action_space.sample()
else:
qvalues = net.predict(memory)
action = np.argmax(qvalues[0])
return action

Then we can simply instantiate the agent with the environment and saved model and call get_action during the test loop described here to find the optimal action to play during each time step.

agent = GymAgent(env, net, memory, args)
env.monitor.start(args.output_folder, force=True)
num_episodes = 10
for i_episode in xrange(num_episodes):
observation = env.reset()
for t in xrange(10000):
action = agent.get_action(t, observation)
observation, reward, done, info = env.step(action)
if done:
break
env.monitor.close()

This code for testing is all in this script which can be run with

python src/test_gym.py Breakout-v0 <output_folder> –load_weights <saved_model_pkl>

This will log the testing results and record videos to the specified output_folder which we can then upload to OpenAI Gym for evaluation. It is also recommended to upload a gist describing how to reproduce your results.

Evaluation Results on OpenAI Gym

Figure 2. Evaluation Results on OpenAI Gym

An example video of an agent playing several episodes:

Using Nervana Cloud

To train a model on Nervana Cloud, first install and configure ncloud.  ncloud is a command line client to help you use and manage Nervana’s deep learning cloud. 

Assuming the necessary dependencies are installed, we can run training with:

ncloud train src/main.py –args ”Breakout-v0 –environment gym” –custom_code_url https://github.com/tambetm/simple_dqn

and testing with:

ncloud train src/test_gym.py –args ”Breakout-v0 –load_weights <saved_model_pkl>” ——custom_code_url https://github.com/tambetm/simple_dqn

To find out more about Nervana Cloud, visit Nervana’s Products page

Conclusion

OpenAI Gym provides a nice toolkit for training and testing reinforcement learning algorithms. Extending Simple-DQN to work with OpenAI Gym was relatively straightforward to implement and hopefully others can easily extend this work to develop better learning algorithms.

Author Bio Image

JD Co-Reyes

Nervana Systems Intern

Related Blog Posts

neon v2.3.0: Significant Performance Boost for Deep Speech 2 and VGG models

We are excited to announce the release of neon™ 2.3.0.  It ships with significant performance improvements for Deep Speech 2 (DS2) and VGG models running on Intel® architecture (IA). For the DS2 model, our tests show up to 6.8X improvement1,4 with the Intel® Math Kernel Library (Intel® MKL) backend over the NumPy CPU backend with…

Read more

#neon

BDW-SKX Normalized Throughput

neon v2.1.0: Leveraging Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

We are excited to announce the availability of neon™ 2.1 framework. An optimized backend based on Intel® Math Kernel Library (Intel® MKL), is enabled by default on CPU platforms with this release. neon™ 2.1 also uses a newer version of the Intel ® MKL for Deep Neural Networks (Intel ® MKL-DNN), which features optimizations for…

Read more

#neon #Release Notes

neon™ 2.0: Optimized for Intel® Architectures

neon™ is a deep learning framework created by Nervana Systems with industry leading performance on GPUs thanks to its custom assembly kernels and optimized algorithms. After Nervana joined Intel, we have been working together to bring superior performance to CPU platforms as well. Today, after the result of a great collaboration between the teams, we…

Read more

#neon