|
| 1 | +{ |
| 2 | + "cells": [ |
| 3 | + { |
| 4 | + "cell_type": "markdown", |
| 5 | + "metadata": {}, |
| 6 | + "source": [ |
| 7 | + "## Using OpenAI Gym on Skills Network Labs" |
| 8 | + ] |
| 9 | + }, |
| 10 | + { |
| 11 | + "cell_type": "markdown", |
| 12 | + "metadata": {}, |
| 13 | + "source": [ |
| 14 | + "Skills Network Labs executes your code remotely. If you were running OpenAI Gym on your own, local, computer, you could use your own display. Since we're _not_ running this locally, we have to use a virtual display:" |
| 15 | + ] |
| 16 | + }, |
| 17 | + { |
| 18 | + "cell_type": "code", |
| 19 | + "execution_count": null, |
| 20 | + "metadata": {}, |
| 21 | + "outputs": [], |
| 22 | + "source": [ |
| 23 | + "!pip install pyvirtualdisplay\n", |
| 24 | + "from pyvirtualdisplay import Display\n", |
| 25 | + "display = Display(visible=0, size=(1400, 900))\n", |
| 26 | + "display.start()" |
| 27 | + ] |
| 28 | + }, |
| 29 | + { |
| 30 | + "cell_type": "markdown", |
| 31 | + "metadata": {}, |
| 32 | + "source": [ |
| 33 | + "And of course we'll need `gym`:" |
| 34 | + ] |
| 35 | + }, |
| 36 | + { |
| 37 | + "cell_type": "code", |
| 38 | + "execution_count": null, |
| 39 | + "metadata": {}, |
| 40 | + "outputs": [], |
| 41 | + "source": [ |
| 42 | + "!pip install gym\n", |
| 43 | + "import gym" |
| 44 | + ] |
| 45 | + }, |
| 46 | + { |
| 47 | + "cell_type": "markdown", |
| 48 | + "metadata": {}, |
| 49 | + "source": [ |
| 50 | + "We'll use the classic arcade game _Space Invaders_ for this demo:" |
| 51 | + ] |
| 52 | + }, |
| 53 | + { |
| 54 | + "cell_type": "code", |
| 55 | + "execution_count": null, |
| 56 | + "metadata": {}, |
| 57 | + "outputs": [], |
| 58 | + "source": [ |
| 59 | + "!pip install gym[atari]\n", |
| 60 | + "env = gym.make('SpaceInvaders-v0')" |
| 61 | + ] |
| 62 | + }, |
| 63 | + { |
| 64 | + "cell_type": "markdown", |
| 65 | + "metadata": {}, |
| 66 | + "source": [ |
| 67 | + "Here we use a wrapper to modify our environment so that results are stored in a `gym-results` folder:" |
| 68 | + ] |
| 69 | + }, |
| 70 | + { |
| 71 | + "cell_type": "code", |
| 72 | + "execution_count": null, |
| 73 | + "metadata": {}, |
| 74 | + "outputs": [], |
| 75 | + "source": [ |
| 76 | + "from gym import wrappers\n", |
| 77 | + "env = wrappers.Monitor(env, \"/resources/gym-tutorial-results\", force=True)\n", |
| 78 | + "# force=True means we'll overwrite past results,\n", |
| 79 | + "# only keeping one result at a time" |
| 80 | + ] |
| 81 | + }, |
| 82 | + { |
| 83 | + "cell_type": "code", |
| 84 | + "execution_count": null, |
| 85 | + "metadata": {}, |
| 86 | + "outputs": [], |
| 87 | + "source": [ |
| 88 | + "env = wrappers.Monitor(env, \"/resources/gym-tutorial-results\", force=True)\n", |
| 89 | + "env.reset()\n", |
| 90 | + "for t in range(1000):\n", |
| 91 | + " env.render()\n", |
| 92 | + " action = env.action_space.sample() # take a random action\n", |
| 93 | + " observation, reward, done, info = env.step(action)\n", |
| 94 | + " if done:\n", |
| 95 | + " print(\"Episode finished after {} timesteps\".format(t+1))\n", |
| 96 | + " break\n", |
| 97 | + "env.close()" |
| 98 | + ] |
| 99 | + }, |
| 100 | + { |
| 101 | + "cell_type": "markdown", |
| 102 | + "metadata": {}, |
| 103 | + "source": [ |
| 104 | + "Our result is stored as an mp4 file in the `gym-results` folder we specified. You can display the video using the `display_result_video()` function provided below:" |
| 105 | + ] |
| 106 | + }, |
| 107 | + { |
| 108 | + "cell_type": "code", |
| 109 | + "execution_count": null, |
| 110 | + "metadata": {}, |
| 111 | + "outputs": [], |
| 112 | + "source": [ |
| 113 | + "from IPython.display import Video\n", |
| 114 | + "\n", |
| 115 | + "def display_result_video():\n", |
| 116 | + " filename = '/resources/gym-tutorial-results/openaigym.video.%s.video000000.mp4' % env.file_infix\n", |
| 117 | + " return Video(filename, width=600, embed=True)\n", |
| 118 | + "\n", |
| 119 | + "display_result_video()" |
| 120 | + ] |
| 121 | + }, |
| 122 | + { |
| 123 | + "cell_type": "markdown", |
| 124 | + "metadata": {}, |
| 125 | + "source": [ |
| 126 | + "Taking random actions probably didn't go so well. You can view the available actions by running:" |
| 127 | + ] |
| 128 | + }, |
| 129 | + { |
| 130 | + "cell_type": "code", |
| 131 | + "execution_count": null, |
| 132 | + "metadata": {}, |
| 133 | + "outputs": [], |
| 134 | + "source": [ |
| 135 | + "env.unwrapped.get_action_meanings()" |
| 136 | + ] |
| 137 | + }, |
| 138 | + { |
| 139 | + "cell_type": "markdown", |
| 140 | + "metadata": {}, |
| 141 | + "source": [ |
| 142 | + "And then we can take those actions by using env.step(), e.g.:\n", |
| 143 | + "\n", |
| 144 | + "env.step(0) -> noop (no operation; do nothing)\n", |
| 145 | + "\n", |
| 146 | + "env.step(1) -> fire\n", |
| 147 | + "\n", |
| 148 | + "env.step(2) -> go right\n", |
| 149 | + "\n", |
| 150 | + "env.step(3) -> go left\n", |
| 151 | + "\n", |
| 152 | + "etc." |
| 153 | + ] |
| 154 | + }, |
| 155 | + { |
| 156 | + "cell_type": "markdown", |
| 157 | + "metadata": {}, |
| 158 | + "source": [ |
| 159 | + "They say the best defense is a good offense. Let's test that theory by just standing still and shooting for the whole game:" |
| 160 | + ] |
| 161 | + }, |
| 162 | + { |
| 163 | + "cell_type": "code", |
| 164 | + "execution_count": null, |
| 165 | + "metadata": {}, |
| 166 | + "outputs": [], |
| 167 | + "source": [ |
| 168 | + "env = wrappers.Monitor(env, \"/resources/gym-tutorial-results\", force=True)\n", |
| 169 | + "env.reset()\n", |
| 170 | + "for t in range(1000):\n", |
| 171 | + " env.render()\n", |
| 172 | + " action = 1 # always fire!\n", |
| 173 | + " observation, reward, done, info = env.step(action)\n", |
| 174 | + " if done:\n", |
| 175 | + " print(\"Episode finished after {} timesteps\".format(t+1))\n", |
| 176 | + " break\n", |
| 177 | + "env.close()" |
| 178 | + ] |
| 179 | + }, |
| 180 | + { |
| 181 | + "cell_type": "code", |
| 182 | + "execution_count": null, |
| 183 | + "metadata": {}, |
| 184 | + "outputs": [], |
| 185 | + "source": [ |
| 186 | + "display_result_video()" |
| 187 | + ] |
| 188 | + }, |
| 189 | + { |
| 190 | + "cell_type": "markdown", |
| 191 | + "metadata": {}, |
| 192 | + "source": [ |
| 193 | + "How did it go? Can you do better?\n", |
| 194 | + "\n", |
| 195 | + "**Tip:** The episode will automatically finish after you run out of lives or advance 1000 timesteps. You will need to increase the number of timesteps if your algorithm makes it further than 1000 timesteps into the game." |
| 196 | + ] |
| 197 | + } |
| 198 | + ], |
| 199 | + "metadata": { |
| 200 | + "kernelspec": { |
| 201 | + "display_name": "Python", |
| 202 | + "language": "python", |
| 203 | + "name": "conda-env-python-py" |
| 204 | + }, |
| 205 | + "language_info": { |
| 206 | + "codemirror_mode": { |
| 207 | + "name": "ipython", |
| 208 | + "version": 3 |
| 209 | + }, |
| 210 | + "file_extension": ".py", |
| 211 | + "mimetype": "text/x-python", |
| 212 | + "name": "python", |
| 213 | + "nbconvert_exporter": "python", |
| 214 | + "pygments_lexer": "ipython3", |
| 215 | + "version": "3.6.7" |
| 216 | + } |
| 217 | + }, |
| 218 | + "nbformat": 4, |
| 219 | + "nbformat_minor": 4 |
| 220 | +} |
0 commit comments