Skip to content

Commit c5a9631

Browse files
committed
Added tutorial with assets
1 parent 4f26743 commit c5a9631

8 files changed

+4735
-2
lines changed

Dockerfile

+38
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,38 @@
1+
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
2+
3+
RUN apt-get update && \
4+
apt-get install -y \
5+
build-essential \
6+
cmake \
7+
git \
8+
wget \
9+
unzip \
10+
yasm \
11+
pkg-config \
12+
curl
13+
14+
# Install Python 3
15+
RUN apt-get install -y \
16+
python3-dev \
17+
python3-numpy \
18+
python3-pip
19+
20+
# Install OpenCV
21+
RUN apt-get install -y \
22+
libopencv-dev \
23+
python-opencv
24+
25+
# Cleanup
26+
RUN rm -rf /var/lib/apt/lists/*
27+
28+
# Install Python dependencies
29+
RUN pip3 --no-cache-dir install \
30+
opencv-python \
31+
Pillow \
32+
pyyaml \
33+
tqdm
34+
35+
# Install project specific dependencies
36+
RUN pip3 --no-cache-dir install \
37+
tensorflow-gpu==1.8.0 \
38+
git+git://github.com/JiahuiYu/neuralgym.git@88292adb524186693a32404c0cfdc790426ea441

README.md

+259-2
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,259 @@
1-
# Dockertainment
2-
The Docker Tutorial You've Been Wanting!
1+
# Run the ML Model Zoo in Docker – A friendly Tutorial for Artist and Designers
2+
3+
By [@b-g](https://github.com/b-g) Benedikt Groß
4+
5+
We find ourselves in an exciting technological moment. In the last few years, it seems magic started to happen in Artificial Intelligence “AI”. After a long AI winter, machine learning “ML” methods and techniques started to work.
6+
7+
If you follow ML news outlets like [Two Minute Papers](https://www.youtube.com/channel/UCbfYPyITQ-7l4upoX8nvctg) (ML papers nicely exlpained in 2 min videos) or the recent development of [Runway ML App](https://runwayml.com/) (think of "ready made" ML effect app) there seems to be popping up one interesting ML model after another. And often these ML models come with example code on Github! Yay!
8+
9+
But the excitment often fades away quickly, as even though the example code or the demo of the ML model doesn't look crazy complicated ... it can become quickly pure hell to actually get it running on your computer. Most often simply because of ultra specfic software and GPU driver dependencies, which propably don't go well with what you already have installed on your machine. It is often a total mess :(
10+
11+
This tutorial tries to jump in by showing you an approach to handle the ML software dependency complexity better! We are going to use a [Docker](https://en.wikipedia.org/wiki/Docker_(software)) Container (think of an entire operating system nicely bundled in a container) which has access to the GPU and the files of a host machine running it. So instead of installing everything direclty in the operating system of your local machine, we create a layer of abstraction for every model. You could also say every ML model gets its own (virtual) operation system, as if you had for every ML model a dedicated computer.
12+
13+
Let's assume you stumbled upon the "DeepFill" paper [Generative Image Inpainting with Contextual Attention](https://arxiv.org/abs/1801.07892) and the corresponding Github repository [JiahuiYu/generative_inpainting](https://github.com/JiahuiYu/generative_inpainting). Here is a quick illustration on what DeepFill does:
14+
15+
![deepfill-illustration](/Users/bene/Dropbox (Personal)/CMU/Docker ML Tutorial/deepfill-illustration.png)
16+
17+
Fancy! You provide a mask and Deepfill is hallucinating the content which matches the context.
18+
19+
The following sections are step by step instructions how to get DeepFill running in a docker container. The process should be fairly similar for other models and hence can be seen as a general approach to encapsulate the setup complexity which comes with state of the art ML models.
20+
21+
My hope is furthermore that dedicated Docker containers will make ML models a lot more shareable and accessible for a wider audience in Art & Design, to facilitate the very needed debate of wider implications of AI/ML.
22+
23+
24+
25+
## 0. TOC
26+
27+
[TOC]
28+
29+
30+
## 1. Prerequisite 🐧
31+
32+
You will need the following hard- and software setup to be able to run Docker with GPU support:
33+
34+
- An Ubuntu computer/server with a Nvidia CUDA GPU
35+
- Docker with version >= 1.4
36+
- Nvidia drivers with version >= 361
37+
38+
39+
## 2. Install Party: CUDA, Docker and nvidia-container-toolkit 💻
40+
41+
42+
### Install Docker
43+
44+
Follow the official documentation: [https://docs.docker.com/install/linux/docker-ce/ubuntu/](https://docs.docker.com/install/linux/docker-ce/ubuntu/)
45+
46+
Verify Docker version:
47+
48+
```bash
49+
docker version
50+
```
51+
52+
The output should be a long list with infos like "API version: 1.4" etc.
53+
54+
### Install Nvidia CUDA driver
55+
56+
Install CUDA along with latest nvidia driver for you graphics card.
57+
58+
- Go to: https://developer.nvidia.com/cuda-downloads
59+
- Select Linux > x86_64 > Ubuntu
60+
- Select your ubuntu version
61+
- Select Installer type (we tested with deb local or deb network)
62+
- Follow instructions
63+
- After install, reboot your machine
64+
- Test if nvidia driver are installed with: `nvidia-smi`
65+
66+
67+
Verify Nvidia drivers version:
68+
69+
```bash
70+
nvidia-smi
71+
72+
+-----------------------------------------------------------------------------+
73+
| NVIDIA-SMI 418.67 Driver Version: 418.67 CUDA Version: 10.1 |
74+
|-------------------------------+----------------------+----------------------+
75+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
76+
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
77+
|===============================+======================+======================|
78+
| 0 GeForce GTX 1050 Off | 00000000:02:00.0 Off | N/A |
79+
| N/A 40C P0 8W / N/A | 0MiB / 2002MiB | 1% Default |
80+
+-------------------------------+----------------------+----------------------+
81+
```
82+
83+
### Install nvidia-container-toolkit
84+
85+
- Follow the official [quickstart documentation](https://github.com/NVIDIA/nvidia-docker#quickstart) e.g. for Ubuntu 16.04/18.04:
86+
87+
```bash
88+
# Add the package repositories
89+
$ distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
90+
$ curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
91+
$ curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list
92+
93+
# Install and reload Docker
94+
$ sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
95+
$ sudo systemctl restart docker
96+
```
97+
98+
- Verify installation by running a dummy docker image
99+
100+
```bash
101+
$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
102+
103+
# Should output something like
104+
+----------------------------------------------------------+
105+
| NVIDIA-SMI 418.87 Driver Version: 418.87 CUDA Version: 10.1
106+
|-------------------------------+----------------------
107+
```
108+
109+
Yay 🎉🎉🎉 ! You can go on to finally run ML models in Docker on your machine! Good news is that you just have to do the horrible installation part once.
110+
111+
_⚠️ Currently (December 2019) the nvidia-docker projects seems to be in an odd transition phase of supporting two slightly different ways of leveraging NVIDIA GPUs in docker containers. At the moment best practice seems to install the nvidia-container-toolkit and if needed the deprecated nvidia-docker. You can install both without running into conflicts. The install order doesn't matter as well. _
112+
113+
## 3. Example: Getting DeepFill running in Docker 📦🏃
114+
115+
So let's have a look in DeepFill Github repository [JiahuiYu/generative_inpainting](https://github.com/JiahuiYu/generative_inpainting). The README.md is nice in the sense that it explains with a few images what DeepFill does, has references and even sections on requirements and how to run the demo. But you still won't be able to run it out of the box. Like other ML models DeepFill relies on very specific software dependencies. And as ML researchers are busy with their research, documenting software setups for a wider audience seem currently not to be a priority in those circles. The dream situation would be that there is already a `Dockerfile` (e.g. [Detectron2](https://github.com/facebookresearch/detectron2) is a notable exception), or at least a`requirements.txt` (used in Python to define dependencies).
116+
117+
### Requirements spotting
118+
119+
We have to create the Docker container on our own. This is where we have to start to play detective! :)
120+
121+
Stroll around in the repository and try to find clues of what we are going to need. I found the following:
122+
- there is a badge saying Tensorflow v1.7.0
123+
- under requirements the author states: python3, tensorflow, neuralgym
124+
- OpenCV (cv2) is mentioned
125+
- and to download the pretrained models and copy them to the folder `model_logs/`
126+
127+
### Fork the DeepFill repository
128+
129+
to your own Github account. Press the "Fork" button at JiahuiYu/generative_inpainting. Result should be github.com/{your-username}/generative_inpainting
130+
131+
![fork_button](assets/fork_button.jpg)
132+
133+
Checkout your fork of generative_inpainting to your own computer.
134+
135+
### Create a Dockerfile
136+
137+
Now we will create a Docker container which reflects all the requirements we have spotted. Add an empty file named `Dockerfile` to your DeepFill repository.
138+
139+
We will base everything on an official Docker container from Nvidia. That way we get a clean Ubuntu with correctly installed CUDA / CUDNN / GPU / drivers. The syntax to write a docker container is quite understandable:
140+
141+
- **FROM**: new container should be based on "cuda" with tag "9.0-cudnn7-runtime-ubuntu16.04" published by "nvidia"
142+
- **RUN**: run a single or many terminal commands to install something.
143+
- **apt-get**: is the command line package manager of Ubuntu. Think of an app store for command line apps.
144+
- **pip3**: is the command line package manager of Python 3.
145+
146+
Here is how the `Dockerfile` looks:
147+
148+
```dockerfile
149+
FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
150+
151+
# Install a few basic command line tools e.g. zip, git, cmake
152+
RUN apt-get update && \
153+
apt-get install -y \
154+
build-essential \
155+
cmake \
156+
git \
157+
wget \
158+
unzip \
159+
yasm \
160+
pkg-config \
161+
curl
162+
163+
# Install Python 3, numpy and pip3
164+
RUN apt-get install -y \
165+
python3-dev \
166+
python3-numpy \
167+
python3-pip
168+
169+
# Install OpenCV
170+
RUN apt-get install -y \
171+
libopencv-dev \
172+
python-opencv
173+
174+
# Cleanup apt-get installs
175+
RUN rm -rf /var/lib/apt/lists/*
176+
177+
# Install Python dependencies
178+
RUN pip3 --no-cache-dir install \
179+
opencv-python \
180+
Pillow \
181+
pyyaml \
182+
tqdm
183+
184+
# Install project specific dependencies
185+
RUN pip3 --no-cache-dir install \
186+
tensorflow-gpu==1.8.0 \
187+
git+git://github.com/JiahuiYu/neuralgym.git@88292adb524186693a32404c0cfdc790426ea441
188+
```
189+
190+
You can also think of the `Dockerfile` as a very long list of installation instructions. There are additional docker keywords for config setting e.g. which network ports should be available etc.
191+
192+
We will later also talk about strategies for finding which versions / packages / cuda ... match your ML models and go well together.
193+
194+
### Build the DeepFill container
195+
196+
```bash
197+
docker build -t deepfill:v0 .
198+
```
199+
200+
**deepfill** is the name of our container and **v0** is our version tag
201+
202+
During the installation process Docker prints out what is going on. Keep your eyes open for red lines (errors). If everything goes well you can run the container now.
203+
204+
### Run the DeepFill container
205+
206+
```bash
207+
docker run -it --runtime=nvidia --volume $(pwd)/:/shared --workdir /shared deepfill:v0 bash
208+
```
209+
210+
- **-it** and **bash** run the container interactive, container should provide a bash terminal prompt
211+
212+
- **--runtime=nvidia** container can access to GPU
213+
214+
- **--volume** mount the current folder (the DeepFill repo) to folder /shared in the docker container filesystem. The folder is shared between the host and the docker container
215+
216+
- **deepfill:v0** run a docker container named deepfill with version v0
217+
218+
### Download pretrained DeepFill models
219+
220+
Download the [pretrained models](https://github.com/JiahuiYu/generative_inpainting#pretrained-models) e.g. Places2 (places background) or CelebA-HQ (faces) and copy it to folder `model_logs`. The demo relies on it.
221+
222+
### Run the DeepFill demo in the container
223+
224+
Copy these two images the DeepFill repo folder:
225+
226+
| input.png | mask.png |
227+
| ------------------- | ----------------- |
228+
| ![input](input.png) | ![mask](mask.png) |
229+
230+
Paste the command below in the terminal of our running DeepFill container:
231+
232+
```bash
233+
python3 test.py --image input.png --mask mask.png --output output.png --checkpoint_dir model_logs/release_places2_256
234+
```
235+
236+
The terminal will return a lot of debugging infos ... don't bother. After a few seconds you should get this result:
237+
238+
| output.png |
239+
| ----------------------------------- |
240+
| ![0001](assets/deepfill-output.png) |
241+
242+
Yay 🎉🎉🎉 !
243+
244+
## 4. Strategies for finding the requirements 🤯
245+
246+
To be honest it can take quite a while to figure out the requirements. Yes it is a mess. These strategies helped me figuring it out a bit less painful:
247+
248+
- Try to define the dependecy as specific as possible, as there is a steady release of updates which not always go well together. You want to be able to run your container a few weeks later:
249+
- Bad: pip3 install tensorflow-gpu
250+
- Good: pip3 install tensorflow-gpu==1.8.0
251+
- Bad: FROM nvidia/cuda
252+
- Good: FROM nvidia/cuda:9.0-cudnn7-runtime-ubuntu16.04
253+
- Guess a cuda version which matches the age of the core ML libaray used e.g:
254+
- Tensorflow 2.0 → cuda:10.2-cudnn7
255+
- Tensorflow 1.8.0 → cuda:9.0-cudnn7
256+
- If it is unclear, start with the new ones and gradually move back in time
257+
- All available docker containers published by Nvidia can be found here: https://hub.docker.com/r/nvidia/cuda/
258+
- Google error message in combination with specfic tensorflow / cuda versions e.g. `FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04` and then running the demo gave me `ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory`. After reading a few posts it turned out that this is a typical error which can be avoided using cuda 9.0.
259+
- If there is a requirements state e.g. tensorflow-gpu==1.7.0 but you still have problems e.g. `E tensorflow/stream_executor/cuda/cuda_dnn.cc:396] Loaded runtime CuDNN library: 7603 (compatibility version 7600) but source was compiled with 7005 (compatibility version 7000)` . Try to gently bump up or down the version. In the case of DeppFill using tensorflow-gpu==1.8.0 solver the issue.

0 commit comments

Comments
 (0)