|
| 1 | +# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions |
| 2 | +Copyright (C) 2016, Authors. |
| 3 | + |
| 4 | +This directory contains code and instructions for training the local prediction |
| 5 | +network using the [Caffe](https://github.com/BVLC/caffe) framework. |
| 6 | + |
| 7 | +The primary network definition is in the file `train.prototxt` in this directory. |
| 8 | +In addition to the prediction network, we also use existing Caffe layers to |
| 9 | +compute depth derivatives, and generate classification targets for each depth |
| 10 | +map on the fly. |
| 11 | + |
| 12 | +## Custom Layers |
| 13 | + |
| 14 | +Our network employs two custom layers, included in the `layers/` sub-directory. |
| 15 | + |
| 16 | +1. The first is simply a python data layer in `layers/NYUdata.py`, and handles |
| 17 | + loading training data from the NYUv2 dataset (details on how to prepare the |
| 18 | + data are in the next section). Make sure you compile Caffe with python layers |
| 19 | + enabled, and place the above file in the current directory or somewhere |
| 20 | + in your `PYTHONPATH`. |
| 21 | + |
| 22 | +2. The second layer is the SoftMax + KL-Divergence loss layer. You will need to |
| 23 | + compile this into Caffe. Copy the header file `softmax_kld_loss_layer.hpp` |
| 24 | + into the `include/caffe/layers/` directory of your caffe distribution, and |
| 25 | + `softmax_kld_loss_layer.c*` files into the `src/caffe/layers/` directory. |
| 26 | + Then run `make` to compile / update caffe. |
| 27 | + |
| 28 | +## Preparing NYUv2 Data |
| 29 | + |
| 30 | +Download the RAW distribution and toolbox from the [NYUv2 depth dataset |
| 31 | +page](http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html). Read |
| 32 | +the documentation to figure out how to process the RAW data to |
| 33 | +create aligned RGB-depth image pairs, and to *fill-in* missing depth |
| 34 | +values. Also, make sure you only use scenes corresponding to the training |
| 35 | +set in the official train-test split. |
| 36 | + |
| 37 | +For each scene, generate a pair of PNG files to store the RGB and depth data |
| 38 | +respectively. These should be named with a common base name and different |
| 39 | +suffixes: '_i.png`, for the 8-bit 3 channel PNG corresponding to the |
| 40 | +RGB image, and `_f.png` for a 16-bit 1 channel PNG image corresponding |
| 41 | +to depth---the depth png should be scaled so that the max UINT16 value |
| 42 | +(2^16-1) corresponds to a depth of 10 meters. |
| 43 | + |
| 44 | +All images should be of size 561x427, corresponding to the valid projection |
| 45 | +area (you can use the `crop_image` function in the NYU toolbox). If you |
| 46 | +decide to train on a different dataset, you might need to edit the data layer |
| 47 | +and the network architecture to work with different resolution images. |
| 48 | + |
| 49 | +Place all pairs you want to use in the same directory, and prior to calling |
| 50 | +affe, set the environment variable `NYU_DATA_DIR` to its path, e.g. as |
| 51 | +`export NYU_DATA_DIR=/pathto/nyu_data_dir`. Then, create a text file called |
| 52 | +`train.txt` (and place it in the same directory from which you are calling caffe). |
| 53 | +Each line in this file should correspond to the common prefix for each scene. So, |
| 54 | +if you have a line with `scene1_frame005`, then the data layer will read the |
| 55 | +files: |
| 56 | + |
| 57 | +``` |
| 58 | +/pathto/nyu_data_dir/scene1_frame005_i.png |
| 59 | +/pathto/nyu_data_dir/scene1_frame005_f.png |
| 60 | +``` |
| 61 | + |
| 62 | +for the image and depth data respectively. |
| 63 | + |
| 64 | + |
| 65 | +## Training |
| 66 | + |
| 67 | +Use the provided `train.prototxt` file for the network definition, and create a |
| 68 | +solver prototxt file based on the description in the paper (momentum of 0.9, no |
| 69 | +weight decay, and learning rate schedule described in the paper). |
| 70 | + |
| 71 | +When you begin training, you should provide as an option to caffe: |
| 72 | + |
| 73 | +``` |
| 74 | +-weights filters_init.caffemodel.h5,/path/to/vgg19.caffemodel |
| 75 | +``` |
| 76 | + |
| 77 | +where `vgg19.caffemodel` is the pre-trained VGG-19 model from the caffe model |
| 78 | +zoo. `filters_init.caffemodel.h5` is provided in this directory, and initializes |
| 79 | +the weights of various layers in `train.prototxt` that compute depth-derivatives, |
| 80 | +mixture weights with respect to various bins, perform bilinear up-sampling |
| 81 | +of the scene features, etc. These layers have a learning rate factor of 0, and |
| 82 | +will not change through training. However, they will be saved with model |
| 83 | +snapshots, so you will need to provide the above option only the first time you |
| 84 | +start training. |
| 85 | + |
| 86 | +Please see the paper for more details, and contact <[email protected]> if you |
| 87 | +still have any questions. |
0 commit comments