Skip to content

Commit 3ecd0a5

Browse files
committed
Updated architecture and added training code.
1 parent d0eeaf9 commit 3ecd0a5

9 files changed

+1536
-34
lines changed

README.md

+25-14
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,25 @@
11
# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
22
Copyright (C) 2016, Authors.
33

4-
This is a reference implementation of the algorithm described in the
5-
paper, ["**Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions**"
6-
*arXiv:1605.07081 [cs.CV]*](https://arxiv.org/abs/1605.07081). It is
7-
being made available for non-commercial research use only. If you find
8-
this code useful in your research, please consider citing the paper.
4+
This is a reference implementation of the algorithm described in the paper:
95

10-
Contact <[email protected]> with any questions.
6+
Ayan Chakrabarti, Jingyu Shao, and Gregory Shakhnarovich, ["**Depth from
7+
a Single Image by Harmonizing Overcomplete Local Network Predictions**,"
8+
](https://arxiv.org/abs/1605.07081), NIPS 2016.
9+
10+
It is being made available for non-commercial research use only. If you
11+
find this code useful in your research, please consider citing the paper.
12+
13+
Please see the [project page][proj] and contact <[email protected]> with
14+
any questions.
1115

1216
### Requirements
1317

14-
The inference code is in MATLAB and has no external Caffe dependencies.
18+
The top directory contains the inference code. It is entirely in MATLAB
19+
and has no external Caffe dependencies.
1520

1621
1. You can download our trained neural network model weights,
17-
available as a .caffemodel.h5 file [here][model.h5].
22+
available as a .caffemodel.h5 file from the [project page][proj].
1823

1924
2. This implementation requires a modern CUDA-capable high-memory GPU
2025
(it has been tested on an NVIDIA Titan X), and a recent version of
@@ -25,20 +30,26 @@ The inference code is in MATLAB and has no external Caffe dependencies.
2530
versions of MATLAB, this can be done by running `mexcuda
2631
postMAP.cu`. Requires the CUDA toolkit with `nvcc` to be installed.
2732

28-
[model.h5]: http://www.ttic.edu/chakrabarti/mdepth/wts.caffemodel.h5
33+
[proj]: http://www.ttic.edu/chakrabarti/mdepth/
2934

3035
### Usage
3136

32-
First, you will need to load the network weights from the model file as:
37+
First, you will need to load the network weights from the model file
38+
as:
3339

34-
```>>> net = load('/path/to/wts.caffemodel.h5');```
40+
```>>> net = load('/path/to/mdepth.caffemodel.h5');```
3541

36-
Then given a floating-point RGB image `img`, normalized to `[0,1]`, estimate the corresponding depth map as:
42+
Then given a floating-point RGB image `img`, normalized to `[0,1]`,
43+
estimate the corresponding depth map as:
3744

3845
```>>> Z = mdepth(img,net);```
3946

40-
Note that we expect `img` to be of size `561x427`, which corresponds to the axis aligned crops in the NYU dataset where there is a valid depth map projection. You can recover these as: `img = imgOrig(45:471, 41:601, :)`.
47+
Note that we expect `img` to be of size `561x427`, which corresponds
48+
to the axis aligned crops in the NYU dataset where there is a valid
49+
depth map projection. You can recover these as:
50+
`img = imgOrig(45:471, 41:601, :)`.
4151

4252
### Training with Caffe
4353

44-
Training code will be released soon.
54+
See the `training/` directory for code and instructions for training
55+
your own network.

doForward.m

+66-2
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,8 @@
1111
%-- Ayan Chakrabarti <[email protected]>
1212
function act = doForward(img,net)
1313

14+
glob = doVGG(img,net);
15+
1416
img = gpuArray(single(img));
1517
act = img*2-1;
1618

@@ -29,7 +31,7 @@
2931

3032
if i > 1
3133
if size(act,3) < size(l{1},3)
32-
act = cat(3,act,net.glob);
34+
act = cat(3,act,glob); clear glob;
3335
end;
3436
end;
3537

@@ -39,7 +41,65 @@
3941
fprintf('\n');
4042
act = reshape(act,[size(act,1) size(act,2) net.numk net.nbins]);
4143

44+
%%%%%%%%%%%%%%%%%%%%
45+
% Do VGG forward pass
46+
function glob = doVGG(img,net)
47+
48+
img = double(img);
49+
50+
img = img(22:end-22,25:end-25,:);
51+
52+
img = permute(img,[2 1 3]); img = img(:,:,end:-1:1);
53+
img = img*255;
54+
img = bsxfun(@minus,img, ...
55+
reshape([103.939 116.779 123.68],[1 1 3]));
56+
57+
act = gpuArray(single(img));
58+
4259

60+
% Do all the conv layers
61+
idx = 1;
62+
for i = 1:length(net.vconvs)
63+
for j = 1:net.vconvs(i)
64+
fprintf('\r--- Layer %d,%d ',i,j);
65+
l = net.vlayers{idx}; idx = idx+1;
66+
67+
pad = (size(l{1},1)-1)/2;
68+
if pad > 0
69+
act = padarray(act,[pad pad],0,'both');
70+
end;
71+
act = vConv(act,l{1},l{2},1,1);
72+
end;
73+
act0 = max(act(1:2:end,:,:),act(2:2:end,:,:));
74+
act = max(act0(:,1:2:end,:),act0(:,2:2:end,:));
75+
end;
76+
fprintf('\n');
77+
78+
act0 = act(1:2:end,1:2:end,:)+act(1:2:end,2:2:end,:)+...
79+
act(2:2:end,1:2:end,:)+act(2:2:end,2:2:end,:);
80+
act = act0(:)/4;
81+
act = max(0,net.vgg_fc1{1}*act + net.vgg_fc1{2});
82+
83+
act = net.vgg_gfp{1}*act + net.vgg_gfp{2};
84+
85+
bw = net.gsz(1); bh = net.gsz(2);
86+
fac = net.gsz(4); nUnits = net.gsz(3);
87+
88+
act = reshape(act,[bw bh nUnits]);
89+
act = permute(act,[2 1 3]);
90+
91+
cx = (bw-1)*fac+1; cx = (cx-561)/2;
92+
cy = (bh-1)*fac+1; cy = (cy-427)/2;
93+
94+
glob = zeros([427,561,nUnits],'single','gpuArray');
95+
for i = 1:nUnits
96+
us = interp2(act(:,:,i),log2(fac));
97+
glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
98+
end;
99+
100+
101+
%%%%%%%%%%%%%%%%%%%%
102+
% Conv layer forward
43103
function out = vConv(in,wts,bias,dil,relu)
44104

45105
% Define a global variable MAX_SPACE to adjust memory usage.
@@ -52,6 +112,8 @@
52112
[H,W,C] = size(in);
53113
[K1,K2,~,C2] = size(wts);
54114

115+
wts = gpuArray(single(wts)); bias = gpuArray(single(bias));
116+
55117
% Check if its simply a 1x1 conv
56118
if K1 == 1 && K2 == 1
57119
in = reshape(in,[H*W C]);
@@ -62,6 +124,7 @@
62124
if relu == 1
63125
out = max(0,out);
64126
end;
127+
clear wts bias
65128
return
66129
end;
67130

@@ -92,4 +155,5 @@
92155
out = reshape(out,[(H-K1eq+1) (W-K2eq+1) C2]);
93156
if relu == 1
94157
out = max(0,out);
95-
end;
158+
end;
159+
clear wts bias

loadModel.m

+34-18
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,7 @@
1313
% Build struct with all details
1414
net = struct;
1515

16+
% Get filters and bin centers
1617
k = squeeze(h5read(mh5,'/data/derFilt/0'));
1718
net.numk = size(k,3);
1819
k = k(end:-1:1,end:-1:1,:);
@@ -27,6 +28,7 @@
2728
scales = reshape(scales,[1 1 net.numk]);
2829
net.k = bsxfun(@times,k,scales);
2930

31+
% Set up local path
3032
net.layers = {}; rsize = 1;
3133
for i = 1:length(layers)
3234
l = layers{i};
@@ -41,35 +43,49 @@
4143

4244
net.rsize = rsize;
4345

44-
%Global tensor
45-
tmp=h5read(mh5,'/data/gusamp/0');
46-
fac = size(tmp,1); fac = (fac+1) * 4;
46+
% Set up VGG-19 path
4747

48-
b_w = ceil(560/fac)+1;
49-
b_h = ceil(426/fac)+1;
50-
51-
gfip = h5read(mh5,'/data/gfip0/0');
52-
nUnits = prod(size(gfip))/b_w/b_h;
48+
net.vconvs = [2 2 4 4 4];
49+
net.vlayers = {};
50+
for i = 1:length(net.vconvs)
51+
for j = 1:net.vconvs(i)
52+
w = h5read(mh5,sprintf('/data/conv%d_%d/0',i,j));
53+
b = h5read(mh5,sprintf('/data/conv%d_%d/1',i,j));
54+
net.vlayers{end+1} = {w,b};
55+
end;
56+
end;
5357

54-
gfip = reshape(gfip,[b_w b_h nUnits]);
55-
gfip = permute(gfip,[2 1 3]);
58+
w = h5read(mh5,'/data/vgg_fc1/0');
59+
b = h5read(mh5,'/data/vgg_fc1/1');
60+
net.vgg_fc1 = {w',b};
5661

57-
cx = (b_w-1)*fac+1; cx = (cx-561)/2;
58-
cy = (b_h-1)*fac+1; cy = (cy-427)/2;
62+
w = h5read(mh5,'/data/vgg_fc2/0');
63+
b = h5read(mh5,'/data/vgg_fc2/1');
64+
net.vgg_gfp = {w',b};
5965

60-
net.glob = zeros([427,561,nUnits],'single');
61-
for i = 1:nUnits
62-
us = interp2(gfip(:,:,i),log2(fac));
63-
net.glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
64-
end;
66+
fac = 32;
67+
bw = ceil(560/fac)+1; bh = ceil(426/fac)+1;
68+
nUnits = length(b)/bw/bh;
69+
net.gsz = [bw bh nUnits fac];
6570

6671
% Move everything to gpu
72+
if 1 > 2
6773
for i = 1:length(net.layers)
6874
net.layers{i}{1} = gpuArray(single(net.layers{i}{1}));
6975
net.layers{i}{2} = gpuArray(single(net.layers{i}{2}));
7076
end;
71-
net.glob = gpuArray(single(net.glob));
7277

78+
for i = 1:length(net.vlayers)
79+
net.vlayers{i}{1} = gpuArray(single(net.vlayers{i}{1}));
80+
net.vlayers{i}{2} = gpuArray(single(net.vlayers{i}{2}));
81+
end;
82+
83+
net.vgg_fc1{1} = gpuArray(single(net.vgg_fc1{1}));
84+
net.vgg_fc1{2} = gpuArray(single(net.vgg_fc1{2}));
85+
86+
net.vgg_gfp{1} = gpuArray(single(net.vgg_gfp{1}));
87+
net.vgg_gfp{2} = gpuArray(single(net.vgg_gfp{2}));
88+
end;
7389
%%%% Precompute things for consensus
7490

7591
%%% Choose regularizer

training/README.md

+87
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,87 @@
1+
# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
2+
Copyright (C) 2016, Authors.
3+
4+
This directory contains code and instructions for training the local prediction
5+
network using the [Caffe](https://github.com/BVLC/caffe) framework.
6+
7+
The primary network definition is in the file `train.prototxt` in this directory.
8+
In addition to the prediction network, we also use existing Caffe layers to
9+
compute depth derivatives, and generate classification targets for each depth
10+
map on the fly.
11+
12+
## Custom Layers
13+
14+
Our network employs two custom layers, included in the `layers/` sub-directory.
15+
16+
1. The first is simply a python data layer in `layers/NYUdata.py`, and handles
17+
loading training data from the NYUv2 dataset (details on how to prepare the
18+
data are in the next section). Make sure you compile Caffe with python layers
19+
enabled, and place the above file in the current directory or somewhere
20+
in your `PYTHONPATH`.
21+
22+
2. The second layer is the SoftMax + KL-Divergence loss layer. You will need to
23+
compile this into Caffe. Copy the header file `softmax_kld_loss_layer.hpp`
24+
into the `include/caffe/layers/` directory of your caffe distribution, and
25+
`softmax_kld_loss_layer.c*` files into the `src/caffe/layers/` directory.
26+
Then run `make` to compile / update caffe.
27+
28+
## Preparing NYUv2 Data
29+
30+
Download the RAW distribution and toolbox from the [NYUv2 depth dataset
31+
page](http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html). Read
32+
the documentation to figure out how to process the RAW data to
33+
create aligned RGB-depth image pairs, and to *fill-in* missing depth
34+
values. Also, make sure you only use scenes corresponding to the training
35+
set in the official train-test split.
36+
37+
For each scene, generate a pair of PNG files to store the RGB and depth data
38+
respectively. These should be named with a common base name and different
39+
suffixes: '_i.png`, for the 8-bit 3 channel PNG corresponding to the
40+
RGB image, and `_f.png` for a 16-bit 1 channel PNG image corresponding
41+
to depth---the depth png should be scaled so that the max UINT16 value
42+
(2^16-1) corresponds to a depth of 10 meters.
43+
44+
All images should be of size 561x427, corresponding to the valid projection
45+
area (you can use the `crop_image` function in the NYU toolbox). If you
46+
decide to train on a different dataset, you might need to edit the data layer
47+
and the network architecture to work with different resolution images.
48+
49+
Place all pairs you want to use in the same directory, and prior to calling
50+
affe, set the environment variable `NYU_DATA_DIR` to its path, e.g. as
51+
`export NYU_DATA_DIR=/pathto/nyu_data_dir`. Then, create a text file called
52+
`train.txt` (and place it in the same directory from which you are calling caffe).
53+
Each line in this file should correspond to the common prefix for each scene. So,
54+
if you have a line with `scene1_frame005`, then the data layer will read the
55+
files:
56+
57+
```
58+
/pathto/nyu_data_dir/scene1_frame005_i.png
59+
/pathto/nyu_data_dir/scene1_frame005_f.png
60+
```
61+
62+
for the image and depth data respectively.
63+
64+
65+
## Training
66+
67+
Use the provided `train.prototxt` file for the network definition, and create a
68+
solver prototxt file based on the description in the paper (momentum of 0.9, no
69+
weight decay, and learning rate schedule described in the paper).
70+
71+
When you begin training, you should provide as an option to caffe:
72+
73+
```
74+
-weights filters_init.caffemodel.h5,/path/to/vgg19.caffemodel
75+
```
76+
77+
where `vgg19.caffemodel` is the pre-trained VGG-19 model from the caffe model
78+
zoo. `filters_init.caffemodel.h5` is provided in this directory, and initializes
79+
the weights of various layers in `train.prototxt` that compute depth-derivatives,
80+
mixture weights with respect to various bins, perform bilinear up-sampling
81+
of the scene features, etc. These layers have a learning rate factor of 0, and
82+
will not change through training. However, they will be saved with model
83+
snapshots, so you will need to provide the above option only the first time you
84+
start training.
85+
86+
Please see the paper for more details, and contact <[email protected]> if you
87+
still have any questions.

0 commit comments

Comments
 (0)