ayanc
diff --git a/‎README.md
+25-14 b/‎README.md
+25-14
diff --git a/‎doForward.m
+66-2 b/‎doForward.m
+66-2
diff --git a/‎loadModel.m
+34-18 b/‎loadModel.m
+34-18
diff --git a/‎training/README.md
+87 b/‎training/README.md
+87
@@ -1,20 +1,25 @@
 # Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
 Copyright (C) 2016, Authors.
 
-This is a reference implementation of the algorithm described in the
-paper, ["**Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions**"
-*arXiv:1605.07081 [cs.CV]*](https://arxiv.org/abs/1605.07081). It is
-being made available for non-commercial research use only. If you find
-this code useful in your research, please consider citing the paper.
+This is a reference implementation of the algorithm described in the paper:
 
-Contact <[email protected]> with any questions.
+Ayan Chakrabarti, Jingyu Shao, and Gregory Shakhnarovich, ["**Depth from 
+a Single Image by Harmonizing Overcomplete Local Network Predictions**," 
+](https://arxiv.org/abs/1605.07081), NIPS 2016.
+
+It is being made available for non-commercial research use only. If you
+find this code useful in your research, please consider citing the paper.
+
+Please see the [project page][proj] and contact <[email protected]> with 
+any questions.
 
 ### Requirements
 
-The inference code is in MATLAB and has no external Caffe dependencies.
+The top directory contains the inference code. It is entirely in MATLAB 
+and has no external Caffe dependencies.
 
 1. You can download our trained neural network model weights,
-   available as a .caffemodel.h5 file [here][model.h5].
+   available as a .caffemodel.h5 file from the [project page][proj].
 
 2. This implementation requires a modern CUDA-capable high-memory GPU
    (it has been tested on an NVIDIA Titan X), and a recent version of
@@ -25,20 +30,26 @@ The inference code is in MATLAB and has no external Caffe dependencies.
    versions of MATLAB, this can be done by running `mexcuda
    postMAP.cu`. Requires the CUDA toolkit with `nvcc` to be installed.
 
-[model.h5]: http://www.ttic.edu/chakrabarti/mdepth/wts.caffemodel.h5
+[proj]: http://www.ttic.edu/chakrabarti/mdepth/
 
 ### Usage
 
-First, you will need to load the network weights from the model file as:
+First, you will need to load the network weights from the model file
+as:
 
-```>>> net = load('/path/to/wts.caffemodel.h5');```
+```>>> net = load('/path/to/mdepth.caffemodel.h5');```
 
-Then given a floating-point RGB image `img`, normalized to `[0,1]`, estimate the corresponding depth map as:
+Then given a floating-point RGB image `img`, normalized to `[0,1]`,
+estimate the corresponding depth map as:
 
 ```>>> Z = mdepth(img,net);```
 
-Note that we expect `img` to be of size `561x427`, which corresponds to the axis aligned crops in the NYU dataset where there is a valid depth map projection. You can recover these as: `img = imgOrig(45:471, 41:601, :)`.
+Note that we expect `img` to be of size `561x427`, which corresponds
+to the axis aligned crops in the NYU dataset where there is a valid
+depth map projection. You can recover these as:
+`img = imgOrig(45:471, 41:601, :)`.
 
 ### Training with Caffe
 
-Training code will be released soon.
+See the `training/` directory for code and instructions for training
+your own network.
@@ -11,6 +11,8 @@
 %-- Ayan Chakrabarti <[email protected]>
 function act = doForward(img,net)
 
+glob = doVGG(img,net);
+
 img = gpuArray(single(img));
 act = img*2-1;
 
@@ -29,7 +31,7 @@
 
   if i > 1
     if size(act,3) < size(l{1},3)
-      act = cat(3,act,net.glob);
+      act = cat(3,act,glob); clear glob;
     end;
   end;
 
@@ -39,7 +41,65 @@
 fprintf('\n');
 act = reshape(act,[size(act,1) size(act,2) net.numk net.nbins]);
 
+%%%%%%%%%%%%%%%%%%%%
+% Do VGG forward pass
+function glob = doVGG(img,net)
+
+img = double(img);
+
+img = img(22:end-22,25:end-25,:);
+
+img = permute(img,[2 1 3]); img = img(:,:,end:-1:1);
+img = img*255;
+img = bsxfun(@minus,img, ...
+	     reshape([103.939 116.779 123.68],[1 1 3]));
+
+act = gpuArray(single(img));
+
 
+% Do all the conv layers
+idx = 1;
+for i = 1:length(net.vconvs)
+  for j = 1:net.vconvs(i)
+    fprintf('\r--- Layer %d,%d         ',i,j);
+    l = net.vlayers{idx}; idx = idx+1;
+  
+    pad = (size(l{1},1)-1)/2;
+    if pad > 0
+      act = padarray(act,[pad pad],0,'both');
+    end;
+    act = vConv(act,l{1},l{2},1,1);
+  end;
+  act0 = max(act(1:2:end,:,:),act(2:2:end,:,:));
+  act = max(act0(:,1:2:end,:),act0(:,2:2:end,:));
+end;
+fprintf('\n');
+
+act0 = act(1:2:end,1:2:end,:)+act(1:2:end,2:2:end,:)+...
+       act(2:2:end,1:2:end,:)+act(2:2:end,2:2:end,:);
+act = act0(:)/4;
+act = max(0,net.vgg_fc1{1}*act + net.vgg_fc1{2});
+
+act = net.vgg_gfp{1}*act + net.vgg_gfp{2};
+
+bw = net.gsz(1); bh = net.gsz(2); 
+fac = net.gsz(4); nUnits = net.gsz(3);
+
+act = reshape(act,[bw bh nUnits]);
+act = permute(act,[2 1 3]);
+
+cx = (bw-1)*fac+1; cx = (cx-561)/2;
+cy = (bh-1)*fac+1; cy = (cy-427)/2;
+
+glob = zeros([427,561,nUnits],'single','gpuArray');
+for i = 1:nUnits
+  us = interp2(act(:,:,i),log2(fac));
+  glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
+end;
+
+
+%%%%%%%%%%%%%%%%%%%%
+% Conv layer forward
 function out = vConv(in,wts,bias,dil,relu)
 
 % Define a global variable MAX_SPACE to adjust memory usage.
@@ -52,6 +112,8 @@
 [H,W,C] = size(in);
 [K1,K2,~,C2] = size(wts);
 
+wts = gpuArray(single(wts)); bias = gpuArray(single(bias));
+
 % Check if its simply a 1x1 conv
 if K1 == 1 && K2 == 1
   in = reshape(in,[H*W C]); 
@@ -62,6 +124,7 @@
   if relu == 1
     out = max(0,out);
   end;
+  clear wts bias
   return
 end;
 
@@ -92,4 +155,5 @@
 out = reshape(out,[(H-K1eq+1) (W-K2eq+1) C2]);
 if relu == 1
     out = max(0,out);
-end;
+end;
+clear wts bias
@@ -13,6 +13,7 @@
 % Build struct with all details
 net = struct;
 
+% Get filters and bin centers
 k = squeeze(h5read(mh5,'/data/derFilt/0'));
 net.numk = size(k,3);
 k = k(end:-1:1,end:-1:1,:);
@@ -27,6 +28,7 @@
 scales = reshape(scales,[1 1 net.numk]);
 net.k = bsxfun(@times,k,scales);
 
+% Set up local path
 net.layers = {}; rsize = 1;
 for i = 1:length(layers)
   l = layers{i};
@@ -41,35 +43,49 @@
 
 net.rsize = rsize;
 
-%Global tensor
-tmp=h5read(mh5,'/data/gusamp/0');
-fac = size(tmp,1); fac = (fac+1) * 4;
+% Set up VGG-19 path
 
-b_w = ceil(560/fac)+1;
-b_h = ceil(426/fac)+1;
-
-gfip = h5read(mh5,'/data/gfip0/0');
-nUnits = prod(size(gfip))/b_w/b_h;
+net.vconvs = [2 2 4 4 4];
+net.vlayers = {};
+for i = 1:length(net.vconvs)
+  for j = 1:net.vconvs(i)
+    w = h5read(mh5,sprintf('/data/conv%d_%d/0',i,j));
+    b = h5read(mh5,sprintf('/data/conv%d_%d/1',i,j));
+    net.vlayers{end+1} = {w,b};
+  end;
+end;
 
-gfip = reshape(gfip,[b_w b_h nUnits]);
-gfip = permute(gfip,[2 1 3]);
+w = h5read(mh5,'/data/vgg_fc1/0');
+b = h5read(mh5,'/data/vgg_fc1/1');
+net.vgg_fc1 = {w',b};
 
-cx = (b_w-1)*fac+1; cx = (cx-561)/2;
-cy = (b_h-1)*fac+1; cy = (cy-427)/2;
+w = h5read(mh5,'/data/vgg_fc2/0');
+b = h5read(mh5,'/data/vgg_fc2/1');
+net.vgg_gfp = {w',b};
 
-net.glob = zeros([427,561,nUnits],'single');
-for i = 1:nUnits
-  us = interp2(gfip(:,:,i),log2(fac));
-  net.glob(:,:,i) = us(1+cy:end-cy,1+cx:end-cx);
-end;
+fac = 32;
+bw = ceil(560/fac)+1; bh = ceil(426/fac)+1;
+nUnits = length(b)/bw/bh;
+net.gsz = [bw bh nUnits fac];
 
 % Move everything to gpu
+if 1 > 2
 for i = 1:length(net.layers)
   net.layers{i}{1} = gpuArray(single(net.layers{i}{1}));
   net.layers{i}{2} = gpuArray(single(net.layers{i}{2}));
 end;
-net.glob = gpuArray(single(net.glob));
 
+for i = 1:length(net.vlayers)
+  net.vlayers{i}{1} = gpuArray(single(net.vlayers{i}{1}));
+  net.vlayers{i}{2} = gpuArray(single(net.vlayers{i}{2}));
+end;
+  
+net.vgg_fc1{1} = gpuArray(single(net.vgg_fc1{1}));
+net.vgg_fc1{2} = gpuArray(single(net.vgg_fc1{2}));
+
+net.vgg_gfp{1} = gpuArray(single(net.vgg_gfp{1}));
+net.vgg_gfp{2} = gpuArray(single(net.vgg_gfp{2}));
+end;
 %%%% Precompute things for consensus
 
 %%% Choose regularizer
 
@@ -0,0 +1,87 @@
+# Depth from a Single Image by Harmonizing Overcomplete Local Network Predictions
+Copyright (C) 2016, Authors.
+
+This directory contains code and instructions for training the local prediction
+network using the [Caffe](https://github.com/BVLC/caffe) framework.
+
+The primary network definition is in the file `train.prototxt` in this directory.
+In addition to the prediction network, we also use existing Caffe layers to
+compute depth derivatives, and generate classification targets for each depth
+map on the fly.
+
+## Custom Layers
+
+Our network employs two custom layers, included in the `layers/` sub-directory.
+
+1. The first is simply a python data layer in `layers/NYUdata.py`, and handles
+   loading training data from the NYUv2 dataset (details on how to prepare the
+   data are in the next section). Make sure you compile Caffe with python layers
+   enabled, and place the above file in the current directory or somewhere
+   in your `PYTHONPATH`.
+
+2. The second layer is the SoftMax + KL-Divergence loss layer. You will need to
+   compile this into Caffe. Copy the header file `softmax_kld_loss_layer.hpp`
+   into the `include/caffe/layers/` directory of your caffe distribution, and
+   `softmax_kld_loss_layer.c*` files into the `src/caffe/layers/` directory.
+   Then run `make` to compile / update caffe.
+
+## Preparing NYUv2 Data
+
+Download the RAW distribution and toolbox from the [NYUv2 depth dataset
+page](http://cs.nyu.edu/~silberman/datasets/nyu_depth_v2.html). Read
+the documentation to figure out how to process the RAW data to
+create aligned RGB-depth image pairs, and to *fill-in* missing depth
+values. Also, make sure you only use scenes corresponding to the training
+set in the official train-test split.
+
+For each scene, generate a pair of PNG files to store the RGB and depth data
+respectively. These should be named with a common base name and different
+suffixes: '_i.png`, for the 8-bit 3 channel PNG corresponding to the
+RGB image, and `_f.png` for a 16-bit 1 channel PNG image corresponding
+to depth---the depth png should be scaled so that the max UINT16 value
+(2^16-1) corresponds to a depth of 10 meters.
+
+All images should be of size 561x427, corresponding to the valid projection
+area (you can use the `crop_image` function in the NYU toolbox). If you
+decide to train on a different dataset, you might need to edit the data layer
+and the network architecture to work with different resolution images.
+
+Place all pairs you want to use in the same directory, and prior to calling
+affe, set the environment variable `NYU_DATA_DIR` to its path, e.g. as
+`export NYU_DATA_DIR=/pathto/nyu_data_dir`. Then, create a text file called
+`train.txt` (and place it in the same directory from which you are calling caffe).
+Each line in this file should correspond to the common prefix for each scene. So,
+if you have a line with `scene1_frame005`, then the data layer will read the
+files:
+
+```
+/pathto/nyu_data_dir/scene1_frame005_i.png
+/pathto/nyu_data_dir/scene1_frame005_f.png
+```
+
+for the image and depth data respectively.
+   
+
+## Training
+
+Use the provided `train.prototxt` file for the network definition, and create a
+solver prototxt file based on the description in the paper (momentum of 0.9, no
+weight decay, and learning rate schedule described in the paper).
+
+When you begin training, you should provide as an option to caffe:
+
+```
+-weights filters_init.caffemodel.h5,/path/to/vgg19.caffemodel
+```
+
+where `vgg19.caffemodel` is the pre-trained VGG-19 model from the caffe model
+zoo. `filters_init.caffemodel.h5` is provided in this directory, and initializes
+the weights of various layers in `train.prototxt` that compute depth-derivatives,
+mixture weights with respect to various bins, perform bilinear up-sampling
+of the scene features, etc. These layers have a learning rate factor of 0, and
+will not change through training. However, they will be saved with model
+snapshots, so you will need to provide the above option only the first time you
+start training.
+
+Please see the paper for more details, and contact <[email protected]> if you
+still have any questions.