Skip to content

Commit 939d0ba

Browse files
committed
Add DDFD paper
1 parent 560013d commit 939d0ba

4 files changed

+35
-0
lines changed

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,9 @@
22

33
This repository contains short summaries of some machine learning papers.
44

5+
* Added 2017/08/08:
6+
* [Multi-view Face Detection Using Deep Convolutional Neural Networks](neural-nets/Multi-view_Face_Detection_Using_Deep_Convolutional_Neural_Networks.md) (aka DDFD)
7+
58
* Added 2017/06/11:
69
* [On the Effects of Batch and Weight Normalization in Generative Adversarial Networks](neural-nets/On_The_Effects_of_BN_and_WN_in_GANs.md)
710
* [BEGAN](neural-nets/BEGAN.md)
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
# Paper
2+
3+
* __Title__: Multi-view Face Detection Using Deep Convolutional Neural Networks
4+
* __Authors__: Sachin Sudhakar Farfade, Mohammad Saberian, Li-Jia Li
5+
* __Link__: [https://arxiv.org/abs/1502.02766](https://arxiv.org/abs/1502.02766)
6+
* __Tags__: Deep Learning, CNN, Face, Detection, DDFD
7+
* __Year__: 2015
8+
9+
# Summary
10+
11+
* What
12+
* They propose a CNN-based approach to detect faces in a wide range of orientations using a single model. However, since the training set is skewed, the network is more confident about up-right faces.
13+
* The model does not require additional components such as segmentation, bounding-box regression, segmentation, or SVM classifiers
14+
15+
* How
16+
* __Data augmentation__: to increase the number of positive samples (24K face annotations), the authors used randomly sampled sub-windows of the images with IOU > 50% and also randomly flipped these images. In total, there were 20K positive and 20M negative training samples.
17+
* __CNN Architecture__: 5 convolutional layers followed by 3 fully-connected. The fully-connected layers were converted to convolutional layers. Non-Maximal Suppression is applied to merge predicted bounding boxes.
18+
* __Training__: the CNN was trained using Caffe Library in the AFLW dataset with the following parameters:
19+
* Fine-tuning with AlexNet model
20+
* Input image size = 227x227
21+
* Batch size = 128 (32+, 96-)
22+
* Stride = 32
23+
* __Test__: the model was evaluated on PASCAL FACE, AFW, and FDDB dataset.
24+
* __Running time__: since the fully-connected layers were converted to convolutional layers, the input image in running time may be of any size, obtaining a heat map as output. To detect faces of different sizes though, the image is scaled up/down and new heatmaps are obtained. The authors found that rescaling image 3 times per octave gives reasonable good performance.
25+
![DDFD heatmap](images/DDFD__heatmap.png?raw=true "DDFD heatmap")
26+
* The authors realized that the model is more confident about up-right faces than rotated/occluded ones. This trend is because the lack of good training examples to represent such faces in the training process. Better results can be achieved by using better sampling strategies and more sophisticated data augmentation techniques.
27+
![DDFD example](images/DDFD__example.png?raw=true "DDFD example")
28+
* The authors tested different strategies for NMS and the effect of bounding-box regression for improving face detection. They NMS-avg had better performance compared to NMS-max in terms of average precision. On the other hand, adding a bounding-box regressor degraded the performance for both NMS strategies due to the mismatch between annotations of the training set and the test set. This mismatch is mostly for side-view faces.
29+
30+
* Results:
31+
* In comparison to R-CNN, the proposed face detector had significantly better performance independent of the NMS strategy. The authors believe the inferior performance of R-CNN due to the loss of recall since selective search may miss some of the face regions; and loss in localization since bounding-box regression is not perfect and may not be able to fully align the segmentation bounding-boxes, provided by selective search, with the ground truth.
32+
* In comparison to other state-of-art methods like structural model, TSM and cascade-based methods the DDFD achieve similar or better results. However, this comparison is not completely fair since the most of methods use extra information of pose annotation or information about facial landmarks during the training.

neural-nets/images/DDFD__example.png

346 KB
Loading

neural-nets/images/DDFD__heatmap.png

531 KB
Loading

0 commit comments

Comments
 (0)