An implemention of the following papers:
- Speech2Face: Learning the Face Behind a Voice (Tae-Hyun Oh, Tali Dekel, Changil Kim, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik) CVPR 2019
- Synthesizing Normalized Faces from Facial Identity Features (Forrester Cole, David Belanger, Dilip Krishnan, Aaron Sarna, Inbar Mosseri, William T. Freeman) CVPR 2017
The repository includes the following code:
- Scripts for data preprocessing for the facial decoder and the voice encoder models
- PyTorch models for Facial Encoder (VGG-face recognition), Facial Decoder and Voice Encoder
- Flask Server to deploy all these models
- Links to datasets for Facial Decoder and Voice Encoder
- Python Notebooks for training the Facial Deocoder and Voice Encoder models
References:
Face Morphing Library: https://github.com/alyssaq/face_morpher
Data pre-processing for Voice Encoder: https://github.com/saiteja-talluri/Speech2Face
speech recognition based on facial images
The project consists of 2 major models:
- Sound to FaceVector: converts soundwave into a facial recognition vector
- FaceVector to Image: converts the above mentioned vector to an image
Current implementation consists of FaceVector to Image model
INSTRUCTIONS:
- Upload notebook onto Google Drive
- For VGG-16 backend, make sure you get at least 10GB of CUDA memory
- For Facenet backend, any graphics card on Colab will suffice
- Connect to Google Drive
TEST INSTRUCTIONS:
- Run the cells containing imports, model classes and model loading
- Upload test images
- Run the cell for testing
TRAIN INSTRUCTIONS:
- Download the required batches from Google Drive
- Specify required learning rate and interations
- Load pre-saved model
- Select Run All from google colab