Speech2Face: Learning the Face Behind a Voice

Category
International Conference
Journal/Conference
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
Author
Tae-Hyun Oh*, Tali Dekel*, Changil Kim*, Inbar Mosseri, William T. Freeman, Michael Rubinstein, Wojciech Matusik
Year
2019
Award
Empty
tags
CVPR
2019
Published
생성일자
3/14/2021, 1:48:00 PM
This has been covered by lots of media. You can google it.
Speech2Face synthesizes someone’s face image from hearing their speech. We train it with 2 millions of video clips with near 100,000 different people's faces.
The work is an effort to better understand the capabilities of machine perception, i.e., the speech-face association.When we hear a voice on the radio or the phone call, we, human, often build a mental model to imagine how the person looks. Our work can be considered as a replication of a human mental model by machine. For the Speech2Face task, we rarely understood how strongly we human can parse and whether it is indeed correct or just noisy bias. The reconstructed face by Speech2Face could be used as a proxy to study these.
We can imagine a range of applications, including for privacy-minded people who want to share real photos of themselves off the internet or video calls.