Learning to generate customized dynamic 3D facial expressions
File(s)2007.09805v2.pdf (10.74 MB)
Accepted version
Author(s)
Potamias, Rolandos Alexandros
Zheng, Jiali
Ploumpis, Stylianos
Bouritsas, Giorgos
Ververas, Evangelos
Type
Working Paper
Abstract
Recent advances in deep learning have significantly pushed the
state-of-the-art in photorealistic video animation given a single image. In
this paper, we extrapolate those advances to the 3D domain, by studying 3D
image-to-video translation with a particular focus on 4D facial expressions.
Although 3D facial generative models have been widely explored during the past
years, 4D animation remains relatively unexplored. To this end, in this study
we employ a deep mesh encoder-decoder like architecture to synthesize realistic
high resolution facial expressions by using a single neutral frame along with
an expression identification. In addition, processing 3D meshes remains a
non-trivial task compared to data that live on grid-like structures, such as
images. Given the recent progress in mesh processing with graph convolutions,
we make use of a recently introduced learnable operator which acts directly on
the mesh structure by taking advantage of local vertex orderings. In order to
generalize to 4D facial expressions across subjects, we trained our model using
a high resolution dataset with 4D scans of six facial expressions from 180
subjects. Experimental results demonstrate that our approach preserves the
subject's identity information even for unseen subjects and generates high
quality expressions. To the best of our knowledge, this is the first study
tackling the problem of 4D facial expression synthesis.
state-of-the-art in photorealistic video animation given a single image. In
this paper, we extrapolate those advances to the 3D domain, by studying 3D
image-to-video translation with a particular focus on 4D facial expressions.
Although 3D facial generative models have been widely explored during the past
years, 4D animation remains relatively unexplored. To this end, in this study
we employ a deep mesh encoder-decoder like architecture to synthesize realistic
high resolution facial expressions by using a single neutral frame along with
an expression identification. In addition, processing 3D meshes remains a
non-trivial task compared to data that live on grid-like structures, such as
images. Given the recent progress in mesh processing with graph convolutions,
we make use of a recently introduced learnable operator which acts directly on
the mesh structure by taking advantage of local vertex orderings. In order to
generalize to 4D facial expressions across subjects, we trained our model using
a high resolution dataset with 4D scans of six facial expressions from 180
subjects. Experimental results demonstrate that our approach preserves the
subject's identity information even for unseen subjects and generates high
quality expressions. To the best of our knowledge, this is the first study
tackling the problem of 4D facial expression synthesis.
Date Issued
2020-07-19
Citation
2020
Publisher
arXiv
Copyright Statement
© 2020 The Author(s)
Identifier
http://arxiv.org/abs/2007.09805v2
Subjects
cs.CV
cs.CV
Notes
accepted at European Conference on Computer Vision 2020 (ECCV)
Publication Status
Published