In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy
will defend his dissertation
Real-time Facial Performance Capture and Manipulation
Acquisition and editing of facial performance is an essential and challenging task in computer graphics, with broad applications in films, cartoons, VR systems, and electronic games. The creation of high-resolution, realistic facial animations often involves controlled lighting setups, multiple cameras, active markers, depth sensors, and substantial post-editing from experienced artists. This dissertation focuses on the capture and manipulation of facial performance from regular RGB video. First, we propose a novel method to reconstruct high-resolution facial geometry and appearance in real-time by capturing an individual-specific face model with fine-scale details, based on monocular RGB video input. Specifically, after reconstructing the coarse facial model from the input video, we subsequently refine it using shape-from-shading techniques, where illumination, albedo texture, and displacements are recovered by minimizing the difference between the synthesized face and the input RGB video. To recover wrinkle level details, we build a hierarchical face pyramid through adaptive subdivisions and progressive refinements of the mesh from a coarse level to a fine level. Our approach can produce results close to off-line methods and better than previous real-time methods. On top of the reconstruction method, we propose two manipulation approaches upon facial expressions and facial appearance, namely facial expression transformation and face swapping. In facial expression transformation, we directly generate desired and photo-realistic facial expressions on top of input monocular RGB video without the need of any driving source actor. We developed an unpaired learning framework to learn the mapping between any two facial expressions in the facial blendshape space. Our method automatically transforms the source expression in an input video clip to a specified target expression through the combination of the 3D face reconstruction, the learned bi-directional expression mapping, and automatic lip correction. It can be applied to new users with different identities, ages, speeches, and expressions, and without additional training. In face swapping, we present a high-fidelity method to replace the face in a target video clip by the face from a single source portrait image. We first run our face reconstruction method on both the source image and the target video. Then, the albedo of the source face is modified by a novel harmonization method to match the target face. The face geometry is predicted as the source identity performing the target expression with person-specific wrinkle style. Finally, the source face is re-rendered and blended into the target video using the lighting and camera parameters from the target video. Our method runs fully automatically and at a real-time rate on any target face captured by cameras or from legacy videos. More importantly, unlike existing deep-learning-based methods, our method does not need to pre-train any models, i.e., pre-collecting a large image/video dataset of the source or target face for model training is not required.
Date: Tuesday, April 2, 2019
Time: 2:30 - 4:30 PM
Place: Online Presentation - MS Teams Meeting
https://teams.microsoft.com/l/team/19%3afb9d5754f8484440abed99b071905485%40thread.tacv2/conversations?groupId=066e3443-0db2-4ae8-8a2c-4939ba9fd00c&tenantId=170bbabd-a2f0-4c90-ad4b-0e8f0f0c4259, Access code: ml36l0i
Advisor: Dr. Zhigang Deng
Faculty, students, and the general public are invited.