Pushing the envelope for estimating poses and actions via full 3D reconstruction
Author(s)
Baek, Seungryul
Type
Thesis or dissertation
Abstract
Estimating poses and actions of human bodies and hands is an important task in the computer vision community due to its vast applications, including human
computer interaction, virtual reality and augmented reality, medical image analysis. Challenges: There are many in-the-wild challenges in this task (see chapter 1). Among them, in this thesis, we focused on two challenges which could be relieved by incorporating the 3D geometry: (1) inherent 2D-to-3D ambiguity driven by the non-linear 2D projection process when capturing 3D objects. (2) lack of sufficient and quality annotated datasets due to the high-dimensionality of subjects' attribute space and inherent difficulty in annotating 3D coordinate values. Contributions: We first tried to jointly tackle the 2D-to-3D ambiguity and insufficient data issues by (1) explicitly reconstructing 2.5D and 3D samples and use them as new training data to train a pose estimator. Next, we tried to (2) encode 3D geometry in the training process of the action recognizer to reduce the 2D-to-3D ambiguity. In appendix, we proposed a (3) new hand pose synthetic dataset that can be used for more complete attribute changes and multi-modal experiments in the future. Experiments: Throughout experiments, we found interesting facts: (1) 2.5D depth map reconstruction and data augmentation can improve the accuracy of the depth-based hand pose estimation algorithm, (2) 3D mesh reconstruction can be used to generate a new RGB data and it improves the accuracy of RGB-based dense hand pose estimation algorithm, (3) 3D geometry from 3D poses and scene layouts could be successfully utilized to reduce the 2D-to-3D ambiguity in the action recognition problem.
computer interaction, virtual reality and augmented reality, medical image analysis. Challenges: There are many in-the-wild challenges in this task (see chapter 1). Among them, in this thesis, we focused on two challenges which could be relieved by incorporating the 3D geometry: (1) inherent 2D-to-3D ambiguity driven by the non-linear 2D projection process when capturing 3D objects. (2) lack of sufficient and quality annotated datasets due to the high-dimensionality of subjects' attribute space and inherent difficulty in annotating 3D coordinate values. Contributions: We first tried to jointly tackle the 2D-to-3D ambiguity and insufficient data issues by (1) explicitly reconstructing 2.5D and 3D samples and use them as new training data to train a pose estimator. Next, we tried to (2) encode 3D geometry in the training process of the action recognizer to reduce the 2D-to-3D ambiguity. In appendix, we proposed a (3) new hand pose synthetic dataset that can be used for more complete attribute changes and multi-modal experiments in the future. Experiments: Throughout experiments, we found interesting facts: (1) 2.5D depth map reconstruction and data augmentation can improve the accuracy of the depth-based hand pose estimation algorithm, (2) 3D mesh reconstruction can be used to generate a new RGB data and it improves the accuracy of RGB-based dense hand pose estimation algorithm, (3) 3D geometry from 3D poses and scene layouts could be successfully utilized to reduce the 2D-to-3D ambiguity in the action recognition problem.
Version
Open Access
Date Issued
2019-09
Date Awarded
2020-03
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Kim, Tae-Kyun
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)