|Abstract: ||3D hand pose regression is a fundamental component in many modern human computer interaction applications such as sign language recognition, virtual object manipulation, game control, etc. This thesis focuses on the scope of 3D pose regression with a single hand from depth data. The problem has many challenges including high degrees of freedom, severe viewpoint changes, self-occlusion and sensor noise.
The main contributions of this work are to propose a series of decision forest-based methods in a progressive manner, which improves upon the previous and achieves state-of-the-art performance is achieved in the end. The thesis first introduces a novel algorithm called semi-supervised transductive regression forest, which combines transductive learning and semi-supervised learning to bridge the gap between synthetically generated, noise-free training data and real noisy data. Moreover, it incorporates a coarse-to-fine training quality function to handle viewpoint changes in a more efficient manner. As a patch-based method, STR forest has high complexity during inference. To handle that, this thesis proposes latent regression forest, a method that models the pose estimation problem as a coarse-to-fine search. This inherently combines the efficiency of a holistic method and the flexibility of a patch-based method, and thus results in 62.5 FPS without CPU/GPU optimisation. Targeting the drawbacks of LRF, a new algorithm called hierarchical sampling forests is proposed to model this problem as a progressive search, guided by kinematic structure. Hence the intermediate results (partial poses) can be verified by a new efficient energy function. Consequently it can produce more accurate full poses. All these methods are thoroughly described, compared and published. In the conclusion part we discuss and analyse their differences, limitations and usage scenarios, and then propose a few ideas for future work.|