HyperSDFusion: Bridging hierarchical structures in language and geometry for enhanced 3D Text2Shape generation
Author(s)
Leng, Zhiying
Birdal, Tolga
Liang, Xiaohui
Tombari, Federico
Type
Conference Paper
Abstract
3D shape generation from text is a fundamental task in 3D representation learning. The text-shape pairs exhibit a hierarchical structure, where a general text like “chair” covers all 3D shapes of the chair, while more detailed prompts refer to more specific shapes. Furthermore, both text and 3D shapes are inherently hierarchical structures. However, existing Text2Shape methods, such as SDFusion, do not exploit that. In this work, we propose HyperSD-Fusion, a dual-branch diffusion model that generates 3D shapes from a given text. Since hyperbolic space is suitable for handling hierarchical data, we propose to learn the hierarchical representations of text and 3D shapes in hyperbolic space. First, we introduce a hyperbolic text-image encoder to learn the sequential and multi-modal hierarchical features of text in hyperbolic space. In addition, we design a hyperbolic text-graph convolution module to learn the hierarchical features of text in hyperbolic space. In order to fully utilize these text features, we introduce a dual-branch structure to embed text features in 3D feature space. At last, to endow the generated 3D shapes with a hierarchical structure, we devise a hyperbolic hierarchical loss. Our method is the first to explore the hyperbolic hierarchical representation for text-to-shape generation. Experimental results on the existing text-to-shape paired dataset, Text2Shape, achieved state-of-the-art results. We release our implementation under HyperSDFusion.github.io.
Date Issued
2024-09-16
Date Acceptance
2024-06-01
Citation
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp.19691-19700
Publisher
IEEE
Start Page
19691
End Page
19700
Journal / Book Title
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Copyright Statement
©2024 The Author(s). This CVPR paper is the Open Access version, provided by the Computer Vision Foundation.
Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.
Except for this watermark, it is identical to the accepted version; the final published version of the proceedings is available on IEEE Xplore.
Identifier
https://doi.org/10.1109/cvpr52733.2024.01862
Source
Conference on Computer Vision and Pattern Recognition (CVPR)
Publication Status
Published
Start Date
2024-06-16
Finish Date
2024-06-22
Coverage Spatial
Seattle, WA, USA