GeoDream:
Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation

Arxiv 2023
Baorui Ma 1*,   Haoge Deng2,1*,   Junsheng Zhou3,1,   Yu-Shen Liu 3,   Tiejun Huang1,4,   Xinlong Wang 1

1 Beijing Academy of Artificial Intelligence

,

2 BUPT

,

3 Tsinghua University

,

4 Peking University

* Equal contribution

Abstract

Text-to-3D generation by distilling pretrained large-scale text-to-image diffusion models has shown great promise but still suffers from inconsistent 3D geometric structures (Janus problems) and severe artifacts. The aforementioned problems mainly stem from 2D diffusion models lacking 3D awareness during the lifting. In this work, we present GeoDream, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent geometric structures without sacrificing diversity or fidelity. Specifically, we first utilize a multi-view diffusion model to generate posed images and then construct cost volume from the predicted image, which serves as native 3D geometric priors, ensuring spatial consistency in 3D space. Subsequently, we further propose to harness 3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors via a disentangled design. Notably, disentangling 2D and 3D priors allows us to refine 3D geometric priors further. We justify that the refined 3D geometric priors aid in the 3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for the refinement of 3D geometric priors. Our numerical and visual comparisons demonstrate that GeoDream generates more 3D consistent textured meshes with high-resolution realistic renderings (i.e., 1024 × 1024) and adheres more closely to semantic coherence.

Videos

Rendered Images from Generated Results

3D stylized game little building

A delicious creamy lemon cake

A flamingo standing on one leg

A colorful toucan with a large beak

More Results (Coming Soon)

Exported Textured Meshes

An astronaut riding a horse

A delicious creamy lemon cake

Wes Anderson style Red Panda, reading a book,super cute

A high quality photo of a dragon

More Results (Coming Soon)

Applications and Mesh Animations

Method Overview

We focus on generating 3D content with consistently accurate geometry and delicate visual detail, by equipping 2D diffusion priors with the capability to produce 3D consistent geometry while retaining their generalizability. GeoDream consists of the following two stages. i) Following MVS-based methods, given multi-view images predicted by multi-view diffusion models, we construct cost volume as native 3D geometric priors within 2 minutes. ii) During priors refinement, we show that geometric priors can be further fine-tuned to boost rendering quality and geometric accuracy by combining a 2D diffusion model. We justify that disentangling 3D and 2D priors is a potentially exciting direction for maintaining both the generalization of 2D diffusion priors and the consistency of 3D priors.

BibTeX

@inproceedings{Ma2023GeoDream,
    title = {GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation},
    author = {Baorui Ma and Haoge Deng and Junsheng Zhou and Yu-Shen Liu and Tiejun Huang and Xinlong Wang},
    journal={arXiv preprint arXiv:2311.17971},
    year={2023}
    }