Text-to-3D generation by distilling pretrained large-scale text-to-image diffusion
models has shown great promise but still suffers from inconsistent 3D geometric
structures (Janus problems) and severe artifacts. The aforementioned problems mainly
stem from 2D diffusion models lacking 3D awareness during the lifting. In this work,
we present GeoDream, a novel method that incorporates explicit generalized 3D priors with
2D diffusion priors to enhance the capability of obtaining unambiguous 3D consistent
geometric structures without sacrificing diversity or fidelity. Specifically, we first
utilize a multi-view diffusion model to generate posed images and then construct cost
volume from the predicted image, which serves as native 3D geometric priors,
ensuring spatial consistency in 3D space. Subsequently, we further propose to harness
3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors
via a disentangled design. Notably, disentangling 2D and 3D priors allows us to refine
3D geometric priors further. We justify that the refined 3D geometric priors aid in the
3D-aware capability of 2D diffusion priors, which in turn provides superior guidance for
the refinement of 3D geometric priors. Our numerical and visual comparisons demonstrate
that GeoDream generates more 3D consistent textured meshes with high-resolution realistic
renderings (i.e., 1024 × 1024) and adheres more closely to semantic coherence.
3D stylized game little building
A delicious creamy lemon cake
A flamingo standing on one leg
A colorful toucan with a large beak
An astronaut riding a horse
A delicious creamy lemon cake
Wes Anderson style Red Panda, reading a book,super cute
A high quality photo of a dragon
We focus on generating 3D content with consistently accurate geometry and delicate visual detail, by equipping 2D diffusion priors with the capability to produce 3D consistent geometry while retaining their generalizability. GeoDream consists of the following two stages. i) Following MVS-based methods, given multi-view images predicted by multi-view diffusion models, we construct cost volume as native 3D geometric priors within 2 minutes. ii) During priors refinement, we show that geometric priors can be further fine-tuned to boost rendering quality and geometric accuracy by combining a 2D diffusion model. We justify that disentangling 3D and 2D priors is a potentially exciting direction for maintaining both the generalization of 2D diffusion priors and the consistency of 3D priors.
@inproceedings{Ma2023GeoDream,
title = {GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and Consistent 3D Generation},
author = {Baorui Ma and Haoge Deng and Junsheng Zhou and Yu-Shen Liu and Tiejun Huang and Xinlong Wang},
journal={arXiv preprint arXiv:2311.17971},
year={2023}
}