G-buffer Objaverse: High-Quality Rendering Dataset of Objaverse

Institute for Intelligent Computing, Alibaba Group
1TIDE Rendering  
23D Object Annotation and Generation  3Simulation Platform

Samples of the dataset. From top to bottom, there are the RGB, Albedo, Normal and Depth images.

Introduction

G-buffer Objaverse (GObjaverse) proposed in Sparse3D (ECCV2024) is rendered using the TIDE renderer on Objaverse with A10 for about 2000 GPU hours, yielding 30,000,000 images of Albedo, RGB, Depth, and Normal map. We proposed a rendering framework for high quality and high speed dataset rendering. The framework is a hybrid of rasterization and path tracing, the first ray-scene intersection is obtained by hardware rasterization and accurate indirect lighting by full hardware path tracing. Additionally, we using adaptive sampling, denoiser and path-guiding to further speed up the rendering time. In this rendering framework, we render 38 views of a centered object, including 24 views at elevation range from 5° to 30°, rotation = {r × 15° | r ∈ [0, 23]}, and 12 views at elevation from -5° to 5°, rotation = {r × 30° | r ∈ [0, 11]}, and 2 views for top and bottom respectively. In addition, we mannuly split the subset of the objaverse dataset into 10 general categories including Human-Shape (41,557), Animals (28,882), Daily-Used (220,222), Furnitures (19,284), Buildings&&Outdoor (116,545), Transportations (20,075), Plants (7,195), Food (5,314), Electronics (13,252) and Poor-quality (107,001).

MY ALT TEXT

Video

Application

Application: Label-free Image-Conditional Generation

In Sparse3D, we use 8 general classes (Human-Shape, Animals, Daily-Used, Furnitures, Transportations, Plants, Food and Electronics) from Objaverse which is labeled manually and perform a label-free conditional generation experiment. The splits are generated by randomly shuffling and splitting the data into proportions of 0.9, 0.05, 0.05 respectively for train, validation, and test. As shown in figure, we visualize some typical samples generated by Sparse3D, Point-E, Shape-E, and Syncdreamer.

MY ALT TEXT


Application: MultiView Normal-Depth Diffusion Model

Additionally, we have also used G-buffer Objaverse for training MultiView Normal-Depth diffusion model (ND-MV) and depth-condition MultiView Albedo diffusion model (Albedo-MV), which are employed for 3D object generation through score-distillation sampling (SDS) in RichDreamer .

MY ALT TEXT

BibTeX

@inproceedings{zuo2024sparse3d,
     title={High-Fidelity 3D Textured Shapes Generation by Sparse Encoding and Adversarial Decoding},
     author={Zuo, Qi and Gu, Xiaodong and Dong, Yuan and Zhao, Zhengyi and Yuan, Weihao and Qiu, Lingteng and Bo, Liefeng and Dong, Zilong},
     booktitle={European Conference on Computer Vision},
     year={2024}
     }
    
@inproceedings{qiu2024richdreamer,
        title={Richdreamer: A generalizable normal-depth diffusion model for detail richness in text-to-3d},
        author={Qiu, Lingteng and Chen, Guanying and Gu, Xiaodong and Zuo, Qi and Xu, Mutian and Wu, Yushuang and Yuan, Weihao and Dong, Zilong and Bo, Liefeng and Han, Xiaoguang},
        booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
        pages={9914--9925},
        year={2024}
      }
    
@article{objaverse,
    title={Objaverse: A Universe of Annotated 3D Objects},
    author={Matt Deitke and Dustin Schwenk and Jordi Salvador and Luca Weihs and
            Oscar Michel and Eli VanderBilt and Ludwig Schmidt and
            Kiana Ehsani and Aniruddha Kembhavi and Ali Farhadi},
    journal={arXiv preprint arXiv:2212.08051},
    year={2022}
}