Soon, anybody can be animated everywhere!

Video from official website


In a recent breakthrough research endeavor undertaken at the esteemed Institute for Intelligent Computing, Alibaba Group, a pioneering paper has been unveiled, titled “En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data.” This cutting-edge study represents a significant leap forward in the domain of generative modeling for 3D human avatars.

Video from official website

The core innovation of En3D lies in its departure from traditional approaches that grapple with the constraints of scarce 3D datasets and imbalanced 2D collections. Unlike its predecessors, the En3D model is designed as a zero-shot 3D generative scheme, adept at crafting visually realistic, geometrically accurate, and content-wise diverse 3D human avatars without relying on pre-existing 3D or 2D assets.

The methodological brilliance behind En3D is manifested through a meticulously crafted workflow that leverages accurate physical modeling. By learning from synthetic 2D data, this approach ensures that the generative model produces high-quality results. During the inference phase, En3D seamlessly integrates optimization modules, effectively bridging the gap between realistic appearances and coarse 3D shapes.


An overview of Proposed system

Soon, anybody can be animated everywhere!
Official Page

The proposed scheme is structured around three integral modules: 3D generative modeling (3DGM), geometric sculpting (GS), and explicit texturing (ET). In the 3DGM phase, a triplane-based architecture is employed, utilizing synthesized, diverse, balanced, and structured human images with precise camera parameters (φ) to adeptly learn generalizable 3D human representations. The GS module functions as an optimization component, incorporating multi-view normal constraints to enhance and refine geometry details. ET, on the other hand, employs UV partitioning and a differentiable rasterizer to disentangle explicit UV texture maps. This comprehensive approach not only yields multi-view renderings but also ensures the generation of realistic 3D models as the final results.


Comprising three pivotal modules, En3D’s 3D generator excels at accurately modeling generalizable 3D humans with realistic appearances, thanks to synthesized, balanced, diverse, and structured human images. The geometry sculptor elevates shape quality using multi-view normal constraints, capturing intricate human anatomy with precision. Meanwhile, the texturing module disentangles explicit texture maps with fidelity and editability, leveraging semantical UV partitioning and a differentiable rasterizer.

Empirical evidence from experimental results showcases En3D’s superiority over prior works, establishing its dominance in terms of image quality, geometry accuracy, and content diversity. Furthermore, the practical applicability of En3D is demonstrated through the seamless integration of its generated avatars for animation and editing, highlighting the scalability and adaptability of this innovative approach across various content styles.

This work, encapsulated in the paper “En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data,” is a testament to the relentless pursuit of excellence and innovation in the realm of AI and generative modeling.


Paper Citation

@article{men2024en3d,
title={En3D: An Enhanced Generative Model for Sculpting 3D Humans from 2D Synthetic Data},
author={Men, Yifang and Lei, Biwen and Yao, Yuan and Cui, Miaomiao and Lian, Zhouhui and Xie, Xuansong},
booktitle={arXiv preprint arXiv:2401.01173},
year={2024}
}

Leave a Reply

Your email address will not be published. Required fields are marked *