Motionshop-2

Institute for Intelligent Computing, Alibaba Group
13D AIGC  
2TIDE Rendering   *Engineering Lead   +Project Lead

Abstract

We present Motionshop-2, an advanced version of the original Motionshop, which improves three core modules and adds an essential feature. The improvements in the three core modules are as follows: (1) the inpainting module has undergone substantial optimization, resulting in significantly improved processing speed; (2) the accuracy of the human pose estimation module has greatly improved compared to the previous version; and (3) the overall rendering quality of our system is considerably better than before, particularly in terms of visual perception. As for the new feature, Motionshop-2 introduces hybrid rendering, allowing for the simultaneous rendering of 3D Gaussian model and manually designed traditional meshes in certain scenes. To enable users better explore our new feature, we design a fancy system, namely AniGS, that can efficiently produce realistic animatable 3D gaussian avatar from single image.

More Visual Results

Please refer to AniGS!

Upgrades

Video Inpainting

In this update, we have made substantial engineering optimizations to the video inpainting module. Consequently, the generation speed has improved by 20%, and the quality of the generated videos has not degraded compared to the previous version. These enhancements ensure a more efficient workflow and a smoother user experience, making it easier to achieve high-quality video restoration with faster processing time.

Human Pose Estimation

In this update, we first collect a large-scale dataset of human video sequences and run an off-the-shelf human pose estimator to predict SMPL-X parameters. Sequentially, we manually correct any incorrect predictions of SMPL-X. Then, we fine-tune the SMPL-X estimator on our new dataset. Compared to the previous version, our new model achieves significant improvements, especially in foots and hands locations. Additionally, we incorporate a temporal regularization term to enhance the smoothness of the output of the SMPL-X sequence. These enhancements allow Motionshop-2 to generate smoother and more accurate human animation sequences.

UniTIDE Rendering

In this update, Our rendering pipeline expand its ability to support hybrid rendering of triangle mesh with 3D gaussian splates (See New Feature). Additionally, our TIDE team has substantially optimized and updated the core rendering algorithm, resulting in more realistic and vivid rendering quality. For example, we add relighting function into traidtional Gaussian-splatting rasterization, which enhances the realism and harmony of the rendering results.

New Feature 🔥🔥🔥

We are excited to announce a significant update that introduces a new feature: Hybrid Rendering Support. This enhancement allows for the simultaneous rendering of generated Gaussian models and manually designed traditional meshes. To enable users to explore our new features, we present a fancy Gaussian model generation system: 3D Animation Engine for Your Photos!

3D Animation Engine for your Photos

Generating animatable characters from a single image is challenging due to the lack of sufficient global information when relying solely on that image. Recently, our community has witnessed the bloom of diffusion-based image-to-video (I2V) models and their extensions. Several pioneering works, like Animate Anyone, Champ, and MIMO, have explored how to generate human animation videos conditioned on the reference image and human motion hints. Although these diffusion-based human animation methods obtain compelling results, they are time-consuming and memory-intensive.

In this project, we propose a fancy Gaussian model generation system that not only effectively generates animatable characters from any single image input but also achieves competitive results compared to diffusion-based human animation approaches. Instead of generating each frame solely based on a video diffusion model without an intermediate representation, we utilize 3D Gaussian Splatting (3DGS) as a representation to reconstruct an A-pose 3D character informed by a multi-view diffusion prior model. More details can be found in AniGS. 3DGS avatar can be seamlessly integrated into our Motionshop-2 once generated. Notably, our model requires only a single run and enables real-time animation on an RTX 3090 with 24GB of memory.