About me

I am a researcher at StepFun, working with Dr. Gang Yu on advancing AI-generated content (AIGC), personalized content creation, 3D generation and computer graphics. I am focused on applying advanced AI techniques within the field of creative content to explore its potential applications. Prior to joining StepFun, I held research roles at Tencent, SenseTime Research, and Shanghai AI Lab.

Selected Projects

  • Step1X-Edit

    Step1X-Edit & GEdit-Bench

    Step1X-Edit is an open-source general editing model that achieves proprietary-level performance with comprehensive editing capabilities.

    GEdit-Bench is benchmark that evaluate editing models with genuine user instructions.
    >> Project Page

  • OmniSVG

    OmniSVG

    OmniSVG is a family of SVG generation models which built on a pre-trained vision-language model Qwen-VL and incorporates an SVG tokenizer. It is capable of progressively generating high-quality SVGs across a wide spectrum of complexity — from simple icons to intricate anime characters. It demonstrates remarkable versatility through multiple generation modalities, including Text-to-SVG, Image-to-SVG, and Character-Reference SVG, making it a powerful and flexible solution for diverse creative tasks.
    >> Project Page

  • MVPaint

    MVPaint

    MVPaint explores synchronized multi-view diffusion to create consistent and detailed 3D textures from textual descriptions. With synchronized multi-view diffusion, it delivers seamless, high-res textures with minimal UV wrapping dependency.
    >> Project Page

  • MeshXL

    MeshXL

    MeshXL is a family of generative pre-trained foundation models for 3D mesh generation. With the Neural Coordinate Field representation, the generation of unstructured 3D mesh data can be seaminglessly addressed by modern LLM methods.
    >> Project Page

  • DNA-Rendering

    DNA-Rendering

    DNA-Rendering is a large-scale, high-fidelity repository of human performance data for neural actor rendering, which contains large volume of data with diverse attributes and rich annotation. Along with the dataset, a large-scale and quantitative benchmark in full-scale with multiple tasks on human rendering is provided.
    >> Project Page

  • GNR & GeneBody

    GNR & GeneBody

    Generalizable Neural Performer (GNR) learns a generalizable and robust neural body representation over various geometry and appearance with a Geometric Body Embedding strategy which achors body shape priors to implicit field and Screen-Space Occlusion-Aware Appearance Blending to help image blending from source views. A dataset GeneBody is constructed to demonstrate the effectness of the proposed algorithm.
    >> Project Page

  • Renderme-360

    Renderme-360

    RenderMe-360 is a comprehensive 4D human head dataset to drive advance in head avatar research which contains massive data assets with high-fidelity, high deversity and rich annotation attributions. A comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks which opens the door for future exploration in head avatars.
    >> Project Page

  • MonoHuman

    MonoHuman

    MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses from monocular videos. The key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg keyframe information to reason the feature correlations for coherent results. Extensive experiments demonstrate the superiority of our proposed MonoHuman over state-of-the-art methods.
    >> Project Page

Resume

Education

  1. M.Phil, Hong Kong University of Science and Technology

    Thesis: Human Reconstruction and Motion Capture using a Single Flying Camera
    pdf, e-print

  2. B.Eng., Huazhong University of Science and Technology

Publications

  1. * denotes equal contribution, denotes project lead.

  2. Yiying Yang*, Wei Cheng*, Sijin Chen, Xianfang Zeng, and others. 2025. OmniSVG: A Unified Scalable Vector Graphics Generation Model. In arXiv preprint 2025.

    arXiv, project page, code, data

  3. Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, and others. 2025. Step1X-Edit: A Practical Framework for General Image Editing. In arXiv preprint 2025.

    arXiv, project page, code, benchmark

  4. Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, and others. 2025. Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets. In arXiv preprint 2025.

    arXiv, project page, code, demo, data

  5. Wei Cheng*†, Juncheng Mu*, Xianfang Zeng, Xin Chen, and others. 2025. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. In CVPR 2025.

    arXiv, project page, video, code

  6. Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, and others. 2025. StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians. In arXiv preprint 2025.

    arXiv, project page, code

  7. Chongjun Tu*, Lin Zhang*, Pengtao Chen*, Peng Ye, Xianfang Zeng, Wei Cheng and others. 2025. FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding. In arXiv preprint 2025.

    arXiv, project page, code, benchmark

  8. Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu and others. 2024. MeshXL: Neural Coordinate Field for Generative 3D Foundation Models. In NeurIPS 2024.

    arXiv, project page, video, code

  9. Wei Cheng, Ruixiang Chen*, Wanqi Yin*, Siming Fan*, Keyu Chen*, Honglin He, Huiwen Luo, Zhongang Cai, Jingbo Wang, Yang Gao, and others. 2023. DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering. In ICCV 2023.

    arXiv, project page, video, code

  10. Dongwei Pan, Long Zhuo*, Jingtan* Piao*, Huiwen Luo*, Wei Cheng*, Yuxin Wang*, Siming Fan, Shengqi Liu, Lei Yang, Bo Dai, and others. RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars. In NeurIPS 2023 Dataset and Benchmark Track.

    arXiv, project page, video, code

  11. Wei Cheng, Su Xu, Jingtan Piao, Chen Qian, Wayne Wu, Kwan-Yee Lin, and Hongsheng Li. Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis. In arXiv preprint 2022.

    arXiv, project page, video, code

  12. Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin. MonoHuman: Animatable Human Neural Field from Monocular Video. In CVPR 2023.

    arXiv, project page, video, code

  13. Lan Xu, Wei Cheng, Kaiwen Guo, Lei Han, Yebin Liu, and Lu Fang. Flyfusion: Realtime Dynamic Scene Reconstruction using a Flying Depth Camera. TVCG 2019.

    paper, video

  14. Wei Cheng*, Lan Xu*, Lei Han, Yuanfang Guo, and Lu Fang. 2018. ihuman3d: Intelligent Human Body 3D Reconstruction using a Single Flying Camera. In ACMMM 2018.

    paper, video

  15. Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang. Flycap: Markerless Motion Capture using Multiple Autonomous Flying Cameras.TVCG 2017

    paper