About me

I am a researcher at StepFun, working with Dr. Gang Yu on advancing AI-generated content (AIGC), personalized content creation, 3D generation and computer graphics. I am focused on applying advanced AI techniques within the field of creative content to explore its potential applications. Prior to joining StepFun, I held research roles at Tencent, SenseTime Research, and Shanghai AI Lab.

Selected Projects

  • Step1X-Edit

    Step1X-Edit & GEdit-Bench

    Step1X-Edit is an open-source general editing model that achieves proprietary-level performance with comprehensive editing capabilities.

    GEdit-Bench is benchmark that evaluate editing models with genuine user instructions.
    >> Project Page

  • OmniSVG

    OmniSVG

    OmniSVG is a family of SVG generation models which built on a pre-trained vision-language model Qwen-VL and incorporates an SVG tokenizer. It is capable of progressively generating high-quality SVGs across a wide spectrum of complexity — from simple icons to intricate anime characters. It demonstrates remarkable versatility through multiple generation modalities, including Text-to-SVG, Image-to-SVG, and Character-Reference SVG, making it a powerful and flexible solution for diverse creative tasks.
    >> Project Page

  • MVPaint

    MVPaint

    MVPaint explores synchronized multi-view diffusion to create consistent and detailed 3D textures from textual descriptions. With synchronized multi-view diffusion, it delivers seamless, high-res textures with minimal UV wrapping dependency.
    >> Project Page

  • MeshXL

    MeshXL

    MeshXL is a family of generative pre-trained foundation models for 3D mesh generation. With the Neural Coordinate Field representation, the generation of unstructured 3D mesh data can be seaminglessly addressed by modern LLM methods.
    >> Project Page

  • DNA-Rendering

    DNA-Rendering

    DNA-Rendering is a large-scale, high-fidelity repository of human performance data for neural actor rendering, which contains large volume of data with diverse attributes and rich annotation. Along with the dataset, a large-scale and quantitative benchmark in full-scale with multiple tasks on human rendering is provided.
    >> Project Page

  • GNR & GeneBody

    GNR & GeneBody

    Generalizable Neural Performer (GNR) learns a generalizable and robust neural body representation over various geometry and appearance with a Geometric Body Embedding strategy which achors body shape priors to implicit field and Screen-Space Occlusion-Aware Appearance Blending to help image blending from source views. A dataset GeneBody is constructed to demonstrate the effectness of the proposed algorithm.
    >> Project Page

  • Renderme-360

    Renderme-360

    RenderMe-360 is a comprehensive 4D human head dataset to drive advance in head avatar research which contains massive data assets with high-fidelity, high deversity and rich annotation attributions. A comprehensive benchmark for head avatar research, with 16 state-of-the-art methods performed on five main tasks which opens the door for future exploration in head avatars.
    >> Project Page

  • MonoHuman

    MonoHuman

    MonoHuman, which robustly renders view-consistent and high-fidelity avatars under arbitrary novel poses from monocular videos. The key insight is to model the deformation field with bi-directional constraints and explicitly leverage the off-the-peg keyframe information to reason the feature correlations for coherent results. Extensive experiments demonstrate the superiority of our proposed MonoHuman over state-of-the-art methods.
    >> Project Page

Resume

Education

  1. M.Phil, Hong Kong University of Science and Technology

    Thesis: Human Reconstruction and Motion Capture using a Single Flying Camera
    pdf, e-print

  2. B.Eng., Huazhong University of Science and Technology

Publications

  1. * denotes equal contribution, denotes project lead.

  2. OmniSVG: A Unified Scalable Vector Graphics Generation Model. In arXiv preprint 2025.
    Yiying Yang*, Wei Cheng*, Sijin Chen, Xianfang Zeng, and others.

    arXiv project page code data

  3. Step1X-Edit: A Practical Framework for General Image Editing. In arXiv preprint 2025.
    Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, and others.

    arXiv project page code benchmark

  4. Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets. In arXiv preprint 2025.
    Weiyu Li, Xuanyang Zhang, Zheng Sun, Di Qi, Hao Li, Wei Cheng, and others.

    arXiv project page code demo data

  5. MVPaint: Synchronized Multi-View Diffusion for Painting Anything 3D. In CVPR 2025.
    Wei Cheng*†, Juncheng Mu*, Xianfang Zeng, Xin Chen, and others.

    arXiv project page video code

  6. ViStoryBench: Comprehensive Benchmark Suite for Story Visualization. In arXiv preprint 2025.
    Cailin Zhuang*, Ailin Huang*†, Wei Cheng, Jingwei Wu, Yaoqi Hu, Jiaqi Liao, and others.

    arXiv project page code benchmark

  7. OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation. In arXiv preprint 2025.
    Jingjing Chang, Yixiao Fang, Peng Xing, Shuhan Wu, Wei Cheng, and others.

    arXiv project page code benchmark

  8. Sparse-vDiT: Unleashing the Power of Sparse Attention to Accelerate Video Diffusion Transformers. In arXiv preprint 2025.
    Pengtao Chen, Xianfang Zeng, Maosen Zhao, Peng Ye, Mingzhu Shen, Wei Cheng, and others.

    arXiv code

  9. StyleMe3D: Stylization with Disentangled Priors by Multiple Encoders on 3D Gaussians. In arXiv preprint 2025.
    Cailin Zhuang, Yaoqi Hu, Xuanyang Zhang, Wei Cheng, Jiacheng Bao, and others.

    arXiv project page code

  10. FAVOR-Bench: A Comprehensive Benchmark for Fine-Grained Video Motion Understanding. In arXiv preprint 2025.
    Chongjun Tu*, Lin Zhang*, Pengtao Chen*, Peng Ye, Xianfang Zeng, Wei Cheng and others.

    arXiv project page code benchmark

  11. MeshXL: Neural Coordinate Field for Generative 3D Foundation Models. In NeurIPS 2024.
    Sijin Chen, Xin Chen, Anqi Pang, Xianfang Zeng, Wei Cheng, Yijun Fu and others.

    arXiv project page video code

  12. DNA-Rendering: A Diverse Neural Actor Repository for High-Fidelity Human-centric Rendering. In ICCV 2023.
    Wei Cheng, Ruixiang Chen*, Wanqi Yin*, Siming Fan*, Keyu Chen*, Honglin He and others.

    arXiv project page video code

  13. RenderMe-360: A Large Digital Asset Library and Benchmarks Towards High-fidelity Head Avatars. In NeurIPS 2023 Dataset and Benchmark Track.
    Dongwei Pan, Long Zhuo*, Jingtan* Piao*, Huiwen Luo*, Wei Cheng*, Yuxin Wang*, Siming Fan and others.

    arXiv project page video code

  14. Generalizable Neural Performer: Learning Robust Radiance Fields for Human Novel View Synthesis. In arXiv preprint 2022.
    Wei Cheng, Su Xu, Jingtan Piao, Chen Qian, Wayne Wu, Kwan-Yee Lin, and Hongsheng Li.

    arXiv project page video code

  15. MonoHuman: Animatable Human Neural Field from Monocular Video. In CVPR 2023.
    Zhengming Yu, Wei Cheng, Xian Liu, Wayne Wu, and Kwan-Yee Lin.

    arXiv project page video code

  16. Flyfusion: Realtime Dynamic Scene Reconstruction using a Flying Depth Camera. In TVCG 2019.
    Lan Xu, Wei Cheng, Kaiwen Guo, Lei Han, Yebin Liu, and Lu Fang.

    paper video

  17. ihuman3d: Intelligent Human Body 3D Reconstruction using a Single Flying Camera. In ACMMM 2018.
    Wei Cheng*, Lan Xu*, Lei Han, Yuanfang Guo, and Lu Fang.

    paper video

  18. Flycap: Markerless Motion Capture using Multiple Autonomous Flying Cameras. In TVCG 2017.
    Lan Xu, Yebin Liu, Wei Cheng, Kaiwen Guo, Guyue Zhou, Qionghai Dai, and Lu Fang.

    paper