My name is Zemin Yang and I am a master student in the Visual & Data Intelligence (VDI) Center, 4DV Lab at ShanghaiTech University, supervised by Yuexin Ma and Xinge Zhu from MMLAB at The Chinese University of Hong Kong. Before that, I received my bachelor's degree from China University of Geoscience (Wuhan). I'm interested in computer vision, machine learning, and their applications in robotics, particularly in embodied AI and vision-language-action models.

My goal 📌 is to help human interacts with robots, devices and data utilizing AI in a more efficient and natural way.

📝 Publications

* indicates equal contributions, † indicates corresponding author

ICML 2026 (Under Review)
sym

ResVLA: From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong*, Yaoyu He*, Zemin Yang*, Pengfei Tian, Yifan Huang, Qingqiu Huang, Xinge Zhu, Yuexin Ma†

This paper proposes ResVLA, a generative vision-language-action framework that shifts robot control from generation-from-noise to refinement-from-intent by anchoring low-frequency semantic intent and refining high-frequency residual dynamics, achieving strong robustness, faster convergence, and competitive real-world performance.

PDF Project page Github
AAAI 2026 (Oral)
sym

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Hanqing Wang*, Shaoyang Wang*, Yiming Zhong, Zemin Yang, Jiamin Wang, Zhiqing Cui, Jiahao Yuan, Yifan Han, Mingyu Liu, Yuexin Ma†

We introduce Affordance-R1, which is capable of generating explicit reasoning alongside the final answer. With the help of proposed affordance reasoning reward, it achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities.

PDF Paper Github
ICLR 2026
sym

HUMOF: Human Motion Forecasting in Interactive Social Scenes

Caiyi Sun*, Yujing Sun*, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu-Ming Yiu†, Yuexin Ma†

This paper proposes HUMOF, a human motion forecasting framework for interactive social scenes that jointly models human-human and human-scene interactions with hierarchical representations and coarse-to-fine reasoning, achieving state-of-the-art performance across four public datasets.

PDF Paper
NIPS 2025
sym

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens

Yiming Zhong, Yumeng Liu, Chuyang Xiao, Zemin Yang, Youzhuo Wang, Yufei Zhu, Yujing Sun, Xinge Zhu, Yuexin Ma†

This paper proposes FreqPolicy, a frequency-domain autoregressive visuomotor policy that progressively models hierarchical frequency components with continuous latent representations, achieving superior accuracy and efficiency in robotic manipulation tasks.

PDF Paper Project Page Github
ICCV 2025
sym

EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Yufei Zhu*, Yiming Zhong*, Zemin Yang, Peishan Cong, Jingyi Yu, Xinge Zhu†, Yuexin Ma†

This paper introduces EvolvingGrasp, which integrates Handpose-wise Preference Optimization with a Physics-aware Consistency Model to enable efficient evolutionary grasp generation, achieving improved grasp success rates and computational efficiency.

PDF Paper Project Page Github
CVPR 2025
sym

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

Yumeng Liu, Xiaoxiao Long†, Zemin Yang, Yuan Liu, Marc Habermann, Christian Theobalt, Yuexin Ma†, Wenping Wang

This paper explores reconstructing hand-object interactions from a single-view image. We develop a pipeline to estimate hand pose and object shape using large models and apply a prior-guided optimization to adjust the hand pose, ensuring it meets 3D physical constraints while aligning with the 2D image.

PDF Paper Github
AAAI 2025
sym

UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis

Zemin Yang*, Yujing Sun*, Xidong Peng, Siu Ming Yiu, Yuexin Ma†

This paper proposes a universal image demoiréing solution, UniDemoiré, which has superior generalization capability. Notably, we propose innovative and effective data generation and synthesis methods that can automatically provide vast high-quality moiré images to train a universal demoiréing model.

PDF Project Page Github

🎖 Honors and Awards

  • 11/2025 Outstanding Master Student
  • 10/2025 National Scholarship for Graduate Students
  • 10/2025 Huahong Scholarship
  • 06/2024 Outstanding Graduate Award
  • 06/2023 1st Award in the 13th Mathercup Mathematical Contest in Modeling
  • 02/2023 Meritorious Winner in the American Collegiate Mathematical Contest in Modeling
  • 10/2022 1st Award in the China Undergraduate Mathematical Contest in Modeling at Hubei Region

đź“– Educations

  • 2024 - Now Master of Computer Science and Technology

    ShanghaiTech University

  • 2019 - 2024 Bachelor of Data Science and Big Data Technology

    China University of Geoscience (Wuhan)

🎨 Hobbies

📷 Photography, ⚽ Football, 🏀 Basketball, 🎮 FPS Games