My name is Zemin Yang and I am a master student in the Visual & Data Intelligence (VDI) Center, 4DV Lab at ShanghaiTech University, supervised by Yuexin Ma and Xinge Zhu from MMLAB at The Chinese University of Hong Kong. Before that, I received my bachelor's degree from China University of Geoscience (Wuhan). I'm interested in computer vision, machine learning, and their applications in robotics, particularly in embodied AI and vision-language-action models.

My goal 📌 is to help human interacts with robots, devices and data utilizing AI in a more efficient and natural way.

📝 Publications

* indicates equal contributions, † indicates corresponding author

ICML 2026 (Under Review)

ResVLA: From Noise to Intent: Anchoring Generative VLA Policies with Residual Bridges

Yiming Zhong*, Yaoyu He*, Zemin Yang*, Pengfei Tian, Yifan Huang, Qingqiu Huang, Xinge Zhu, Yuexin Ma†

This paper proposes ResVLA, a generative vision-language-action framework that shifts robot control from generation-from-noise to refinement-from-intent by anchoring low-frequency semantic intent and refining high-frequency residual dynamics, achieving strong robustness, faster convergence, and competitive real-world performance.

PDF Project page Github

AAAI 2026 (Oral)

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Hanqing Wang*, Shaoyang Wang*, Yiming Zhong, Zemin Yang, Jiamin Wang, Zhiqing Cui, Jiahao Yuan, Yifan Han, Mingyu Liu, Yuexin Ma†

We introduce Affordance-R1, which is capable of generating explicit reasoning alongside the final answer. With the help of proposed affordance reasoning reward, it achieves robust zero-shot generalization and exhibits emergent test-time reasoning capabilities.

PDF Paper Github

ICLR 2026

HUMOF: Human Motion Forecasting in Interactive Social Scenes

Caiyi Sun*, Yujing Sun*, Xiao Han, Zemin Yang, Jiawei Liu, Xinge Zhu, Siu-Ming Yiu†, Yuexin Ma†

This paper proposes HUMOF, a human motion forecasting framework for interactive social scenes that jointly models human-human and human-scene interactions with hierarchical representations and coarse-to-fine reasoning, achieving state-of-the-art performance across four public datasets.

PDF Paper

NIPS 2025

FreqPolicy: Frequency Autoregressive Visuomotor Policy with Continuous Tokens

Yiming Zhong, Yumeng Liu, Chuyang Xiao, Zemin Yang, Youzhuo Wang, Yufei Zhu, Yujing Sun, Xinge Zhu, Yuexin Ma†

This paper proposes FreqPolicy, a frequency-domain autoregressive visuomotor policy that progressively models hierarchical frequency components with continuous latent representations, achieving superior accuracy and efficiency in robotic manipulation tasks.

PDF Paper Project Page Github

ICCV 2025

EvolvingGrasp: Evolutionary Grasp Generation via Efficient Preference Alignment

Yufei Zhu*, Yiming Zhong*, Zemin Yang, Peishan Cong, Jingyi Yu, Xinge Zhu†, Yuexin Ma†

This paper introduces EvolvingGrasp, which integrates Handpose-wise Preference Optimization with a Physics-aware Consistency Model to enable efficient evolutionary grasp generation, achieving improved grasp success rates and computational efficiency.

PDF Paper Project Page Github

CVPR 2025

EasyHOI: Unleashing the Power of Large Models for Reconstructing Hand-Object Interactions in the Wild

Yumeng Liu, Xiaoxiao Long†, Zemin Yang, Yuan Liu, Marc Habermann, Christian Theobalt, Yuexin Ma†, Wenping Wang

This paper explores reconstructing hand-object interactions from a single-view image. We develop a pipeline to estimate hand pose and object shape using large models and apply a prior-guided optimization to adjust the hand pose, ensuring it meets 3D physical constraints while aligning with the 2D image.

PDF Paper Github

AAAI 2025

UniDemoiré: Towards Universal Image Demoiréing with Data Generation and Synthesis

Zemin Yang*, Yujing Sun*, Xidong Peng, Siu Ming Yiu, Yuexin Ma†

This paper proposes a universal image demoiréing solution, UniDemoiré, which has superior generalization capability. Notably, we propose innovative and effective data generation and synthesis methods that can automatically provide vast high-quality moiré images to train a universal demoiréing model.

PDF Project Page Github

🎖 Honors and Awards

11/2025 Outstanding Master Student
10/2025 National Scholarship for Graduate Students
10/2025 Huahong Scholarship
06/2024 Outstanding Graduate Award
06/2023 1st Award in the 13th Mathercup Mathematical Contest in Modeling
02/2023 Meritorious Winner in the American Collegiate Mathematical Contest in Modeling
10/2022 1st Award in the China Undergraduate Mathematical Contest in Modeling at Hubei Region

📖 Educations

2024 - Now Master of Computer Science and Technology

ShanghaiTech University
2019 - 2024 Bachelor of Data Science and Big Data Technology

China University of Geoscience (Wuhan)

🎨 Hobbies

📷 Photography, ⚽ Football, 🏀 Basketball, 🎮 FPS Games

Zemin Yang

🔥 News

📝 Publications

🎖 Honors and Awards

📖 Educations

🎨 Hobbies