张圣宇,浙江大学平台“百人计划”研究员,博士生导师。研究方向包括大小模型协同、跨媒体生成和大模型智能体。近年来,在TPAMI、TKDE、NeurIPS、CVPR等CCF A类期刊和会议上发表论文五十余篇。任KDD、ACM MM 领域主席,SIGIR、IJCAI、WSDM等会议高级程序委员会委员。曾获2023年度计算机学会科技进步一等奖、2024年度上海市科技进步一等奖、2025年度图象图形学学会自然科学一等奖、2025年度浙江省科技进步一等奖。
课题组科研工作分享公众号:AI4GC(微信、小红书搜索)
谷歌学术:https://scholar.google.com.hk/citations?user=l4Dyt7EAAAAJ
学术主页:https://shengyuzhang.github.io/
团队长期招收有志于从事大模型智能体、视觉生成、高效机器学习等领域研究,代码/数学能力强(精通Python、Pytorch或有ACM ICPC等编程竞赛经验者优先)、自驱力强的学生(本科生、硕士或实习生)。申报请填写表格:https://docs.qq.com/form/page/DSmJQb0d5WkdsbmZP
课题组学生毕业去向(按毕业、拿到offer时间先后):
1. Dong Yao (阿里巴巴)
2. Jiahao Xun (小红书)
3. Ziqi Jiang (香港科技大学PhD)
4. Zihao Tang (微软亚洲工程院 Applied Scientist)
5. Zhonghua Jiang(浙江大学计算机学院转博)
6. Yuhang Li (香港理工大学PhD拟录取)
7. Kunxi Li (浙江大学软件学院学院转博)
8. Kairui Fu (字节跳动)
多模态大模型和GUI Agent 微调优化
InfiGUIAgent系列工作(InfiGUIAgent,InfiGUI-R1,InfiGUI-G1)
Yuhang Liu, Zeyu Liu, Shuanghe Zhu, Pengxiang Li, Congkai Xie, Jiasheng Wang, Xueyu Hu, Xiaotian Han, Jianbo Yuan, Xinyao Wang, Shengyu Zhang*, Hongxia Yang, Fei Wu:
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization. AAAI 2026
Wenkai Wang, Hongcan Guo, Zheqi Lv, Shengyu Zhang*:
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models. AAAI 2026
Yurun Chen, Xueyu Hu, Yuhan Liu, Zigi Wang, Zeyi Liao, Lin Chen, Feng Wei, Yuxi qian, Bo Zheng, Keting Yin, Shengyu Zhang*:
Graph2Eval:Automatic Multimodal Task Generation for Agents via Knowledge Graphs. CVPR 2026
Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, Shengyu Zhang*:
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding. ACL 2026
Yuqing Zhang, Honghui Sheng, Xueyu Hu, Shengyu Zhang*, Fei Wu:
DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks. ACL 2026
多模态大模型和GUI Agent 推理优化
Biao Yi, Xueyu Hu, Yurun Chen, Shengyu Zhang*, Hongxia Yang, Fan Wu:
EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation. AAAI 2026
Kunxi Li, Zhonghua Jiang, Zhouzhou Shen, Zhaode Wang, Chengfei Lv, Shengyu Zhang*, Fan Wu, Fei Wu:
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference. ACL 2025
Zhonghua Jiang, Kui Chen, Kunxi Li, Keting Yin, Yiyun Zhou, Zhaode Wang, Chengfei Lv, Shengyu Zhang*:
AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization. AAAI 2026
Sihao Liu, YuFan Xiong, Zhonghua Jiang, Zhaode Wang, chengfei lv, Shengyu Zhang*:
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction. ACL 2026
Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, Shengyu Zhang. ACM MM 2025
视觉内容生成AIGC
Keming Ye, Zhou Zhao, Fan Wu, Shengyu Zhang*:
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration. ICLR 2026
Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing LYU, Zhou Zhao, Shengyu Zhang*:
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits. CVPR 2026
Jiajian Xie, Shengyu Zhang*, Mengze Li, Chengfei Lv, Zhou Zhao, Fei Wu:
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation. ICLR 2025
Zhan Qu, Shengyu Zhang*, Mengze Li, Zhuo Chen, Chengfei Lv, Zhou Zhao, Fei Wu:
ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation. IJCAI 2025



