多模态大模型和GUI Agent 微调优化
InfiGUIAgent系列工作(InfiGUIAgent,InfiGUI-R1,InfiGUI-G1)
Yuhang Liu, Zeyu Liu, Shuanghe Zhu, Pengxiang Li, Congkai Xie, Jiasheng Wang, Xueyu Hu, Xiaotian Han, Jianbo Yuan, Xinyao Wang, Shengyu Zhang*, Hongxia Yang, Fei Wu:
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization. AAAI 2026
Wenkai Wang, Hongcan Guo, Zheqi Lv, Shengyu Zhang*:
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models. AAAI 2026
Yurun Chen, Xueyu Hu, Yuhan Liu, Zigi Wang, Zeyi Liao, Lin Chen, Feng Wei, Yuxi qian, Bo Zheng, Keting Yin, Shengyu Zhang*:
Graph2Eval:Automatic Multimodal Task Generation for Agents via Knowledge Graphs. CVPR 2026
Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, Shengyu Zhang*:
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding. ACL 2026
Yuqing Zhang, Honghui Sheng, Xueyu Hu, Shengyu Zhang*, Fei Wu:
DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks. ACL 2026
多模态大模型和GUI Agent 推理优化
Biao Yi, Xueyu Hu, Yurun Chen, Shengyu Zhang*, Hongxia Yang, Fan Wu:
EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation. AAAI 2026
Kunxi Li, Zhonghua Jiang, Zhouzhou Shen, Zhaode Wang, Chengfei Lv, Shengyu Zhang*, Fan Wu, Fei Wu:
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference. ACL 2025
Zhonghua Jiang, Kui Chen, Kunxi Li, Keting Yin, Yiyun Zhou, Zhaode Wang, Chengfei Lv, Shengyu Zhang*:
AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization. AAAI 2026
Sihao Liu, YuFan Xiong, Zhonghua Jiang, Zhaode Wang, chengfei lv, Shengyu Zhang*:
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction. ACL 2026
Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, Shengyu Zhang. ACM MM 2025
视觉内容生成AIGC
Keming Ye, Zhou Zhao, Fan Wu, Shengyu Zhang*:
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration. ICLR 2026
Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing LYU, Zhou Zhao, Shengyu Zhang*:
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits. CVPR 2026
Jiajian Xie, Shengyu Zhang*, Mengze Li, Chengfei Lv, Zhou Zhao, Fei Wu:
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation. ICLR 2025
Zhan Qu, Shengyu Zhang*, Mengze Li, Zhuo Chen, Chengfei Lv, Zhou Zhao, Fei Wu:
ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation. IJCAI 2025
多模态大模型和GUI Agent 微调优化
InfiGUIAgent系列工作(InfiGUIAgent,InfiGUI-R1,InfiGUI-G1)
Yuhang Liu, Zeyu Liu, Shuanghe Zhu, Pengxiang Li, Congkai Xie, Jiasheng Wang, Xueyu Hu, Xiaotian Han, Jianbo Yuan, Xinyao Wang, Shengyu Zhang*, Hongxia Yang, Fei Wu:
InfiGUI-G1: Advancing GUI Grounding with Adaptive Exploration Policy Optimization. AAAI 2026
Wenkai Wang, Hongcan Guo, Zheqi Lv, Shengyu Zhang*:
A Rolling Stone Gathers No Moss: Adaptive Policy Optimization for Stable Self-Evaluation in Large Multimodal Models. AAAI 2026
Yurun Chen, Xueyu Hu, Yuhan Liu, Zigi Wang, Zeyi Liao, Lin Chen, Feng Wei, Yuxi qian, Bo Zheng, Keting Yin, Shengyu Zhang*:
Graph2Eval:Automatic Multimodal Task Generation for Agents via Knowledge Graphs. CVPR 2026
Wenkai Wang, Xiyun Li, Hongcan Guo, Wenhao Yu, Tianqing Fang, Haitao Mi, Dong Yu, Shengyu Zhang*:
Measure Twice, Click Once: Co-evolving Proposer and Visual Critic via Reinforcement Learning for GUI Grounding. ACL 2026
Yuqing Zhang, Honghui Sheng, Xueyu Hu, Shengyu Zhang*, Fei Wu:
DAC-Bench: A Decision-Aware Benchmark for Compositional Mobile GUI Tasks. ACL 2026
多模态大模型和GUI Agent 推理优化
Biao Yi, Xueyu Hu, Yurun Chen, Shengyu Zhang*, Hongxia Yang, Fan Wu:
EcoAgent: An Efficient Device-Cloud Collaborative Multi-Agent Framework for Mobile Automation. AAAI 2026
Kunxi Li, Zhonghua Jiang, Zhouzhou Shen, Zhaode Wang, Chengfei Lv, Shengyu Zhang*, Fan Wu, Fei Wu:
MadaKV: Adaptive Modality-Perception KV Cache Eviction for Efficient Multimodal Long-Context Inference. ACL 2025
Zhonghua Jiang, Kui Chen, Kunxi Li, Keting Yin, Yiyun Zhou, Zhaode Wang, Chengfei Lv, Shengyu Zhang*:
AccKV: Towards Efficient Audio-Video LLMs Inference via Adaptive-Focusing and Cross-Calibration KV Cache Optimization. AAAI 2026
Sihao Liu, YuFan Xiong, Zhonghua Jiang, Zhaode Wang, chengfei lv, Shengyu Zhang*:
RetentiveKV: State-Space Memory for Uncertainty-Aware Multimodal KV Cache Eviction. ACL 2026
Evaluating the Robustness of Multimodal Agents Against Active Environmental Injection Attacks
Yurun Chen, Xavier Hu, Keting Yin, Juncheng Li, Shengyu Zhang. ACM MM 2025
视觉内容生成AIGC
Keming Ye, Zhou Zhao, Fan Wu, Shengyu Zhang*:
CIAR: Interval-based Collaborative Decoding for Image Generation Acceleration. ICLR 2026
Keming Ye, Zhipeng Huang, Canmiao Fu, Qingyang Liu, Jiani Cai, Zheqi Lv, Chen Li, Jing LYU, Zhou Zhao, Shengyu Zhang*:
UnicEdit-10M: A Dataset and Benchmark Breaking the Scale-Quality Barrier via Unified Verification for Reasoning-Enriched Edits. CVPR 2026
Jiajian Xie, Shengyu Zhang*, Mengze Li, Chengfei Lv, Zhou Zhao, Fei Wu:
EcoFace: Audio-Visual Emotional Co-Disentanglement Speech-Driven 3D Talking Face Generation. ICLR 2025
Zhan Qu, Shengyu Zhang*, Mengze Li, Zhuo Chen, Chengfei Lv, Zhou Zhao, Fei Wu:
ExpTalk: Diverse Emotional Expression via Adaptive Disentanglement and Refined Alignment for Speech-Driven 3D Facial Animation. IJCAI 2025



