πŸ‘¨β€πŸŽ“ Biography

Hi! This is Shenghao Xie. I am currently a first-year Ph.D. student at Academy for Advanced Interdisciplinary Studies, Peking University, fortunately working with Prof. Lei Ma and Prof. Tiejun Huang. I am also a visiting student of Tsinghua Statistical Artificial Intelligence & Learning (TSAIL) Group, Tsinghua University, advised by Prof. Hang Su and Prof. Jun Zhu. Previously, I received my B.E. degree from School of Cyber Science and Engineering, Wuhan University in 2024, supervised by Prof. Shanghang Zhang.

My long-term research goal is to pursue the vision AGI and create corresponding social good. Recent interests have primarily focused on the vision foundation model, as well as its applications in AI4Healthcare:

Β· Vision Foundation Model. Firstly, I attempt to unlock the scaling law and zero-shot generalization in vision foundation models by integrating various data (both spatial and temporal, e.g., 2D, 3D, videos, and 4D) and tasks (both perception and generation, e.g., segmentation, caption, translation, and editing). Then I seek to further equip them with reasoning and emboddied interaction capabilities.

Β· AI for Healthcare. I am committed to addressing valuable medical problems and building AI systems that effectively assist doctors. Specifically, I develop data-driven discriminative models based on large-scale medical images (e.g., early cancer detection). Moreover, I also explore how to leverage generative models with clinically meaningful evaluation metrics for some data-scarce scenarios (e.g., rare diseases).

I am open to both collaborations and discussions, please feel free to send me an email.

πŸ”₯ News

πŸ“ Selected Publications

(* denotes co-first author. † denotes corresponding author. View the full publication list on my google scholar.)

arXiv 2024
sym

Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective

Shenghao Xie, Wenqiang Zu, Mingyang Zhao, Duo Su, Shilong Liu, Ruohua Shi, Guoqi Li, Shanghang Zhang, Lei Ma

Paper Repo

  • The first comprehensive survey to dive deep into the trend of unifying understanding and generation in vision foundation models from the autoregression perspective.
MIA 2024
sym

Embedded Visual Prompt Tuning

Wenqiang Zu, Shenghao Xie*, Qing Zhao, Guoqi Li, Lei Ma

Paper Code

  • Embed but not prepend, let’s insert prompts into embedding channels!

πŸŽ– Honors and Awards

  • 2024.06 Outstanding Bachelor’s Degree Thesis at Wuhan University.
  • 2023.11 Lei Jun Computer Science Undergraduate Scholarship.
  • 2023.08 First Prize at National College Student Information Security Contest.

πŸ’» Internships