Dr. Shuo Chen (陈硕)

(Pronouns: he/him/his)

AI Algorithm Engineer
Huawei, Singapore
[Singapore MRT Gate Info]
Email: shanshuo1992 at gmail dot com
[Github] [Google Scholar] [LinkedIn] [Twitter]

About Me (CV)

I am an AI Algorithm Engineer at Huawei Singapore Research Center, focus on Text-to-Video generation. I was a Research Scientist at TikTok Singapore, working on Text-to-Video generation.

Previously, I was a PhD candidate at University of Amsterdam under the supervision of Prof. Cees Snoek and Dr. Pascal Mettes.

I received my master's degree in Department of Electronic Engineering, Tsinghua University under the supervision of Prof. Qingmin Liao and Dr. Fei Zhou. I did undergrad in School of Internet of Things at Nanjing University of Posts and Telecommunications.

I was a visiting student of Multimedia Research Center at Shenzhen Institute of Advanced Technology under the supervision of Prof. Yu Qiao.

News

▸ 2025.05: We released Selftok.
▸ 2024.01: We released MagicVideo-V2.
▸ 2023.07: I successfully defended my doctoral degree. View my thesis here.
▸ 2023.05: I start my full-time job at TikTok, Singapore.
▸ 2023.04: One paper is accepted by ICMR 2023.
▸ 2022.12: I start my intern at Snap, New York.
▸ 2022.08: I had a fantastic internship experience advised by Dr. Erhan Gundogdu and Dr. Loris Bazzani at Amazon Berlin.
▸ 2022.05: I start my intern at Amazon, Berlin.
▸ 2021.10: Two papers are accepted by BMVC 2021.
▸ 2021.09: I had a fantastic internship experience advised by Dr. Tan Yu at Baidu Research.
▸ 2021.07: One paper is accepted by ICCV 2021. This paper is also accepted by ICCV'21 workshop on Structured Rpresentations for Video Understanding.
▸ 2021.06: I start my intern at Baidu Research, Beijing.
▸ 2020.03: One paper is accepted by ICMR 2020.
▸ 2019.10: One paper is accepted by ACMMM 2019.
▸ 2018.1.15: I join the VIS Lab group.

Publications

Selftok: Discrete Visual Tokens of Autoregression, by Diffusion, and for Reasoning
Bohan Wang, Zhongqi Yue, Fengda Zhang, Shuo Chen, Li'an Bi, Junzhe Zhang, Xue Song, Kennard Yanting Chan, Jiachun Pan, Weijia Wu, Mingze Zhou, Wang Lin, Kaihang Pan, Saining Zhang, Liyu Jia, Wentao Hu, Wei Zhao, Hanwang Zhang
arXiv, 2025.
[arXiv] [project] [code] [pdf]

MagicVideo-V2:Multi-Stage High-Aesthetic Video Generation
Weimin Wang*, Jiawei Liu*, Zhijie Lin, Jiangqiao Yan, Shuo Chen, Chetwin Low, Tuyen Hoang, Jie Wu, Jun Hao Liew, Hanshu Yan, Daquan Zhou, Jiashi Feng
arXiv, 2024.
[arXiv] [project]

Multi-Label Meta Weighting for Long-Tailed Dynamic Scene Graph Generation
Shuo Chen, Yingjun Du, Pascal Mettes, Tao Hu and Cees G.M. Snoek
ACM International Conference on Multimedia Retrieval (ICMR), 2023. (Oral)
[arXiv] [code]

R2-MLP: Round-Roll MLP for Multi-View 3D Object Recognition
Shuo Chen, Tan Yu, and Ping Li
arXiv, 2022.
[arXiv] [code]

MVT: Multi-view Vision Transformer for 3D Object Recognition
Shuo Chen, Tan Yu, and Ping Li
British Machine Vision Conference (BMVC), 2021.
[arXiv] [code]

Diagnosing Errors in Video Relation Detectors
Shuo Chen, Pascal Mettes, and Cees G.M. Snoek
British Machine Vision Conference (BMVC), 2021.
[arXiv] [supplemental material] [code]

Social Fabric: Tubelet Compositions for Video Relation Detection
Shuo Chen, Zenglin Shi, Pascal Mettes, and Cees G.M. Snoek
IEEE International Conference on Computer Vision (ICCV), 2021.
[arXiv] [code]

Interactivity Proposals for Surveillance Videos
Shuo Chen, Pascal Mettes, Tao Hu and Cees G.M. Snoek
ACM International Conference on Multimedia Retrieval (ICMR), 2020. (Oral)
[pdf] [ACM] [code]

Interactive Exploration of Journalistic Video Footage through Multimodal Semantic Matching
Sarah Ibrahimi, Shuo Chen, Devanshu Arya, Arthur Camara, Yunlu Chen, Tanja Crijns, Maurits van der Goes, Thomas Mensink, Emiel van Miltenburg, Daan Odijk, William Thong, Jiaojiao Zhao,and Pascal Mettes
ACM International Conference on Multimedia (ACMMM), 2019.
[ACM]

Research on Transfer Learning Algorithm based on Generating Weighted Subspaces
Shuo Chen
Master Thesis (in Chinese), 2017.
[pdf]

Visual Domain Adaptation using Weighted Subspace Alignment
Shuo Chen, Fei Zhou, Qingmin Liao
IEEE International Conference on Visual Communications and Image Processing (VCIP), 2016. (Oral)
[pdf] [IEEE] [slides] [code]

Education Background

University of Amsterdam
PhD Candidate | 2018.1-2023.7

Tsinghua University
Department of Electronic Engineering, Master of Engineering | 2014.9-2017.6

Nanjing University of Posts and Telecommunications
School of Internet of Things, Bachelor of Engineering | 2010.9-2014.6

Research Experience

Multimedia Research Center, Shenzhen Institute of Advanced Technology
Visiting Student | 2016.11-2018.1

Visual Information Processing Lab, Graduate School at Shenzhen, Tsinghua University
Research Assistant | 2015.2-2017.6

Teaching

University of Amsterdam

TA, Applied Machine Learning (Autumn 2020)
TA, Computer Vision 1 (Autumn 2020)
TA, Big Data (Spring 2020)
TA, Leren (Autumn 2019)
TA, Computer Vision 2 (Spring 2019)
TA, Computer Vision 2 (Spring 2018)

Internships

Snap, NewYork
Research Intern | 2022.12-2023.3

Amazon, Berlin
Applied Scientist Intern | 2022.5-2022.8

Baidu, Beijing
Research Intern | 2021.6-2021.9

Shenzhen Cloudream Technology Co., Ltd.
Algorithm Developer | 2016.3-2016.7

Professional Activities

Reviewer: NeurIPS 2022
Reviewer: Neurocomputing
Reviewer: CVPR 2022, 2023, 2024
Reviewer: CVIU
Reviewer: ICLR 2022, 2023, 2024
Reviewer: ICCV 2021 Workshop on Structured Representations for Video Understanding
Reviewer: ICCV 2021, 2023
Reviewer: ECCV 2020, 2022
Reviewer: ACM TOMM
Reviewer: TVCG