Yiqing Liang

Final-year PhD Candidate @ CS

Brown University

Email / CV / Google Scholar / Github / LinkedIn / Twitter / Blog

About

I am a final-year PhD candidate in Computer Science at Brown University and a member of Brown Visual Computing Group, advised by Professor James Tompkin. During my PhD, I was fortunate to collaborate with Adam Harley, Mikaela Angelina Uy, and Leonidas Guibas from Stanford University.

I received my Master in Computer Science from Columbia University, advised by Professor Shuran Song and Professor Shih-Fu Chang. I completed my Bachelor in Computer Science at Fudan University and was a visiting student at MIT EECS (CSAIL).

I was a research intern at NVIDIA Research with Abhishek Badki, Hang Su, and Orazio Gallo, and at Meta Reality Labs with Numair Khan, Lei Xiao, and Douglas Lanman.

News

[ May-2025 ]: The first End-to-End 3D Learning (E2E3D) workshop is accepted to ICCV 2025! See you in Hawaii.
[ May-2025 ]: MonoDyGauBench got accepted to TMLR 2025!
[ Apr-2025 ]: ZeroMSF is selected as CVPR 2025 Award Candidate (0.48%)!
[ Apr-2025 ]: I am giving an invited talk at Harvard Visual Computing Group!
[ Mar-2025 ]: ZeroMSF is accepted to CVPR 2025 as an Oral Presentation!
[ Feb-2025 ]: I am giving a lightning talk at NYC Computer Vision Day 2025!

[ Nov-2024 ]: I am presenting my internship work with Nvidia Research at NECV 2024!
[ Jun-2024 ]: I started the summer internship at LPR Team @ NVIDIA Research.
[ Dec-2023 ]: My research is recognized in Hugging Face Daily Papers by AK!
[ Dec-2023 ]: I am presenting my internship work with Meta Reality Labs at NECV 2023!
[ Jul-2023 ]: SAFF got accepted to ICCV 2023!
[ Jun-2023 ]: I started the summer internship at DSR team of Meta Reality Labs.

Research Interests

Machines that truly understand our world must grasp how the 3-D world moves, and invites action. My work aims to endow artificial agents with this spatiotemporal intelligence, blending vision, geometry, and high-level reasoning so they can perceive, predict, and plan in real time.

Foundation Models Multimodal LLMs Video Generation Reinforcement Learning World Models Machine Learning

Selected Research [view all]

	MoDoMoDo: Multi-Domain Data Mixtures for Multimodal LLM Reinforcement Learning Yiqing Liang, Jielin Qiu, Wenhao Ding, Zuxin Liu, James Tompkin, Mengdi Xu, Mengzhou Xia, Zhengzhong Tu, Laixi Shi, Jiacheng Zhu Under Review, 2025 project / paper / data / code / bibtex We introduce MoDoMoDo, a systematic post-training framework for Multimodal LLM RLVR, featuring a rigorous data mixture problem formulation and benchmark implementation.
	E3D-Bench: A Benchmark for End-to-End 3D Geometric Foundation Models Wenyan Cong, Yiqing Liang, Yancheng Zhang, Ziyi Yang, Yan Wang, Boris Ivanovic, Marco Pavone, Chen Chen, Zhangyang Wang, Zhiwen Fan Under Review, 2025 project / paper / code / bibtex We present the first comprehensive benchmark for 3D end‑to‑end 3D geometric foundation models, covering five core tasks: sparse-view depth estimation, video depth estimation, 3D reconstruction, multi-view pose estimation, novel view synthesis, and spanning both standard and challenging out-of-distribution datasets.
	Zero-Shot Monocular Scene Flow Estimation in the Wild Yiqing Liang, Abhishek Badki^, Hang Su^, James Tompkin, Orazio Gallo CVPR, 2025 Oral, Award Candidate (0.48%) project / paper / video / code / bibtex We present ZeroMSF, the first generalizable 3D foundation model that understands monocular scene flow for diverse real-world scenarios, utilizing our curated data recipe of 1M synthetic training samples.
	Monocular Dynamic Gaussian Splatting is Fast and Brittle and Scene Complexity Rules Yiqing Liang, Mikhail Okunev, Mikaela Angelina Uy, Runfeng Li, Leonidas J. Guibas, James Tompkin, Adam Harley TMLR, 2025 project / paper / data / code / bibtex We present a benchmark of dynamic Gaussian Splatting methods for monocular view synthesis, combining existing datasets and a new synthetic dataset to provide standardized comparisons and identify key factors affecting efficiency and quality.
	GauFRe: Gaussian Deformation Fields for Real-time Dynamic Novel View Synthesis Yiqing Liang, Numair Khan, Zhengqin Li, Thu Nguyen-Phuoc, Douglas Lanman, James Tompkin, Lei Xiao CVPR, 2024, CV4MR WACV, 2025 project / paper / code / bibtex We propose GauFRe: a dynamic scene reconstruction method using deformable 3D Gaussians for monocular video that is efficient to train, renders in real-time and separates static and dynamic regions.
	Semantic Attention Flow Fields for Monocular Dynamic Scene Decomposition Yiqing Liang, Eliot Laidlaw, Alexander Meyerowitz, Srinath Sridhar, James Tompkin ICCV, 2023 project / paper / code / bibtex We present SAFF: a dynamic neural volume reconstruction of a casual monocular video that consists of time-varying color, density, scene flow, semantics, and attention information.
	SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning Zhecan Wang, Haoxuan You, Liunian Harold Li, Alireza Zareian, Suji Park, Yiqing Liang, Kai-Wei Chang, Shih-Fu Chang AAAI, 2022 paper / bibtex We propose a Scene Graph Enhanced Image-Text Learning (SGEITL) framework to incorporate visual scene graph in commonsense reasoning
	SSCNav: Confidence-Aware Semantic Scene Completion for Visual Semantic Navigation Yiqing Liang, Boyuan Chen, Shuran Song ICRA, 2021 project / paper / video / code / bibtex We explicitly model scene priors using a confidence-aware semantic scene completion module to complete the scene and guide the agent's navigation planning.