About Me

I am a Research Scientist at CoreAI, Meta Reality Labs, where I develop advanced on-device solutions to strengthen the perception stack for Meta’s MR/VR product lines. My research spans 3D reconstruction, generative AI, and 3D Vision-Language Models (3D VLMs), with a focus on bridging spatial understanding and multimodal intelligence.

Before joining Meta Reality Labs, I was a technical lead and senior machine learning/computer vision engineer with the Video Engineering Group at Apple Inc. I have lead the algorithm development and delivered the shipments of multiple groundbreaking products, includes Room Tracking on VisionPro, RoomPlan Enhancement and RoomPlan. Additionally, I collaborated with Apple AIML on 3D Scene Style Generation, where we pioneered RoomDreamer, the first paper to enable text-driven 3D indoor scene synthesis with coherent geometry and texture.

I received my Ph.D. and M.S. degree from University of Maryland, College Park, where I was advised by Prof. Rama Chellappa. I completed my B.S. degree in Electrical Engineering and Information Science from University of Science and Technology of China. Additionally, I completed internships at Snap Research and the Palo Alto Research Center.

Highlights

Jun, 2025. 🚀 VLM-3R is online! Check out our Project Website, read the arXiv Paper, and explore the Code. VLM-3R, a Vision-Language Model designed for direct, end-to-end spatial and temporal reasoning from monocular video—no external sensors required. It seamlessly fuses 3D spatial context with language understanding, unlocking a new frontier in visual-spatial intelligence.
Mar, 2025, MV-DUSt3R+ is accepted as an Oral at CVPR 2025. Check our Demo and Project. Congratulations to Zhenggang Tang, Yuchen Fan, Dilin Wang, Rakesh Ranjan, Alexander Schwing and Zhicheng Yan
Jan, 2025, MV-DUSt3R+ is Open Souced. Let’s further push the boundary!
Dec, 2024. MV-DUSt3R+ is online, a single-stage, multi-view, and multi-path model capable of reconstructing large-scale scenes from sparse, unconstrained views in just 2 seconds!
Jun, 2024. Room Tracking on VisionPro is unveiled at Apple WWDC 2024. This technology identifies room boundaries, supports precisely-aligned geometries, and recognizes transitions between rooms.
Oct, 2023. Our paper “RoomDreamer: Text-Driven 3D Indoor Scene Synthesis with Coherent Geometry and Texture” is accepted to ACM Multimedia 2023. [arXiv] [demo]. Congratulations to Liangchen Song, Liangliang Cao and all co-authors.
Jun, 2023. RoomPlan Enhancement is introduced at Apple WWDC 2023. It added numerous powerful features to RoomPlan, including multi-room scanning, multi-room layout, object attributes, polygon walls, improved furniture representation, room-type identification, and floor-shape recognition.
Oct, 2022. Our research article, “3D Parametric Room Representation with RoomPlan” is published at Apple Machine Learning Research. Read our research article to learn more!
Jun, 2022. RoomPlan is first released at Apple WWDC 2022. Combining the power of Apple LiDAR, state-of-the-art 3D machine learning, and an intuitive scanning UI, RoomPlan empowers developers to create innovative solutions in interior design, architecture, real estate, and e-commerce.

Hongyu Xu

Highlights