Vincent Liu
vincent [dot] liu15 [at] gmail [dot] com

I’m building a robotics company to scale robot learning from humans in the real world. In the process of proofing this, I recently first-authored foundational research on zero-shot human-to-robot learning. I post about AI, robotics, and deep tech startups on X.
I received my BA in Mathematics and MS in Computer Science from Stanford University. My past work in AI has spanned theoretical and applied topics in 3D world models, self-driving, speech generation, multi-task learning, self-supervised learning, computer vision, reinforcement learning, and natural language processing.
news
Jun 03, 2025 | We released our research on tactile robot learning from humans Feel the Force: Contact-Driven Learning from Humans 🤲 |
---|---|
May 27, 2025 | We released our research on zero-shot human-to-robot learning EgoZero: Robot Learning from Smart Glasses 📝 |
research
CSM Implicit Shape Foundation Model October 2024 I helped train and scale CSM’s Implicit Shape Foundation Model, a 3D-native latent diffusion-Transformer model that produces 3D occupancy fields from single image in <1 minute, and its pre- and post-training data infrastructure. My work helped CSM land #1 on the 3D Arena Leaderboard and close initial enterprise contracts. ![]() |
CSM Neural Radiance Field Foundation Model March 2024 I helped train and scale CSM’s Neural Radiance Field (NeRF) Foundation Model, a 2D-native NeRF model that reconstructs 3D occupancy fields from single image in <1 minute. This model plugs into a slow refinement process based on Poole et al., 2022 and grew CSM to over 300k+ users. My work was presented at NVIDIA GTC 2024. |
End-to-end Speech Synthesis with Generative Adversarial Networks June 2022 I developed adversarial methods for end-to-end text-to-speech models as part of my Master’s thesis. I also lectured on these concepts for Stanford’s CS 236G: Generative Adversarial Networks Winter 2021 offering. Here are some samples from my implementation, which is an adaptation of Donahue et al., 2021; Kim et al., 2020. |
Tesla Full Self-Driving September 2021 I trained and shipped models to the v9.2, v10.1, v10.2, and v10.3 releases of Tesla’s Full Self-Driving (FSD). My work was also featured in Tesla’s 2021 AI Day event. ![]() |
NVIDIA Neural Modules September 2020 I trained and shipped lightweight speech vocoders to the v1.0.0 release of NVIDIA’s Neural Modules (NeMo), an open-source speech AI toolkit for developers that has now expanded more broadly to conversational AI and LLMs. |
publications
teaching
CS 236G: Generative Adversarial Networks March 2021 I helped create and teach CS 236G: Generative Adversarial Networks, which also became a Coursera course. I designed advanced material based on papers such as Pix2PixHD, GauGAN, SRGAN, BigGAN, and MUNIT. I created workshops (1, 2) that cover topics such as mode collapse, spectral normalization, orthogonal initialization, mixed precision, distributed training, Pytorch best practices, and also gave a guest lecture on adversarial methods in speech synthesis. |