Research

My recent work explores generative 3D world models, specifically diffusion-based occupancy prediction for exploration, mapping, and long-horizon planning in mobile robots. I develop algorithms and AI models that enable mobile robots to perceive, predict, and act effectively in complex, partially observed, and egocentric environments. My work focuses on: multi-modal machine perception for 3D scene understanding, generative occupancy modeling, and Vision–Language–Action (VLA) models that unify perception, language, and control for grounded decision-making.


Publications

RF-Modulated Adaptive Communication Improves Multi-Agent Robotic Exploration

Robust Robotic Exploration and Mapping Using Generative Occupancy Map Synthesis

Online Diffusion-Based 3D Occupancy Prediction at the Frontier with Probabilistic Map Reconciliation

SceneSense: Diffusion Models for 3D Occupancy Synthesis from Partial Observation


Projects and Presentations

Advancing Robotics with Vision-Language-Action Models

Sim-to-Sim Transfer Framework

  • Developed a simulation-to-simulation transfer pipeline to validate control policies across different physics engines (comparing IsaacSim with MuJoCo).
  • Read the project write-up here

Vision Language Models: PaliGemma from Scratch

  • Recreated the PaliGemma Vision-Language Model (VLM) architecture entirely from scratch using PyTorch.
  • Implemented the complete model structure to deepen understanding of multimodal integration and large-scale model design.
  • View the GitHub repo here