Aim My Robot: Precision Local Navigation to Any Object

Existing navigation systems mostly consider “success” when the robot reaches within 1 m radius to a goal. This precision is insufficient for emerging applications where a robot needs to be positioned precisely relative to an object for downstream tasks, such as docking, inspection, and manipulation. To this end, we design and implement Aim-My-Robot (AMR), a local navigation system that enables a robot to reach any object in its vicinity at the desired relative pose, with centimeter-level accuracy.

Haotian Zhang

Haotian Zhang is a Senior Research Scientist at NV Cosmos. His research aims to enable embodied agents to understand the outside world. To that end, he works on designing sensible modules that learn the effective representation of information from Vision & Language. Haotian's work on GLIP was awarded as CVPR 2022 Best Paper Finalist. Prior to joining NV, he obtained his Ph.D. at the University of Washington. Haotian believes that living an interesting life is done by doing interesting things with interesting people, and that’s what he hopes to do.

Chi-Pin Huang

Chi-Pin Huang is a Research Scientist at NVIDIA Research Taiwan. His research focuses on Vision-Language Generative Models and Vision-Language-Action Models (VLAs), with particular interest in bridging perception, generation, and decision-making. He received his Ph.D. degree from National Taiwan University in 2026 under the supervision of Prof. Yu-Chiang Frank Wang, and earned his B.S.

Sameer Dharur

Sameer Dharur is a research scientist on the Cosmos team at NVIDIA, helping to build vision-language-models (VLMs) that reason better about the world. Prior to that, he spent ~4.5 years as a researcher and engineer at Apple specializing in computer vision and natural language processing to solve problems in image and video understanding, question answering, and robotics.