Search

Shiyi Lan

OmniDrive: A Holistic Vision-Language Dataset for Autonomous Driving with Counter Factual Reasoning
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Hydra-MDP: End-to-end Multimodal Planning with Multi-target Hydra-Distillation
Is Ego Status All You Need for Open-Loop End-to-End Autonomous Driving?
FocalFormer3D: Focusing on Hard Instance for 3D Object Detection
Fully Attentional Networks with Self-emerging Token Labeling
FB-OCC: 3D Occupancy Prediction based on Forward-Backward View Transformation
Vision Transformers Are Good Mask Auto-Labelers
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision