Knowledge Distillation

Unified Reinforcement and Imitation Learning for Vision-Language Models

Vision-Language Models (VLMs) have achieved remarkable progress, yet their large scale often renders them impractical for resource-constrained environments. This paper introduces Unified Reinforcement and Imitation Learning (RIL), a novel and …

VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models

The recent surge in high-quality visual instruction tuning samples from closed-source vision-language models (VLMs) such as GPT-4V has accelerated the release of open-source VLMs across various model sizes. However, scaling VLMs to improve …