Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning

Publication
ArXiv Preprint