Machine Learning | NVIDIA Research Tel-Aviv Lab

An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

Abstract Text-to-image models offer unprecedented freedom to guide creation through natural language. Yet, it is unclear how such freedom can be exercised to generate images of specific unique concepts, modify their appearance, or compose them in new roles and novel scenes.

"This is my unicorn, Fluffy": Personalizing frozen vision-language representations

Abstract Large Vision & Language models pretrained on web-scale data provide representations that are invaluable for numerous V&L problems. However, it is unclear how they can be used for reasoning about user-specific visual concepts in unstructured language.

Perception and Reasoning

Understanding of a complex scene goes way beyond top-down perception. When people operate in a natural scene, they can detect and recognize objects and relations using context, they can predict how objects and people will move next, and even reason why they behave as they do.

From Local Structures to Size Generalization in Graph Neural Networks

Graph neural networks (GNNs) can process graphs of different sizes, but their ability to generalize across sizes, specifically from small to large graphs, is still not well understood. In this paper, we identify an important type of data where …

How to Stop Epidemics: Controlling Graph Dynamics with Reinforcement Learning and Graph Neural Networks

We consider the problem of monitoring and controlling a partially-observed dynamic process that spreads over a graph. This problem naturally arises in contexts such as scheduling virus tests or quarantining individuals to curb a spreading epidemic; …

Compositional Video Synthesis with Action Graphs

Video Abstract Videos of actions are complex signals, containing rich compositional structure. Current video generation models are limited in their ability to generate such videos. To address this challenge, we introduce a generative model (AG2Vid) that can be conditioned on an Action Graph, a structure that naturally represents the dynamics of actions and interactions between objects.

GP-Tree: A Gaussian Process Classifier for Few-Shot Incremental Learning

Video Abstract Gaussian processes (GPs) are non-parametric, flexible, models that work well in many tasks. Combining GPs with deep learning methods via deep kernel learning is especially compelling due to the strong expressive power induced by the network.

Known unknowns: Learning novel concepts using exploratory reasoning-by-elimination

Video Abstract Cite the paper If you use the contents of this project, please cite our paper. @article{hagrawal2021unknown, title={Known unknowns: Learning novel concepts using exploratory reasoning-by-elimination}, author={Harsh Agrawal, Eli Meirom, Yuval Atzmon, Shie Mannor, Gal Chechik}, journal={Uncertainty in artificial intelligence}, year={2021} }

Personalized Federated Learning using Hypernetworks

Video Abstract Personalized federated learning is tasked with training machine learning models for multiple clients, each with its own data distribution. The goal is to train personalized models in a collaborative way while accounting for data disparities across clients and reducing communication costs.

A causal view of compositional zero-shot recognition

Video Abstract People easily recognize new visual categories that are new combinations of known components. This compositional generalization capacity is critical for learning in real-world domains like vision and language because the long tail of new combinations dominates the distribution.