Loading...



Keynote Speakers

Yaser Sheikh
Professor at CMU
ex-Meta VP
Founder of AI Venture
TBD
TBD

Introduction

The 1st International Workshop on Interactive Physical AI (IPA 2026) at CVPR 2026 will bring together researchers from computer vision, robotics, and multimodal AI, providing the first comprehensive forum to address the full scope of interactive physical AI systems while building upon prior workshops that have explored subsets of this space. The workshop topics include (but are not limited to):

  • Human-AI interaction in physical environments
  • Embodied conversational AI and multimodal learning
  • Full-duplex multimodal conversational models
  • Social intelligence and communication for robots and avatars
  • Egocentric vision and first-person perception
  • Real-time audio-visual processing for interactive systems
  • Safe and cooperative human-robot interaction
  • Personalization and lifelong learning for physical AI
  • Privacy-aware learning in interactive settings
  • Physically authentic perception and generation for avatars and agents

We will be hosting invited speakers and will also be accepting the submission of full unpublished papers. These papers will be peer-reviewed via a double-blind process, and will be published in the official CVPR 2026 workshop proceedings and be presented at the workshop itself.

What is Interactive Physical AI?

Advances in multimodal learning, embodied intelligence, and conversational AI are transforming how humans interact with intelligent AI systems situated alongside us in our physical world. We define such systems as Interactive Physical AI (IPA). IPA systems simultaneously

  1. Perceive humans and scenes using audio-visual signals
  2. Generate communication signals via verbal and nonverbal behaviors (speech, prosody, backchannels, visual cues such as gaze and gestures)
  3. Act safely and effectively under physical-world constraints in shared spaces

Embodiments of IPA include:

  • Robots (both humanoids and non-humanoids)
  • Physically-grounded and environment-aware avatars (e.g., AR telepresence)
  • On-device audio-visual agents
that interact with humans in the physical world.


Call for Papers

Submission: We invite authors to submit unpublished papers (8-page CVPR format) to our workshop, to be presented at a poster session upon acceptance. All submissions will go through a double-blind review process. All contributions must be submitted (along with supplementary materials, if any) on OpenReview (The link will be provided soon).

Accepted papers will be published in the official CVPR Workshops proceedings and the Computer Vision Foundation (CVF) Open Access archive.

Note: Authors of previously rejected main conference submissions are also welcome to submit their work to our workshop. When doing so, you must submit the previous reviewers' comments (named as previous_reviews.pdf) and a letter of changes (named as letter_of_changes.pdf) as part of your supplementary materials to clearly demonstrate the changes made to address the comments made by previous reviewers.



Important Dates


Paper Submission Deadline February 28, 2026 (23:59 PST)
Notification to Authors March 20, 2026
Camera-Ready Deadline April 10, 2026


Schedule

Schedule to be announced.


Keynote Speakers


Yaser Sheikh
Professor at Carnegie Mellon University
ex-VP at Meta
Founder of AI Venture in Stealth

Yaser Sheikh builds frontier systems that enable machines to perceive and predict. He is currently a Consulting Professor at Carnegie Mellon University and the founder of a new AI venture focused on foundational advances in long-horizon foresight.

Previously, he served as a Vice President at Meta (2015–2025), where he founded the Meta Reality Lab in Pittsburgh and led the invention and productization of Codec Avatars, a breakthrough in real-time, photorealistic telepresence that will usher in the next generation of global communication.

Before Meta, he spent over a decade as faculty at CMU's Robotics Institute (2006–2019), where his group developed fundamental advances in machine perception, including OpenPose and the Panoptic Studio, systems that reshaped how AI understands human motion and behavior.

More speakers to be announced.


Organizers

Leena Mathur
Carnegie Mellon University
Koki Nagano
NVIDIA



Workshop sponsored by: