Learning Physically Simulated Tennis Skills from Broadcast Videos

Abstract

We present a system that learns diverse, physically simulated tennis skills from large-scale demonstrations of tennis play harvested from broadcast videos. Our approach is built upon hierarchical models, combining a low-level imitation policy and a high-level motion planning policy to steer the character in a motion embedding learned from broadcast videos. When deployed at scale on large video collections that encompass a vast set of examples of real-world tennis play, our approach can learn complex tennis shotmaking skills and realistically chain together multiple shots into extended rallies, using only simple rewards and without explicit annotations of stroke types. To address the low quality of motions extracted from broadcast videos, we correct estimated motion with physics-based imitation, and use a hybrid control policy that overrides erroneous aspects of the learned motion embedding with corrections predicted by the high-level policy. We demonstrate that our system produces controllers for physically-simulated tennis players that can hit the incoming ball to target positions accurately using a diverse array of strokes (serves, forehands, and backhands), spins (topspins and slices), and playing styles (one/two-handed backhands, left/right-handed play). Overall, our system can synthesize two physically simulated characters playing extended tennis rallies with simulated racket and ball dynamics.

Video

Diverse Tennis Skills

Serve	Forehand topspin
Backhand topspin	Backhand slice

Simulation/Visualization

Simulation

Visualization

Task Performance

Same incoming ball + Different target

Different incoming ball + Same target

Different Player Styles

Right-hand, two-handed backhand

Left-hand, two-handed backhand

Two Player Rallies

Right-hand, one-handed backhand (near) vs. Right-hand, two-handed backhand (far)

View 1

View 2

View 3

Left-hand, two-handed backhand (near) vs. Right-hand, one-handed backhand (far)

View 1

View 2

View 3

Citation

      
@article{
  zhang2023vid2player3d,
  author = {Zhang, Haotian and Yuan, Ye and Makoviychuk, Viktor and Guo, Yunrong and Fidler, Sanja and Peng, Xue Bin and Fatahalian, Kayvon},
  title = {Learning Physically Simulated Tennis Skills from Broadcast Videos},
  journal = {ACM Trans. Graph.},
  issue_date = {August 2023},
  numpages = {14},
  doi = {10.1145/3592408},
  publisher = {ACM},
  address = {New York, NY, USA},
  keywords = {physics-based character animation, imitation learning, reinforcement learning},
}

Acknowledgement

All the broadcast videos are provided courtesy of the Tennis Channel.

Contact

For any question regarding this research paper, please contact Haotian Zhang via haotianz@nvidia.com

Template adapted from GLAMR.

Learning Physically Simulated Tennis Skills from Broadcast Videos

Abstract

Video

Diverse Tennis Skills

Serve

Forehand topspin

Backhand topspin

Backhand slice

Simulation/Visualization

Simulation

Visualization

Task Performance

Same incoming ball + Different target

Different incoming ball + Same target

Different Player Styles

Right-hand, two-handed backhand

Left-hand, two-handed backhand

Two Player Rallies

Right-hand, one-handed backhand (near) vs. Right-hand, two-handed backhand (far)

View 1

View 2

View 3

Left-hand, two-handed backhand (near) vs. Right-hand, one-handed backhand (far)

View 1

View 2

View 3

Citation

Acknowledgement

Contact