Democratizing Immersive Experiences with NVIDIA AI

GTC 2025 XR Pavillion

Figure 1: We present novel immersive experiences with free-viewpoint static and dynamic 3D content. Depicted here is a dynamic (boxer) and static (footballer) scene, shown on a large 65-inch light field display (Looking Glass). Our exhibit demonstrates QUEEN, a method to compress 3D dynamic scenes and render free viewpoints in real-time, and an efficient method for rendering light field quilts from 3D representations.


Abstract

The AMRI team at NVIDIA Research presents novel immersive 3D experiences at GTC 2025 that allows users to move around in a streaming video, in 3D, in real-time. This allows for a highly immersive video viewing experience, especially when paired with 3D displays such as the Looking Glass Go or virtual/mixed reality headsets. To make free-viewpoint videos streamable, we contribute two key advancements: (a) an efficient, on-the-fly compression scheme for 3D dynamic videos, called QUEEN, and (b) a novel efficient rendering scheme for light field displays. QUEEN allows dynamic 3D videos to be compressed with high ratio and short training times, then rendered at over 300FPS, while our novel light field renderer is able to render multiple possible viewpoints (a light field quilt) in one efficient pass, running at >30FPS.

QUEEN: QUEEN is a novel framework for streaming compressed free‐viewpoint videos using 3D Gaussian Splatting. It learns per-frame Gaussian attribute residuals without structural constraints and employs quantization and sparsity (via a learned gating module) to efficiently compress dynamic and static scene content. On diverse benchmarks, QUEEN achieves high-quality reconstructions with models as small as 0.7 MB/frame, trains in under 5 seconds, and renders at roughly 350 FPS.



Experiences

We show two innovative and immersive experiences at our booth for viewing dynamic 3D videos.

Free-viewpoint Video

We use QUEEN to compress a dynamic 3D scene into a quantized set of 3D Gaussian primitives, then play it back on both a large 2D display (with mouse-based viewpoint control) and a Looking Glass light field displays. The explicit and efficient compressed 3D Gaussian representation easily allows 3D scene modifications and camera changes both at later stages of content-creation/production and with the end-user, all rendering at a high frame rate (>300 on a standard display).

(a) Family scene on a light field display

(b) Boxer scene on a 2D monitor with viewpoint control

Figure 2: Viewing dynamic 3D scenes using (a) a light field display or (b) large 2D display with viewpoint control. The viewpoint control allows viewers to take control of their immersive 3D experience.

Immersive XR Video

The immersiveness of 3D video is best experienced in the most immersive form factor, virtual reality. We render and display dynamic 3D scenes on a VR headset and allow viewpoint control with both the mouse and head movement. Our high-resolution and high-framerate content is highly realistic (e.g. the punches of the boxer results in some users flinching).

(a) Boxer scene in VR

(b) Basketball scene in VR

Figure 3: Viewing static and dynamic 3D scenes using a virtual reality headset allows higher levels of realism and immersion as the scene is moved and animated according to the user's head movement.


Key Features

Here, we highlight the core concepts that enable our immersive demonstrations.

Figure 4: In contrast to traditional 2D video streaming, 3D free-viewpoint video streaming requires simultaneous 3D/4D reconstruction and compression to handle dynamic 3D scenes. This is highly challenging to do in an efficient manner, while retaining visual quality and realism.

QUEEN addresses the problem of making free-viewpoint videos streamable. Figure 4 shows that, in addition to the typical encoding and decoding required for 2D video streaming, streaming free-viewpoint video requires simultaneous reconstruction, compression, and rendering capabilities.

QUEEN solves these challenges with a system for learning compressed representations of dynamic 3D Gaussian splats, as shown (Figure 5). Our approach learns streamable 3D Gaussian attribute residuals at each time-step. We develop a quantization-sparsity framework for compressing position residuals via sparsity and all other attributes via quantization. Our compressed latents are learned in an end-to-end differentiable manner during 3DGS training. We also develop an adaptive masking technique to split static and dynamic Gaussians along with corresponding image regions to speed up per frame training.

Figure 5: QUEEN incrementally updates Gaussian attributes by simultaneously learning and compressing residuals between consecutive time-steps via a quantization-sparsity framework. We additionally render only the dynamic Gaussians for masked regions for faster performance.

QUEEN achieves state-of-the-art performance for compressed, streamable 3DGS video, with higher compression rates (fewer megabytes per frame) and significantly improved visual fidelity (Figure 6).

Figure 6: QUEEN compresses dynamic 3D scenes with high visual quality while reducing the model size greatly, training in just a few seconds and rendering at >300 FPS. (Multi-view training images courtesy of Google's DeepView Video Project).


Partners

We thank our partners for providing high quality 3D and 4D content.


Acknowledgments

This website is based off of the EG3D website.




Privacy PolicyManage My PrivacyDo Not Sell or Share My DataTerms of ServiceAccessibilityCorporate Policies Contact