AI-Mediated 3D Video Conferencing

We present an AI-mediated 3D video conferencing system that can reconstruct and autostereoscopically display a life-sized talking head using consumer-grade compute resources and minimal capture equipment. Our 3D capture uses a novel 3D lifting method that encodes a given 2D input into an efficient triplanar neural representation of the user, which can be rendered from novel viewpoints in real-time. Our AI-based techniques drastically reduce the cost for 3D capture, while providing a high-fidelity 3D representation on the receiver's end at the cost of traditional 2D video streaming.

Flexible Isosurface Extraction for Gradient-Based Mesh Optimization

This work considers gradient-based mesh optimization, where we iteratively optimize for a 3D surface mesh by representing it as the isosurface of a scalar field, an increasingly common paradigm in applications including photogrammetry, generative modeling, and inverse physics.

Tight Bounding Boxes for Voxels and Bricks in a Signed Distance Field Ray Tracer

We present simple methods to compute tight axis-aligned bounding boxes for voxels and for bricks of voxels in a signed distance function renderer based on ray tracing. Our results show total frame time reductions of 20-31% in a real-time path tracer.

GPU-Accelerated Machine Learning in Non-Orthogonal Multiple Access

Non-orthogonal multiple access (NOMA) is an interesting technology that enables massive connectivity as required in future 5G and 6G networks. While purely linear processing already achieves good performance in NOMA systems, in certain scenarios, non-linear processing is mandatory to ensure accept-able performance. In this paper, we propose a neural network architecture that combines the advantages of both linear and non-linear processing. Its real-time detection performance is demonstrated by a highly efficient implementation on a graphics processing unit (GPU).

Neural LiDAR Fields for Novel View Synthesis

We present Neural Fields for LiDAR (NFL), a method to optimise a neural field scene representation from LiDAR measurements, with the goal of synthesizing realistic LiDAR scans from novel viewpoints. NFL combines the rendering power of neural fields with a detailed, physically motivated model of the LiDAR sensing process, thus enabling it to accurately reproduce key sensor behaviors like beam divergence, secondary returns, and ray dropping.

Text2LIVE: Text-Driven Layered Image and Video Editing

We present a method for zero-shot, text-driven appearance manipulation in natural images and videos. Specifically, given an input image or video and a target text prompt, our goal is to edit the appearance of existing objects (e.g., object's texture) or augment the scene with new visual effects (e.g., smoke, fire) in a semantically meaningful manner.

Neural Congealing: Aligning Images to a Joint Semantic Atlas

We present Neural Congealing -- a zero-shot self-supervised framework for detecting and jointly aligning semantically-common content across a given set of images. Our approach harnesses the power of pre-trained DINO-ViT features to learn: (i) a joint semantic atlas -- a 2D grid that captures the mode of DINO-ViT features in the input set, and (ii) dense mappings from the unified atlas to each of the input images.

Late Breaking Results: Test Selection For RTL Coverage By Unsupervised Learning From Fast Functional Simulation

Functional coverage closure is an important but RTL simulation intensive aspect of constrained random verification.   To reduce these computational demands, we propose test selection for functional coverage via machine learning (ML) based anomaly detection in the structural coverage space of fast functional simulators. We achieve promising results on two units from a state-of-the-art production GPU design.