3D Gaussian Splatting (3DGS) has shown great potential for efficient reconstruction and high-fidelity real-time rendering of complex scenes on consumer hardware. However, due to its rasterization-based formulation, 3DGS is constrained to ideal pinhole cameras and lacks support for secondary lighting effects. Recent methods address these limitations by tracing volumetric particles instead, however, this comes at the cost of significantly slower rendering speeds. In this work, we propose 3D Gaussian Unscented Transform (3DGUT), replacing the EWA splatting formulation in 3DGS with the Unscented Transform that approximates the particles through sigma points, which can be projected exactly under any nonlinear projection function. This modification enables trivial support of distorted cameras with time dependent effects such as rolling shutter, while retaining the efficiency of rasterization. Additionally, we align our rendering formulation with that of tracing-based methods, enabling secondary ray tracing required to represent phenomena such as reflections and refraction within the same 3D representation.
Qualitative comparison of our novel-view synthesis results against 3DGS on the MipNERF360 dataset. 3DGUT achieve comparable perceptual quality.
The unscented transform enables our method to support complex camera models, such as fisheye cameras, without requiring a true ray-tracing formulation. We compare our approach against FisheyeGS, demonstrating through both quantitative and qualitative evaluations that 3DGUT significantly outperforms FisheyeGS across all perceptual metrics. Notably, 3DGUT achieves this with fewer than half the particles (0.38M vs. 1.07M). While FisheyeGS relies on deriving a Jacobian specific to this particular fisheye camera model—restricting its generalizability even to closely related models (e.g., fisheye cameras with distortions)—our simple yet robust formulation delivers superior performance and can be effortlessly adapted to any camera model.
Apart from the modeling of distorted cameras, 3DGUT can also faithfully incorporate the camera motion into the projection formulation, hence offering support for time-dependent camera effects such as rolling-shutter, which are commonly encountered in the fields of autonomous driving and robotics. Although optical distortion can be addressed with image rectification[1], incorporating time-dependency of the projection function in the linearization framework is highly non-trivial.
[1] Image rectification is generally effective only for low-FoV cameras and results in information loss.
Our method enables the simulation of reflections and refractions—effects traditionally achievable only through ray tracing—using a hybrid rendering scheme. Specifically, we begin by computing all primary ray intersections with the scene. These primary rays are then rendered using our splatting method by discarding Gaussian hits that fall behind a ray's closest intersection. Finally, we compute secondary rays and trace them using 3DGRT. This capability is made possible by our method's ability to generate a 3D representation fully consistent with 3DGRT.
Real-world AV and robotics applications often need to account for distorted intrinsic camera models and time-dependent effects like rolling shutter distortions caused by high sensor speeds. 3DGUT (sorted) can faithfully handle these effects naturally and reaches comparable performance to ray tracing-based reconstruction methods. Below, we show qualitative results on the Waymo dataset against 3DGRT.
While Monte Carlo sampling is expensive to compute, it provides accurate reference distributions for assessing the quality of both EWA () and our UT-based projection () methods. This assessment can be quantified using the Kullback-Leibler (KL) divergence between both 2D distributions, where lower KL values indicate the projected Gaussians better approximate the reference projections. In the figure below, we evaluate the KL divergence for a fixed reconstruction. Specifically, for each visible Gaussian, we compare the projections obtained using either method under different camera and pose configurations against MC-based references (using 500 samples per reference). The resulting KL divergence distributions are visualized in the histograms at the bottom.