Abstract layered spatial research visualization for the HiGS project

Research Whitepaper

HiGS

A hierarchical rendering architecture that lets 3D Gaussian Splatting scale from dense scenes to real-time views without giving up exact alpha compositing.

Dawid Pająk Martin Bisson Rodolfo Lima

NVIDIA

Render rate comparison at equal compute budget, 4K resolution, 75M Gaussians, RTX Pro 6000. gsplat reference (left) vs gsplat with HiGS (right). HiGS achieves renders at 3.5x while maintaining image quality.

Abstract

3D Gaussian Splatting (3DGS) has become the standard for real-time novel view synthesis on commodity GPUs. Its pipeline ties spatial partitioning and rasterization to one tile size, yet the two pull in opposite directions: partitioning, which bins and depth-sorts gaussians, grows cheaper with larger tiles, while rasterization gets cheaper with smaller ones. Prior acceleration work reduces the cost of individual stages but keeps both locked to that single scale, where a few dense tiles dominate frame time. We present Hierarchically Tiled Gaussian Splatting (HiGS), which gives each its own scale: partitioning runs over coarse macro-tiles, while rasterization runs over the fine render tiles within them. Rasterization work is then issued in proportion to the gaussians in each macro-tile rather than per tile, so dense regions spread across many parallel units instead of serializing through one. Across tested scenes, HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer we evaluate, while preserving exact front-to-back alpha compositing.

Highlights

Problem

The tile-size bottleneck

3DGS renderers usually bin, sort, and rasterize at one tile size. Large tiles reduce sort work, small tiles reduce wasted pixel work, and dense views leave a few overloaded tiles controlling frame time.

Approach

Two-scale rendering

HiGS bins and depth-sorts gaussians by coarse macro-tile, then rasterizes finer render tiles from shared local batches, turning sorting, data reuse, and load balancing into one coherent hierarchy.

Evidence

Faster exact compositing

Across tested scenes at 1080p and 4K, HiGS runs 1.8–2.2× faster than state-of-the-art rasterizers while preserving exact front-to-back alpha compositing and comparable image quality.

Method Overview

HiGS decouples the spatial scale used for partitioning from the fine render tiles used for blending. Coarse macro-tiles organize and sort gaussian work once, then fused render-tile kernels reuse each local batch across the pixels that actually need it.

1

Macro-tile partitioning

Visible gaussians are intersected with coarse macro-tiles and written directly into per-macro-tile depth-keyed lists, shrinking pair counts before sorting begins.

2

Segmented sort cascade

Each macro-tile list becomes an independent depth-sort segment, replacing one global composite-key sort with narrower 32-bit in-segment sorting.

3

Fused tile rasterization

Gaussians are loaded once per macro-tile batch, filtered into render-tile visibility masks inline, and blended front-to-back by dynamically scheduled render tiles.

Results

HiGS is evaluated on Mip-NeRF 360 scenes and a large nvcampus park capture, comparing throughput, image quality, and scaling behavior against modern 3D Gaussian Splatting rasterizers.

Throughput Across Rasterizers

Table 4 reports mean FPS across seven Mip-NeRF 360 scenes. HiGS leads every compared rasterizer at both resolutions, including a 1.8–2.2× speedup over state-of-the-art rasterizers and a 3.6–4.4× speedup over gsplat.

Method 1080p FPS 4K FPS Speedup 1080p Speedup 4K
HiGS 1937 1214 1.00× 1.00×
FlashGS 893 670 2.17× 1.81×
Faster-GS 897 588 2.16× 2.06×
TC-GS 765 499 2.53× 2.43×
Speedy-Splat 643 385 3.01× 3.16×
StopThePop 573 293 3.38× 4.14×
gsplat 541 275 3.58× 4.42×
3DGS 286 102 6.77× 11.86×

Quality Remains Comparable

Table 5 isolates rendering-kernel differences and also compares against COLMAP test images. The ground-truth PSNR spread is only 0.04 dB across methods; HiGS preserves the same absolute quality level while using the faster fp16 rendering path.

Method vs. gsplat PSNR GT PSNR GT SSIM GT LPIPS
HiGS w/o SH comp. 67.03 27.68 0.8649 0.1034
HiGS w/ SH comp. 55.59 27.67 0.8645 0.1034
FlashGS 49.74 27.65 0.8633 0.1046
Faster-GS 73.86 27.68 0.8649 0.1035
Speedy-Splat 94.43 27.68 0.8649 0.1034
StopThePop 94.37 27.68 0.8649 0.1034
3DGS 75.04 27.68 0.8649 0.1034
gsplat reference 27.68 0.8649 0.1034

Scaling To Dense Scenes

Figure 5 evaluates the nvcampus park scene from 5M to 75M gaussians. HiGS keeps the lowest frame time across the sweep and scales roughly linearly, reaching 9.97 ms at 1080p and 10.29 ms at 4K for the 75M-gaussian capture.

HiGS scaling results on nvcampus across 5M to 75M gaussians at 1080p and 4K
Frame time and tile-gaussian pair counts from Figure 5, comparing HiGS with prior rasterizers on the nvcampus park capture.

Citation

Cite the arXiv preprint as follows.

@misc{higs2026,
  title        = {HiGS: A Hierarchical Rendering Architecture for Real-Time 3D Gaussian Splatting},
  author       = {Dawid Paj{\k{a}}k and Martin Bisson and Rodolfo Lima},
  howpublished = {arXiv preprint},
  year         = {2026},
  url          = {https://arxiv.org/abs/2606.00352}
}