The tile-size bottleneck
3DGS renderers usually bin, sort, and rasterize at one tile size. Large tiles reduce sort work, small tiles reduce wasted pixel work, and dense views leave a few overloaded tiles controlling frame time.
3D Gaussian Splatting (3DGS) has become the standard for real-time novel view synthesis on commodity GPUs. Its pipeline ties spatial partitioning and rasterization to one tile size, yet the two pull in opposite directions: partitioning, which bins and depth-sorts gaussians, grows cheaper with larger tiles, while rasterization gets cheaper with smaller ones. Prior acceleration work reduces the cost of individual stages but keeps both locked to that single scale, where a few dense tiles dominate frame time. We present Hierarchically Tiled Gaussian Splatting (HiGS), which gives each its own scale: partitioning runs over coarse macro-tiles, while rasterization runs over the fine render tiles within them. Rasterization work is then issued in proportion to the gaussians in each macro-tile rather than per tile, so dense regions spread across many parallel units instead of serializing through one. Across tested scenes, HiGS renders up to ∼15.8× faster than the original 3DGS and outperforms every other rasterizer we evaluate, while preserving exact front-to-back alpha compositing.
3DGS renderers usually bin, sort, and rasterize at one tile size. Large tiles reduce sort work, small tiles reduce wasted pixel work, and dense views leave a few overloaded tiles controlling frame time.
HiGS bins and depth-sorts gaussians by coarse macro-tile, then rasterizes finer render tiles from shared local batches, turning sorting, data reuse, and load balancing into one coherent hierarchy.
Across tested scenes at 1080p and 4K, HiGS runs 1.8–2.2× faster than state-of-the-art rasterizers while preserving exact front-to-back alpha compositing and comparable image quality.
HiGS decouples the spatial scale used for partitioning from the fine render tiles used for blending. Coarse macro-tiles organize and sort gaussian work once, then fused render-tile kernels reuse each local batch across the pixels that actually need it.
Visible gaussians are intersected with coarse macro-tiles and written directly into per-macro-tile depth-keyed lists, shrinking pair counts before sorting begins.
Each macro-tile list becomes an independent depth-sort segment, replacing one global composite-key sort with narrower 32-bit in-segment sorting.
Gaussians are loaded once per macro-tile batch, filtered into render-tile visibility masks inline, and blended front-to-back by dynamically scheduled render tiles.
HiGS is evaluated on Mip-NeRF 360 scenes and a large nvcampus park capture, comparing throughput, image quality, and scaling behavior against modern 3D Gaussian Splatting rasterizers.
Table 4 reports mean FPS across seven Mip-NeRF 360 scenes. HiGS leads every compared rasterizer at both resolutions, including a 1.8–2.2× speedup over state-of-the-art rasterizers and a 3.6–4.4× speedup over gsplat.
| Method | 1080p FPS | 4K FPS | Speedup 1080p | Speedup 4K |
|---|---|---|---|---|
| HiGS | 1937 | 1214 | 1.00× | 1.00× |
| FlashGS | 893 | 670 | 2.17× | 1.81× |
| Faster-GS | 897 | 588 | 2.16× | 2.06× |
| TC-GS | 765 | 499 | 2.53× | 2.43× |
| Speedy-Splat | 643 | 385 | 3.01× | 3.16× |
| StopThePop | 573 | 293 | 3.38× | 4.14× |
| gsplat | 541 | 275 | 3.58× | 4.42× |
| 3DGS | 286 | 102 | 6.77× | 11.86× |
Table 5 isolates rendering-kernel differences and also compares against COLMAP test images. The ground-truth PSNR spread is only 0.04 dB across methods; HiGS preserves the same absolute quality level while using the faster fp16 rendering path.
| Method | vs. gsplat PSNR | GT PSNR | GT SSIM | GT LPIPS |
|---|---|---|---|---|
| HiGS w/o SH comp. | 67.03 | 27.68 | 0.8649 | 0.1034 |
| HiGS w/ SH comp. | 55.59 | 27.67 | 0.8645 | 0.1034 |
| FlashGS | 49.74 | 27.65 | 0.8633 | 0.1046 |
| Faster-GS | 73.86 | 27.68 | 0.8649 | 0.1035 |
| Speedy-Splat | 94.43 | 27.68 | 0.8649 | 0.1034 |
| StopThePop | 94.37 | 27.68 | 0.8649 | 0.1034 |
| 3DGS | 75.04 | 27.68 | 0.8649 | 0.1034 |
| gsplat | reference | 27.68 | 0.8649 | 0.1034 |
Figure 5 evaluates the nvcampus park scene from 5M to 75M gaussians. HiGS keeps the lowest frame time across the sweep and scales roughly linearly, reaching 9.97 ms at 1080p and 10.29 ms at 4K for the 75M-gaussian capture.
Cite the arXiv preprint as follows.
@misc{higs2026,
title = {HiGS: A Hierarchical Rendering Architecture for Real-Time 3D Gaussian Splatting},
author = {Dawid Paj{\k{a}}k and Martin Bisson and Rodolfo Lima},
howpublished = {arXiv preprint},
year = {2026},
url = {https://arxiv.org/abs/2606.00352}
}