Spatial Intelligence Lab (SIL) NVIDIA Research

Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction

1NVIDIA
2Università della Svizzera italiana, Lugano, Switzerland

Neural Harmonic Textures for novel view synthesis. We attach learnable feature vectors (right) to the vertices of virtual bounding tetrahedra encapsulating each primitive (center). After harmonic encoding and accumulation along the ray, a small neural network decodes the resulting signal into RGB color (left).

Abstract


Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging.

We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost.

Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.

Method Overview


We present our approach using 3D Gaussian primitives in the context of novel view synthesis, following 3DGUT. Our method has three core components: particle-bound feature embedding (latent features on a virtual scaffold per primitive), harmonic texturing (periodic encoding of interpolated features before alpha blending), and neural deferred decoding (a single MLP evaluation per pixel to reconstruct color from the accumulated harmonics).

Neural Harmonic Textures pipeline

Neural Harmonic Textures applied to novel-view synthesis. We attach feature vectors \(\mathbf{f}_i\) to the vertices of tetrahedra inscribing the Gaussian primitives. We evaluate the point along the ray where the projected Gaussian has maximum response, barycentrically interpolate vertex features there, encode them with sine and cosine, alpha-blend along the ray, and decode the sum of harmonics with a shallow MLP in a single image-space pass.

Particle-bound feature embedding

Each primitive is bounded by an ellipsoid in world space, which becomes a sphere in whitened canonical space. We define a virtual bounding tetrahedron in this canonical space and attach one \(N_f\)-dimensional feature vector \(\mathbf{f}^j \in \mathbb{R}^{N_f}\) to each of its four vertices (\(j \in \{0,1,2,3\}\)). For each ray–primitive intersection, we take the point \(\mathbf{p}^*\) where the projected Gaussian has maximum response along the ray (following 3DGUT). The feature at that location is given by barycentric interpolation of the vertex features:

\[ \mathbf{f} = \sum_{j=0}^{3} w_j \, \mathbf{f}^j, \qquad \text{with } \sum_{j=0}^{3} w_j = 1 \text{ and } w_j \geq 0, \] where \(w_j\) are the barycentric coordinates of \(\mathbf{p}^*\) in the tetrahedron.

Harmonic texturing

Rather than decoding each sample with an MLP before blending (which would require many evaluations per ray), we encode the interpolated features with periodic functions and blend the encodings along the ray. Concretely, we map \(\mathbf{f}_i\) to \(\bigl[\sin(\mathbf{f}_i); \cos(\mathbf{f}_i)\bigr]\), turning the signal into a sum of harmonic components—we call this harmonic texturing. The primitive opacity \(\alpha_i\) and transmittance \(T_i\) act as the amplitude of each harmonic. Large differences between vertex features within a primitive yield rapidly varying (high-frequency) textures; the interpolation thus acts as a frequency modulator.

Latent vectors

(a) Latent vectors on virtual tetrahedron

Interpolation

(b) Interpolation at \(\mathbf{p}^*\)

Harmonic decomposition

(c) Harmonic decomposition

Neural deferred decoding

After alpha-compositing these harmonic textures along the ray, we decode the final pixel color in a single image-space pass—neural deferred decoding. The rendering equation is

\[ \mathbf{c} = \mathrm{MLP}_\theta \left( \; \sum_{i \in \mathcal{G}} \alpha_i \, T_i \, \begin{bmatrix} \sin(\mathbf{f}_i) \\ \cos(\mathbf{f}_i) \end{bmatrix}, \;\; k \cdot \mathrm{SH}_2(\mathbf{d}) \; \right), \] where \(\mathcal{G}\) is the set of primitives intersected by the ray, \(\mathbf{f}_i\) is the interpolated feature at \(\mathbf{p}_i^*\), and \(\mathrm{SH}_2(\mathbf{d})\) encodes the ray direction for view-dependent effects.

Results


We consistently outperform prior work on standard benchmarks (MipNeRF360, Tanks and Temples, Deep Blending) and excel at high-frequency detail, specular highlights, and view-dependent effects. Below we show qualitative comparisons and per-scene videos.

Main benchmark comparison

Comparison on MipNeRF360, Tanks & Temples, and Deep Blending: neural field methods, primitive-based methods, and mixed methods. Our method uses 64 features per primitive (16 per vertex), 128×3 MLP; indoor 2M / outdoor 5M primitives. All results in MipNeRF360 use original compressed JPEG references; expect +~0.3 dB PSNR with gsplat's default downscaling. Also note that Spherical Voronoi tweaks learning rates per dataset, while all other methods including ours do not.
Method MipNeRF360 Tanks & Temples Deep Blending
PSNRSSIMLPIPS PSNRSSIMLPIPS PSNRSSIMLPIPS
Instant NGP-Big25.590.6950.37521.920.7400.34224.960.8150.459
Mip-NeRF 36027.600.7880.27522.220.7540.29029.400.8990.306
ZipNeRF28.550.8290.21823.640.8360.179
2DGS27.220.8040.27522.850.8270.24429.560.9040.325
3DGS-MCMC27.990.8300.22924.460.8660.17429.490.9120.306
3DGUT-MCMC27.820.8260.23324.200.8610.18029.870.9130.309
Beta Splatting-MCMC28.120.8310.23824.540.8660.19629.560.9070.316
Spherical Voronoi28.560.8350.22824.800.8710.17230.340.9140.299
Triangle Splatting27.000.8080.23123.050.8430.19128.920.8910.308
Textured Gaussians27.350.82724.260.85428.330.891
NeST26.540.7760.260
Radiance Meshes27.150.8100.27423.130.8510.20029.390.9010.362
Neural Harmonic Textures (Ours)28.740.8340.21625.680.8820.14130.940.9190.302

Novel View Synthesis (3DGUT + NHT)

Controlled comparison: same framework (gsplat, 3DGUT), varying only the appearance model—Spherical Harmonics (SH), Spherical Voronoi (SV), and ours (NHT). NHT consistently outperforms SH and SV without sacrificing real-time performance.

Video (Ours)
Spherical Voronoi
Ours
3DGUT
Spherical Voronoi Ours 3DGUT
Show Ours vs Ground Truth
Ours
Ground Truth
Video (Ours)
Spherical Voronoi
Ours
3DGUT
Spherical Voronoi Ours 3DGUT
Show Ours vs Ground Truth
Ours
Ground Truth
Video (Ours)
Spherical Voronoi
Ours
3DGUT
Spherical Voronoi Ours 3DGUT
Show Ours vs Ground Truth
Ours
Ground Truth
Video (Ours)
Spherical Voronoi
Ours
3DGUT
Spherical Voronoi Ours 3DGUT
Show Ours vs Ground Truth
Ours
Ground Truth

Radiance Field Reconstruction

Comparison on MipNeRF360 and Tanks and Temples

Comparison with previous works on MipNeRF360 and Tanks and Temples. Our method models high-frequency detail and view-dependent effects more faithfully.

Compactness

Reconstruction quality vs. primitive count (1K–4M). NHT outperforms 3DGS and 3DGUT across the range, especially in the low-primitive regime; we can match 1M-primitive performance of prior work with about a third of the primitives.

Compactness study

Bonsai — 10K Gaussians

Side-by-side comparison on the Bonsai scene using only 10K Gaussians. Neural Harmonic Textures effectively detaches geometry from appearance, recovering significantly more detail than 3DGUT-MCMC with the same primitive budget.

3DGUT-MCMC
Neural Harmonic Textures (Ours)
Method PSNR SSIM LPIPS
3DGUT-MCMC 23.97 0.724 0.548
Neural Harmonic Textures (Ours) 27.63 (+3.7) 0.856 (+0.132) 0.410 (-0.138)

Other Applications


NHT enables applications beyond radiance fields. Its ability to detach signal dimensionality from primitive complexity enables fitting high-dimensional signals at any resource budget. It is also completely agnostic to the underlying primitive type: we integrate it with 2DGS and Triangle Splatting (using bounding triangles and three features per primitive). We also demonstrate semantic field reconstruction (joint RGB + LSEG 512-d features), outperforming Feature 3DGS, 2D image fitting on high-resolution HDR RAW images with strong perceptual quality at high compression, and PBR material stack rendering with intrinsics generated using RGB2X at real-time rates.

2D image fitting

2D image compression

2D image fitting: NHT vs Instant NGP at 100× compression (45.7MP 14-bit HDR RAW). We achieve substantially better perceptual quality (LPIPS) at similar training time.

Semantic field reconstruction

We train a joint RGB and LSEG 512-d semantic feature field and compare against Feature 3DGS. LSEG feature maps are projected to RGB via PCA for visualization. Below we show PCA visualizations on test views from MipNeRF360: ground-truth (GT) vs our rendered features (Ours) for four scenes and two views each. Our method faithfully reconstructs semantic structure with sharp boundaries at higher resolution and real-time rates.

Bicycle
Bicycle GT view 1 Bicycle Ours view 1 Bicycle GT view 2 Bicycle Ours view 2
Garden
Garden GT view 1 Garden Ours view 1 Garden GT view 2 Garden Ours view 2
Bonsai
Bonsai GT view 1 Bonsai Ours view 1 Bonsai GT view 2 Bonsai Ours view 2
Kitchen
Kitchen GT view 1 Kitchen Ours view 1 Kitchen GT view 2 Kitchen Ours view 2
GT Ours GT Ours

LSEG feature PCA visualizations on test views from MipNeRF360. Rows: bicycle, garden, bonsai, kitchen. Columns: ground-truth (GT) and our rendered features (Ours) for two views each.

PBR material stacks

Since NHT decouples signal dimensionality from primitive complexity, we can fit and render full PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X at 60+ fps.

Real-time rendering of PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X, fitted and rendered with NHT at 60+ fps.

BibTeX

@misc{condor2026nht,
    title={Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction},
    author={Jorge Condor and Nicolas Moenne-Loccoz and Merlin Nimier-David and Piotr Didyk and Zan Gojcic and Qi Wu},
    year={2026},
    eprint={2604.01204},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2604.01204},
}