NHT — Neural Harmonic Textures

Jorge Condor^1,2, Nicolas Moëenne-Loccoz¹, Merlin Nimier-David¹,
Piotr Didyk², Zan Gojcic¹, Qi Wu¹

¹NVIDIA
²Università della Svizzera italiana, Lugano, Switzerland

history_edu arXiv description Paper code Code

Neural Harmonic Textures for novel view synthesis. We attach learnable feature vectors (right) to the vertices of virtual bounding tetrahedra encapsulating each primitive (center). After harmonic encoding and accumulation along the ray, a small neural network decodes the resulting signal into RGB color (left).

Abstract

Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging.

We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost.

Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.

Method Overview

We present our approach using 3D Gaussian primitives in the context of novel view synthesis, following 3DGUT. Our method has three core components: particle-bound feature embedding (latent features on a virtual scaffold per primitive), harmonic texturing (periodic encoding of interpolated features before alpha blending), and neural deferred decoding (a single MLP evaluation per pixel to reconstruct color from the accumulated harmonics).

Particle-bound feature embedding

Each primitive is bounded by an ellipsoid in world space, which becomes a sphere in whitened canonical space. We define a virtual bounding tetrahedron in this canonical space and attach one \(N_f\)-dimensional feature vector \(\mathbf{f}^j \in \mathbb{R}^{N_f}\) to each of its four vertices (\(j \in \{0,1,2,3\}\)). For each ray–primitive intersection, we take the point \(\mathbf{p}^*\) where the projected Gaussian has maximum response along the ray (following 3DGUT). The feature at that location is given by barycentric interpolation of the vertex features:

\[ \mathbf{f} = \sum_{j=0}^{3} w_j \, \mathbf{f}^j, \qquad \text{with } \sum_{j=0}^{3} w_j = 1 \text{ and } w_j \geq 0, \] where \(w_j\) are the barycentric coordinates of \(\mathbf{p}^*\) in the tetrahedron.

Harmonic texturing

Rather than decoding each sample with an MLP before blending (which would require many evaluations per ray), we encode the interpolated features with periodic functions and blend the encodings along the ray. Concretely, we map \(\mathbf{f}_i\) to \(\bigl[\sin(\mathbf{f}_i); \cos(\mathbf{f}_i)\bigr]\), turning the signal into a sum of harmonic components—we call this harmonic texturing. The primitive opacity \(\alpha_i\) and transmittance \(T_i\) act as the amplitude of each harmonic. Large differences between vertex features within a primitive yield rapidly varying (high-frequency) textures; the interpolation thus acts as a frequency modulator.

(a) Latent vectors on virtual tetrahedron

(b) Interpolation at \(\mathbf{p}^*\)

Neural deferred decoding

After alpha-compositing these harmonic textures along the ray, we decode the final pixel color in a single image-space pass—neural deferred decoding. The rendering equation is

\[ \mathbf{c} = \mathrm{MLP}_\theta \left( \; \sum_{i \in \mathcal{G}} \alpha_i \, T_i \, \begin{bmatrix} \sin(\mathbf{f}_i) \\ \cos(\mathbf{f}_i) \end{bmatrix}, \;\; k \cdot \mathrm{SH}_2(\mathbf{d}) \; \right), \] where \(\mathcal{G}\) is the set of primitives intersected by the ray, \(\mathbf{f}_i\) is the interpolated feature at \(\mathbf{p}_i^*\), and \(\mathrm{SH}_2(\mathbf{d})\) encodes the ray direction for view-dependent effects.

Results

We consistently outperform prior work on standard benchmarks (MipNeRF360, Tanks and Temples, Deep Blending) and excel at high-frequency detail, specular highlights, and view-dependent effects. Below we show qualitative comparisons and per-scene videos.

Main benchmark comparison

Comparison on MipNeRF360, Tanks & Temples, and Deep Blending: neural field methods, primitive-based methods, and mixed methods. Our method uses 64 features per primitive (16 per vertex), 128×3 MLP; indoor 2M / outdoor 5M primitives. All results in MipNeRF360 use original compressed JPEG references; expect +~0.3 dB PSNR with *gsplat*'s default downscaling. Also note that Spherical Voronoi tweaks learning rates per dataset, while all other methods including ours do not.
Method	MipNeRF360			Tanks & Temples			Deep Blending
	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS	PSNR	SSIM	LPIPS
Instant NGP-Big	25.59	0.695	0.375	21.92	0.740	0.342	24.96	0.815	0.459
Mip-NeRF 360	27.60	0.788	0.275	22.22	0.754	0.290	29.40	0.899	0.306
ZipNeRF	28.55	0.829	0.218	23.64	0.836	0.179	—	—	—
2DGS	27.22	0.804	0.275	22.85	0.827	0.244	29.56	0.904	0.325
3DGS-MCMC	27.99	0.830	0.229	24.46	0.866	0.174	29.49	0.912	0.306
3DGUT-MCMC	27.82	0.826	0.233	24.20	0.861	0.180	29.87	0.913	0.309
Beta Splatting-MCMC	28.12	0.831	0.238	24.54	0.866	0.196	29.56	0.907	0.316
Spherical Voronoi	28.56	0.835	0.228	24.80	0.871	0.172	30.34	0.914	0.299
Triangle Splatting	27.00	0.808	0.231	23.05	0.843	0.191	28.92	0.891	0.308
Textured Gaussians	27.35	0.827	—	24.26	0.854	—	28.33	0.891	—
NeST	26.54	0.776	0.260	—	—	—	—	—	—
Radiance Meshes	27.15	0.810	0.274	23.13	0.851	0.200	29.39	0.901	0.362
Neural Harmonic Textures (Ours)	28.74	0.834	0.216	25.68	0.882	0.141	30.94	0.919	0.302

Novel View Synthesis (3DGUT + NHT)

Controlled comparison: same framework (gsplat, 3DGUT), varying only the appearance model—Spherical Harmonics (SH), Spherical Voronoi (SV), and ours (NHT). NHT consistently outperforms SH and SV without sacrificing real-time performance.

Video (Ours)

Spherical Voronoi Ours 3DGUT

Show Ours vs Ground Truth

Video (Ours)

Spherical Voronoi Ours 3DGUT

Show Ours vs Ground Truth

Video (Ours)

Spherical Voronoi Ours 3DGUT

Show Ours vs Ground Truth

Video (Ours)

Spherical Voronoi Ours 3DGUT

Show Ours vs Ground Truth

Radiance Field Reconstruction

Comparison on MipNeRF360 and Tanks and Temples

Compactness

Reconstruction quality vs. primitive count (1K–4M). NHT outperforms 3DGS and 3DGUT across the range, especially in the low-primitive regime; we can match 1M-primitive performance of prior work with about a third of the primitives.

Bonsai — 10K Gaussians

Side-by-side comparison on the Bonsai scene using only 10K Gaussians. Neural Harmonic Textures effectively detaches geometry from appearance, recovering significantly more detail than 3DGUT-MCMC with the same primitive budget.

3DGUT-MCMC

Neural Harmonic Textures (Ours)

Method	PSNR	SSIM	LPIPS
3DGUT-MCMC	23.97	0.724	0.548
Neural Harmonic Textures (Ours)	27.63 (+3.7)	0.856 (+0.132)	0.410 (-0.138)

Other Applications

NHT enables applications beyond radiance fields. Its ability to detach signal dimensionality from primitive complexity enables fitting high-dimensional signals at any resource budget. It is also completely agnostic to the underlying primitive type: we integrate it with 2DGS and Triangle Splatting (using bounding triangles and three features per primitive). We also demonstrate semantic field reconstruction (joint RGB + LSEG 512-d features), outperforming Feature 3DGS, 2D image fitting on high-resolution HDR RAW images with strong perceptual quality at high compression, and PBR material stack rendering with intrinsics generated using RGB2X at real-time rates.

2D image fitting

Semantic field reconstruction

We train a joint RGB and LSEG 512-d semantic feature field and compare against Feature 3DGS. LSEG feature maps are projected to RGB via PCA for visualization. Below we show PCA visualizations on test views from MipNeRF360: ground-truth (GT) vs our rendered features (Ours) for four scenes and two views each. Our method faithfully reconstructs semantic structure with sharp boundaries at higher resolution and real-time rates.

PBR material stacks

Since NHT decouples signal dimensionality from primitive complexity, we can fit and render full PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X at 60+ fps.

Real-time rendering of PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X, fitted and rendered with NHT at 60+ fps.

BibTeX

@misc{condor2026nht,
    title={Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction},
    author={Jorge Condor and Nicolas Moenne-Loccoz and Merlin Nimier-David and Piotr Didyk and Zan Gojcic and Qi Wu},
    year={2026},
    eprint={2604.01204},
    archivePrefix={arXiv},
    primaryClass={cs.CV},
    url={https://arxiv.org/abs/2604.01204},
}

Bicycle

Garden

Bonsai

Kitchen

GT	Ours	GT	Ours