Primitive-based methods such as 3D Gaussian Splatting have recently become the state-of-the-art for novel-view synthesis and related reconstruction tasks. Compared to neural fields, these representations are more flexible, adaptive, and scale better to large scenes. However, the limited expressivity of individual primitives makes modeling high-frequency detail challenging.
We introduce Neural Harmonic Textures, a neural representation approach that anchors latent feature vectors on a virtual scaffold surrounding each primitive. These features are interpolated within the primitive at ray intersection points. Inspired by Fourier analysis, we apply periodic activations to the interpolated features, turning alpha blending into a weighted sum of harmonic components. The resulting signal is then decoded in a single deferred pass using a small neural network, significantly reducing computational cost.
Neural Harmonic Textures yield state-of-the-art results in real-time novel view synthesis while bridging the gap between primitive- and neural-field-based reconstruction. Our method integrates seamlessly into existing primitive-based pipelines such as 3DGUT, Triangle Splatting, and 2DGS. We further demonstrate its generality with applications to 2D image fitting and semantic reconstruction.
We present our approach using 3D Gaussian primitives in the context of novel view synthesis, following 3DGUT. Our method has three core components: particle-bound feature embedding (latent features on a virtual scaffold per primitive), harmonic texturing (periodic encoding of interpolated features before alpha blending), and neural deferred decoding (a single MLP evaluation per pixel to reconstruct color from the accumulated harmonics).
Neural Harmonic Textures applied to novel-view synthesis. We attach feature vectors \(\mathbf{f}_i\) to the vertices of tetrahedra inscribing the Gaussian primitives. We evaluate the point along the ray where the projected Gaussian has maximum response, barycentrically interpolate vertex features there, encode them with sine and cosine, alpha-blend along the ray, and decode the sum of harmonics with a shallow MLP in a single image-space pass.
Each primitive is bounded by an ellipsoid in world space, which becomes a sphere in whitened canonical space. We define a virtual bounding tetrahedron in this canonical space and attach one \(N_f\)-dimensional feature vector \(\mathbf{f}^j \in \mathbb{R}^{N_f}\) to each of its four vertices (\(j \in \{0,1,2,3\}\)). For each ray–primitive intersection, we take the point \(\mathbf{p}^*\) where the projected Gaussian has maximum response along the ray (following 3DGUT). The feature at that location is given by barycentric interpolation of the vertex features:
\[ \mathbf{f} = \sum_{j=0}^{3} w_j \, \mathbf{f}^j, \qquad \text{with } \sum_{j=0}^{3} w_j = 1 \text{ and } w_j \geq 0, \] where \(w_j\) are the barycentric coordinates of \(\mathbf{p}^*\) in the tetrahedron.
Rather than decoding each sample with an MLP before blending (which would require many evaluations per ray), we encode the interpolated features with periodic functions and blend the encodings along the ray. Concretely, we map \(\mathbf{f}_i\) to \(\bigl[\sin(\mathbf{f}_i); \cos(\mathbf{f}_i)\bigr]\), turning the signal into a sum of harmonic components—we call this harmonic texturing. The primitive opacity \(\alpha_i\) and transmittance \(T_i\) act as the amplitude of each harmonic. Large differences between vertex features within a primitive yield rapidly varying (high-frequency) textures; the interpolation thus acts as a frequency modulator.
(a) Latent vectors on virtual tetrahedron
(b) Interpolation at \(\mathbf{p}^*\)
(c) Harmonic decomposition
After alpha-compositing these harmonic textures along the ray, we decode the final pixel color in a single image-space pass—neural deferred decoding. The rendering equation is
\[ \mathbf{c} = \mathrm{MLP}_\theta \left( \; \sum_{i \in \mathcal{G}} \alpha_i \, T_i \, \begin{bmatrix} \sin(\mathbf{f}_i) \\ \cos(\mathbf{f}_i) \end{bmatrix}, \;\; k \cdot \mathrm{SH}_2(\mathbf{d}) \; \right), \] where \(\mathcal{G}\) is the set of primitives intersected by the ray, \(\mathbf{f}_i\) is the interpolated feature at \(\mathbf{p}_i^*\), and \(\mathrm{SH}_2(\mathbf{d})\) encodes the ray direction for view-dependent effects.
We consistently outperform prior work on standard benchmarks (MipNeRF360, Tanks and Temples, Deep Blending) and excel at high-frequency detail, specular highlights, and view-dependent effects. Below we show qualitative comparisons and per-scene videos.
| Method | MipNeRF360 | Tanks & Temples | Deep Blending | ||||||
|---|---|---|---|---|---|---|---|---|---|
| PSNR | SSIM | LPIPS | PSNR | SSIM | LPIPS | PSNR | SSIM | LPIPS | |
| Instant NGP-Big | 25.59 | 0.695 | 0.375 | 21.92 | 0.740 | 0.342 | 24.96 | 0.815 | 0.459 |
| Mip-NeRF 360 | 27.60 | 0.788 | 0.275 | 22.22 | 0.754 | 0.290 | 29.40 | 0.899 | 0.306 |
| ZipNeRF | 28.55 | 0.829 | 0.218 | 23.64 | 0.836 | 0.179 | — | — | — |
| 2DGS | 27.22 | 0.804 | 0.275 | 22.85 | 0.827 | 0.244 | 29.56 | 0.904 | 0.325 |
| 3DGS-MCMC | 27.99 | 0.830 | 0.229 | 24.46 | 0.866 | 0.174 | 29.49 | 0.912 | 0.306 |
| 3DGUT-MCMC | 27.82 | 0.826 | 0.233 | 24.20 | 0.861 | 0.180 | 29.87 | 0.913 | 0.309 |
| Beta Splatting-MCMC | 28.12 | 0.831 | 0.238 | 24.54 | 0.866 | 0.196 | 29.56 | 0.907 | 0.316 |
| Spherical Voronoi | 28.56 | 0.835 | 0.228 | 24.80 | 0.871 | 0.172 | 30.34 | 0.914 | 0.299 |
| Triangle Splatting | 27.00 | 0.808 | 0.231 | 23.05 | 0.843 | 0.191 | 28.92 | 0.891 | 0.308 |
| Textured Gaussians | 27.35 | 0.827 | — | 24.26 | 0.854 | — | 28.33 | 0.891 | — |
| NeST | 26.54 | 0.776 | 0.260 | — | — | — | — | — | — |
| Radiance Meshes | 27.15 | 0.810 | 0.274 | 23.13 | 0.851 | 0.200 | 29.39 | 0.901 | 0.362 |
| Neural Harmonic Textures (Ours) | 28.74 | 0.834 | 0.216 | 25.68 | 0.882 | 0.141 | 30.94 | 0.919 | 0.302 |
Controlled comparison: same framework (gsplat, 3DGUT), varying only the appearance model—Spherical Harmonics (SH), Spherical Voronoi (SV), and ours (NHT). NHT consistently outperforms SH and SV without sacrificing real-time performance.












Comparison with previous works on MipNeRF360 and Tanks and Temples. Our method models high-frequency detail and view-dependent effects more faithfully.
Reconstruction quality vs. primitive count (1K–4M). NHT outperforms 3DGS and 3DGUT across the range, especially in the low-primitive regime; we can match 1M-primitive performance of prior work with about a third of the primitives.
Side-by-side comparison on the Bonsai scene using only 10K Gaussians. Neural Harmonic Textures effectively detaches geometry from appearance, recovering significantly more detail than 3DGUT-MCMC with the same primitive budget.
| Method | PSNR | SSIM | LPIPS |
|---|---|---|---|
| 3DGUT-MCMC | 23.97 | 0.724 | 0.548 |
| Neural Harmonic Textures (Ours) | 27.63 (+3.7) | 0.856 (+0.132) | 0.410 (-0.138) |
NHT enables applications beyond radiance fields. Its ability to detach signal dimensionality from primitive complexity enables fitting high-dimensional signals at any resource budget. It is also completely agnostic to the underlying primitive type: we integrate it with 2DGS and Triangle Splatting (using bounding triangles and three features per primitive). We also demonstrate semantic field reconstruction (joint RGB + LSEG 512-d features), outperforming Feature 3DGS, 2D image fitting on high-resolution HDR RAW images with strong perceptual quality at high compression, and PBR material stack rendering with intrinsics generated using RGB2X at real-time rates.
2D image fitting: NHT vs Instant NGP at 100× compression (45.7MP 14-bit HDR RAW). We achieve substantially better perceptual quality (LPIPS) at similar training time.
We train a joint RGB and LSEG 512-d semantic feature field and compare against Feature 3DGS. LSEG feature maps are projected to RGB via PCA for visualization. Below we show PCA visualizations on test views from MipNeRF360: ground-truth (GT) vs our rendered features (Ours) for four scenes and two views each. Our method faithfully reconstructs semantic structure with sharp boundaries at higher resolution and real-time rates.
| Bicycle | |||
![]() |
![]() |
![]() |
![]() |
| Garden | |||
![]() |
![]() |
![]() |
![]() |
| Bonsai | |||
![]() |
![]() |
![]() |
![]() |
| Kitchen | |||
![]() |
![]() |
![]() |
![]() |
| GT | Ours | GT | Ours |
LSEG feature PCA visualizations on test views from MipNeRF360. Rows: bicycle, garden, bonsai, kitchen. Columns: ground-truth (GT) and our rendered features (Ours) for two views each.
Since NHT decouples signal dimensionality from primitive complexity, we can fit and render full PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X at 60+ fps.
Real-time rendering of PBR material stacks (albedo, normals, roughness, metallic) extracted from RGB2X, fitted and rendered with NHT at 60+ fps.
@misc{condor2026nht,
title={Neural Harmonic Textures for High-Quality Primitive Based Neural Reconstruction},
author={Jorge Condor and Nicolas Moenne-Loccoz and Merlin Nimier-David and Piotr Didyk and Zan Gojcic and Qi Wu},
year={2026},
eprint={2604.01204},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2604.01204},
}