Recent text-to-3D generation approaches produce impressive 3D results but require time-consuming optimization that can take up to an hour per prompt. Amortized methods like ATT3D optimize multiple prompts simultaneously to improve efficiency, enabling fast text-to-3D synthesis. However, ATT3D cannot capture high-frequency geometry and texture details and struggles to scale to large prompt sets, so it generalizes poorly. We introduce Latte3D, addressing these limitations to achieve fast, high-quality generation on a significantly larger prompt set.