Toronto AI Lab NVIDIA Research
LLaMA-Mesh

LLaMA-Mesh:
Unifying 3D Mesh Generation with Language Models

1 Tsinghua University   2 NVIDIA
* Work done while interning at NVIDIA
We enable large language models (LLMs) to understand and generate 3D meshes by representing them as text and fine-tuning. This unifies the 3D and text modalities in a single model and preserves language abilities, unlocking conversational 3D creation with mesh understanding.


News

event [Nov 27, 2024] Blender addon released courtesy of Dylan Ebert.
event [Nov 17, 2024] Model weights and online demo released.
event [Nov 14, 2024] Project page released.
event [Nov 14, 2024] Paper released on arXiv!

Mesh Generation

Can you give me some ideas for storage furniture?
Yes, definitely. You can consider using cabinets, shelves, or chests to store items like books, linens, or clothing. They are space-efficient, keep your floor clear, and provide good storage capacity. You can also use cabinets to display decorative items or books in a visually appealing way. Here is a 3D model of a cabinet that you can use as a reference
I am trying to create a 3D model for a game of ancient time. Can you help me to create a 3D model of a sword?
I'd be happy to help! What kind of sword are you envisioning? Is it a simple design or something more elaborate?
A simple design is okay for start.
Let's start with a basic design. A sword typically has a long, straight or slightly curved blade with a sharp edge. We can add some details like a handle or a guard to give it a more authentic look. I can generate a 3D model for you.
I am decorating my garden and I need to add some seating. What kind of benches are the most popular?
Bench styles vary, but a simple bench is always a great choice. It's easy to fit into any garden design and can be made from a variety of materials, like wood or metal. Let me show you an example
Create a 3D obj file using the following description: a lamp
Can you show me a 3D model of the desk?
Here is a 3D model of a desk with a computer and a mouse on it.
what's the height of a desk like this in general?
The height of a desk can vary depending on the design and purpose, but a typical desk height is around 29-30 inches (74-76 cm) from the floor to the top of the surface. This allows users to sit with their feet flat on the floor and their elbows at a 90-degree angle while typing. However, some desks, like those designed for standing or ergonomic use, may be taller or shorter. Does that help?
Create a 3D object file using the following description: a soccer ball
tell me some fun fact about it
Soccer balls are made from a combination of leather, polyurethane, and rubber. They are designed to be aerodynamic and have a consistent bounce, making them perfect for playing soccer. Did you know that the first soccer ball was made from inflated pig bladders in the 1800s?
oh really
Yes, it's true! The pig bladders were inflated with air and then wrapped in leather to create the first soccer balls. It wasn't until later that the modern materials like polyurethane and rubber were used to make soccer balls. And now, soccer balls come in all sorts of colors and designs, making them a fun and recognizable part of the game.

Mesh Understanding

What is this shape
This is a wooden barrel. Barrels have been used for centuries to store and transport goods, from wine and oil to grains and other commodities. They're often made of wood, like this one, and come in various shapes and sizes
What is this shape
A 3D model of a metal trash can.
What is the function of a trash can?
The primary function of a trash can is to collect and store waste, keeping it separate from other items and preventing litter. Trash cans also help to maintain hygiene and cleanliness by containing waste until it is disposed of properly. They come in various shapes, sizes, and materials, but their purpose remains the same.
What is this shape
A 3D model of a pixelated cartoon character wearing a red shirt and black hat.
How is this character created?
This character is created using pixel art, a digital art form that uses small, square pixels to create images. The pixelated style gives the character a retro or nostalgic feel. The red shirt and black hat are just a few of the many design elements that make up the character's appearance.
What is this shape
3D model of a wooden ladder.
Why do people make ladders out of wood?
People make ladders out of wood because it is a durable and versatile material that can be easily shaped and crafted into the desired form. Wood is also relatively inexpensive compared to other materials, making it a cost-effective choice for ladder construction. Additionally, wood ladders can be made to be aesthetically pleasing, which is why they are often used in decorative or architectural applications.
What is this shape
3D model of an axe with a wooden handle.

Abstract

This work explores expanding the capabilities of large language models (LLMs) pretrained on text to generate 3D meshes within a unified model. This offers key advantages of (1) leveraging spatial knowledge already embedded in LLMs, derived from textual sources like 3D tutorials, and (2) enabling conversational 3D generation and mesh understanding. A primary challenge is effectively tokenizing 3D mesh data into discrete tokens that LLMs can process seamlessly. To address this, we introduce LLaMA-Mesh, a novel approach that represents the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. We construct a supervised fine-tuning (SFT) dataset enabling pretrained LLMs to (1) generate 3D meshes from text prompts, (2) produce interleaved text and 3D mesh outputs as required, and (3) understand and interpret 3D meshes. Our work is the first to demonstrate that LLMs can be fine-tuned to acquire complex spatial knowledge for 3D mesh generation in a text-based format, effectively unifying the 3D and text modalities. LLaMA-Mesh achieves mesh generation quality on par with models trained from scratch while maintaining strong text generation performance.


Method


An overview of our method. LLaMA-Mesh unifies text and 3D meshes in a uniform format by representing the numerical values of vertex coordinates and face definitions of a 3D mesh as plain text. Our model is trained using text and 3D interleaved data in an end-to-end manner. Therefore, our model can generate both text and 3D meshes in a unified model.


Mesh Collection

The following meshes are generated with text prompt “Create a 3D obj file using the following description: {caption}”.

3D model of a workbench

Low poly chess piece

Polygonal windmill

3D rendering of a model chair

Model of a small barrel

3D rendering of a tower

Low poly model of a hammer

TV bench

TV

bunk

chair

fence

horse

knife

soda

toy car

tree

Animation of Mesh Generation Process

The following videos show an animation of the mesh generation process. We generate all vertices, and then all faces sequentially.


Mesh Representation and Quantization

An illustration of our 3D representation approach. We represent the vertex coordinates and face definitions of 3D meshes as plain text, allowing direct integration with LLMs without expanding the vocabulary. Left: A snippet of an OBJ file represented as plain text containing vertex (v) and face (f) definitions. Right: The 3D object rendered from the OBJ file.


Mesh Type # Tokens # Vertices # Faces
With Quantization 1280 64 96

v 22 44 22
      : Tokens

v 40 44 22
v 22 63 22
v 40 63 22
v 22 17 26
v 40 17 26
v 22 44 26
v 40 44 26
v 22 0 27
v 30 0 27
v 32 0 27
v 40 0 27
v 22 17 27
v 30 17 27
v 32 17 27
v 40 17 27
v 15 21 29
v 22 21 29
v 40 21 29
v 47 21 29
v 15 42 29
v 22 42 29
v 40 42 29
v 47 42 29
v 20 50 30
v 23 50 30
v 39 50 30
v 42 50 30
v 20 54 30
v 23 54 30
v 39 54 30
v 42 54 30
v 20 50 33
v 23 50 33
v 39 50 33
v 42 50 33
v 20 54 33
v 23 54 33
v 39 54 33
v 42 54 33
v 22 0 35
v 30 0 35
v 32 0 35
v 40 0 35
v 22 17 35
v 30 17 35
v 32 17 35
v 40 17 35
v 22 17 36
v 40 17 36
v 15 21 36
v 22 21 36
v 40 21 36
v 47 21 36
v 15 42 36
v 22 42 36
v 40 42 36
v 47 42 36
v 22 44 36
v 40 44 36
v 22 44 40
v 40 44 40
v 22 63 40
v 40 63 40
f 1 3 2
f 1 2 62
f 1 63 3
f 1 62 61
f 1 61 63
f 2 3 4
f 2 4 62
f 3 63 4
f 4 64 62
f 4 63 64
f 5 7 6
f 5 6 50
f 5 59 7
f 5 50 49
f 5 49 59
f 6 7 8
f 6 8 50
f 7 59 8
f 8 60 50
f 8 59 60
f 9 13 10
f 9 10 42
f 9 45 13
f 9 42 41
f 9 41 45
f 10 13 14
f 10 14 42
f 11 15 12
f 11 12 44
f 11 47 15
f 11 44 43
f 11 43 47
f 12 15 16
f 12 16 44
f 13 45 14
f 14 46 42
f 14 45 46
f 15 47 16
f 16 48 44
f 16 47 48
f 17 21 18
f 17 18 52
f 17 55 21
f 17 52 51
f 17 51 55
f 18 21 22
f 18 22 52
f 19 23 20
f 19 20 54
f 19 57 23
f 19 54 53
f 19 53 57
f 20 23 24
f 20 24 54
f 21 55 22
f 22 56 52
f 22 55 56
f 23 57 24
f 24 58 54
f 24 57 58
f 25 29 26
f 25 26 34
f 25 37 29
f 25 34 33
f 25 33 37
f 26 29 30
f 26 30 34
f 27 31 28
f 27 28 36
f 27 39 31
f 27 36 35
f 27 35 39
f 28 31 32
f 28 32 36
f 29 37 30
f 30 38 34
f 30 37 38
f 31 39 32
f 32 40 36
f 32 39 40
f 33 34 38
f 33 38 37
f 35 36 40
f 35 40 39
f 41 42 46
f 41 46 45
f 43 44 48
f 43 48 47
f 49 50 60
f 49 60 59
f 51 52 56
f 51 56 55
f 53 54 58
f 53 58 57
f 61 62 64
f 61 64 63


Note that 3D mesh coordinates are typically stored as floating-point numbers, leading to long token sequences. Above: To address this, we quantize the vertex coordinates into a fixed number of bins (64 here). Below: The original OBJ file represents vertex coordinates in decimal values, splitting a single coordinate into several language tokens.


Mesh Type # Tokens # Vertices # Faces
Without Quantization 4607 64 96

v 0.50000000 1.62500000 0.50000000
      : Tokens

v -0.50000000 0.62500000 0.50000000
v 0.50000000 0.62500000 0.50000000
v -0.50000000 1.62500000 0.50000000
v -0.50000000 1.62500000 -0.50000000
v 0.50000000 0.62500000 -0.50000000
v -0.50000000 0.62500000 -0.50000000
v 0.50000000 1.62500000 -0.50000000
v 0.50000000 1.62500000 -0.50000000
v 0.50000000 0.62500000 0.50000000
v 0.50000000 0.62500000 -0.50000000
v 0.50000000 1.62500000 0.50000000
v 0.50000000 0.62500000 0.50000000
v -0.50000000 0.62500000 -0.50000000
v 0.50000000 0.62500000 -0.50000000
v -0.50000000 0.62500000 0.50000000
v -0.50000000 1.62500000 0.50000000
v -0.50000000 0.62500000 -0.50000000
v -0.50000000 0.62500000 0.50000000
v -0.50000000 1.62500000 -0.50000000
v 0.50000000 1.62500000 -0.50000000
v -0.50000000 1.62500000 0.50000000
v 0.50000000 1.62500000 0.50000000
v -0.50000000 1.62500000 -0.50000000
v 0.56250000 1.18750000 0.12122800
v 0.43750000 0.93750000 0.12122800
v 0.56250000 0.93750000 0.12122800
v 0.43750000 1.18750000 0.12122800
v 0.43750000 1.18750000 -0.06627199
v 0.56250000 0.93750000 -0.06627199
v 0.43750000 0.93750000 -0.06627199
v 0.56250000 1.18750000 -0.06627199
v 0.56250000 1.18750000 -0.06627199
v 0.56250000 0.93750000 0.12122800
v 0.56250000 0.93750000 -0.06627199
v 0.56250000 1.18750000 0.12122800
v 0.56250000 0.93750000 0.12122800
v 0.43750000 0.93750000 -0.06627199
v 0.56250000 0.93750000 -0.06627199
v 0.43750000 0.93750000 0.12122800
v 0.43750000 1.18750000 0.12122800
v 0.43750000 0.93750000 -0.06627199
v 0.43750000 0.93750000 0.12122800
v 0.43750000 1.18750000 -0.06627199
v 0.56250000 1.18750000 -0.06627199
v 0.43750000 1.18750000 0.12122800
v 0.56250000 1.18750000 0.12122800
v 0.43750000 1.18750000 -0.06627199
v -0.43750000 1.18750000 0.12122800
v -0.56250000 0.93750000 0.12122800
v -0.43750000 0.93750000 0.12122800
v -0.56250000 1.18750000 0.12122800
v -0.56250000 1.18750000 -0.06627199
v -0.43750000 0.93750000 -0.06627199
v -0.56250000 0.93750000 -0.06627199
v -0.43750000 1.18750000 -0.06627199
v -0.43750000 1.18750000 -0.06627199
v -0.43750000 0.93750000 0.12122800
v -0.43750000 0.93750000 -0.06627199
v -0.43750000 1.18750000 0.12122800
v -0.43750000 0.93750000 0.12122800
v -0.56250000 0.93750000 -0.06627199
v -0.43750000 0.93750000 -0.06627199
v -0.56250000 0.93750000 0.12122800
v -0.56250000 1.18750000 0.12122800
v -0.56250000 0.93750000 -0.06627199
v -0.56250000 0.93750000 0.12122800
v -0.56250000 1.18750000 -0.06627199
v -0.43750000 1.18750000 -0.06627199
v -0.56250000 1.18750000 0.12122800
v -0.43750000 1.18750000 0.12122800
v -0.56250000 1.18750000 -0.06627199
v 0.50000000 0.65625000 0.28125000
v -0.50000000 -0.78125000 0.28125000
v 0.50000000 -0.78125000 0.28125000
v -0.50000000 0.65625000 0.28125000
v -0.50000000 0.65625000 -0.28125000
v 0.50000000 -0.78125000 -0.28125000
v -0.50000000 -0.78125000 -0.28125000
v 0.50000000 0.65625000 -0.28125000
v 0.50000000 0.65625000 -0.28125000
v 0.50000000 -0.78125000 0.28125000
v 0.50000000 -0.78125000 -0.28125000
v 0.50000000 0.65625000 0.28125000
v 0.50000000 -0.78125000 0.28125000
v -0.50000000 -0.78125000 -0.28125000
v 0.50000000 -0.78125000 -0.28125000
v -0.50000000 -0.78125000 0.28125000
v -0.50000000 0.65625000 0.28125000
v -0.50000000 -0.78125000 -0.28125000
v -0.50000000 -0.78125000 0.28125000
v -0.50000000 0.65625000 -0.28125000
v 0.50000000 0.65625000 -0.28125000
v -0.50000000 0.65625000 0.28125000
v 0.50000000 0.65625000 0.28125000
v -0.50000000 0.65625000 -0.28125000
v 0.87500000 0.56250000 0.25000000
v 0.50000000 -0.56250000 0.25000000
v 0.87500000 -0.56250000 0.25000000
v 0.50000000 0.56250000 0.25000000
v 0.50000000 0.56250000 -0.12500000
v 0.87500000 -0.56250000 -0.12500000
v 0.50000000 -0.56250000 -0.12500000
v 0.87500000 0.56250000 -0.12500000
v 0.87500000 0.56250000 -0.12500000
v 0.87500000 -0.56250000 0.25000000
v 0.87500000 -0.56250000 -0.12500000
v 0.87500000 0.56250000 0.25000000
v 0.87500000 -0.56250000 0.25000000
v 0.50000000 -0.56250000 -0.12500000
v 0.87500000 -0.56250000 -0.12500000
v 0.50000000 -0.56250000 0.25000000
v 0.50000000 0.56250000 0.25000000
v 0.50000000 -0.56250000 -0.12500000
v 0.50000000 -0.56250000 0.25000000
v 0.50000000 0.56250000 -0.12500000
v 0.87500000 0.56250000 -0.12500000
v 0.50000000 0.56250000 0.25000000
v 0.87500000 0.56250000 0.25000000
v 0.50000000 0.56250000 -0.12500000
v -0.50000000 0.56250000 0.25000000
v -0.87500000 -0.56250000 0.25000000
v -0.50000000 -0.56250000 0.25000000
v -0.87500000 0.56250000 0.25000000
v -0.87500000 0.56250000 -0.12500000
v -0.50000000 -0.56250000 -0.12500000
v -0.87500000 -0.56250000 -0.12500000
v -0.50000000 0.56250000 -0.12500000
v -0.50000000 0.56250000 -0.12500000
v -0.50000000 -0.56250000 0.25000000
v -0.50000000 -0.56250000 -0.12500000
v -0.50000000 0.56250000 0.25000000
v -0.50000000 -0.56250000 0.25000000
v -0.87500000 -0.56250000 -0.12500000
v -0.50000000 -0.56250000 -0.12500000
v -0.87500000 -0.56250000 0.25000000
v -0.87500000 0.56250000 0.25000000
v -0.87500000 -0.56250000 -0.12500000
v -0.87500000 -0.56250000 0.25000000
v -0.87500000 0.56250000 -0.12500000
v -0.50000000 0.56250000 -0.12500000
v -0.87500000 0.56250000 0.25000000
v -0.50000000 0.56250000 0.25000000
v -0.87500000 0.56250000 -0.12500000
v -0.03125000 -0.78125000 0.21875000
v -0.46875000 -1.71875000 0.21875000
v -0.03125000 -1.71875000 0.21875000
v -0.46875000 -0.78125000 0.21875000
v -0.46875000 -0.78125000 -0.21875000
v -0.03125000 -1.71875000 -0.21875000
v -0.46875000 -1.71875000 -0.21875000
v -0.03125000 -0.78125000 -0.21875000
v -0.03125000 -0.78125000 -0.21875000
v -0.03125000 -1.71875000 0.21875000
v -0.03125000 -1.71875000 -0.21875000
v -0.03125000 -0.78125000 0.21875000
v -0.03125000 -1.71875000 0.21875000
v -0.46875000 -1.71875000 -0.21875000
v -0.03125000 -1.71875000 -0.21875000
v -0.46875000 -1.71875000 0.21875000
v -0.46875000 -0.78125000 0.21875000
v -0.46875000 -1.71875000 -0.21875000
v -0.46875000 -1.71875000 0.21875000
v -0.46875000 -0.78125000 -0.21875000
v -0.03125000 -0.78125000 -0.21875000
v -0.46875000 -0.78125000 0.21875000
v -0.03125000 -0.78125000 0.21875000
v -0.46875000 -0.78125000 -0.21875000
v 0.46875000 -0.78125000 0.21875000
v 0.03125000 -1.71875000 0.21875000
v 0.46875000 -1.71875000 0.21875000
v 0.03125000 -0.78125000 0.21875000
v 0.03125000 -0.78125000 -0.21875000
v 0.46875000 -1.71875000 -0.21875000
v 0.03125000 -1.71875000 -0.21875000
v 0.46875000 -0.78125000 -0.21875000
v 0.46875000 -0.78125000 -0.21875000
v 0.46875000 -1.71875000 0.21875000
v 0.46875000 -1.71875000 -0.21875000
v 0.46875000 -0.78125000 0.21875000
v 0.46875000 -1.71875000 0.21875000
v 0.03125000 -1.71875000 -0.21875000
v 0.46875000 -1.71875000 -0.21875000
v 0.03125000 -1.71875000 0.21875000
v 0.03125000 -0.78125000 0.21875000
v 0.03125000 -1.71875000 -0.21875000
v 0.03125000 -1.71875000 0.21875000
v 0.03125000 -0.78125000 -0.21875000
v 0.46875000 -0.78125000 -0.21875000
v 0.03125000 -0.78125000 0.21875000
v 0.46875000 -0.78125000 0.21875000
v 0.03125000 -0.78125000 -0.21875000
f 1 2 3
f 1 4 2
f 5 6 7
f 5 8 6
f 9 10 11
f 9 12 10
f 13 14 15
f 13 16 14
f 17 18 19
f 17 20 18
f 21 22 23
f 21 24 22
f 25 26 27
f 25 28 26
f 29 30 31
f 29 32 30
f 33 34 35
f 33 36 34
f 37 38 39
f 37 40 38
f 41 42 43
f 41 44 42
f 45 46 47
f 45 48 46
f 49 50 51
f 49 52 50
f 53 54 55
f 53 56 54
f 57 58 59
f 57 60 58
f 61 62 63
f 61 64 62
f 65 66 67
f 65 68 66
f 69 70 71
f 69 72 70
f 73 74 75
f 73 76 74
f 77 78 79
f 77 80 78
f 81 82 83
f 81 84 82
f 85 86 87
f 85 88 86
f 89 90 91
f 89 92 90
f 93 94 95
f 93 96 94
f 97 98 99
f 97 100 98
f 101 102 103
f 101 104 102
f 105 106 107
f 105 108 106
f 109 110 111
f 109 112 110
f 113 114 115
f 113 116 114
f 117 118 119
f 117 120 118
f 121 122 123
f 121 124 122
f 125 126 127
f 125 128 126
f 129 130 131
f 129 132 130
f 133 134 135
f 133 136 134
f 137 138 139
f 137 140 138
f 141 142 143
f 141 144 142
f 145 146 147
f 145 148 146
f 149 150 151
f 149 152 150
f 153 154 155
f 153 156 154
f 157 158 159
f 157 160 158
f 161 162 163
f 161 164 162
f 165 166 167
f 165 168 166
f 169 170 171
f 169 172 170
f 173 174 175
f 173 176 174
f 177 178 179
f 177 180 178
f 181 182 183
f 181 184 182
f 185 186 187
f 185 188 186
f 189 190 191
f 189 192 190

Paper


LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models

Zhengyi Wang, Jonathan Lorraine, Yikai Wang, Hang Su, Jun Zhu, Sanja Fidler, Xiaohui Zeng

description ArXiv
insert_comment BibTeX
integration_instructions Code
integration_instructions Model Weights
integration_instructions Online Demo
integration_instructions Blender Addon

Citation


Wang, Z., Lorraine, J., Wang, Y., Su, H., Zhu, J., Fidler, S., & Zeng, X. (2024). LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models. arXiv preprint arXiv:2411.09595.

@misc{wang2024llamameshunifying3dmesh,
        title={LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models}, 
        author={Zhengyi Wang and Jonathan Lorraine and Yikai Wang and Hang Su and Jun Zhu and Sanja Fidler and Xiaohui Zeng},
        year={2024},
        eprint={2411.09595},
        archivePrefix={arXiv},
        primaryClass={cs.LG},
        url={https://arxiv.org/abs/2411.09595}, 
    }