Microsoft TRELLIS.2 Fast Image To 3D Model Generation

The wait is over. Microsoft has just open-sourced TRELLIS.2, an AI tool that's rewriting the rules of 3D content creation.

Imagine this workflow: You have a single image. You run it through TRELLIS.2. Sixty seconds later, you're holding a production-ready 3D model—complete with full textures, transparency channels, and game-quality precision.

This isn't vaporware or another half-baked AI experiment. TRELLIS.2 delivers real results that 3D artists and game developers can use today.

Speed That Changes Everything

Here's what caught my attention: On an H100 GPU, TRELLIS.2 generates a basic quality model in 3 seconds. Not 3 minutes. Three seconds.

The output? Complete .glb files with full PBR materials that work immediately in Blender, Unity, and Unreal Engine. No pipeline gymnastics. No post-processing marathons. Just drag, drop, and start creating.

Why This Matters for Your Workflow

Traditional 3D modeling means hours per asset. Even with photogrammetry, you're stuck with cleanup and optimization. TRELLIS.2 eliminates these bottlenecks entirely.

Whether you're:

Prototyping game environments
Building asset libraries for client projects
Visualizing concept art in 3D
Working solo without a full art team

...this tool compresses weeks of work into minutes.

What Makes TRELLIS.2 Special

The Triple Threat: Quality + Resolution + Speed

TRELLIS.2 runs on a 4B parameter DiT model (Diffusion Transformer), but the real magic isn't in the parameter count – it's in the architecture design.

They've implemented a sparse 3D VAE with 16x spatial downsampling, which compresses high-resolution 3D assets into a compact latent space. This keeps all the fine details while slashing computational costs.

The system generates resolutions from 512³ to 1536³. On an H100, you're looking at roughly 3 seconds for 512³, 17 seconds for 1024³, and about a minute for 1536³.

Handles Any Topology You Throw At It

TRELLIS.2 introduces a new representation method called O-Voxel.

This breakthrough smashes through the limitations of isosurface fields and handles complex structures that used to be nightmares:

Thin shells: Perfectly reconstructs clothes, leaves, paper – anything with thin geometry.

Non-manifold geometry: Handles mathematically complex, non-closed structures without breaking a sweat.

Internal structures: Goes beyond just the surface to capture enclosed interior structures.

What this means: the models it generates aren't just "close enough" – they're structurally sound and actually usable.

Rich Texture Modeling

Most 3D generation models give you a single color map, and when you try to render it, everything looks like cheap plastic.

Not TRELLIS.2.

It generates the complete PBR quartet:

Base Color
Roughness
Metallic
Opacity

You can drop these straight into a real rendering pipeline. That glass you generated? It's actually transparent in Unity.

Speaking of generation: The quality of your PBR materials depends heavily on your input image's detail. I used z-image.ai to create the source image for the armor example above because it handles lighting references perfectly, which TRELLIS loves.

That metal armor? It genuinely reflects light.

Minimal Processing Required

Almost zero post-processing needed. The entire workflow is streamlined to the extreme:

< 10 seconds (single-core CPU): Textured mesh → O-Voxel
< 100ms (CUDA): O-Voxel → Textured mesh

Bottom line: Take a photo → Run the model → Grab your .glb file → You're done.

How to Use It

Microsoft has deployed a Space on Hugging Face where you can try it right away.

Just open the page, upload an image of your subject (person, animal, or object), and hit generate. The entire process genuinely takes under 20 seconds. Once it's done, you can toggle through different texture views – the 3D effect is pristine. Then download the GLB and import it into your 3D software for further refinement.

Self-Hosting Requirements

If you want to deploy it yourself, you'll need a computer or server with an NVIDIA GPU packing at least 24GB of VRAM.

Download the pre-trained model TRELLIS.2-4B first.

Clone the repo and install dependencies:

git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive
cd TRELLIS.2

. ./setup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm

Image-to-3D Code Example

import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"  # Can save GPU memory
import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 1. Setup Environment Map
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 2. Load Pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# 3. Load Image & Run
image = Image.open("assets/example_image/T.png")
mesh = pipeline.run(image)[0]
mesh.simplify(16777216) # nvdiffrast limit

# 4. Render Video
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
imageio.mimsave("sample.mp4", video, fps=15)

# 5. Export to GLB
glb = o_voxel.postprocess.to_glb(
    vertices            =   mesh.vertices,
    faces               =   mesh.faces,
    attr_volume         =   mesh.attrs,
    coords              =   mesh.coords,
    attr_layout         =   mesh.layout,
    voxel_size          =   mesh.voxel_size,
    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target   =   1000000,
    texture_size        =   4096,
    remesh              =   True,
    remesh_band         =   1,
    remesh_project      =   0,
    verbose             =   True
)
glb.export("sample.glb", extension_webp=True)

After running the script, you'll get:

sample.mp4: A video visualization showing the generated 3D asset with PBR materials and environmental lighting.
sample.glb: The extracted GLB-format, PBR-ready 3D asset.

Important note: Files are exported in OPAQUE mode by default. While the alpha channel is preserved in the texture map, it's not activated initially. To enable transparency, import the asset into your 3D software and manually connect the texture's alpha channel to the material's transparency or alpha input.

Web Demo

python app.py

My Take on This

If past Image-to-3D tools felt like AI art experiments, TRELLIS.2 is legitimately approaching the realm of automated 3D asset production.

It's not just generating fuzzy shapes for you to admire – it's delivering structurally accurate, texture-rich assets with transparency channels that meet professional standards.

For anyone working in e-commerce, gaming, or VR/AR, this isn't just a tool upgrade. It's a productivity revolution.

If you've got photos lying around that you want to turn into 3D models, head over to the official project page or try the Hugging Face demo and give it a shot!

Keep creating, and I'll catch you in the next one!

— Dora