I have a friend that does a lot of prototyping of products, starting from simple perspective sketches. Sometimes it is an issue keeping interest in the project having to do tedious 3D modeling. AI generated 3D models from sketches would reduce time spent doing time consuming work, and so I set out to look for available options, and possibly doing a study of applicability. This lead me to threestudio and the stable-zero123 model.
What follows is a guide for setting up and generating 3D meshes from images using stable-zero123 on an Nvidia A100 compute instance, and initial results.
Results and conclusion at the bottom.
→ Colab notebook ←
→ GitHub/threestudio ←
→ Background removal tool ←
→ 3dviewer.net ←
Walkthrough
Clone repository
!git clone https://github.com/threestudio-project/threestudio.git
Install dependencies
!pip install ninja
!pip install lightning==2.0.0 omegaconf==2.3.0 jaxtyping typeguard diffusers transformers accelerate opencv-python tensorboard matplotlib imageio imageio[ffmpeg] trimesh bitsandbytes sentencepiece safetensors huggingface_hub libigl xatlas networkx pysdf PyMCubes wandb torchmetrics controlnet_aux
!pip install einops kornia taming-transformers-rom1504 git+https://github.com/openai/CLIP.git # zero123
!pip install open3d plotly # mesh visualization
- Note: Instance needs to be restarted after installing a certain dependency.
Build additional dependencies
!pip install git+https://github.com/ashawkey/envlight.git
!pip install git+https://github.com/KAIR-BAIR/nerfacc.git@v0.5.2
!pip install git+https://github.com/NVlabs/nvdiffrast.git
!pip install git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch
Fetch model and sample images
%cd /content/threestudio/load/images
!wget https://example.com/example_rgba.png
%cd /content/threestudio/load/zero123/
!wget https://huggingface.co/stabilityai/stable-zero123/resolve/main/stable_zero123.ckpt
%cd /content/threestudio
Fetch custom config
%cd /content/threestudio/configs/
!wget https://example.com/run.yaml
%cd /content/threestudio
Use stable-zero123.yaml
as foundation.
I have changed data/random_camera/batch_size
from [12 6 4]
to [6 4 2]
to reduce memory usage, in effect reducing effectiveness of the model.
Training parameters can be modified at the bottom of the config. Default parameters seems to be optimal for most cases.
Define sample image path
imgpath = "./load/images/example_rgba.png"
Train model on sample image
!python launch.py --config configs/run.yaml --train --gpu 0 data.image_path="$imgpath"
Define training output save directory
# Insert training save output folder here
save_dir = "\"outputs/zero123-sai/" + "[64, 128, 256]_example_rgba.png@20240728-205147" + "/\""
- Note: use
!ls outputs/zero123-sai/
to list training sessions/save directories.
Export mesh
!python launch.py --config $save_dir/configs/parsed.yaml --export --gpu 0 data.image_path="$imgpath" resume=$save_dir/ckpts/last.ckpt system.exporter.context_type=cuda system.exporter_type=mesh-exporter system.exporter.fmt=obj
Results
Output from the model is:
- The training image.
- A 360 spin video of the model in different shadings saved for every validation.
- The OBJ file.
OBJ files can be viewed in any 3D model viewer.
Online 3D viewer: 3dviewer.net
Kayak- 128x128 - 600 training steps
–> kayak initial.obj <–
Sword - 512x512 - 600 training steps
–> sword512-600.obj <–
Sword - 512x512 - 1200 training steps
–> sword512-1200.obj <–
Sword - 128x128 - 600 training steps
–> sword128-600.obj <–
Basketball - 128x128 - 600 training steps
–> basketball128-600.obj <–
Comparing the validation videos with the exported models, it is clear that something is going haywire during the export phase.
Sword 128x128 looks promising.
Conclusion
It seems the current models are in the preliminary stages and are not yet consumer ready. The compute power required is an issue, as it is a requirement to run the system locally. With time it might be possible to run inference on mediocre consumer hardware, given an adequate model. However with the current model, it seems like the training phase is ingrained in the mesh volume generation, and so requires significant resources in relation to consumer hardware. Until further developments and refinements, 3D modeling will have to be done by mouse and keyboard.