As of this writing in mid February 2023, social media has been a buzz about generative AI based tools. The conversation is very polarizing, people are either gushing utopicaly of its blessings or on the flip-side are in abject terror of AI and its soon evil take over of the world. Over the past few months I've had the chance to explore Open AI's ChatGPT, StableFusion, Leonardo.ai and MidJourney. Like most people commenting on social media, I was completely blown away by the power and potential of these tools.
Great new technological advances disrupt, change, and reformat society. AI is doing this right now. Long standing paradigms of how we create, learn and work are morphing. Students heading into collage and university need to seriously (re)consider what vocations they wish to pursue. Working people across all domains and levels will need to adapt, retool skill their sets or completely re-invent themselves in order to future-proof their careers.
Grand, existential arguments aside, let's focus on the current state of AI tools in the context of 3D form model generation. Current AI 3D model generators are not as advanced and polished when compared to 2D image, text or sound generators. This is obviously due to the inherent complexity of 3D modeling. A highly advanced and efficient algorithm is needed to translate input data (prompts) into acceptable 3D form geometry, physicality, color and texture. The tech is in its early stages but is evolving rapidly.
The rest of this post will feature several emerging AI powered 3D form generators. Full disclaimer: I have no deep experience with any of these tools, I am a layman merely providing a quick survey of what is currently in the market and will attempt to give a basic description of what these tools do and how they do it. Please reach out to me if I missed anything or made an error - let's go!
The first tool in our survey is Sloyd, which is a 3D asset generator for video game creators. Sloyd accelerates the game making process by offering a huge catalog of archetypal objects called generators and via a cool interface allows you to fine tune a slew of variables to completely customize your object.
Instead of creating 3D game assets from scratch, creators can leverage this tool and quickly export as many assets as needed to build out their game environments. Props, weapons, collectibles, and architectural assets are currently only available. Sloyd has basically created a custom dataset of parametric off the shelf objects which combined with their custom ML algorithm, allows users to create custom assets, in real-time with game ready results.
Next up is Google's DreamFusion which is a text to 3D model using 2D diffusion. Wow, that's a mouthful - so how does it work? DreamFusion is a technology that can create 3D scenes based on written captions or text prompts.
The first step is for Imagen (the diffusion model) to serve up 2D images from user text prompts. A low-resolution version of the model is used to allow for super fast processing. These images are 2D renderings of the target text from different camera positions and angles. Score Distillation Sampling (SDS) is used to score the accuracy of these 2D images based on the training data that has been fed into the model. Using this function, only images with a passing score are added to the bucket of 2D images to be used to create the 3D model.
Next, NeRF (Neural Radiance Fields) translates our collection of 2D images into quasi-3D renderings (3D image views active in a limited range along the X,Y, Z axes). NeRF is in of itself a machine learning model - it can leverage our collection of 2D images and can predict or render what all other camera angles and positions will look like. This is done at the pixel level, then stitched together to create a 3D image in space. The NeRF model calculates color, lighting, shading and surface normals (data that allows for accurate surfacing/representation of a 3D object). After this insanely complex process, you have a full color, shaded, relightable model that can be viewed from any angle in 3D virtual space.
Finally, the above generated NeRF models can be exported to meshes - usable 3D geometry - using the marching cubes algorithm for easy integration into 3D renderers or CAD modeling software. A user could import the file into CAD, manipulate the 3D model and of course send to a 3D printer.
For the ability to automatically convert text to actual 3D model (not virtual) assets leveraging DreamFusion's methodologies, check out this open source project on Github.
3DFY is an interesting company based in Israel - they've created a technology that allows 3D model creation based on a sampling of 2D images. Their proprietary process comprises of 4 components: input module, data engine, computational pipeline and validation.
The input module accepts any type of image - complexity or quality are not factors in its ability to process images. Next, the data engine compares new images with its training data made up of synthetic, high quality 3D models. Then leveraging a native cloud infrastructure, it quickly and economically generates 3D assets.
The pipeline consists of several deep learning (DL) models with custom training methods and infrastructure that optimizes for training time and cost. Finally the validation module sharpens these DL models and aligns client specifications with final output via a series of iterative re-training steps and optimization.
Applications for this technology are far reaching. In the consumer retail space, furniture brands can for example showcase their offerings in a customers home via AR/VR. In gaming, 2D image artwork can quickly be converted to 3D assets such as peripheral and environmental objects (a table in a room, etc.)
Systems based visual perception in robots or self-driving cars need massive amounts of 3D data in their models to learn how to navigate our complex world. This data is usually made up of conventional video or photography and requires much effort and cost. The use of synthetic or computationally created assets can make training these models more efficient and cost effective.
I'm keeping an eye on this tech. As a product designer this could dramatically reduce the time it takes to design physical products. For example, I could design a new chair, toy or lighting fixture and with only 2D orthographic images I can use 3DFY to output a CAD model to refine, optimize and even print in 3D if desired.
OpenAI is now practically a household name since it introduced ChatGPT, the AI based chatbot that is now doing everyone's work! (I've used it to fine tune this article and to check my HTML in coding this blog post). Alongside ChatGPT, there is the very popular and powerful 2D graphics generator, DALL-E and for 3D generation, they have Point-E.
Like the other tools surveyed, Point-E is a 3D model generator that can create an object from text input. However, unlike in competitive models, Point-E generates 3D objects made up of discrete, color assigned, groupings of data points in space called a point cloud. Let's take a look in how this is achieved.
The premise behind Point-E is that it pairs a text-to-image model with an image-to-3D model. First, Point-E creates 2D images of the targeted text prompts using a diffusion model called GLIDE. Glide is a specific diffusion framework that is fine-tuned or trained on 3D image renderings. The created 2D images are called rendered synthetic views and are solely of the object in multiple camera angles and positions, they are basically images of 3D renderings (confused yet?).
Next, a point cloud diffusion model is used to translate the synthetic views of the object into a full color, RGB point cloud 3D model that can be manipulated in virtual space. These 3D objects (eventually) can then be used in video games, metaverse apps, or in post-production for movies. To make these models usable CAD programs, Blender an open-source 3D creation software, is used to convert the point cloud data into a 3D mesh.
Point-E's two step process is of course not perfect but according to my research, it is claimed to be 600 or more times faster than DreamFusion and requires significantly less GPU power. Here is more information and access to open source documentation.
It's a very exciting time to be a content creator. Whether you're a writer, artist, designer, musician, programmer, AI has exponentially super-charged our creativity and potential in unbelievable new ways. As I move forward in my journey, I'm looking forward to see how I can leverage AI in my creative work with Artefacture -stay tuned to find out!
Thank you, until next time! - Andreu O.