Meet 3D-GPT: An Artificial Intelligence Framework for Instruction-Driven 3D Modelling that Makes Use of Large Language Models (LLMs)

Using meticulously detailed models, 3D content production in the metaverse age revolutionizes multimedia experiences in gaming, virtual reality, and film industries. However, designers often require assistance with the time-consuming 3D modeling process. They start with basic shapes (such as cubes, spheres, or cylinders) and utilize tools like Blender for precise contouring, detailing, and texturing. Rendering and post-processing complete this labor-intensive production and deliver the final polished model.

Procedural generation, with changeable parameters and rule-based systems, proves effective in automating content development. However, it requires a deep understanding of generation rules, algorithmic frameworks, and individual parameters. Furthermore, coordinating these procedures with customers’ creative aspirations through efficient communication adds complexity. This highlights the need to streamline the traditional 3D modeling approach to empower creators in the metaverse age.

Language and Learning Models (LLMs) demonstrate exceptional skills in planning, tool use, language understanding, and characterizing object qualities. Researchers from Australian National University, the University of Oxford, and Beijing Academy of Artificial Intelligence introduce 3D-GPT, a framework designed to facilitate instruction-driven 3D content synthesis. This framework divides the 3D modeling process into smaller segments, allowing LLMs to act as problem-solving agents.

3D-GPT consists of three main agents: the conceptualization agent, the 3D modeling agent, and the job dispatch agent. The first two agents work together to fulfill the responsibilities of 3D conceptualization and 3D modeling by adjusting the 3D generating functions. The third agent controls the system by accepting text inputs, managing commands, and facilitating communication between the other two agents.

By using users’ written descriptions as a guide, 3D-GPT provides accurate and customizable 3D creation. It enhances initial scene descriptions and modifies textual inputs based on further directions. In complex scenarios, manually specifying each controllable parameter in procedural creation reduces effort. Additionally, 3D-GPT integrates smoothly with Blender, giving users access to various manipulation tools.

The researchers present three key contributions: the introduction of 3D-GPT as a framework for 3D scene creation, the exploration of an alternate approach in text-to-3D generation using Python programs, and empirical studies showcasing the potential of LLMs in creating 3D material.

Overall, this research aims to enhance the productivity and flexibility of procedural 3D modeling using LLMs, ultimately benefiting the end-users and promoting user participation in the creative process.

Source link