Transforming 2D dog images into 3D models

16th March 2024

Sheryl Miles

0 0

In a study carried out by the University of Surrey, researchers have found a way to create accurate 3D models from photographs of dogs – by using the technology behind the video game Grand Theft Auto.

The award-winning research utilises artificial intelligence to transform 2D images into detailed 3D structures, and promises to advance various fields by enhancing the accuracy and efficiency of 3D modelling processes.

Accurate 3D models are crucial for a multitude of applications across diverse sectors, including virtual reality, animation, medical imaging, and even in the design and manufacturing industries. They provide a detailed representation of objects in three dimensions, offering insights and perspectives that are impossible to achieve with 2D counterparts. Accurate 3D models are integral in reducing costs and enhancing product development. For instance, in the medical field, precise 3D models of organs can improve surgical planning and patient outcomes. In the entertainment industry, realistic 3D models can enhance the visual experience, making digital worlds more realistic, and, ergo, more immersive.

The University of Surrey's research team tackled the challenge of creating accurate 3D models from 2D images by teaching an AI system to predict the 3D pose of a dog from a single photograph.

Traditional methods to teach AI this transformation require 3D 'ground truth' data, often acquired through motion capture technology. However, applying motion capture suits to dogs is impractical due for a multitude of reasons, not least because of their varied shapes, sizes, and temperaments.

To circumvent this obstacle, the researchers modified the video game Grand Theft Auto V to enable it to generate a virtual environment populated with dogs. By replacing the game's main character with one of eight different dog breeds, the team created 118 videos depicting dogs in various actions such as sitting, walking, barking, and running under different weather and lighting conditions. By using this approach, it allowed the production of a comprehensive database named 'DigiDogs', comprising 27,900 frames, without the need for real dogs to wear motion capture suits.

The next phase involved refining the AI model with Meta's DINOv2 model to ensure its predictions from real dog pictures matched the accuracy obtained from the virtual DigiDogs database – a crucial step for translating the model's capabilities to real-world applications.

This research is noteworthy in its demonstration of a novel method to overcome the limitations of traditional 3D modelling techniques, particularly in scenarios where obtaining ground truth data can be challenging. The use of a popular video game to create a virtual training environment showcases the potential of cross-disciplinary approaches in advancing technological solutions. The study's success opens the door to more accurate and efficient 3D model generation which has the ability to impact various industries – improving design processes and enhancing experiences.

Beyond its immediate application, the methodology could be adapted for other subjects where motion capture is impractical or impossible such as offshore windfarms.

The potential implications of this work are vast, and it holds promise of changing the way we create, interact with, and utilise 3D models across industries.