Nvidia showcases cutting-edge generative AI research at NeurIPS 2022

Check out the on-demand sessions from the Low-Code/No-Code Summit to learn how to successfully innovate and achieve efficiency by upskilling and scaling citizen developers. Look now.


Nvidia showcased cutting-edge artificial intelligence (AI) innovations at NeurIPS 2022. The hardware giant continues to push the boundaries of technology in machine learning (ML), self-driving cars, robotics, graphics, simulation and more.

The three categories of awards at NeurIPS 2022 were: Outstanding Main Track Papers, Outstanding Dataset and Benchmark Track Papers, and Time Test Paper. Nvidia won two awards this year for its AI research papers, one exploring diffusion-based generative AI models, the other on training generalist AI agents.

Nvidia also presented a series of AI advances it had been working on for the past year. It has published two articles, on offering unique lighting approaches and on 3D model creation, following up on work on 3D and generative AI.

“NeurIPS is a major conference in machine learning and we see great value in participating in the show among other leaders in the field. We showcased 60+ research projects at the conference and were proud to have two papers honored with the NeurIPS 2022 Awards for their contribution to machine learning,” Sanja Fidler, VP of AI research at Nvidia and writer of both the 3D MoMa and GET3D articles, told VentureBeat.

Event

Intelligent Security Summit

Learn the critical role of AI and ML in cybersecurity and industry-specific case studies on December 8. Sign up for your free pass today.

Register now

Synthetic data generation for images, text and video were the main topics of several Nvidia-authored articles. Other topics covered were reinforcement learning, data acquisition and augmentation, weather models, and federated learning.

Nvidia reveals a new way to design diffusion-based generative models

Diffusion-based models have emerged as one of the most disruptive techniques in generative AI. Diffusion models have shown an exciting potential for achieving superior image sample quality compared to traditional methods such as GAN (generative adversarial networks). Nvidia researchers won an “outstanding main track paper” award for their work in diffusion model design, which suggests improvements to model design based on an analysis of multiple diffusion models.

Their paper, titled “Elucidating the design space of diffusion-based generative models,” breaks down the components of a diffusion model into a modular design, helping developers identify processes that can be changed to improve overall model performance. Nvidia claims that these proposed design modifications can dramatically improve diffusion models’ efficiency and quality.

The methods defined in the paper are primarily independent of model components, such as network architecture and training details. However, the researchers first measured baseline results for different models using their original output functions, then tested them through a unified framework using a fixed formula, followed by minor adjustments that resulted in improvements. This method enabled the research team to adequately assess various practical choices and propose general improvements to the diffusion model’s sampling process that are universally applicable to all models.

The methods described in the article also proved to be very effective, as they allowed models to achieve record scores with improved capabilities compared to performance metrics such as ImageNet-64 and CIFAR-10.

Results of Nvidia’s architecture tested on various benchmarking datasets. Image source: Nvidia

That said, the research team also noted that such advances in sample quality can amplify negative societal effects when used in a large-scale system like DALL·E 2. These negative effects can include disinformation, emphasis on stereotypes, and harmful biases. Moreover, training and sampling such diffusion models also requires a lot of electricity; Nvidia’s project consumed ~250MWh on an internal cluster of Nvidia V100s.

Generate complex 3D shapes from 2D images

Most of the tech giants are gearing up to show off their metaverse capabilities, including Nvidia. Earlier this year, the company demonstrated how Omniverse can be the platform for creating metaverse applications. The company has now developed a model that can generate high-fidelity 3D models from 2D images, further enhancing its metaverse technology stack.

Named Nvidia GET3D (for its ability to generate explicit textured 3D meshes), the model is trained only on 2D images, but can generate 3D shapes with intricate details and a high number of polygons. It creates the figures in a triangular grid, similar to a papier-mâché model, covered with a layer of textured material.

“The metaverse consists of large, consistent virtual worlds. These virtual worlds need to be filled with 3D content—but there aren’t enough experts in the world to create the vast amount of content required by metaverse applications,” Fidler said. “GET3D is an early example of the kind of 3D generative AI we’re building to provide users with a diverse and scalable set of content creation tools.”

Overview of GET3D architecture. Image source: Nvidia

Also, the model generates these shapes in the same triangular mesh format used by popular 3D applications. This allows creative professionals to quickly import their assets into game engines, 3D modeling software, and movie renderers, so they can start working on them. These AI-generated objects can populate 3D representations of buildings, outdoor locations or entire cities, as well as digital environments developed for robotics, architecture and social media.

According to Nvidia, previous 3D generative AI models were significantly limited in the level of detail they could produce; even the most sophisticated reverse rendering algorithms could only construct 3D objects based on 2D photographs collected from multiple angles, requiring developers to build one 3D shape at a time.

Manual modeling of a realistic 3D world is time- and resource-consuming. AI tools like GET3D can greatly optimize the 3D modeling process and allow artists to focus on what matters. For example, when performing inference on a single Nvidia GPU, GET3D can produce 20 schemas in a second, acting as a generative resistor network for 2D images while producing 3D objects.

The more comprehensive and diversified the training data set, the more varied and comprehensive the result will be. The model was trained on NVIDIA A100 tensor core GPUs, using one million 2D images of 3D shapes taken from multiple camera angles.

Once a GET3D-generated form is exported to a graphics tool, artists can apply realistic lighting effects as the element moves or rotates in a scene. Developers can also use language cues to create an image in a specific style by combining another AI tool from Nvidia, StyleGAN-NADA. For example, they can change a rendered car to become a burned car or taxi, or convert an ordinary house into a haunted one.

According to the researchers, a future version of GET3D may include techniques for estimating camera position. This will allow developers to train the model on real-world data instead of synthetic datasets. The model will also be updated to enable universal generation, meaning developers will be able to train GET3D on all types of 3D schemas simultaneously rather than on one object category at a time.

Enhance 3D rendering pipelines with lighting

At the recent CVPR conference in New Orleans in June, Nvidia Research introduced 3D MoMa. Developers can use this reverse rendering method to generate 3D objects that consist of three parts: a 3D mesh model, materials placed on the model, and lighting.

Since then, the team has made significant progress in extracting materials and lighting from 3D objects, allowing artists to alter AI-generated shapes by changing materials or adjusting lighting as the object moves around a scene. Now presented at NeurIPS 2022, 3D MoMa relies on a more realistic shading model that uses Nvidia RTX GPU-accelerated ray tracing.

Recent advances in differentiable rendering have enabled high-quality reconstruction of 3D scenes from multiview images. However, Nvidia says most methods still rely on simple rendering algorithms such as pre-filtered direct illumination or learned representations of irradiance. Nvidia’s 3D MoMa model incorporates Monte Carlo integration, an approach that significantly improves decomposition into shape, materials and lighting.

3D MoMa’s Monte Carlo integration. Image source: Nvidia

Unfortunately, Monte Carlo integration yields estimates with significant noise, even at large sample counts, making gradient-based inverse rendering challenging. To solve this, the development team incorporated several important sampling and denoising into a new inverse-rendering pipeline. This significantly improved convergence and enabled gradient-based optimization at low sample counts.

Nvidia’s paper on diffusion-based generative models also presents an efficient method for jointly reconstructing geometry (explicit triangle meshes), materials and lighting, which significantly improves material and light separation compared to previous work. Finally, Nvidia envisions that denoising could become an integral part of high-quality inverse rendering pipelines.

Fidler highlighted the importance of lighting in a 3D environment, saying that realistic lighting is essential to a 3D scene.

“By reconstructing the geometry and decoupling lighting effects from the material properties of objects, we can produce content that supports relighting effects and augmented reality (AR) — which is much more useful for creators, artists and engineers,” Fidler told VentureBeat. “With AI, we want to accelerate and generate these 3D objects by learning from a wide variety of images rather than manually creating each piece of content.”

Image source: Nvidia

3D MoMa achieves this. As a result, the content it produces can be imported directly into existing graphics software and used as building blocks for complex scenes.

The 3D MoMa model has limitations. They include a lack of efficient regularization of mirror image parameters for materials, and reliance on a foreground segmentation mask. In addition, the researchers note in the paper that the approach is computationally intensive, requiring an advanced GPU for optimization runs.

The paper presents a unique Monte Carlo rendering method combined with variance reduction techniques, practical and applicable for multiview 3D object reconstruction of explicit triangular 3D models.

Nvidia’s future AI focus

Fidler said Nvidia is very excited about generative AI, as the company believes the technology will soon open up opportunities for more people to be creators.

“You’re already seeing generative AI, and our work in the field, being used to create amazing images and beautiful works of art,” she said. “Take, for example, Refik Anadol’s exhibition at MoMA, which uses Nvidia StyleGAN.”

Fidler said other new domains Nvidia is currently working on are fundamental models, self-supervised learning and the metaverse.

“Basic models can be trained on huge, unlabeled data sets, opening the door to more scalable approaches to solving a variety of problems with AI. Similarly, self-supervised learning aims to learn from unlabeled data to reduce the need for human annotation, which which can be a barrier to progress,” explained Fidler.

“We also see a lot of opportunity in gaming and the metaverse, using AI to generate content on the fly so that the experience is unique every time. In the near future, you’ll be able to use it for entire villages, landscapes and cities by putting together an example on an image to generate an entire 3D world.”

VentureBeat’s mission will be a digital town square for technical decision makers to gain knowledge about transformative business technology and transactions. Discover our orientations.

Leave a Reply

Your email address will not be published. Required fields are marked *