NVIDIA breakthrough emulates images from small datasets for groundbreaking AI training

2020/12/08 Innoverview Read

NVIDIA’s latest breakthrough emulates new images from existing small datasets with truly groundbreaking potential for AI training.

The company demonstrated its latest AI model using a small dataset – just a fraction of the size typically used for a Generative Adversarial Network (GAN) – of artwork from the Metropolitan Museum of Art.

From the dataset, NVIDIA’s AI was able to create new images which replicate the style of the original artist’s work. These images can then be used to help train further AI models.

The AI achieved this impressive feat by applying a breakthrough neural network training technique similar to the popular NVIDIA StyleGAN2 model. 

The technique is called Adaptive Discriminator Augmentation (ADA) and NVIDIA claims that it reduces the number of training images required by 10-20x while still getting great results.

David Luebke, VP of Graphics Research at NVIDIA, said:

“These results mean people can use GANs to tackle problems where vast quantities of data are too time-consuming or difficult to obtain.

I can’t wait to see what artists, medical experts and researchers use it for.”

Healthcare is a particularly exciting field where NVIDIA’s research could be applied. For example, it could help to create cancer histology images to train other AI models.

The breakthrough will help with the issues around most current datasets.

Large datasets are often required for AI training but aren’t always available. On the other hand, large datasets are difficult to ensure their content is suitable and does not unintentionally lead to algorithmic bias.

Earlier this year, MIT was forced to remove a large dataset called 80 Million Tiny Images. The dataset is popular for training AIs but was found to contain images labelled with racist, misogynistic, and other unacceptable terms.

A statement on MIT’s website claims it was unaware of the offensive labels and they were “a consequence of the automated data collection procedure that relied on nouns from WordNet.”

The statement goes on to explain the 80 million images contained in the dataset – with sizes of just 32×32 pixels – meant that manual inspection would be almost impossible and couldn’t guarantee all offensive images would be removed.

By starting with a small dataset that can be feasibly checked manually, a technique like NVIDIA’s ADA could be used to create new images which emulate the originals and can scale up to the required size for training AI models.

In a blog post, NVIDIA wrote:

“It typically takes 50,000 to 100,000 training images to train a high-quality GAN. But in many cases, researchers simply don’t have tens or hundreds of thousands of sample images at their disposal.

With just a couple thousand images for training, many GANs would falter at producing realistic results. This problem, called overfitting, occurs when the discriminator simply memorizes the training images and fails to provide useful feedback to the generator.”

You can find NVIDIA’s full research paper here (PDF). The paper is being presented at this year’s NeurIPS conference as one of a record 28 NVIDIA Research papers accepted to the prestigious conference.

(Source: AI News https://artificialintelligence-news.com/2020/12/07/nvidia-emulates-images-small-datasets-ai-training/ )