Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News University Research Teams Open-Source Natural Adversarial Image DataSet for Computer-Vision AI

University Research Teams Open-Source Natural Adversarial Image DataSet for Computer-Vision AI

Research teams from three universities recently released a dataset called ImageNet-A, containing natural adversarial images: real-world images that are misclassified by image-recognition AI. When used as a test-set on several state-of-the-art pre-trained models, the models achieve an accuracy rate of less than 3%.

In a paper published in July, researchers from UC Berkeley, the University of Washington, and the University of Chicago described their process for creating the dataset of 7,500 images, which were deliberately chosen to "fool" a pre-trained image recognition system. While there has been previous research on adversarial attacks on such systems, most of the work studies how to modify images in a way that causes the model to output the wrong answer. By contrast, the team used real-world, or "natural" images collected un-modified from the internet. The team used their images as a test-set on a pre-trained DenseNet-121 model, which has a top-1 error rate of 25% when tested on the popular ImageNet dataset. This same model, when tested with ImageNet-A, has a top-1 error rate of 98%. The team also used their dataset to measure the effectiveness of "defensive" training measures developed by the research community; they found that "these techniques hardly help."


Adversarial Image Examples


Computer vision systems have made great strides in recent years, thanks to deep-learning models such as convolutional neural networks (CNN) and to large, curated image datasets such as ImageNet. However, these systems are still vulnerable to attacks, where images that are easily recognizable by humans have been modified in a way that causes the AI to recognize the image as something else. These attacks could have serious consequences for, say, autonomous vehicles: researchers have shown that it is possible to modify stop signs in a way that causes many computer vision systems to recognize them as yield signs. And while there is research on techniques for defending against these attacks, so far "only two methods have provided a significant defense."

One of these methods is called adversarial training, where in addition to "clean" input images, the model is trained using adversarial images that have noise or other perturbations applied to them. The ImageNet-A team used adversarial training and the ImageNet dataset to train a ResNeXt-50 model. This did slightly improve the model's robustness when tested on their ImageNet-A adversarial data; however, on the "clean" ImageNet test data, the model, which normally has a 92.2% top-5 accuracy, degraded to 81.88%, which the team considers unacceptable, given the modest robustness gains. On the other hand, the team found that simply increasing the model size (for example, by adding layers), did improve robustness, in some model architectures almost doubling accuracy.

Several commenters on Reddit questioned whether the ImageNet-A images were truly "adversarial," based on the "l_p" definition. First, author Daniel Hendrycks joined the discussion, and defended his team's definition ("inputs to machine learning models that an attacker has intentionally designed to cause the model to make a mistake"), explaining:

I also agree it is absolutely not adversarial in the typical l_p sense... By my measure, the definition [we] provided is most representative of the Google Brain Red Team's definition. Much of the reason for categorizing these examples as "adversarial" is to draw attention to the motivations and circumscriptions of adversarial examples research. If these are to be excluded, then must "adversarial examples" only be "gradient perturbed examples"? Why should a model's train-test mismatch robustness be excluded from security analysis, as it currently is?

A link to download the image dataset as well as python scripts for using it as input to a pre-trained model are available on GitHub.

Rate this Article