Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Facebook, Microsoft, and Partners Announce Deepfake Detection Challenge

Facebook, Microsoft, and Partners Announce Deepfake Detection Challenge

This item in japanese

Facebook, Microsoft, the Partnership on AI, and researchers from several universities have created the Deepfake Detection Challenge (DDC), a contest to produce AI that can detect misleading images and video that have been created by AI. The challenge includes several grants and awards for the teams that create the best AI solution, using the DDC's dataset of real and fake videos.

In a recent blog post, Facebook's CTO Mike Schroepfer announced that Facebook will be contributing US $10 million to the effort as well as a curated dataset containing video of paid actors. The dataset will be "freely available" to the community, and the money will be used to fund research grants and cash prizes for developers who produce the winning solution. The challenge's goal, Schroepfer says, is

to spur the industry to create new ways of detecting and preventing media manipulated via AI from being used to mislead others.

Deepfakes are a "technique for human image synthesis based on artificial intelligence." In short, they are "fake" images and videos created by deep-learning models. Image manipulation is of course not new, but the creation of realistic-looking fake videos has until recently required the big budget of Hollywood movies. With the advent of ubiquitous cloud computing, deep-learning frameworks, and open-source implementations of "face-swapping" technology, the bar is much lower. While this technology does have merely entertaining applications, such as replacing one actor's face with another's, many are concerned that deepfakes could be used maliciously, either to manipulate public opinion, or as a "social engineering" attack by hackers. Similar concerns about the potential abuse of their deep-learning results prompted OpenAI not to release their full GPT-2 text-generation model.

Facebook and the other partners in the DDC hope to set an AI to catch an AI, by establishing a "Kaggle-style" competition with a cash prize. The competitors will download the DDC dataset, which will contain both real and fake videos, and use the data to train a machine-learning model which can successfully identify the fakes. The challenge includes a "test mechanism" which will score the competitors' models on a subset of the dataset that is not made public, but is instead held aside for testing only.

Deepfake detection is already an area of active research among the universities involved in the challenge. For example, Professor Siwei Lyu at the University at Albany-SUNY has published several papers on the topic. Lyu points out that people in fake videos often do not blink. The image generation algorithms also leave behind other more subtle "fingerprints" that can be detected by deep-learning systems.

There is some concern that the DDC could itself lead to more convincing fakes. Most deepfake models are based on the generative adversarial network (GAN) architecture, which consists of two competing neural networks: the generative network that learns to produce convincing images, and the discriminative network that judges how good the image is. The two networks are trained simultaneously, in an AI "arms race," until the generative network can produce realistic-looking images. Users on Twitter asked Facebook VP and chief AI scientist (and deep-learning pioneer) Yann LeCun if improved deepfake-detectors could be used in such an adversarial training setting to create even better fakes. The DDC site's FAQ tries to address this concern somewhat by restricting the use of the dataset:

We will be gating access to the training dataset so only researchers accepted into the challenge can access it. Each participant will need to agree to terms of use on how they use, store, and handle the data, and there are strict restrictions on who else the data can be shared with.

The DDC dataset is currently not available, but the site notes that events will begin in October 2019 and will run until March 2020.

Rate this Article