Key Takeaways
- Machine Learning Models, just like most software, can be hacked
- Privacy attacks against machine learning systems, such as membership inference attacks and model inversion attacks, can expose personal or sensitive information
- Several attacks do not require direct access to the model but can be used versus the model API
- Personalized models, such as predictive text, can expose highly sensitive information
- Sensitive training data and their models should be secured
Machine learning is an exciting field of new opportunities and applications; but like most technology, there are also dangers present as we expand the machine learning systems and reach within our organizations. The use of machine learning on sensitive information, such as financial data, shopping histories, conversations with friends and health-related data, has expanded in the past five years -- and so has the research on vulnerabilities within those machine learning systems.
In the news and commentary today, the most common example of hacking a machine learning system is adversarial input. Adversarial input, like the video shown below, are crafted examples which fool a machine learning system into making a false prediction. In this video, a group of researchers at MIT were able to show that they can 3D print an adversarial turtle which is misclassified as a rifle from multiple angles by a computer vision system.
In general, adversarial examples are used to confuse the decision or outcome of a machine learning system in order to achieve some goal. This could be a simple misclassification of an example or a targeted account. This has been covered again and again in the press; does pose significant security threats to things like self-driving cars.
However, an often overlooked danger within machine learning is the research on privacy attacks against machine learning systems. In this article, we’ll explore several privacy attacks which show that machine learning systems are susceptible to revealing sensitive information inadvertently. This is just as dangerous, if not more so than adversarial examples due to the wealth of services where one can buy and sell private information online.
I Know Your HIV Status: Exposing Sensitive Training Data
What if I told you I could determine if your data was used to train a model with 70-90% accuracy, even if I didn’t have complete information? With the advent of ever-more sophisticated and also personalized or grouped machine learning becoming commonplace, these attacks can expose sensitive information about individuals.
This type of attack is called a Membership Inference Attack (MIA), and it was created by Professor Reza Shokri, who has been working on several privacy attacks over the past four years. In his paper Membership Inference Attacks against Machine Learning Models, which won a prestigious privacy award, he outlines the attack method. First, adequate training data must be collected from the model itself via sequential queries of possible inputs or gathered from available public or private datasets that the attacker has access to. Then, the attacker builds several shadow models -- which should mimic the model (i.e. take similar inputs and outputs of the target model). These shadow models should be tuned for high precision and recall on samples of the training data that was collected. Note: the attack aims to have different training and testing splits for each shadow model, so you must have enough data to perform this step. Finally, in an architecture somewhat similar to a generative adversarial network (or GAN), the attacker can train a discriminator which learns the difference in output between the seen training data and the unseen test or validation data based on the input and output to the numerous shadow models. This discriminator is then used to evaluate the target API and determine if a data point is in or out of the target model. These attacks can, therefore, be run without full access to the model or in what is called a “black-box” environment.
Professor Shokri and his fellow researchers tested these attacks against Amazon ML, Google Cloud machine learning systems and a local model. From their experiments, they achieved 74% accuracy against Amazon’s machine learning as a service and 94% accuracy against Google’s machine learning as a service, proving fairly high certainty for a machine learning task exposing private information. In a more recent paper, they were also able to expose this information in supposedly “completely private and decentralized” systems such as federated learning, a type of machine learning architecture supported by Google and other large companies as a way to promote sharing of private data in a “secure” way.
Of particular import when reviewing this attack is the fact that recommender systems are becoming increasingly more personalized -- employing demographic information, net worth and other sensitive fields. For models trained on a very small population, such as net worth, an attacker who knows a few pieces of information about you can then determine if you were included in the training dataset, and therefore expose your net worth. This also applies to other information they may not have had before that is part of the population selection for the training data (such as gender, age, race, location, immigration status, sexual orientation, political affiliation and buying preferences).
What data could these attacks expose? For AI in healthcare, it could expose something as sensitive as HIV status. AI in health treatment often segregates individuals based on their disease. If an attacker were able to query the API of a model which recommends treatment for HIV patients, exposing that they were a part of the training data also lets us know their positive HIV status. As AI use grows in the healthcare industry, we need to be cognizant of privacy and security vulnerabilities in the models we train and use.
Now I See You: Extracting Images from Facial Recognition Systems
What if I told you I could learn your face just by knowing your name? You might think I’m using Google Image Search, but actually, this information is exposed simply by having access to a facial recognition model which has been trained to recognize your face.
This type of attack is called a model inversion attack and it was created by Matt Fredrikson and fellow researchers, who presented the attack in a 2015 paper Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures. Similar to the membership inference attack, this attack only needs API access to the machine learning model and can be run as a series of progressive queries. To begin, the attacker can use a base image -- if they know anything about their target (such as age, gender or race), they can attempt to pick a closer image. Then, the attacker runs a series of queries performing the inversion attack -- changing pixels to increase the accuracy or confidence of the machine learning system. At some point, high confidence is reached, which produces an image similar to the one shown above -- which, although isn’t perfect, is fairly demonstrative of how the person looks. The paper also details several other attacks using this method against different types of datasets.
How plausible is this attack? We now see regular use of facial recognition systems, including the use of US immigration and federal databases shared for things like airline check-ins, which means the access to and use of these systems is also growing. With every contractor or subcontractor who has access to the system or API, there is another potential attack on the horizon. Along with the privacy violation that comes with this attack, access to your passport, visa or other important and official photos can be a potential security threat.
Your Credit Card Number Is…: Sharing Secrets in Deep Learning for Text
What if I told you I could gather your credit card number with access only to a personalized or semi-personalized text-based model? You can think of this like a predictive keyboard… what happens if you type “My phone number is…”?
This type of attack was investigated by Nicholas Carlini and several other Google Brain and academic researchers in a paper called The Secret Sharer: Measuring Unintended Neural Network Memorization & Extracting Secrets. Unlike the two previous attacks, this attack requires white box access to the model -- so the attacker needs to get ahold of the trained model or binary themselves. Then, they proceed by querying the text model to maximize the likelihood of the next character to discover known patterns (such as ID number, credit card numbers or phone numbers). In the paper, they were able to extract secrets that had been seen as few as 9 times in the training data.
As part of the research, they trained models of 7 individuals from the Enron email dataset -- and were able to recover a social security number and 2 credit card numbers. Given the increased use of deep learning in text, particularly for conversational AI in customer service, the ability to memorize and then extract secrets is a real security risk. It essentially allows a phishing attack against any user who has ever shared private details to the bot.
Protecting Your Data: Secure Machine Learning
Given that machine learning exposes a new attack surface, how can we protect and secure machine learning against these privacy attacks? One such solution, federated learning, has already been exposed to have vulnerabilities from Professor Shokri and his group at the University of Singapore. Others, such as PATE from Nicholas Papernot, require large amounts of training data and employ a differential privacy mechanism (meaning they still might expose some information). Still, others employ encryption techniques on the training data (using homomorphic encryption or homomorphic pseudonymization). It seems there is not yet a generally feasible way to ensure that private or sensitive information may or may not exist in the machine learning system.
That said, in computing, we’ve been managing private and sensitive data for decades. It hasn’t always been perfect, or even very secure, but as computing specialists, we’ve developed ways to encrypt and decrypt data in real-time, to create secure computing enclaves, to protect data held in memory and many other security measures for protecting sensitive and privileged information from attackers. By continuing research and development on new ways to apply secure computing principles and encryption methods to machine learning problems, we can likely discover and apply better security for sensitive data contained in these models.
We’ve also found ways to protect API and API access, to harden software systems and still allow basic interoperability. Applying this knowledge to our machine learning systems by regularly testing them for exploits and vulnerabilities is key to securing these systems in general. Developing new ways to allow access to machine learning APIs which also can determine the feasibility of incoming input could strengthen the security of said systems and provide peace-of-mind for both the security and machine learning teams. Essentially, we need to make sure OWASP application security standards are applied to machine learning APIs and are expanded to include these special types of attacks.
Incorporating security protocols, testing and system review as a regular part of machine learning deployment would allow security and machine learning teams to work together to solve these problems. With experts from both areas developing and standardizing new techniques, I am confident we can build a more secure and privacy-aware future for machine learning systems.
About the Author
Katharine Jarmul is a pythonista and co-founder of KIProtect, a data science and machine learning security company in Berlin, Germany. She's been using Python since 2008 to solve and create problems. She helped form the first PyLadies chapter in Los Angeles in 2010 and co-authored an O'Reilly book along with several video courses on Python and data.