BT

Ready for InfoQ 3.0? Try the new design and let us know what you think!

You are now in FULL VIEW
CLOSE FULL VIEW

Analyzing & Preventing Unconscious Bias in Machine Learning
Recorded at:

| by Rachel Thomas Follow 1 Followers on Jun 12, 2018 | NOTICE: The next QCon is in London, Mar 4 - 6, 2019. Save an extra £75.00 with INFOQ75!
38:12

Summary
Rachel Thomas keynotes on three case studies, attempting to diagnose bias, identify some sources, and discusses what it takes to avoid it.

Bio
Rachel Thomas was selected by Forbes as one of “20 Incredible Women Advancing AI Research.” She is co-founder of fast.ai and a researcher-in-residence at the University of San Francisco Data Institute, where she teaches in the Masters in Data Science program. Her background includes energy trading, a data scientist + backend engineer at Uber, and a full-stack software instructor at Hackbright.

QCon.ai is a AI and Machine Learning conference held in San Francisco for developers, architects & technical managers focused on applied AI/ML.

Transcript

I just briefly wanted to say a little bit about my background. I studied Math and Computer Science in college and then did a Ph.D. in Math. I worked as a quant in Energy Trading and that's where I first started working with data. I was an early data scientist and backend developer at Uber. I taught full stack software development at Hackbright. I really love teaching and I think I'll always return to teaching in some form.

And then two years ago, together with Jeremy Howard, I started fast.ai with the goal of making deep learning more accessible and easier to use. I'm on Twitter @math_rachel and, as William said, I blog about diversity on Medium @racheltho, and I blog about data science at fast.ai.

fast.ai

I just have one slide about fast.ai. We have this, as William mentioned, a totally free course, "Practical Deep Learning for Coders." The only prerequisite is one year of coding experience. It's distinctive in that there are no advanced math prerequisites, yet it takes you to the state-of-the-art. It's a very kind of code-first approach.

We've had a lot of success. We've had students get jobs at Google Brain, have their work featured on HBO and in Forbes, launch new companies, get new jobs. So a lot of success stories. I wanted to let you know that this is out here, and this was a partnership between fast.ai, which is a non-profit research lab, and the University of San Francisco's Data Institute.

Case Study: Software for Hiring, Firing & Criminal Justice Systems

So, I think you've been hearing about a lot of awesome things that deep learning can do and I'm going to talk about some pitfalls and risk within coding bias. So algorithms are increasingly being used to make life-impacting decisions or on hiring and firing, and in the criminal justice system.

So Pro Publica did an investigation in 2016 on recidivism algorithm that is used in making pretrial decisions. So who is required to pay bail and who isn't, which a lot of people can't afford bail, and so this really impacts their lives. It's also used in sentencing and determining parole. And Pro Publica found that the false-positive rate for black defendants, so these are people labeled "high-risk" who did not re-offend, was nearly twice as high as for white defendants. So there was an error rate of 45% for black defendants and 24% for white defendants.

So this is pretty horrifying. It's also that this report came out in 2016 and the Wisconsin Supreme Court upheld the use of this algorithm last year, and I'll come back to it. So I'm going to talk through some negative case studies and then circle back to solutions in the latter part of my talk.

”Blindness” Doesn’t Work

But one thing I wanted to say now is that blindness doesn't work. So race was not an explicit input variable to this algorithm, and race and gender are latently encoded in a lot of other variables of where we live and our social networks and education, and all sorts of things because I think sometimes people as a first solution think, "Oh, let's just not look at race or gender," but that does not at all guarantee that you won't have bias.

There was a study this year in 2018 on this algorithm that compared how the algorithm did to Mechanical Turk workers and they did about the same in their accuracy. So this, in addition to being very racially biased, is not even a very accurate algorithm. The same study also found that COMPAS, this algorithm I'm talking about, has 137 variables. It's this proprietary black box. Dartmouth researchers found that it performed just as well as a linear classifier on two variables.

So it's horrifying that this is still in use and then also it highlights a few things. It's important to have a good baseline to know what good performance is, how you could be doing with a simpler model. Just because something's complicated, it doesn't mean that it works.

The Use of AI for Predictive Policing

An area, kind of in keeping in this line that really scares me is the use of AI for predictive policing. TASER acquired two AI companies last year and is marketing predictive police software to police departments. They own 80% of the police body camera market in the U.S. So they have a lot of video data.

And then, The Verge did an investigation a month ago revealing that New Orleans had been using predictive policing software from Palantir for the last six years in a top-secret program, and applications like this really scare me for a number of reasons. One, there's no transparency. So because these are private companies, they're not subject to state/public record laws in the same way that police departments are, and often they're kind of protected even in court of not having to reveal what they're doing.

Also, there's a lot of racial bias in existing police data. So the data sets that these algorithms are going to be learning from are very biased. And then finally, there's been a repeated failure of computer vision to work on people of color, and I'm going to dig into that in a moment, but I think this is a really scary combination of what could go wrong.

Bias in Image Software

So computer vision mentioned that it's often failed on people of color. One of the most infamous examples comes from 2015 Google Photos, which automatically labels your photos, which is a useful tool. So it's pointed out "graduation photos" and "buildings”. It labeled black people as gorillas, which is very offensive.

In 2016, beauty.ai were doing the first AI judged beauty competition. It found that people with light skin were judged much more attractive than people with dark skin. 2017, FaceApp, which uses neural networks to create filters for photographs, created a "hotness filter" that lightened people's skin and gave them more European features. And so this is a picture. On the left is a user's actual face. On the right is what they output as a "hotter" version of him, and he is saying that he is uninstalling the app.

And this continues, 2018. So this is a paper that came out a few months ago by Joy Buolamwini and Timnit Gebru and they are definitely two researchers you should be following. This was part of Joy's Ph.D. thesis at MIT, but they evaluated a lot of commercial computer vision classifiers from Microsoft, IBM, and Face++ which is a giant Chinese company, and they found that the classifiers worked better on men than on women, and better on people with light skin than people with dark skin.

And you can see it's a pretty noticeable gap. So between light-skinned male and dark-skinned female the range is between 20 and 35%, how much worse the classifier is. And this, Joy and Timnit broke it down into- so this is just for women, they've grouped pictures of women together by skin shade and showed what the error rate is, and you can see it getting progressively higher for women with darker skin and the error rates for category of the darkest skin are ridiculously high. So they're 25% and 47%. And again, I'll be returning to these case studies later in the talk with some suggestions about what we can do.

Case Study: Word Embeddings

The third case study is word embeddings. So word embeddings are used in products like Google Translate. So this is a very well-documented example. If you take the pair of sentences, "She is a doctor. He is a nurse," translate them to Turkish, and then translate them back to English, the genders have been flipped to fit the stereotype and it's now saying, "He is a doctor. She is a nurse." Turkish has a gender-neutral singular pronoun and you can see this in other languages with gender-neutral singular pronouns. People have documented this about a variety of words that women are lazy, women are unhappy, a lot of stereotypes.

So I want to dig in just a brief lesson on word embeddings of why is this happening. This is not intentional on the part of Google, and so high-level computers and machine learning treat pictures and words as numbers. And oh, before I get into what a word embedding is, I want to talk about some places that they're used because this is a tool for a lot of other products. And it's used in speech recognition, and computers are better at speech recognition than humans are. Now it's used in image captioning. So these are algorithms where you give it a picture and the algorithm is outputting "Man in black shirt is playing guitar" or "Construction worker in orange vest is working on the road”. They are also used in Google Smart Replies. So this automatically suggests responses to emails for you. So someone asked about your vacation plans and Smart Reply suggest you might want to say, "No plans yet," or, "I just sent them to you." So those are all tools that use word embeddings.

This is an example from our course "Practical Deep Learning for Coders" where you can give words and get back a picture. And so, here we've given it the words "Tench" which is a type of fish, and "net." And it's returned a picture of a Tench in a net. So that's pretty cool.

We need a way to represent words as numbers and a not-very-good approach would be just to go through a number of a bunch of words, and can you think about what the problem with this is? It doesn't give us any notion of what it means for words to be similar. So cat and catastrophe might be sequential number wise but there's not any sort of semantic relationship there between them.

A Better Approach: Word as Vectors

So a better approach is to represent words as vectors. This is a simplified toy example I made up, but take a look at the first column in black and guess what sort of meaning it might correspond to. So you'll see that puppy and dog are both values near one, so maybe this is capturing something about dogginess.

And now, look at the second column in red; what might that be capturing? Yeah, so youthfulness. Because puppy and kitten both have this property and dog and cat don't. And so having this multidimensional vector is letting you capture several things about words. And these are called word embeddings and in practice, you do not want to make them up by hand, you want a computer to learn them for you using machine learning.

Word embeddings are represented as high-dimensional vectors. So this is an example of kitten and puppy and duckling might all be close to each other in space because they're all baby animals. Avalanche might be far away since there's not really a connection. And they're also nice because you can get analogies and kind of you know, you get from king to man by going the same direction and the same distance as you get from queen to woman.

What is Word2Vec?

So Word2Vec is a library of word embeddings released by Google. There are other ones out there. So Facebook has released FastText, Stanford has one called GloVe. They're not deep learning although they're often used as an input in deep learning. Algorithms were used to train them but Word2Vec refers to the library itself. And it's really useful because it can take a lot of data and time and computational power to train something like Word2Vec, and so it's handy that Google has done this for us and released it so we can use it. And it's much easier to use. This is an already trained version.

I gave a whole workshop on this topic. If you want more detail, you can find this on YouTube and all my code is on https://github.com/fastai/word-embeddings-workshop and you can run through it yourself in a Jupiter notebook, and it's kind of fun to play around with because you can try out different words, but that's what that code samples I'm about to show come from.

So the word Kangaroo might be represented by a 100-dimensional vector and they actually have these in all, or not all, but lots of different sizes, but I'm using a 100-deminsional version. And so this is not very human readable. So we've got this array of 100 numbers. Well, what's nice is that we can do things with it, like look at distance. And so here, I'm looking at the distance between puppy and dog, and they're pretty close. I look at the distance between queen and princess and those are also pretty close.

And the distance between unrelated words is higher. So the distance between celebrity and dusty is higher, kitten and airplane are far away. These are kind of unrelated words. And I'm using co-sign similarity here, not Euclidian distance since you don't want to use Euclidian distance in high dimensions.

So this seems kind of useful. I could capture something about language. And you can also get what are the ten closest words to something [...] So here I look for the words "closest to swimming" and I see it's swim, rowing, diving, volleyball, gymnastics, pool, indoor Olympic. These all make sense. They kind of fit with what I would expect.

Word Analogies Are Useful

So word analogies are useful. We've got this useful tool. They also capture things like “Spain is to Madrid as Italy is to Rome”. However, there's a lot of opportunity for bias here. So I looked at the distance between man and genius and found that it was much closer than the distance between woman and genius. And this is just one example but you can look through others and you can try this out.

Word Associations

There are researchers who have studied this more systematically. So this is a group from Princeton and University of Bath, and they looked at it. They called them baskets of words. So they would take a group of words, like these are all flowers, clover, poppy, marigold, iris. They had another basket that was insects, locus, spider, bedbug, maggot. They had a basket of pleasant words, health, love, peace, cheer, and a basket of unpleasant words, abuse, filth, murder, death, and they looked at the distances between these different word baskets. They found that flowers are closer to pleasant words, insects are closer to unpleasant words. So this all seems reasonable so far, but then they looked at stereotypically black names compared to stereotypically white names and found that the black names were closer to unpleasant words and the white names were closer to pleasant words. So that is really dangerous to kind of having this bias. They found a number of racial and gender biases in what words are close to each other looking at entire groups of words. So it's not just a parallelized comparison.

And so this is why we kind of end up with a sexist output from Word2Vec. So we get analogies like “father is to doctor as mother is to nurse”, “man is to computer programmer as woman is to homemaker”. So these are all analogies found in Word2Vec and also in GloVe.

Language

So going back, that's kind of explaining what we saw with Google Translate earlier. There's a blog post by Rob Spear where he talks about a system for restaurant reviews that ranked Mexican restaurants lower because the word embeddings had negative connotations with Mexican.

And I guess there's something I should mention - these word embeddings were learned through using giant corpus of text and so there are a lot of racial and gender biases in text and that's where the word embeddings learn this at the same time that they were learning, kind of the semantic meanings that we want them to know.

Word embeddings improve web search results. So I think this could be dangerous, thinking this is being used as an underlying building block. What sort of biases will be put in there? What if someone is searching for grad students and neural networks and it becomes disproportionately more likely to return male names because of what the word embeddings have learned.

Machine Learning Can Amplify Bias

So what can and should we do about these problems? And before I get into the section on steps toward solutions, I first want to address some objections that I frequently hear. So I often write and tweet about bias and one common objection is, aren't these models just reflecting biases in the world? Like if this is how the world is, don't we just want to reflect it? And then a second objection I often hear is that, "Well, the problem's coming from something downstream or upstream for my responsibility, is this really my responsibility?" And I disagree with both of those and just wanted to say why before I get to the solutions.

So, one is that machine learning can actually amplify bias. There is an interesting paper called "Men also like shopping" where they looked at a dataset. These are kind commonly site datasets, wherein which 67% of people cooking are women and the algorithm predicted that 84% of the people cooking would be women. So you have this risk of even amplifying what we see in the world.

So Zeynep Tufekci, I definitely recommend following her on Twitter. She is a really insightful researcher into the intersection of technology and society. She also has a New York Times column. She pointed out that the number of people telling me that YouTube auto play ends up with white supremacist videos from all sorts of starting points is staggering. And then there's just this whole thread of people reporting this phenomenon of getting shown white supremacist videos on YouTube. Someone says, "I was watching a leaf blower video and three videos later, it was white supremacy." Someone else said, "I was watching an academic discussion of the origins of plantation slavery and the next video was from holocaust deniers." Someone else said, "I was watching a video with my daughters on Nelson Mandela and the next video was something saying that the black people in South Africa are the true racist and criminals."

And so this is really scary and dangerous stuff that's influencing our world to have YouTube recommending white supremacist videos so much. I actually had already prepared the slide for this talk. And a few days ago, a fast.ai student contacted us to say, "All I use YouTube for is to watch your videos and I'm getting recommended white supremacist videos now," which even already knowing that this was a problem, is still kind of upsetting to hear.

Someone else to check out about this is Guillaume Chaslot. He is a former engineer that worked on YouTube's recommendation system and he has written about this phenomenon as well.

So this is an example of a runaway feedback loop. So Zeynep has written about it for the New York Times.

Renée DiResta, who is an expert in disinformation and how propaganda spreads, noticed a few years ago with Facebook, if you join an anti-vaccine group, you also get recommended to join cure cancer naturally, chemtrails, the earth is flat, and of all sorts of anti-science groups so that these networks are doing a lot to promote this kind of propaganda.

This is a paper from a group of computer scientists on how runaway feedback loops can work on predictive policing. So this is the idea of where, if you predict there's more crime in certain areas, you might send more police there, but because the more police there, they might make more arrests, which might cause you to think that there's more crime there, which might cause you to send even more police there, and you can easily get this runaway feedback loop.

Going back to the example of COMPAS, this recidivism algorithm, Abe Gong is a data scientist who gave a great talk on it at ODSC and he dug into what some of the inputs were. And it included things like if you lived with both your parents and they separated, how old were you at the time? I think many people will think is really unethical to have people's prison sentences directly related to things that happened when they are a child that they have no control over.

And so we really need to think about just what variables are ethical to include in our models and just because you have access to data, and even if it helps your model performance, is it ethical to you? And is it in keeping with our values as a society?

So my answer to the question of "aren't these models just reflecting biases in the world?" is no, in some cases they're amplifying them. No, in some cases they're part of these runaway feedback loops. And then also I'll say that I want to make the world a better place through my work. Our technology does impact the world and I think that's a responsibility that we should use well.

So there was a paper on using neural networks for gang crime classification. And at the conference where it was presented, several audience members had questions about the ethics of it, what will happen if someone is incorrectly classified as a gang member. And the computer scientist who has a Ph.D. and is working at Harvard said, "I'm just an engineer." And he got a lot of criticism for this, which I think is rightly so, that even as engineers we need to be asking ethical questions about our work and able to answer ethical questions about it. And I think that we're going to see less and less tolerance from society for this sort of answer as well.

There was an article in The Verge within the past month about an algorithm or software that's used to determine healthcare benefits and it's used in over half of U.S. states, and they gave a case study. When it was implemented in Arkansas, a lot of people drastically had their healthcare cut.

So, people with severe disabilities, for instance, if profiled a woman with cerebral palsy who needs an aid to help her to get out of bed, to go to the bathroom, to get food, and they cut her hours by 20 hours a week fewer of help than what she was getting before, and she couldn't get an explanation for why, and there was not a meaningful way to appeal it.

And they interviewed the creator of the algorithm, who is a professor and earning royalties off of this software, and he was asked whether there should be a way to communicate decisions. And he said, "It's probably something we should do. I should also probably dust under my bed." This sounds really callous. And then he later says, "No, that's somebody else's responsibility. It's not mine." And I think these responses are callous and I also do think that people are getting increasingly frustrated with that sort of response.

Angela Bassa, the director of Data Science at iRobot said, "It's not that data can be biased. Data is biased. If you want to use data, you need to understand how it was generated," and I think that's very true.

Ways to Address Bias in Word Embeddings

All right, so toward solutions. So going back to our problem of bias in word embeddings. There are two different kind of schools of thought about this and one is to try to de-bias the word embeddings. There's an academic paper by Bolukbasi that gives a technique for doing so. Rob Spear has released a de-biased set of word embeddings called concept net.

Then the Caliskan Islam paper, which is another academic paper, says that the de-biasing should happen at the point of decision or the point you're taking an action, that humans are able to perceive the world with bias and so computers should be able to as well, and that it's at the point of action where you want to make sure there's no bias.

And they warn and I think this is true that even if you remove bias earlier in your model, there's so many places that bias can seep in that you need to continue to be on the lookout for it, regardless of which route you take, but I wanted you to know these are two different schools of thought that researchers have about it.

More Representative Data Sets

So going back to our computer vision problem, more representative datasets is one solution. So Joy Buolamwini and Timnit Gebru, as part of their work mentioned before, where they identified these failures of computer vision products, also put together a much more representative dataset of men and women with all different skin shades. And that I believe is available at gendershades.org and you can also find their academic paper as well as a short video about their work.

Timnit also recently released a paper called Datasheets for Datasets. So she has a background in electrical engineering and she talks about in electronics, there are datasheets for any circuit or resistor you buy that give you a lot of information about how it was manufactured, under what conditions it's safe and reliable to use. And so she in this group proposed this as something that we could do with datasets. And they gave some sample questions that could be included that relate to how the dataset was created, how it's composed, what sort of preprocessing was done, what sort of work is needed to maintain it, what are any legal or ethical considerations, and this could, because our model's not going to be better than our data. It might be worse. And so it's really important to understand the datasets that go into building our models.

This paper, I definitely recommend checking it out. It also has really interesting case studies of how safety regulations and standardization came to the electronics industry, to the car industry and to the pharmaceutical industry. There are lot of industry tidbits for instance with cars, I didn't know so historically crash test dummies represented a prototypical male anatomy and so women were in collisions of a similar strength. Women were 47% more likely to be injured. And it was until 2011 that they started regulating your need to use crash test dummies that represent a prototypical female anatomy as well. So, a lot of interesting examples like that show why it matters who is involved in this process.

Choosing NOT to Just Maximize a Metric

So I love this example. This is from Evan Estola who is the Dubai data scientist at Meetup and he gave a talk at ML Conf Seattle. And he talks about at Meetup; they realized more men were interested in technical meetups than women and they felt like this was something that could become a runaway feedback loop of "let's recommend fewer technical meetups to women, maybe even fewer women will go to technical meetups and then we'll recommend even fewer to them. They won't find out about that." And so they chose not to do that and they chose not to maximize the metric. And so I think that's a great example of a company doing the right thing and not just going with this is how to maximize this metric.

Talk to Domain Experts & Those Impacted

Another example of doing things well is this session from the Fairness Accountability and Transparency Conference that happened last month. So this was Kristian Lum, who has a Ph.D. in Statistics and is the lead data scientist at the Human Rights Digital Analysis Group. And she organized a session together with a public defender where they talked about what are physical obstacles that really happen in the legal system. And they also had an innocent man who had been arrested and couldn't afford bail and to really talk about what is the impact that using an algorithm in the legal system looks like.

And I think I've been guilty, having a math background, of wanting to think in these kinds of pure abstract systems but they're all these really complicated and messy systems in the real world where things don't go how they should. The public defender shared about to get to Biker's Island where this man was held, she was having to take a bus for two hours each way and then they only get 30 minutes to see their defender. Then, sometimes the guards are late bringing them in, and so a lot of details that people outside the legal system or outside the prison system wouldn't know. And so I think it's really important that as programmers, we be talking to people who understand kind of the messy real-world systems that our work is involved with, and you can find this tutorial online.

It’s Your Job to Think about Unintended Consequences in Advance

Then I would say it's your job to think about unintended consequences in advance. So think about how trolls or harassers could use your platform, how authoritarian governments could use it. I mean there's definitely an argument for not storing data you don't need so that nobody can ever take that data. How your platform could be used for propaganda or disinformation, this is something I think Facebook just announced last week that they're going to start threat modeling, what could go wrong and everyone was like, "Why haven't they been doing this for the last 14 years?"

But it's really I think our job to think about how our software could be misused before it happens, and this is something that I think in InfoSec happens regularly and is very much a part of the culture there, but that we need to start doing more of thinking about how things could go wrong.

Questions to Ask about AI

So here's the list of some questions that I think are helpful to ask about AI. These can be systems that you're working on or products that you hear about. So one – What bias is in the data? As I said, there's some bias in all data and we need to understand what it is and how the data was created.

Can the code and data be audited? Are they open source? So I definitely think there's a risk when closed source proprietary algorithms are being used to decide things around healthcare and criminal justice and who gets hired or fired. What are error rates for different subgroups? So if you do not have a representative dataset you may not notice that your algorithm's performing very poorly on some subgroup, if there are not that many people from it in your dataset and it's important to check this, just like Pro Publica did with the recidivism algorithm looking at race.

What is the accuracy of a simple rule-based alternative? And this is really important to have a good baseline and I think that should be the first step whenever you're working on a problem because sometimes you'll hear, "I don't know, is 95% accuracy good?" Who knows? It really depends on the context and what you're trying to deal and this came up with the recidivism algorithm that it was doing the same as linear classifier of two variables. So it's good to know what that simple alternative is.

And then what processes are in place to handle appeals or mistakes? So I think we really, really need a human appeals process for things that are impacting people's lives, and I think as engineers, we also kind of have relatively more power in asking these questions within our companies.

So how diverse is the team that built it? So this is the other side of it. I feel that the teams building our technology should be representative of the people that are going to be impacted by it, which is increasingly all of us.

Your Responsibility in Hiring

So your responsibility in hiring, this could be a whole another separate talk. I'll just say a few things about it. Research shows that diverse teams perform better and believing you're meritocratic actually increases bias. It really takes time and effort to do interviews consistently and one blog post I find really inspiring is "Small Cultural Changes You Can Make" by Julia Evans. Julia is an engineer at Stripe and she writes about being an individual contributor engineer, and she just personally wanted to be more consistent in her phone screens. So she developed a rubric for them and eventually it was adopted by all of Stripe to use to make their process more consistent.

So you don't have to be a manager to make change. I have a blog post I've written, "How to Make Tech Interviews a Little Less Awful", where I compile a lot of different research on the topic and did some case studies of companies that I think seem to be doing things well.

Advanced Technology is Not a Substitute for Good Policy

And then so advanced technology is not a substitute for good policy. You wouldn't be able to tell from this talk, but a lot of the talks I give are really kind of hopeful, optimistic. I tell stories about fast.ai students all over the world applying deep learning to the social impact problems to help save the rainforest or improve the care of patients with Parkinson's disease. Sometimes people ask me, "Oh, is this the solution? If we democratize AI enough, will this help us have a good impact and not a negative impact?" And I think that that is a very important component but it's not going to replace the need for good policy.

We Have Regulations

People sometimes suggest that it will be hard to regulate AI but I want to point out that we have some regulations that are relevant. So the Age Discrimination and Employment Act from 1967. So it seems like Facebook maybe violating it by letting companies show ads just to young users, not showing them to older users. With their housing act, Facebook also was found to be violating in 2016, letting people place housing ads and say, "I don't want people of certain races to see this ad."

Facebook said they were sorry. This was awful. It's not what they intended. Over a year later, Pro Publica found that they were still violating it. So our regulations aren't perfect but I think that they are a step in the right direction and I wish that they were enforced more for tech companies.

And I will say Equal Credit Opportunity Act, again they're still biased in lending but lenders do have to do a number of things to try to show that they're not showing bias, and it is better than not having any protection since I think we really do need to think about what are the rights we want to protect as a society.

You Can Never Be Done Checking for Bias

So you can never be done checking for bias. So I've given some steps towards solutions but bias could just seep in from so many places. So there's not a checklist that assures you, "Okay, bias is gone. I don't have to worry." It's something that we always have to continue to be on the lookout for.

Resources

So I've links to some resources. So the word embeddings that I just gave a brief tidbit from is part of a larger workshop. All the code is on Github. This Fairness Accountability and Transparency Conference, really interesting stuff. All the talks are available either on the website or on YouTube, and in particular, I wanted to highlight Arvind Narayanan. He is a professor at Princeton in computer science. He also does a lot of stuff around privacy. He gave a talk on 21 definitions of fairness that was fascinating since this is a very active field.

Latanya Sweeney who is a professor at Harvard and if you're not familiar with her, you should look her up because she has just a really fascinating career. As a grad student, she de-anonymized the Governor of Massachusetts health data and had it sent to him because he had guaranteed that this anonymization would work well, and has sent a lot of interesting stuff since then.

Kristian Lum's tutorial that I mentioned earlier, where she worked with a public defender, as well as an innocent man who could not afford bail. GenderShades is the website for Timnit and Joy's work, and then "Weapons of Math Destruction" is a book by Cathy O'Neil that covers a lot of biased algorithms.

So that's all I have and I'm open to questions. You can find me on Twitter @math_rachel and then I also blog about data science at fast.ai.

See more presentations with transcripts

BT