BT

Leslie Miley on Bias in Big Data/ML and AI - QCon San Francisco

| by Shane Hastie Follow 28 Followers on Nov 20, 2017. Estimated reading time: 5 minutes | NOTICE: The next QCon is in San Francisco Nov 5 - 9, 2018. Save an extra $100 with INFOQSF18!

At QCon San Francisco last week, Leslie Miley gave a keynote talk in which he explained how inherent bias in data sets have affected things from the 2016 Presidential race to criminal sentencing in the United States.

He started by emphasizing that 2017 has been an unprecedented year because social media has been overwhelmed with bias powered by fake news, machine learning and AI.  He recounted the numbers that Facebook acknowledged - in 2016 they claimed that fake news was not a problem, then in October 2017 they stated that 10 million people saw fake ads and in November they revised this number to 126 million, and climbing.  Twitter identified 6000 Russian-linked bots which generated 131,000 tweets between September and November 2016, viewed by 288 people globally and 68 million in the USA.  He asked the question - how can this happen?

He explained that while he was at Twitter he ran abuse-safety-security in the accounts team.  They identified hundreds of millions of accounts which had been created in the Ukraine & Russia.  He doesn't know if they were removed.  Facebook has stated that up to 200 million of their accounts may be false or fake or compromised.  There is a significant problem which is not being addressed.

 He explained that in 2016 Twitter released their algorithmic timeline, which is designed to ensure that you see more tweets from the people you interact with the most and that it:

    Ensured the most popular tweets are far more widely seen than they used to be, enabling them to go viral on an unprecedented scale.

They have achieved this goal very effectively, however there is a problem when the most popular tweets and posts are falsified news.   He said the system didn't deliver news, it delivered propaganda; it didn't deliver your cat video, it delivered biased information.  They told people to go out and protest against Black Lives Matter; they told someone to go and shoot up a pizza parlour in the middle of the country - someone did that because of fake information that they received from social media.

He maintains that Facebook and Twitter are publishers, media companies; however they are not held account like media companies are because they are treated as a "platform".  There is extensive ongoing debate and discussion about the role of Facebook and Twitter as media companies or platforms.

There could be as many as nearly one billion false accounts on social media which are generating fake posts and taking advantage of the algorithmic timeline features to spread their content very widely, taking advantage of bias and altering people's moods and behaviours. He cited a Facebook experiment which showed how inserting different posts into peoples timelines altered their mood and actions.  He asked if, having published that this is possible, they did anything to prevent others from using the same techniques; his contention is that they did not.

The false data becomes a part of the training data which determines what the timeline algorithms present to people.  

He related this to the 2008 mortgage crisis - the way information is collected and presented with very little control and not understanding how the system works and why it works in the way it does.  

He explained why this concerns him - he is sure that the "next big thing" will be an AI/ML company and asks if they will repeat the mistakes of the past.  Without conscious care and effort, this is a very likely outcome.  

There is a growing and thriving industry emerging around the use of algorithms in a wide range of areas.  He gave the example of their use in ride-sharing – what would happen if the algorithm determines that in a particular area most rides are under $5.00?  Will they send people to pick up in those areas, will they send drivers who are lower-rated?  What is the impact on the people who live in those areas? This is already happening – and there is no visibility into what is going on.

It is also happening in sentencing guidelines where the algorithm resulted in African-Americans being 45% more likely to get sent to prison for the same crime, because the dataset they used to train the model was inherently biased.  This algorithm has been deployed in 25 states in the USA, without being changed.

There is no transparency around how the algorithms are put together and trained and these algorithms are making more and more life and death decisions in society – around employment, health care, mortgage rates and in many other areas of our lives.

When these problems become apparent and come crashing down, the public will be left to pick up the pieces.

He then identified concrete things we can do to prevent these problems from happening.  This starts with having the discussion about where the training data comes from; is it over-sampled or under-sampled, how are the algorithms built? Be transparent about what information is collected, how it is used, what elements are taken into account in the calculations.

He presented some actionable steps we can take:

  • Seek people outside of your circle for your data training experiments – widen your datasets
  • Practice radical transparency in what data is being used – ensure the data set identification and algorithm is peer reviewed
  • Hire more women in engineering – just do it; engineering teams with more women in them produce better results
  • Work on empathy and self-awareness – every day try to wring a little bit of the bias out of yourself (referencing President Obama); refactor your empathy and self-awareness

He ended by providing a list of sources for the audience to delve further into these topics:

He encouraged the audience:

Let’s not build an ML weapon of mass destruction and then stand back after five or so years and try to say “but we’re just a platform”.

He ended by saying that in technology we have worked with little oversight and regulation – let’s aim to be self-regulated rather than waiting for these types of problems to cause government regulation.  Think about the impact of what we’re building on people who are less privileged than the people who build the systems.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

I am glad someone is saying these things. by Ron Hodges

As William Gibson famously said, "the street finds its own uses for things." Usually, these uses are perversions of the inventors' intent. I find a staggering amount of what is either childish naiveté or willful self-serving denial on the behalf of those inventors with regards to "unintended" outcomes from all these wonderful tools. You build it, you have responsibility for how it is used.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT