Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations ML/AI Panel

ML/AI Panel



The panelists discuss what makes ML different from other types of applications and why it requires special tooling. They also talk about where the low-hanging fruit is and how they recommend acquiring those initial quick wins.


Chris Albon is the the Director of Data Science at Devoted Health. Paige Bailey is the product manager for Swift for TensorFlow at Google. Amy Unruh is a Staff Developer Relations Engineer for the Google Cloud Platform. June Andrews works on AI Instruments at Stitch Fix. Melanie Warrick works at Google.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Moderator: I chose these particular panelists because they have each had a strong influence in shaping my own career and they have a breadth of experience across lots of different companies and industries and they have graciously agreed to share their opinions and views with us today.

Our goal for today's session is to convey just how approachable ML is but the field has come a long way since the days of needing a P.hD. just to build a model, much less do something with it. We now have frameworks that take much of the pain away. For the next 50 minutes, we'll give you a window into our careers, how we got to where we are, and you'll get a sense of the current landscape and what you can do with it.

A successful outcome for us is that you take what you've learned today, you go home, and you apply that to your own problems. Let's kick things off with some introductions.

Andrews: I am June Andrews, I currently work at AI Instruments at Stitch Fix. I've been in the industry for a while now. I originally started as a search engineer before we had all the distinction between data science, ML, AI, etc. The latest project that I finished was I led the efforts for integrating real-time ML into modern and diagnostics of airplane engines at GE which produces 60% of airplane engines in commercial jets. I found myself doing more and more research, project development around the integration of ML into larger systems.

Warrick: My name's Melanie Warrick. I'm at Google. I work in cloud and do things around AI. I was at a start-up before this building out an opensource neural net platform, not TensorFlow. Before that, I was working at and I was helping to implement machine learning into applications. Another history before that with business and film.

Unruh: I'm Amy Unruh, also at Google. Also, on the cloud platform working in machine learning related things. In terms of my background, I actually got an AI P.hD. long ago, back when the hot thing was symbolic reasoning. Then, came what sometimes people call the AI winter, and I moved out of doing AI-related things. Obviously there's been a resurgence in the last decade or so and so I feel like I've come full circle and moved back into. It's been very interesting.

Albon: My name's Chris Albon, I'm the Director of Data Science at Devoted Health. I wrote an O'Reilly book on machine learning. I created machine learning flashcards. That's probably the thing I'm probably most known about.

Bailey: I'm Paige Bailey, I currently work on the TensorFlow team at Google particularly as a product manager for Swift for TensorFlow as well as building out some of our frameworks for machine learning researchers. Prior to this, I had a history as a person who is doing geoscience and planetary science at NASA, machine learning before it was called machine learning, so when it was just called research. Then, after that, I went into Chevron building geostatistics and machine learning plugins for an application called PETRO. After that, worked a bit at Microsoft in the office of the Azure CTO, so helping build out machine learning for cloud platforms.

What Makes ML Different?

Moderator: I'd like to start by talking about what it is about machine learning that makes it different. We're at a software development conference, it's not machine learning-focused and there's really only one track on ML and that's the strength. That's what makes me so excited about being here. Instead of being hyper-focused on the ML, the idea is how is ML different, why does it need special frameworks, what can we learn from traditional software development practices and apply it to this space, where are we going to run into challenges, and where are we going to run into specifics that just don't quite fit the normal paradigms that we see? Who wants to jump in with what makes ML different?

Bailey: I could start. One of the things that I really love about machine learning is that it feels very exploratory and very scientific. Again, my background was focused in geophysics and applied math and a lot of the research is about having a hypothesis, testing it out on some data, and then iterating over and over again to see if your hypothesis was correct.

This feels very different from traditional software development to me, and that often if you're building a web application you can look up, "Ok, here's what I absolutely need to do in order to put a button on a webpage," but it's very deterministic as opposed to exploratory. Machine learning feels like a scientific mindset applied to programming, and I really enjoyed that as a career.

Andrews: I think that's a good call out, I'll build off of that. A lot of software I think the lines are becoming increasingly blurred as we go into probabilistic failures and really complex environments where even deployments need ML to gauge how healthy is this deployment. It is a good call out that when you release in software engineering, if you've done a good job with reasonable unit tests, you can walk away.

In ML, that is not the case; you very much need to have a whole another level of guarantees in order to make something in the industrial world walk away safe. The reason is because the quality of the results delivered by ML changes inherently over time as your data changes. Your software engineering, one thing I love to pull up is my undergrad website that calculated the traveling salesman problem for 12 points. It still works today because it was built on a very nice Java container type [inaudible 00:06:51].

If I were to pull up the ML that I did for search five years ago, it would not work. There's the once you release aspect of having to maintain and upkeep it but then there's also the development cycle. People are pretty good at imagining what they need from engineering as long as they have a reasonable amount of experience, but it can be really hard when you start to understand what you need from ML and so that development has a fundamentally different iteration cycle.

Albon: Just to amplify that, as someone who is in charge of implementing data science at a company, the thing that happens with ML which it's one of the hardest challenges is that you can actually do something that fails. You can say, "We're going to predict X,Y, and Z," and then you put all the work into it and it just doesn't work. The data isn't clean enough, the data isn't there, it's just not a possible problem to solve, which is not the case with a lot of the software. If you're just doing straight, you're building a shopping cart you know you just do these steps and the shopping cart will be there and it'll work.

In ML actually, because there's more of the research side, you can actually just be wrong. This could not be a research problem that's solvable in a way that actually works. That, from a bureaucracy point of view as part of an organization, requires I think a different approach where you can't go in and just promise everything that you can possibly do and then be guaranteed that you're going to deliver it, because you might not, it might not work.

Warrick: There's also been a lot of misunderstanding about it too. People are getting more and more comfortable in having an awareness of what it means and I suspect in this room many of you have a better understanding right now than a few years ago. It's become something that's becoming more and more understood but still, there's still a lot of questions. What's been interesting is watching over the last few years in particular because of that lack of understanding, but there's still an awareness, "People can understand their customers with more detail, I want that." Then, can you actually accomplish that?

There's been this challenge to overcome and understand what are the problems, like you're saying, and are those problems you're able to solve? We're seeing this becoming something more and more people have an education on, and understanding, and able to grasp in more clarity I'd say nowadays than just a few years ago. It's amazing how much we've changed. We still have a lot of room to grow.

Unruh: I'm hopeful that in five years or so, we're starting to see tooling come out that's starting to tackle some of these problems in a really interesting way. Things like tools for allowing you to get explanations for why your models gave you a particular prediction, for example, all sorts of tools for visualizing your results, all sorts of tools for analyzing very large scale datasets to see if they're skewed in some way.

I think also in terms of for anyone who follows the field, there's clearly been a boom in the last few years of interesting neural architectures, deep learning architectures that are coming out and I think the theory is trailing the practice by a number of years and so I'm hopeful that some of this will catch up.

Andrews: I do want to take a moment and call out. How many folks here are software engineers? I don't want to mitigate the challenges with software engineering. I know that when you build some things that are really big at the last minute you're, "Why is this database so unstable?" and that can be a huge challenge. I don't want to de-emphasize the challenges that software engineering has, but I want to highlight that the ratio of solvable challenges where if you have enough time, money, resources, and knowledge, you can actually solve them, to unsolvable ones where we're having to do research just to get our project out the door, is different between software engineering and ML.

Moderator: Yes, that's a really good thing to point out. It's a different class of problems that we're looking at. We've evolved over time and traditional software engineering is just better understood. We've been doing it longer, we've been pouring more hours into it and ML is just newer. We're now just putting into practice a lot of this theory that we're just now cracking.

A Good Foundation

Moderator: If you're just now getting started, what would you say are some good foundation skills for it. I don't necessarily mean that you need to go back to basics and, "Go learn some algebra before you can learn anything with ML," but what are some skills that maybe some people in this room already have that lays a good foundation for them?

Andrews: One thing I love about software engineering, and I heard it from Paige [Bailey]'s talk earlier and it brought me back happy memories is how rapidly I used to learn computer languages. It was fun. Software engineers, you guys are good at learning new stuff. I'd say that's your foundational skill for going into ML, is take all of that ability to learn.

The other thing I loved in my personal learning style for a lot of software engineering stuff was just try it out, just build it, throw it together, and see what you got. That's actually a pretty good approach to getting off the ground ML because so much of it is domain-specific but the delta that you learn about your specific problems, your specific data, your specific infrastructure combining all of that you'll get largely what you need just by throwing it together and seeing what happens.

Warrick: I know you were, not linear in algebra and statistics and frankly, that's the thing I tell everybody when I'm, "Learn the language and then use a tool." If I really was to think about, something I've had others tell me it's like scientific theory frankly, just getting to the basics of coming up with a hypothesis and then researching it and testing it and then revisiting it. That serious cycle that we all learn when we were younger is really crucial to doing machine learning and doing it right.

Unruh: More pragmatically maybe. There's a lot of that on GitHub. You have to really take it with a grain of salt, or a cup of salt maybe, because a large subset of what's out there are people who are learning as well so it doesn't necessarily mean that what you see is best practice, but learning by example can be really powerful and there's a lot to see on Github and papers that get published as well.

Also, I would add, again pragmatically, whatever machine learning framework you decided to use, figure out how to use a hyperparameter tuning tool with that framework because as you start to build models you'll be, "This really isn't getting me the accuracy I thought maybe I could get. What do I tweak?" There's a systematic way to explore that, that can be very helpful.

Warrick: I was about to say you had a plug right there.

Andrews: A startup that started in my kitchen five years ago, was specifically for helping with that.

Bailey: I also think it's really interesting to see, especially in the last couple of years how many prepackaged models and callable REST APIs have been created specifically for machine learning, so even if you don't necessarily have a background or an interest in wanting to build your own models, you can borrow and adopt some models that have been built by others.

Azure Cognitive Services are an example of this from the Microsoft side. There's also AutoML Tables and a number of computer vision services from Google and from other folks that have been building out great things. What I've been really delighted to see at least in the TensorFlow example space is the creativity of software developers who are adopting machine learning techniques for their applications.

CodePen, if anybody's a JavaScript-y type person, you can have HTML, CSS, and JavaScript just readily available for people to play within a browser. People have been building these really interesting experiments where you have MobileNet which is a model that's able to understand partially how you can move so it can pluck out arms and torso and legs and such and it's really easy to just have an overlay of Totoro of Spider-man or something.

As you're interacting with the webcam in your browser you can overlay that across a person. That doesn't require you to have any deep complex background to machine learning, it's just how well can you call some JavaScript that can implement this in your creativity to express it.

Albon: To amplify that, one of the things that I found that's been really nice for me comes from, I don't have a P.hD. in computer science, I don't have a P.hD. in mathematics, but there's enough tutorials out now and there's enough books out now that whatever your strength is, there is that strength plus ML. If you're, "I am JavaScript at the core," there is a JavaScript and ML book that you can walk through.

Bailey: There are multiple ones.

Albon: There are multiple ones. If you're really deep in math, there are books that are just the math of ML and you can walk through it because that's what you're comfortable with. There's so many ways to just use your strength and take one more step into ML. I did see Snowflake ML package on GitHub. I have no idea if that works or not, but if you're really into data warehouses, who knows what they do with that.

Unruh: I really liked Paige [Bailey]'s suggestion of just trying some of the APIs that are out there, especially the ones that allow you to tune the models with your own data. I think all the major cloud providers have some variant of that. Really, if you're just learning about machine learning, it can be a good way to get a sense of what this technology can do without having to dig in and actually build a model yourself.

Shortcuts and Tips

Moderator: Paige [Bailey], thank you for mentioning a few APIs. I'm going to pose that question to the rest of the group. I am interested in some specific suggestions that you have, so specific tools that can help jump-start the process. If there is a library that you really like or a specific tutorial. Most people in here are probably familiar with CI/CD tooling and how does that apply to ML. If there are any specific shortcuts, any tips that you have, we'd love to share that with the audience.

Warrick: Paige [Bailey], you've mentioned this in your last talk but I'm a huge fan of Rachel and Jeremy Howard's FastAI. They've got some great material out there that's all free to check out.

Moderator: I believe they even have a Swift version of it.

Bailey: They do and it's called SwiftAI because puns are awesome. The REST API services are wonderful but also Keras is another option as a high-level API and it actually has a number of implementations and languages outside of Python. If you're interested in trying a JavaScript interface to Keras that's available as part of TensorFlow JS, there's an R interface that's very Keras-y for TensorFlow and multiple others. It's a great way to understand machine learning quite quickly and to get started with 10 lines or less.

Unruh: Then, there's TensorFlow Hub as well.

Bailey: There's TensorFlow Hub where you can bring in models and model components and use them for transfer learning which is that concept of taking a base model that somebody has gone through a lot of energy and time to create and just adding a little bit of incremental data and having it tweaked minutely to fit your use case.

Andrews: I think a lot of those are great suggestions for standalone or if you want to downsample your data. Another thing to consider is to use the infrastructure you already have in-house. Chances are there's some form of ML already being used. I love using infrastructure that's in house because that means you have folks that have installed it, are maintaining it, it's connected to your data. One of the big caveats with data or with ML is having that feature store and so much power of the rest of the ML pipeline is specific to what the power of your feature store is. If you can bootstrap off of an existing system in-house, that can really get you to the high ROI components quite quickly.

The other thing I really love about that is in-house data science, in-house ML, they'll have a bunch of testing specific to your application. For search, they'll have, "We've pulled up the top 10 head queries." You don't need to calculate the top 10 head queries, you can just say, "I made this change. Afghanistan restaurants are suddenly very popular. What is this modification?" I also recommend considering using your in-house tools as well.

Unruh: I'll have to give a call out to an opensource project called Kubeflow, which I'm actually giving a workshop on here in a couple of days which is tackling what I like to call the ML Ops aspect of machine learning. Once you've built your model maybe with a smaller subset of data in isolation, what do you do with it? How do you codify the workflow to build it and make sure that that's reproducible and auditable and all that stuff. Kubeflow is built on Kubernetes, an opensource project, but there are other similar tools and frameworks coming out. Those kinds of things can be really helpful to not have your explorations just lapse into chaos as you expand them.

Warrick: I'm also going to give a call out to Project Jupyter, Jupyter Notebooks, all the Jupyter tools. It used to be IPython Notebooks. If you've not seen this and played with it and you're just starting out in the space, definitely set that up. It makes it so much easier to explore data and look at it.

Bailey: It's not just exclusive to Python either. You can have kernels in a variety of languages: F# if you're an F# person, Swift if you're a Swift person.

Unruh: .NET apparently.

Andrews: Just don't start with R, please. I respect R developers, sorry.

Moderator: I'm not an R developer. Python was a very easy place for me to start and there's so many libraries in it. As long as what you're working with is interoperable with Python libraries then you should be set for the future.

First ML Applications

Moderator: Let's take a step back and let's go a little bit back in time and talk about, I'd like to here from each of you what's the first ML application that you ever built. You don't have to go into too much detail, but tell us a little bit about it and if you were going to build it again today, what tools would you use to build it?

Warrick: I did a boot camp and I built an application. This is 2014 and I know people hadn't fully started to do this yet but I wanted my email to identify when somebody was trying to meet up with me and then to make me aware of it, and then I wanted to do a lot more than just that but I was basically trying to do a recommender running within NLP to assess all the emails that were coming in.

I know it's, "We already have that," but in 2014, which wasn't that long ago, we didn't have that. I was trying to build out what, but me on my own was not going to accomplish what Google has accomplished as quickly. That was something I played around with. I got into at least identifying some of the emails, probably 20% of the time.

Bailey: My first machine learning project was probably logistic regression, which isn't very interesting, to help better qualify data that was coming in for a project. My first deep learning project was, I was in the arts sciences and people would get P.hD.'s for looking at very detailed satellite images of reefs and hand counting the different kinds of atolls that are in the reef, so hundreds and hundreds of them, and they're like one of five shapes or something. People were divided up into sections and then they would hand count and then they would write a paper and they would produce the paper and defend it and it would be a good portion of their P.hD. I did not want to do that.

The idea was to go through and to build an image classification tool to help with classifying those five different kinds of reefs. I was trying to do it by myself and failing miserably, but that was also the end of the year was when TensorFlow was open-sourced and their first tutorial was doing an image classification on about five different types and I was able to solve that task just by plugging in my own data. I was so delighted and then I showed my advisor and he's, "This isn't science." I was, "Ok. I guess." Then, I turned into a computer science major. True story.

Unruh: Mine is going way back, many more than Michelle [Casbon] wanted to go. Strictly speaking, my first machine learning project was when I was in grad school. My desktop machine was a Symbolics Lisp machine and I wrote this whole thing in Lisp. As a technology that was popular at the time in academic circles called explanation-based learning. Probably no one in here has ever heard of that. Symbolic reasoning. I won't go into the details of what it was but basically, I was building an agent to do some planning to figure out how to accomplish a task and then reason about what were the important aspects of that task and its environment that were crucial to accomplishing it and then that was the learning and then it could use that information again in the future.

Andrews: I'll share the first time something I did was called ML. I was working at LinkedIn and I came home. My husband has a degree, we're both applied mathematicians. I came home and I was going through my LinkedIn feed and it said, "Someone endorsed you for machine learning," and I said, "That's wrong. I don't do machine learning." He said, "No, you do." I said, "What are you talking about? I'm a social network analyst."

What I was working on was the People You May Know recommendations at LinkedIn and I was doing it from a social analyst perspective of, "What are the features of... What do you have to have in common, your interaction rates and how this predicts who actually knows who in the real world?" He's, "No. Look it up, that's machine learning." That's at least the first time I recognized I was doing machine learning.

Before that, I'd always considered it social network analysis, linear algebra, and the earliest one that I guess you could technically call ML was I worked on supercomputers and I did data transfer for real-time brain surgery. When you do brain surgery the brain deforms and so you need imagery from where did the tumor go when you take off the top of someone's head. I analyzed the TeraGrid for what was the most likely routes I could get the processing back for that imagery.

Albon: My first time was a few years after I got my P.hD. I did quantitative research classic statistics. I was working at a Kenyan nonprofit called Ushahidi and we had these streams of data that people would send in about when they were doing election monitoring. This polling station, there's violence and this polling station's safe, and all this stuff. I had just read ISL, the Introduction to Statistical Learning and I was super pumped and I was going to apply that thing no matter what.

I went to town on it. I basically made these huge promises of what would be possible based on these short little text messages we were getting in about what I was going to be able to show, I was going to map everything, we were going to understand everything that was happening in this country based on these messages. It didn't work at all, not even close to working. I could basically tell you nothing. I think actually it turned out that a text search for keywords was actually far more efficient than my terrible model. It was fun.

Beginner's Mistakes

Moderator: I think not telling you anything is better than telling you something wrong. As with any technology, the more approachable it becomes the easier it is to misuse it. My question for you is, what are some of the common mistakes you see from beginners who don't fully understand the technology that they're using, and what are some common misconceptions you see from people just starting out.

Andrews: I'll take this one. I just gave a talk on this at MLconf last week. I particularly look at how ML is integrated in the human decision systems and I'm going to extract this and say if you take a machine and you smash it into a human decision system, things don't always go well. Unfortunately, there's a large amount of hype about how well ML and AI can deliver in these environments. You have two complex systems, you have to be very careful about how you bring them together. Oftentimes, we overlook that complexity and just smash them together.

In the talk, I showed that the judicial system is being smashed together with ML, particularly around predicting risk scores for bail whether or not folks are being granted parole to the point that it's being legislated that ML is required in those systems, but with that legislation, there's no set of deliverables of how accurate it needs to be, no oversight for how it's being audited. I was able to show that depending on the UI, I can change 40% of how many people are released from prison. I think that is a phenomenal area for the greatest mistakes from the smallest details of when you want to smash ML into a system you've got to be careful.

Albon: One thing that I think relates to Paige [Bailey]'s early point around that ML is more on the research angle. There's things that can go well, and things that not. The thing that people who do sales and marketing love about ML is that no matter what, it'll give you a prediction. If you read the code and it works, you enter in some values like the model runs and it'll classify something in some group, which is so cool from a sales perspective because it always work, but it doesn't always work.

That prediction might be meaningless, it might not tell you anything, it might be totally wrong, but because the model always gives you something, it's so common to have these tools sold that they just work. They might not work. They'll look at all the commas in someone's application or the spelling mistakes in their handwriting and they'll tell you what they ate for lunch and whether or they're going to steal from your company or not, some craziness. It'll predict. There'll be a thing. There'll be a prediction. There'll be a percent. Who knows if it works or not?

That thing fueled with the hype train around ML has just caused a huge number of companies to attempt to build things like that and market them and sell them. Particularly, I see it like selling to the government because the government isn't great at vetting these things and it's easy to be, "Oh no, totally. We'll protect you. He'll go back to prison," or who will use this feature or who will behave in a certain way, and will do it all based on the shape of their face or some craziness like that. ML is like a ripe tool for using that because it always gives you some answer but it's not the correct answer.

Bailey: I think it's also important that we're seeing that most of the data that's being used to create these machine learning models is very western-centric. A great example is image classification tasks again. You might think of a wedding picture in the United States and it looks a very particular way. A wedding picture in India or in China might look very different but based on the data that's been used to feed the algorithm, you end up with very dissimilar circumstances. Some of them, especially when dealing with people in different cultures, might be more negative than anticipated.

Yes, so I think there are a number of startups and a number of initiatives that have been grassroots spun up to help acquire additional data from places outside of the United States and outside of these western-centric contexts, but finding ways to encourage those and build out those datasets and opensource them I think is incredibly important.

Unruh: Compounding that issue, sometimes the original source of the data can be a bit obscure. There's a big image database called ImageNet. Part of it, relating to labels of people was quietly removed recently because it was so problematic, some of its labels. Yet, out in the world are many machine learning models that are trained on that large dataset that people are now fine-tuning and using transfer learning to tune to their own data that have those problematic labels and biases baked in with no real obvious audit trail as to how to see what they were.

Warrick: To both Chris [Albon]' and June [Andrews]'s point, I agree. A huge challenge and too much faith in the results. Then, to you all's point about data, I think a huge challenge with not involving the people who are actually impacted by the machine results in the actual the data that's being put into the system as well as what answers do you really want to try to get out of the system. Something that I learned early on, and I still struggle with this, and I think it's true for everybody.

When you're building these models really questioning – this goes to all of it – questioning the errors. You can be, "It's 99% correct." Why is that 1% wrong? You might be, "I don't need to spend time on that." It's not a bad question to ask and also to question, "Is 99% right?" It's a good idea to really dig in especially when you see errors. Why are those errors there? Because there may be something silly in that moment that you are not able to see unless you look.

Unruh: In fact, I just had that come up, a dataset I was playing with that was predicting safe drivers. It was 98% accurate when I trained it and it turned out that in the places that it was making a mistake, it was getting wrong all of the cases where someone did get in an accident and made a claim. The problem was essentially that the data for that class was too sparse and if you don't dig into your accuracy results you'll miss a lot of important things.

Andrews: I want to build off of that. That's a little bit of a mind shift from where we used to monitoring uptime and page load time, "Here's my 90th percentile. Here's my 95th percentile," is that when you come to analyzing these ML systems in the real world is you've got to take another viewpoint. You've got to look at what's the worst case scenario here and explicitly measure the downside. It's not straightforward oftentimes.

A things to think about is if you're working with people there are obvious vulnerable populations in people particularly around children, seniors, whatever your application is, think about who's the vulnerable person here, what is the big downside, and go explicitly measure it. Don't let it get folded into the means and the overall accuracy. The people part, that's straightforward to think about, but even when you go to airplane engines, there are more vulnerable airplane engines. Whatever your population is, there's always one angle in ML That you have to protect against.

Moderator: Melanie [Warrick] brought up a good point. If you're training a model and it's 99% accurate, what is that 1% that it's not accurate for? Who is that 1%? That 1% doesn't refer to 1% of the population, it refers to 1% of what your data represents, which is not usually 1% of the population. Datasets come from humans and humans are biased. Just remember when you're training that 99% may actually be very skewed and only represent half or less.

Showing Value Early

One important question is, as you get started in ML, how do you show value early and convince your leadership to continue investing in your product? What are some things you can do to quickly show that investing your time in this is worthwhile?

Andrews: This is something I continually worry about and is something I think should be addressed upfront when you're entering this space, because ML does require an investment. It requires your time, it requires infrastructure, and ultimately, it requires something else not being done. It can be tempting to say, "I'm going to go and build this great amazing shiny new ML space and then once we have that, we can deliver on these great results." The problem with that is that great ML shiny space of what you need is changing faster than you can build it, so even if you did get that you'd want to build a different shiny space. You need to balance building up your company's infrastructure and tools to work with, with also showing ROI on that investment so they continue to invest in it. Also, it's a good checkpoint to make sure that you are investing in the right infrastructure as well.

A couple of techniques for this is, every company has their hearsay of what they believe to be true but don't quite know. Maybe they don't have the right resolution on what that actually looks like in different parts of the population. I really like that approach because it's an avenue through which to engage with leadership and start that conversation if you think something's true, "Let me go test that." That's one way to build buy-in. The other is to use that Bowen spiral of, can you break down the space that you want to attack? Ok, recommended systems are probably 50 recommendeds under the hood, can you build it in a scalable way that you build up the infrastructure of saying, "Ok, I know multiple recommenders are going to come into this home feed, but I'll build and end to end and put two recommenders in and then show that home feed and build from there."

Bailey: I think that's great. Part of the value of showing results early is that they can be imperfect as long as everyone understands that it's a proof of concept. That's another way that these REST API services can be incredibly valuable and things like hyperparameter tuners. You can try to get a first pass at setting up an experiment that's maybe 80% of the way there that can show that it might be worthwhile to pursue more concrete results, but it gets you there in a couple of hours as opposed to a couple of months and that's substancial in an engineering infrastructure.

Andrews: I'm going to build off of that. We've heard hyperparameters come up here and there and they're absolutely right. Hyperparameters matter and the reason they matter is because of the uncanny value of ML. The difference between ML deliverable that absolutely fails and one that can actually deliver value is oftentimes a couple of percents. That's where hyperparameters get you from something that is not going to work to something that you can build your career off of.

Albon: I think another angle to take and one that I've taken before is, you have an organization, the organization doesn't do any ML. One of the things you could do is actually apply really simple models. It's not ML but we could do OLS or you could do this [inaudible 00:41:51]. Just get the organization comfortable with the idea that we are not a health insurance company, we're a health insurance company that's ok with using predictions in our decision-making process. Make them get that one leap and then when you can show, "No. The predictions that we're making, even the simple ones are 5% better than we were doing before or 10% better. Great, now let's take something more complicated. Now, let's move over to something like TensorFlow. We've shown to you that there's value in this and I'm not a snake-oil salesman who's going for it."

Five years ago someone sitting in this room is, "I'm going to go back to the CEO and say, 'AI,' and they're giving me $5 million." We've passed that point. Now, you have to deliver some value. Just starting with something simple where you don't need as much infrastructure. There's a lot of infrastructure that you need, but you don't need as much. Before you were using the mean as your prediction, now we're using an OLS and we're just getting you a little bit farther, a little bit father. Once you build that confidence then you can actually go big and say, "I'm going to go away for six months. we're going to build out this deep learning infrastructure." It's going to take a while, it's going to take resources, it's going to take time. We're not going to see a result right away but when you come back, "This is better than we did before." You're already comfortable with the idea of making decisions based on predictions, and this does better.

Andrews: You hit on something that's good to call out. In software engineering when you want to do something rogue you're, "Just give me two days. It's a hackathon." For ML, it does require a little bit more than that. You will burn yourself out trying to build a substantial ML project in two days. Try to get some buy-in, try to get some support before you go down the path of, "I'm going to build this."

Unruh: Also, if possible, pick something that, dipping your toes in, doesn't require a lot of data cleaning and pre-processing, if possible. That's not always possible. That can suck up 90% of your time right there. If you have the choice, look for something where the data is plentiful because the larger the dataset, the better your model will train, and where you don't have to do too much cleaning ahead of time. That'll help.

Warrick: I'm just going to add a couple of things because y'all hit on it pretty much. Work with the people who deal with your data analytics especially to understand what specific metrics do you want to try to share your performance on, customer churn, whatever. Figure out what you think you can try to go after. Is it one thing to help you guide this discussion when you're trying to have it with the people in your company? Then, pick a tool that's already been built translator tools are a great example. This is a solved problem, if you're trying to translate between languages. You don't have to create this from scratch. You don't have to get the data for it. You just get the tool and implement it. One way to quickly turn something around and say, "Here's the thing that works" or low-hanging fruit as the terminology goes, pick something that's already been built because there's definitely models out there that are pre-built that you can start with.

ML Pipelines and Frameworks

Participant 1: I'm going to give context as an analogy because I don't know how to phrase this, but for my developers that write code I can give them run-time options, like deploying with red/black or gradual rollouts or canaries or things like that. I'm interested what you think the feature is through data frameworks and pipelines or ML pipelines and frameworks, on what things we can give for free post-model.

Andrews: I actually want that to be rolled into a field. I want a field to understand what are the ways to rollout ML because rolling out ML, as Kubeflow has made it nicely scalable, it's more than that. You have to get people to listen to the ML and that needs to be folded into that rollout and iteration. I actually want to see a field give me some options.

Albon: Ditto. I agree. It's definitely something that it's more than just one little answer. This is a big problem, we have to work at it slowly. I think as ML as a field gets more developed, more things like that will start to develop these auxiliary things that are actually super important. Five years ago we were still figuring out some basic parts of how to take research ML and turn it into production and a lot of that framework's in place and we can move onto other areas.

Unruh: Now, people are starting to talk about CI/CD and the context of ML then you get into very interesting questions. Suppose you have a machine learning workflow where you're always retraining a model on your data, how do you evaluate this model? Can you automate whether or not to push them to production or do you need a human? All very interesting questions that people are just starting to tackle in the last couple of years, I think.

Bailey: Absolutely. The concept of having unit tests for input data and other different components of your machine learning framework is really critical. The models that we were deploying to rig sites at Chevron, if you didn't have unit tests on your incoming data that was being used for inference, if you added a new bit type or if you had suddenly some a weight on bit or some pressure increase that you hadn't seen before, having a problematic decision could impact someone's life.

The models that we usually had were continue drilling or stop drilling, something looks wrong. If you get that wrong, that's really problematic. Ensuring that you have appropriate tests in place to understand that if your data is coming in, you can make a good assessment versus if you're seeing something that you haven't seen before and cannot make any context.

Andrews: One trend I am seeing is that in terms of these rollout options is we are starting to integrate more and more people into that rollout. We're doing less and less of this move, the ROC by 0.5. That'd be huge. We're doing less abstract metrics and more quantitative safety-based rollouts.


See more presentations with transcripts


Recorded at:

Mar 10, 2020