InfoQ Homepage Presentations Panel: ML for Developers/SWEs

AI, ML & Data Engineering

Panel: ML for Developers/SWEs

View Presentation

Speed:

Download

49:40

Summary

The panelists cover how they've adopted applied machine learning to software engineering.

Bio

Hien Luu is an engineering manager at LinkedIn and he is an AI & big data enthusiast. Jeff Smith is an engineering manager at Facebook AI where he supports the PyTorch team. Brad Miro is a Developer Programs Engineer at Google where he specializes in machine learning and big data solutions.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Luu: Welcome, everyone to the session for machine learning for developers. This is the panel, we're going to have a lot of interesting discussions. The theme of this panel that we're going to be discussing is about how to apply machine learning to your application systems. We want to treat it from the beginner’s perspective. If I'm a software engineer, I'm new to machine learning, I heard it's an awesome thing. I want to apply this to my application, in the system. How do I get started? How do I go about doing that? What should I be worrying about? How do I convince my management or my team? How do I do all that stuff? Those are the kind of questions that we're going to be talking about.

The structure of this would be, we'll do an introduction, I'll seed the panel with some questions, and then I'll turn it over to the questions from the audience after about 10 or 15 minutes. Let's get to know all panel a little bit. We have Brad [Miro] from Google, and then we'll have Jeff [Smith]. I want to let Brad [Miro] go first to do a quick intro about him. What do you do, and maybe some fun facts about you?

Miro: My name is Brad Miro. I'm a developer programs engineer at Google based here in New York City. A developer programs engineer is someone who splits their time between doing traditional software engineering and also reaching out to the community and speaking to developers and just learning about things that you like, things that you don't like. The products that I specifically work on are in Google Cloud as well as open-source. I do a lot of work with TensorFlow and then also Spark on Google Cloud. That's a little bit about me, and a fun fact is I can beatbox.

Smith: My name is Jeff [Smith]. I work at Facebook within our AI group, and I work specifically as an engineering manager supporting the PyTorch team. I spend most of my time really focused on users outside of Facebook, trying to understand where is the field of AI going and how can we get there together through open collaborations across companies, across research labs, and in open source.

Journeys in Machine Learning

Luu: I'm going to start with that first question. You guys have a lot of experience and have been working with machine learning and deep learning for a while. What is your journey like? How do you get started?

Smith: I think a little bit about my own personal journey, but I also think a lot about the teams I've been on. When I started working within the space of AI more than a decade ago, it was kind of a weird niche thing to do, and it was really strongly associated with academia, and I didn't really know anyone else who had this job other than a couple of people sitting next to me. That's changed a lot. A big part of how that has evolved is that a bunch of people who had strong technical skills in other domains of software engineering have started to realize that they can put those skills to use, within working within machine learning problems.

Some of the common features I see of people who have successfully made a transition from working within software engineering to working within ML engineering specifically, is a focus on building a baseline of knowledge, focused on the real-world machine learning problems that they can solve, doing some independent studies, some dedicated classroom study. I've seen people do things like go back to Grad school or just read a couple of books, dive into projects head first. It really depends a lot on your career goals, but one of the messages I want to talk about in this panel and then in my talk later is that there are a whole lot of organizations collaborating, like Google, Facebook, Amazon, Microsoft, building things for you as a developer to get better at machine learning, to understand how to integrate it into your products, into solving problems that you're working on. It's a much more open door than it used to be, and I would encourage everyone to get started and talk about what your challenges are and what specific questions you have as someone new to the field.

Miro : I just want to piggyback off that. One thing that I am super excited about in the field generally right now is just the access to information that's available. Jeff [Smith] mentioned books, but there are also so many online courses, both through via organizations such as like Coursera and Udacity. Some costs nothing to almost nothing. It's the access that we have to all of this nowadays, it's never been like this before. The fact that it's available for you, it's just great. There's a lot of information available on the Internet and definitely go seek it out if you're interested.

Luu: How did you get into your role?

Miro: I studied math and computer science in college and I guess I came at it from an approach where I liked math, I liked computer science. How can I do both? Machine learning is a perfect candidate for that. I think, fundamentally just the idea of building these algorithms that can teach themselves really fascinated me while I was in school. I actually held various software engineering jobs that weren't necessarily doing machine learning, but just through taking online courses, just building toy models myself, and just really gaining an understanding. Some of it was at work too, just playing with some smaller data sets and seeing if I could derive any business value out of it. It was really just getting my hands dirty. Having the interest is definitely important, and I think nowadays as long as you're at least interested, the means are there for you to continue to learn. That's basically how it was for me. I just took online courses.

How to Get Started with Machine Learning

Luu: Awesome. I think this field has different types of roles, I can see data scientists, machine learning engineers, software engineers. It depends on what you're interested in and where your background is, and there are different path to get there. I'm going to start with, imagine I'm a software engineer and focusing on building applications, services, backend, and so on, and I heard this awesome thing about machine learning and I want to see if I can apply it or introduce that into my application system. It's a quite wide-open question. How do I get started? What do I need to know about? Maybe some of you have similar questions, but it's quite daunting, it's a big field. What do I do? Any thoughts?

Smith: It depends on how close you are to having what you think is the right opportunity to deploy machine learning technology within your application. Certainly, getting oriented and building some of that base knowledge through courses and books or whatever the case may be is a good start there. If you are, say, specifically motivated and, "I want to be able to do these things with the text of users interacting within my application. I want to understand things about it. I want to use that to make suggestions that help their experience." That gives you a lot more focus, because frankly, the field of ML is really big. Being able to pick a domain where you can put things to use gives you focus and allows you to have a more targeted path in learning about what are the techniques in use.

Some of that's going to be general knowledge, like deep learning is a general technique, but some of them is going to be really specific. Working with text is very different than working with audio, versus working with images. Having that sort of focus allows you to find the right resources. Some of those resources that you can find out there are the things like courses that are specific to those topics, but very often libraries as well. We're getting to the point where it's not just about, "Well, you have to solve all this yourself." Libraries often contain functionality like pre-trained models which give you access to some of the best techniques in use developed through pioneering research, but now pre-trained sitting ready for you to use. You can get that available via various libraries. We, at PyTorch, have a tool called PyTorch Hub, which I'll show you a little bit in a talk later, that gives you direct access to leading techniques for a given research problem in a pre-trained model that you just call on us, a one-liner, and say for torch.hub, load your model.

Miro: You definitely touched on this a lot, but I think something that's really important when getting started with incorporating machine learning into a production or an enterprise environment is really figuring out where in the pipeline you are at with your organization. By that I mean, are you an organization that just wants to apply machine learning but doesn't necessarily know where? Are you an organization that wants to apply machine learning but knows exactly how you want to do it? I think that needs to be the first step. I've personally worked at organizations who were on both sides of that who wanted to do machine learning, had no idea what or how, or and then organizations that did know what they wanted to do. Figure out where you are in that and then take the appropriate steps. In those two examples, if you know you want to do machine learning, you know you have a lot of data and you don't necessarily have any ideas of what to do, one way to do it is go attend conferences like QCon and speak to other people who are actively doing this stuff and get ideas from them and see what other people in the industry are doing.

In a lot of cases, you'll notice that a lot of problems are fundamentally the same across multiple organizations. By just going and speaking with people, you'll generally be able to figure out ways that you can then do something similar in your own organization. If you do know what you want to do, yes, it's going out and seeking that specific knowledge. If you have all those text data, you don't necessarily know what to do, go learn about natural language processing, go learn about deep learning for texts. I think it's really important to figure out where you are along and do you actually have the resources to do this? Because you can definitely get started doing this stuff just as a hobbyist or something, something simple. But to actually really add this stuff to production does take a lot of work and you need to make sure that you have all the appropriate steps in place if you really want to generate business value using machine learning.

Introducing Machine Learning to Teams & Management

Luu: Let's talk a little bit more about the human side. I want to convince my team or the management to start adopting ML and stuff and doing applications. How should I go about doing that? What kind of risks should I be aware of? What other teams in the organization should I talk to in the journey of trying to introduce machine learning to the company itself?

Smith: I think that's a really big topic. I'll start with one that I think is pretty clear for almost everyone in the room; if you're a software engineer inside a business or something like that, you probably need to have some concept of how is your investment going to make it out to production use case with live users. That's going to be one of the biggest topics and really has been with the adoption of ML in general. I think people are afraid that someone's just going to be sitting in the corner, spinning up a bunch of server bills, running Python and using up a bunch of GPUs and never delivering value.

Having an understanding of what is the infrastructure you're going to use to take any sort of successful approach from research to production, I think is super important. That's true whatever toolchain you use, whatever cloud you want to be on. That's a story that's gotten a lot better within the past few years, specifically within this field. That's one of the things that I remember from the early part of my career; all I focused on was ML infrastructure. We just need all of these bits and pieces that we have to put together to build a model and take it out to production. These days, for the vast majority of use cases, you don't necessarily need to reinvent the wheel for a lot of the things you need to do. Being able to build that picture, I think, can be a big part of convincing a larger organization, particularly if you're thinking about when this is deployed, this is going to cost something to run a team that supports this live and production that has these servers. Being able to fill out that picture is a great way of establishing that there is a viable path from innovative ML techniques to user value.

Miro: I think one way to look at machine learning is, by nature it is very cool and very fun, but at the end of the day, it is also just another new thing that can help improve your business, just like how computers did when computers started to become mainstream in offices. It's fundamentally just finding out how this can actually benefit you and bringing it down to the numbers, "Right now we have these systems in place and these people are doing this and it costs this much. If we have these automated systems, then it'll cost us this much."

There are definitely risks involved. Given that machine learning, fundamentally, are just statistical methods, you do need to weigh the costs of what happens if you have an unexpected result. Is that extremely detrimental to your business? Is that just a blip that you can just ignore? I think with something like machine learning, there are definitely risks associated with it that you absolutely should take into account before going into this. Again, that's going to be completely situational.

Failed Machine Learning Projects

Luu: In your experience, have you encountered or seen examples where machine learning projects failed?

Smith: My answer is, too many, way too many. If I want to extract some commonalities from the situations in which I've seen ML fail, a lack of organizational alignment's a big one. When we say that we want to build a product that, at its heart, a huge portion of its value is derived from machine learning, but then we start to impose business expectations on it that machine learning simply can't do. There are limitations to the techniques. ML continues to amaze me every time I open up hacker news and someone's achieved something new. It has specific properties as a technology, the same as databases do, as web servers do, and there are organizations where I've seen projects really fail and people be unhappy about that failure and not really learn from it, involve an over-commitment in spending a bunch of money to do something that couldn't possibly conform to all of the expectations of a given business stakeholder, really wishing for a sort of magical technology, not really mapping to the constraints of what's possible with true real-world machine learning.

Miro: Honestly, most of the examples that I have fall into very a similar field. I've also seen other examples where the data just isn't necessarily representative of the actual problem. To give you an example, this is not a project that I specifically worked on, but to give you an idea of what might happen is, let's say you're building a self-driving car and you only train the car when it’s good weather. You don't want to be bothered driving the car in the rain because that's no fun. Then what happens when you want to actually use the self-driving car in the rain? It has no data to actually reflect on, so it won't necessarily be able to perform very well. It's those sorts of things that when you're actually building a data set to start training these models that you need to make sure that you account for all specific use cases. That can be hard, that that does require a lot of human intuition to be able to figure that out. I've definitely seen some projects falter and not necessarily be as effective because it just wasn't representative of all of the use cases that it was trying to account for.

Resources for Learning about Machine Learning

Participant 1: Let me posit a slightly different scenario. Suppose I'm in a company that, for a good or bad reason, has no interest in machine learning, yet I realize that for my career development, this is something that I need to know. This is sort of a two-fold question. Since I have no business case to constrain my learning, how would you suggest I start? One of the things I know from just development in general, is what I don't know I don't know, is what's really going to kill me, and what falls into that category?

Smith: I think there are a lot of educational resources that start with a very generalist focus. I think that's not an uncommon scenario to say, "I have broad interests and I will specialize later." In particular, I think I would echo a point that Brad [Miro] made earlier, which is there are resources that specifically introduce you to the concept of machine learning, and that deep learning, in particular, has proven to be a very successful variant of machine learning, but you can't skip core concepts and not shoot yourself in the foot later. Concretely, the Andrew Ng classic course from Coursera is still one of the best foundations out there in terms of introducing the must-know concepts that allow you to go deeper, even if your specific interest is, "Well, I want to get up and running with deep learning models."

Participant 1: See, that's my point. I have no deep interest in doing anything except that I think it's good for where I should go.

Miro: Something that I'm really interested in doing is just increasing knowledge about artificial intelligence and machine learning generally. The same Andrew Ng that Jeff [Smith] was mentioning, he has a famous quote. He says, "AI is the new electricity," and in his mind, he believes that AI will be as impactful to your life as electricity is. In that sense, even if you're not necessarily interested in business for business reasons, I think it's just good to stay in the know about how this stuff is just changing society in general, because it's only going to keep increasing. So the best way to do that would be following something like a TechCrunch or the InfoQ newsletter, those sorts of things, to just stay up-to-date on what's happening in the world. Then even if you could do a search for just "machine learning in my industry". You might be surprised. There might be some very edge corner cases that you may never have even thought of for actually impacting your industry.

Luu: I can share one specific resource that may be useful, it's on deep learning and AI. There's a new course there which starts at the beginning of the year. It's called "AI for Everyone" It gives you a broad strip of AI at a very high level. I think that might be useful if you are in that kind of a situation.

Developing a Well-Developed System & Monitoring It

Participant 2: This is sort of related to that question about failure. You develop a system that is tested, it's giving good results. What's the state of monitoring for that? What's the current situation or processes that you follow to say, "Ok, now we're depending on this data to be right." How do we know that it's continuing to be right?

Miro: A lot of times when you're building a system, these things do continue to grow and you do need to continuously be retraining them just to make sure that they are up-to-date. Testing with machine learning can definitely be a little tricky because it's not as deterministic as it might be for more traditional systems. Make sure you’re doing tests and running, passing in a sample data, making sure that the output is correct, and then constantly retraining it and making sure that the real-world data that it's working on is being represented appropriately as environments change.

Smith: I'll add to that a little bit. A footnote that I have a lot of thoughts on this one and I put them in a book, it's called "Machine Learning Systems," which is unusually focused on production concerns like failure and monitoring. Some of the things I would call out that are uniquely different that I think Brad was referring to; he didn't use the term, but in passing a concept drift when, in some respect, that the machine learning concept that your model has been built upon no longer conforms in the out of sample set, the real world, as opposed to what your machine learning system is trained on.

You're absolutely right that monitoring has to be a key piece of that. It's pretty integrated to how you serve your models, how easy that's going to be. What is the infrastructure you sit on top of? That's gotten a lot better, but it's very toolchain specific at this point. One project I'm going to talk about later today is an example of something that comes out of our practice inside Facebook AI and our adaptive experimentation framework where we get into how can we understand the real-world behavior of a deployed machine learning model and how much does it match up with our simulations of what should be going on, and when should we detect some form of failure. It's a super deep and complicated topic and we're going to continue to see a lot of new approaches and platforms launched within that.

Luu: I'm going to add a little more in on that. I think machine learning model is kind of unique compared to your standard of microservices and whatnot. The performance in your model can degrade over time if they're not being retrained with fresh data. That's one thing I learned that's quite unique.

Identifying Business Problems to Apply Machine Learning to

Participant 3: My question is - and maybe this may be just me - even after taking preliminary courses and things like that, I feel like I have a problem with getting a good intuition among the set of my company's business problems; which ones can I actually apply machine learning to? I struggle to find out which ones could I actually do. I wonder if you have any thoughts on that?

Smith: One good heuristic I would usually use is how much data do we have. Are we in a data poor application? A further slice of that is actually how many labels do we have? Because this is where we get into supervised learning. Are we talking about labeled or unlabeled, versus data? Some business processes naturally produce labels. For example, fraud detection does that. If we detected fraud, with any sort of due diligence, we marked it as fraud, we have natural labels.

Some things don't. You can imagine having a whole bunch of unlabeled text documents for various purposes or images or something like that. If you don't have labels in a given case, you can do some really easy back of the napkin math around what that might cost you to do. There are great tools for getting labels for new training sets, but if you estimate based on, "This paper, I'm probably going to need something like a million labeled examples. I look and, yes, maybe we have a million examples, none of them are labeled. Can I afford to do that within realistic business constraints?" Those would be my first starting points for almost any business problem or idea.

Miro: Another way to look at this is, I'm going to do a plug here for Google cloud platform for instance. We have a lot of just prebuilt models and very easy-to-use products for just getting started with machine learning, and some of them are as simple as doing computer vision just with a simple API call. You send it a picture and it'll tell you what's in it. Potentially looking at what something like GCP is using or other companies out there, the actual business use cases that they're looking to solve, you might see some of these and be like, "This could potentially be interesting to me." Even if you don't necessarily go and use the products, it might just give you an idea for some other products that some of these organizations are trying to solve.

Luu: Does your company have a data scientist team?

Participant 3: No, not yet.

Luu: Because I would suggest that you can go and talk to them, that you would ask what are they thinking, and then maybe pick their brain.

Participant 3: Yes. My company is healthcare-based and there's just tons of information but it's labeled, unlabeled.

Smith: One small detail though, which I'm sure you're aware of but maybe the audience isn't, is that the usage of data within a healthcare situation is obviously highly restricted and can be highly regulated. The techniques we have for understanding how machine learning models make decisions on data, and thus, in your case, on healthcare data, is a really complicated and difficult topic requiring a lot of technical depth to the work of how you would actually deploy something like that, I would presume. That might be a very specific concern in your organization around building up the skill set of people, even establishing the principles by which you would use data within a healthcare situation and the combination of automated reasoning. From talking with folks who've been able to successfully do things like that, it's a super challenging intersection of business and technical problem.

Luu: Privacy, security concerns, and all that good stuff.

Miro: Don't let that discourage you. There's a lot of really cool work going on there.

Bringing the Agile World into Machine Learning

Participant 4: We're experimenting with machine learning at our company. One of the things we want to avoid is getting into places where you research for five, six months, build a model and so on. How can we bring the Agile world into machine learning and start small and continuously improve, and not get into projects that who knows what will end up being?

Smith: I think this overlaps with some of the things we've footnoted a bit before. I'll bring them up again. Pre-trained models are a great place to start. One way you might access a pre-trained model is from a service, from a cloud provider like Google or Amazon. Another way that you might use a pre-trained model in a different manner is to actually plug it into your application as part of its code in some way. We have support for doing this within PyTorch, ways to load existing models.

Some of these models are good examples of what do we know for sure that we can do pretty well, whether that's around natural language understanding or object detection, something like that. You can work pretty fast from something like that. We're talking about one-liners to load a preexisting model. Then basically you're saying, "Here's my data, apply it to the model and see what sort of results I get out." It's not the case that you have to be thinking in terms of, “We need to be sending three, six months of training models people before we have the slightest clue, does it apply to our data?” If you have the ability to start with something like a pre-trained model, whether it comes in the form of a service or just as an artifact in a line of code, I would encourage you to start there.

Luu: Have your team or company considered a crawl, walk, run, kind of approach to adopting or introducing machine learning?

Participant 4: Those are exactly the kind of things I'm interested in. We applied machine learning for fraud detection in online media, and those are not exactly off-the-shelf models, especially for our specialized data. What we're looking for is how can we bring more systems that help us create models and get them in production and see value before we invest months. In fraud specifically, and in our field specifically, in three to six months your model might be relevant by then. How do you make this process of research and deployment and so on Agile?

How Do You Know You Have Enough Data for ML?

Participant 5: I actually have a data science team, which is kind of neat. We have to solve problems where we often don't have much data. We may have one decision and we need to help automate that class of decision across a large body of problems. I know a lot of these examples when we talk about models being trained on millions of decisions. How do I deal with the fact that I often only have a few? Are there techniques that I can use to know for example that, "Maybe this decision isn't something that we should even put through the algorithm. We're going to have to handle with humans for now." How do I know how much data is enough to really get good answers?

Luu: Can you share more about what's the industry?

Participant 5: We work in a tax compliance, and so what we'll often do is we'll have to figure out, "How would you tax this object?" We may not have a ton of details about the object, and the tax decision may be very specific.

Miro: In my experience with a lot of these things, you just have to sometimes try. I imagine there's a degree of text data that's involved in your decision, and text specifically can sometimes be very difficult to work. I personally find that text is of the harder sub-disciplines to really apply to machine learning. A lot of it is just trying and just seeing can you actually pinpoint it? Maybe looking out and seeing if there are other organizations who have solved, maybe not the same problem, but similar problems, and how they were able to make those fine-grained decisions. That might be one way to do it.

I think there are also a lot of misconceptions out there on how machine learning is being used in a lot of organizations. In some cases, it is completely taking over the role of what humans are doing, but in a lot of cases, it's also not. A lot there it's done in tandem with the humans or for what's called a human in the loop, where the machine learning might get you 80% of the way there, but you still do need a human, at least in today's world, to do those last few steps.

Participant 5: Yes. We call that machine curation. Are there tools out there that can help with that? We've had to hand roll quite a lot, which is great, but it feels like this is something a lot of people are doing. I know that Stitch Fix, for example, has hundreds of people working in this, and I just am not aware of any tool that makes it easier and is open-source or well-managed startups.

Smith: Sorry, I'm just trying to understand the question. You're talking about human in the loop doing the top supervision?

Participant 5: Is there something to make human in the loop?

Smith: Are we saying labeling an instance or are we saying something like overriding a business decision, like for fraud or something like that?

Participant 5: It's more like I just want to quickly be able to see what the decisions of the model were and quickly decide whether they are good enough to move forward and that sort of thing. Something I need to put in the hands of an expert, not an engineer.

Smith: The point I was trying to get at is, you're touching on a space where there are two possible scenarios, and if I heard you correctly, we're talking about a domain expert who needs to make a human expertise decision about some set of data. Briefly, I'll point to the other large problem of having humans in the loop, which is having any sort of human annotation of it does not require domain expertise. This is like, "This is a cat, this bounding box is a stop sign," things like that.

There are a lot of startups chasing that and there’s a lot of open source tooling as well for being able to do that. When you're talking about managing humans in the loop. I do know the problem that you're talking about. I'm not sure that there's a generic one-size-fits-all answer of actually having domain experts in the loop rather than someone on the expertise of, say, a mechanical Turgor. I've seen organizations do it and I can talk about some of the tips and tricks of how you make that happen, but I've never seen it done without some amount of architecture-specific integration, where you are the team which builds the technology, which controls that decision point. You need to be building that step in the business flow, and then producing the appropriate dashboard for the human to review, because it sounds very domain-specific.

Toolchain Selection and Polyglot Environments

Participant 6: This question's related to toolchain selection. We tolerate a polyglot environment for our software engineers, all kinds of languages, run time environments, architectures and frameworks. Are there any gotchas or pitfalls with allowing that in terms of ML toolchain selection, and even all the infrastructure instances that we can then tie to that?

Smith: Yes, I think there are. It's my personal opinion. I would say that compared to where machine learning was before the age of deep learning, there's substantially less heterogeneity in people's tech stacks, particularly within the field of deep learning. Deep learning is really enabled by hardware acceleration like GPUs and TPUs from Google, and there are just only so many ways to do that. I think you're going to find that your toolchain is substantially narrower than for something like web application engineering, and that in particular, the more that you say, "Just use whatever random client-side language you want and then bind to whatever within any powerful deep learning framework," you're not necessarily going to have access to the full range of capabilities being provided by those technologies. I'm speaking generically here. I can talk a little bit more specifically about what it's like with PyTorch later, but I think this is just a generic property of deep learning that has emerged.

Miro: I pretty much more or less agree. The advice I always tell people is find something you like that you know works and just stick with it. Specifically, TensorFlow, PyTorch, those generally tend to be two of the more popular ones, and they're honestly both good. Pick one. They have a lot of overlap of what they can do so just pick one, learn it, and stick with it, it’s generally a safe bet.

Classifying in Isolation

Participant 7: I don't know much about machine learning, I'm still trying to get started, but what I know of classification is, you're trying to classify something in isolation, just itself. Is this label based on a model and some data set you have? Can you point me towards some examples or some research I could look at? If it's just in isolation one thing, you maybe label it one way, but if over a time, some kind of time window, you're seeing this over and over again, it becomes labeled something else? What is that considered? Also, positionally; if you have this network of nodes, where these events are happening in the nodes, changes labels. The field is security, if you're trying to find a hacker who’s trying to hack a system. Certain things are benign, like someone actually doing a login, but based on the timeframe and some kind of sequence.

Miro: I think that that piggybacks nicely off of some of the conversations we were having earlier where a lot of the times, you do just need to keep retraining it and keeping up-to-date with what's happening in the world. Jeff [Smith] mentioned using the term of concept drift, so just making sure that your models do continue to be accurate.

Smith: Yes. I think, depending on what you're talking about, you may be talking about modeling retraining. You might not be though. If we imagine that we have some human-engineered features and they provide some sort of signal on what's going on within real-world concept and you're saying, if I heard you correctly, at T0 only one of these features is firing. At T1 more features are firing. At T2, even more features are firing. Those are multiple classification, so inference operations. You could even call these the same model, and different features extracted over time, is a common approach there. I think you also mentioned some stuff around graphs. Working with graphs is definitely a big sub-problem within machine learning. A lot of specific libraries you could look into. My personal favorite, no surprise, can be PyTorch BigGraph.

Fighting Fraud

Luu: What about a very interesting challenge and use case of fighting fraud? As you know, the techniques that are used are changing all the time. How do you keep up your models to keep up with the changes in terms of what they are doing? Talk about maybe some of the interesting developments in real-time training, for example. Any thoughts on that?

Smith: I'm not an expert on fraud. I wrote one book chapter in which I worked through an example to try and understand some of the techniques in use, and I would say that the challenges you're talking about there point me towards retraining; when you have a dynamic concept, it becomes an important capability to have. Depending on what you're doing, training may have more or less wall clock time for you. This gets into really the whole story of your toolchain in your infrastructure. Do you have the velocity within your process to simply have an automatic job, which is constantly retraining or retraining at a sufficient cadence, that gets it retrained in sufficient time to allow you to adapt to a rapidly moving concept? Some of this gets into cost, because if you want to say, "I just want to run a cluster of many dozens of GPU servers all the time," at some point, that maybe pushes against how much business value you're providing for something like that.

Miro: To get Meta here, you might even be able to use machine learning to predict when you should be retraining your machine learning models. That might be one potential use case.

Biggest Triggers in the Last Few Years & Future of the Industry

Participant 8: What changed in the last three to five years that machine learning became vital and essential for the big companies? The core concepts like neural networks have been around for much longer. What was the biggest trigger or driver in the last few years? Then where do you see the industry heading in the next five to seven years?

Miro: I think the short answer is the hardware, just between the access to hardware and the cloud, just with hardware continuing to grow. I think that's just been able to enable a lot of this. These neural networks are extremely computationally intensive. They just couldn't necessarily work on computers from the '90s and the early 2000s. I think we're seeing a lot of advancements there. That being said, in terms of the next 5 to 10 years, we are bumping into physical limits of Moore's law, Moore's law being every two years the number of transistors double. The idea is that there are physical limitations of that, once we get down to the quantum level.

There are a lot of talks about quantum computing being used as another medium to continue to accelerate these things. You're just seeing more research being done than ever before and actually improving these models. You'll see improved models that are more reliable, might actually rely on less data for being able to retrain. Right now deep learning is really intensive, but there are things such as one-shot learning and zero-shot learning, that allow you to train your models on less data. I think we're also going to see a lot there. We're also just going to see a lot of new data in general in the world just as we continue to turn to the Internet for more and more things. It's just going to be more data to train on.

Smith: I'm going to give an answer that overlaps a little bit there and really focus on the work of Yann LeCun, who's the founder of Facebook AI, he works with us here in New York. I think most of the information he's pointed at people and trying to understand where deep learning is today involves a couple of related breakthroughs in techniques like convolution, which he personally developed, and with Geoff Hinton and Yoshua Bengio, his Turing Award colleagues. Some of those techniques are substantially newer; convolution, LSTM, RNNs, are substantially newer than neural networks. Neural networks go back many decades but a lot of this work that constitutes what we call deep learning today is done by people who are still actively working today, just down the street sometimes.

A big part of what allowed that to become successful is that underlying GPU acceleration. Brad [Miro] was talking a little bit about the story of how CPUs progressed; GPUs are a massive leap forward, if you have the correct software techniques to be deployed on GPUs. In the transition from graphics processing units to a general-purpose GPUs, that got unlocked. Key parts of that involve software breakthroughs that have more recently been made, have been the development of things like the CUDA tech stack on Nvidia GPUs and similar technologies.

That also points towards what's coming next, at least at the hardware level. We know that we have to continue to make breakthroughs in performance efficiency. One major area that Google's invested in is in domain-specific architectures like the tensor processing unit, and I think that it's clear that that's going to be a very promising area. Something that we care a lot about within the PyTorch project is will we see better and better GPUs that get you to the next breakthrough, or do we need to continue to support things like domain-specific architectures like TPUs? We say both of those things right now.

Getting back to the actual computer science techniques that allow us to have the deep learning breakthrough, and again, stealing from Yann, a lot of Yann's work today focuses on self-supervised learning. We've talked a little bit about the business problems of getting someone to pay for human labels and pay for human labels at the scale of millions or billions, and some of those things just start to become infeasible. There aren't enough humans, there aren't enough hours in the day for some of the problems we want to solve. Self-supervised learning, where we're working in the absence of labels and having a deep learning networks learn how to supervise themselves, is a really promising area of research right now, and you can see a lot of interesting papers on that. I think some of the most interesting ones that we've released out of Facebook involved things like working on very large image data sets, where there are no labels available, or even within weekly supervised learning as a closely tech-related technique where you don't really have labels, but you have some sort of information which guides you in that direction.

Luu: There's a lot of research in Wii reinforcement learning as well, I believe.

Team Structuring around Machine Learning

Participant 9: I'm curious, just stepping back about team dynamics and how this really works at a practical level. I have a team in which none of us know anything about ML, but we have a lot of good use cases. We can learn, we can hire, but what do you see on teams as far as how the ML specialist interacts with the application designers and so forth? What's the breakdown there?

Smith: You call out that you need different skills on a team to be able to solve a problem like that. Absolutely. You need to have some idea of what are the decisions and responsibilities you're trying to bring someone in to take on. I would say that, speaking fairly generically, some of the unsuccessful teams I've seen in this respect hire as if they're pursuing major research goals and are trying to develop novel techniques where their business context is, apply the existing techniques to the given business problem, and then you get a massive impedance mismatch. Maybe that's at a personal level. Someone is, in fact, a researcher interested in novel algorithmics and then they're asked to say, "Could I use this tool on this problem?" You get a shrug back and, "Sure. Probably," which isn't really the useful business work.

There are starting to be, I would say, structures within machine learning career paths that you're going to see. You want to make sure that you, as you build a team, find people who are actually interested in the application of ML techniques to business problems. The good news is, thanks to a lot of educational resources and growth within even the higher educational system, there are more and more people who have the right knowledge and an interest in real-world business problems. As long as you're setting that fit, I think you'll have a much better intuition of how could this individual person and another person help your team, rather than just saying, "Well, I guess we need to hire someone to write papers in the corner and never deliver any value."

Participant 9: Just to clarify, though, and again, I know it may be hard to answer at a general level, but thinking of it from an Agile point of view, "Ok, what kind of stories go to the ML engineer and in the broader scheme of developing this application?" How does that break down? How can you break down that work?

Miro: I think one way to look at it is just how you would look at any other engineering project where the machine learning, retraining the model or doing the machine learning specific things would - not to be profound - but that would go to the machine learning engineers; then the infrastructure, depending on how you set it up, might go to the ML engineers. It might go to the data engineers or the software engineers. Especially on the engineering side, there can sometimes be a lot of overlap in the responsibilities of the positions, as well as a lot of ambiguity into what they should be doing in the first place.

See more presentations with transcripts

Recorded at:

Aug 28, 2019

InfoQ Software Architects' Newsletter