BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Panel: Predictive Architectures in Practice

Panel: Predictive Architectures in Practice

Bookmarks
41:25

Summary

The panelists discuss the unique challenges of building and running data architectures for predictions, recommendations and machine learning.

Bio

Sumit Rangwala is a Senior Staff Software Engineer, Artificial Intelligence, at LinkedIn. Josh Wills is Software Engineer working on Search and Learning @SlackHQ. Eric Chen leads the offline model processing pipelines and model online and offline serving accuracy of Michelangelo ML Platform at Uber. Emily Samuels is a Staff Engineer at Spotify. Anil Muppalla is a Data Engineer at Spotify.

About the conference

QCon.ai is a practical AI and machine learning conference bringing together software teams working on all aspects of AI and machine learning.

Transcript

Moderator: Thank you all for being here at the tail end of the "Predictive Architectures in The Real World" track. I hope you learned a lot of cool things and enjoyed yourself in general. My brain is bursting from all the things I've heard, I'm sure for you too. I hope you have some questions for the people who taught us all those amazing things today.

I'm going to kick it off with my own question because that's track host privilege. I'll let the panel answer and then I'll turn it to you, you can think of your questions, scribble them, and then if you run out of questions, I have a bunch of my own, so don't feel under the gun that you have to ask something, or there will be a very boring session, although my questions may be more boring than yours, who knows?

Just to start by introducing our experts even though you've probably seen them earlier. We have Eric Chen, who is software engineering manager from Uber. He talked about feature engineering and the Michelangelo palette, he can probably answer a bunch of questions about that. We have Sumit from LinkedIn, and he worked on People You May Know, and the transition from batch to streaming and special optimized databases that they have built. We have Emily and Anil from Spotify, and I totally missed their talk, but I heard it's awesome and all kinds of good things about real-time recommendations. Then we have Josh Wills, who knows everything about how to monitor your machinery pipeline, so it will work at 2 a.m. in the morning just as well as it does at 2 p.m. in the afternoon, which may be not at all.

Evaluating Success

My question for this esteemed panel of experts is, how do you know that what you did actually worked? How do you evaluate your success? If your manager asks you, "Well, did you do a good job?" How do you know?

Chen: Short answer, I don't know, some people tell me. Longer answer, the customer usually comes back to us, they will tell us their stories. For example, the ease, the maps, the whole examples we're showing here is actually coming from our interaction with our teams. They are data scientists, they have their own KPIs, they know by productionizing this particular model how much money they saved or how much efficiency they improved. That's a way we know, "Ok, we're actually making some impact here."

Rangwala: Yes, for us it's always the metrics, we always measure our success based on the metrics. I happen to be in a machine learning team, so at the end of the day, for us, that's our true north metrics. If we don't move it, then most of the time the meeting succeeds.

Moderator: What's your most important metric?

Rangwala: Our most important metric is what we call, "How many members invited another member and they were accepted?" Invites and accepts are basically one of the metrics. We also have another metric which at LinkedIn, pretty much all the metrics are tracked in real time. At any time, we can go and see how we are performing on any of the metrics, I think the report card is out there.

Samuels: I would just also like to add that, another thing that I think about is how easily it is to experiment on our platform. We have metrics to determine how people are listening to home and whether they're happy with our recommendations, but we also always want to be trying out new things and always be improving. One of the ways that we know that our architecture is good is if we're able to easily experiment.

Wills: I'll echo that and I guess I'll throw in reliability as being a major concern for me, it's like speed times reliability. I studied operations research in college, it's either I'm trying to optimize different models per week subject to give different reliability uptime constraints, or vice versa, or I'm trying to optimize a product of the two with a sort of a logarithmic measure of reliability, basically. Times iteration speed is more or less what I'm after, speed subject reliability, that's my jam.

Limitations of Tooling

Participant 1: In the areas that you work on, what are the areas that you feel that you're limited in and you wish you had more power, or you had more tools to build up better [inaudible 00:05:36]?

Moderator: Do you want to repeat the question because of the lack of mic?

Wills: Sure. The question is, in what areas do we feel limited by our tooling, I think is the core of the question. What tooling do we wish we have? I feel somewhat limited by the monitoring. I guess my talk was primarily about monitoring and alerts, counters and logs, and stuff like that. I do feel somewhat constrained, I think, by my monitoring tools in the sense that they weren't really built with machine learning in mind. One of the ways we have screwed up Prometheus in the past is by inadvertently passing basically Infinity to one of our counters, incriminating by Infinity, and Prometheus just doesn't handle Infinity very well. I don't blame it, I can't handle Infinity that well either.

It's the kind of thing that's an occupational hazard in fitting models, is like singularities and dividing by zero. You have to kind of adapt the tool to be aware of that use case, I fantasize about if there are any VCs here who want to give me money about machine learning, monitoring tools that are sort of designed for machine learning as a first class citizen, as opposed to performance and errors.

Samuels: I feel like a lot of times we're fighting against the data that we have to feed into the models, and if you put garbage in you get garbage out, so a lot of times, we have to do a lot of work upfront to make sure that the data that we're putting in is of good quality. That's something that can be limiting.

Rangwala: For us, I can think of two things, one of the things is this impedance mismatch between when we are working in the offline world. Whenever you have to train a model, you have to work in the offline world, but then when you have to actually use that model, you have to come in the online world and serve it, and making sure that both the worlds are exactly same is actually a very big problem. The offline world is still fairly matured, but there are lots of gaps on the online world. Trying to be able to be able to do exactly what you're doing offline, online, is extremely hard at this point. Then, it also involves moving data from offline to online, Uber talked about how to move features, there is also a problem of how do you move the models? How do you make this whole system seamless so that the training happens automatically, and then it gets picked up automatically by the online system, while not affecting the performance?

The second thing is - maybe it's because of the nature of the company that I work at - we have solved many of the problems, and now the only big problems that we deal with is the size of the data. Again, and again, we are limited by what we want to do, any algorithms that can work at a smaller scale, now they don't work at a larger scale, and we have to figure out new ways of doing things.

Chen: First of all, one thing we're trying to emphasize is what I've experience. From the time you have some experimental idea, you want to try it out till the time you can deploy that in the real production. These things actually related need a lot of tools on your hand to analyze your model, so on two fronts, one is, how can you as a user of the system interactively define all your needs? The other one is, analytics and evals. How can I provide a bunch of evals, which can help you to understand online offline consistency? How you can understand your model behaving in the real situation while in a simulation system? Then, eventually how your model can be pushed into production, and also monitoring on top of it. This is one big thing. Definitely, the other thing we're fighting for as well, the reliability, the size of the data keeps growing. How can we build a system which can magically grow with the size of your data? That's also what we're imagining as well.

A/B Testing Systems

Participant 2: I have a question for the entire panel. I'm interested in learning what is the A/B testing system you use, and if it's proprietary or open-source or a combination? I really like the entire A/B testing work. From bucketizing user, tracking their results, visualizing the experiment results.

Moderator: Ok, what tools do you use for A/B testing?

Chen: Short answer, it's internal tool we built ourselves, it's not part of Michelangelo because A/B testing is everywhere in the world. Longer story, even A/B testing itself requires different tools behind the scenes. Some of them, as you talked about, are bucketizing, some of them, I have a default, I want to turn on a little bit, and then turn it off the way I'm controlling it, probably by time. It's also related to how can I organize multiple experiments, because I want to isolate the impact of them. These things can happen, some of them are actually operational, say, because, we're operating in different cities. That's why our A/B testing is internal, it's very complicated.

Rangwala: I think it is fairly well-known, LinkedIn has a fairly sophisticated A/B testing system which we extensively use, and every time we are ramping any model, any feature, it goes through the A/B test. There is a whole bunch of pipelines which are built, which come for free for anyone who is wrapping their model, which will actually calculate all the metrics, all the true metrics that we care about, product metrics and at the company level. We come to know if our experiment is causing any problem anywhere.

It's a very big team, and it's almost a state of the art at this point. One of the things it still has missing is, just like we do for a software system, where whenever you are doing a code commit, there is an automatic pipeline where the code gets deployed to the next stage and if there are no issues, it automatically moves on to the next stage. I would like to have the same system for models where the A/B testing system is the baseline, which is the information that you use to figure out when to move anything automatically to the next stage.

Muppalla: At Spotify we have a complex way of doing A/B testing, we have a team that's built a platform to generalize how you divide and bucketize users. For home, we have our own way of testing and using this infrastructure that we have and also, we have a couple of data scientists who see the exposures for each treatment that we expose the users to and come up with, "How is this affecting our core metrics that we care about for the homepage?"

Wills: Yes, so echoing what other folks said, at Slack we built our own. I was not actually involved in building it, we had some ex-Facebook people who built a Facebook-centric experiment framework. I had my own biases here because I wrote the Java version of Google's experiment framework, which is unified across C++, Go, Java, different binaries; all have the same experiment configuration and diversion criteria, bucketing, logging, all that kind of good stuff. It was interesting to me to learn how the Facebook folks did it, it was just educational, like the pros and cons. With these sorts of systems, they almost always end up being homegrown outside of, if the marketing team is using Optimizely or one of these kinds of systems for the machine learning use cases, and I think that is another factor, the fact that, A, these things are super fun to write, and so, it's an awesome thing to do. Why would I pay someone to do something that I enjoy doing? I'm just going do it myself.

Two, they're almost always built on top of a bunch of other assumptions you've made. The same thing is true for monitoring stuff, I'm not generally bullish on machine learning as a service, because I want my machine learning to be monitored exactly the same way the rest of my application is monitored, whether I'm using ELK and Grafana, StatsD, Sumo, Datadog, whatever it is I've chosen, I want to use the exact same thing that I use for everything else.

One of my lessons in my old age is that there's no such thing as a Greenfield opportunity. You're always hemmed in by the constraints of other choices other people have made, and an experiment framework is not generally built in the first week of a company's life. It's built after years and years, and so you build it in the context of everything else you've built. I love the Uber talk by the way, and I totally get the palette work on top of Flink and on top of Hive, and on top of all these other systems that are built and trusted and used for lots of different things; as opposed to building something de novo on top of Spark, would be insane.

Accountability and Protecting Your Systems

Participant 3: I wanted to ask you a question. Sometimes recommendations can have real consequences, self-driving cars, you can imagine. Our company makes recommendations to farmers with digital agriculture, somebody follows a recommendation and they don't get what they want, you could be open to be sued or something. What keeps you up at night? What ways do you audit or maintain tracks of your system so that you can replay or you could explain to someone, our system actually worked correctly?

Moderator: “What keeps you up at night?” is a good question on its own right. Then, how do you audit and protect yourself against lawsuits because the recommendations went wrong or went right?

Wills: I know what keeps me up at night, I don't have a good answer for the auditing stuff.

Moderator: You have a three-year-old son.

Wills: Definitely my three-year-old son keeps me up, I actually haven't slept in about four days. The thing that keeps Slack up at night more than anything else, is serving messages to someone that they should not be able to see. There's a private channel somewhere, there's a DM conversation, something like that, and if you somehow did a search, and the search resulted in you getting back messages that you should not be able to see from a privacy perspective, that is existential threat level terror to the company. It 100% keeps me up at night, and is the thing I obsess over.

We have written triple redundancy checks to prevent it from happening. We are paranoid about it to the point that we're willing to sacrifice 100 milliseconds of latency to ensure that it never happens. That is the thing that honestly keeps me up at night more than anything else, more than any recommendation ranking relevance explanation, that's my personal terror. I'm sure everyone has their own, that's mine. I'll give some thought to the other question though, that was a good one.

Muppalla: The thing that I am scared or worried about, or my team constantly thinks about, is a user not seeing anything on the homepage. That's a bit scary for us. We also have a lot of checks, and with respect to checking whether when you make a recommendation, it actually works, we have different users that we are trying to target and different use cases that they belong to. We have this nice way of creating a user that fits that profile kind of thing and seeing what happens, and whether it fits that user's recommendation.

Samuels: I'll just add that I think at Spotify in general, one of the things is if you hit play, and you don't hear anything, that's scary, too. Making sure that if nothing else works, at least you can play a song.

Rangwala: I think for LinkedIn there are existential threats if the websites goes down, but we have a lot of systems in place to take care of it. I would rather answer the question about adding explainability, many times people see something which they sometimes find offensive. We have systems which actually go and find anything which can be low quality, or which could be offensive, and we remove it from the feed for everyone. That being aside, for every machine learning product, there is always this question of, if we recommend something to the user which the user does not find ok, or it leads to a huge amount of traffic to the user, what do we do? The general thing that we use at LinkedIn is, we do a lot of tracking of the data. Everything which has been shown to the user has been tracked with a finer detail of information about what was the reason, what was the score, what was the features that we used in order to recommend this? So that whenever a problem is reported to us, we can go back, replay what happened, and come up with an explanation of why that recommendation was generated.

Chen: To answer the question, I think it's a really a layered question. Michelangelo is really the influencing layer, in some sense, we don't know where you're serving, so we don't answer the question directly. The way we are solving the problem is, we're providing tools for our customers so they can understand their particular business domain, then they can explain to their customers, so what tools are we building? For example, when these things are behaving not really exactly the way as the training time, it could be several reasons, one is, your data is not behaving exactly the same. To our world, it's more or less like data evaluation tools. Are your data drifting? Are you sending a lot of noise to us, because we'll then fall back to something. Those things are built into our system. We service those things to our customers.

The other one is to understand your model itself. Models usually end up with, this is a linear regression model, or this is the decision tree model, or more popularly, this is a deep learning model. In some sense, we don't like deep learning modeling in a sense, because it's very hard to interpret what's going on. For example, we build tools to help you to understand how a decision tree is working, so at what point which particular decision making is made, so the decision tree is behaving this way. We have visualization tools to help you to understand those.

Then, back to your question, I think it really becomes, is this a single instance or this is a trend? If it's a single instance, then you probably need those model evaluation tools to understand what exactly is happening to your particular instance. If this is a trend, it's probably more about, "Do I see some special trends for recent data coming into the system?" I think different problems, we're solving differently.

Moderator: I have a question for you because Uber did have a bunch of lawsuits. Did you ever find yourself, or one of the data scientists, in a situation where you had to explain a machine learning model to a lawyer?

Chen: Not so far as a manager.

Moderator: Because I think it's coming for all of us.

Chen: I don't hear any particular request coming to the team yet.

Understanding Predictions & Dealing with Feedback Participant 4: Once you've acted on behalf of a machine learning model, how do you know that the predictions were correct?

 

Moderator: That was a feedback question. How do you know your predictions were correct? You made a prediction, someone took action, how do you know if it turned out right?

Chen: In Uber’s case, a lot of those are not so hard. Let's say I'm trying to make an ETA guess for your meal. When your meal arrived, I already know the answer, it's really a collection feedback loop.

Participant 4: What if you were prioritizing those people who have a longer ETA?

Chen: Sorry, your question is, what if I keep giving longer ETA?

Moderator: Yes, if you notice that right now all your ETAs are, kind of, you missed the mark and meals actually arrived too late. Have you built a feedback loop into your system?

Chen: There are two Ps, one is, what is the action you need to immediately take? That's monitoring. Let's say, I know my particular ETA is supposed to be in this range in this day, during this time slot. Now I see my ETA is having a different time pattern, it's a purely anomaly detection, it's actually unrelated to the model itself. It's basically, "Here is my trend, am I still following the trend?" Then, the second part is the modeling. When the result is really coming back, so that actually needs to take a longer time. I think the two problems need to be solved separately.

Moderator: Maybe someone else has different ways of dealing with feedback? If you predicted the two people are related but they keep telling you, "No, we never met each other"? Or people hate music you recommend, how do you handle those?

Rangwala: At LinkedIn, it's fairly similar to Uber, the metrics that matters are all measured, and we also have an anomaly detection system on top of the metrics. Individual teams can choose to take their metrics into or put those metrics into these anomaly detection system, and they will flag whenever something bad is happening to their system.

Samuels: For music recommendations, it can be a bit more subjective than just if you open up your phone, and you look, and you see, it can be hard to tell whether this is right, or this is wrong. You can know for your own self whether something makes sense, but it's hard to look at somebody else's data and see whether it makes sense for them. We do have a lot of metrics that we measure, like how often are people coming to home and are they consuming things from home and things like that. We look at those metrics to make sure that we're doing a good job, and if they're tanking then we know we have to change something.

Wills: I would just throw in that search ranking is much the same. I generally know within seconds whether we've done a good job of finding what you were looking for or not. Either you click on something or honestly, at Slack, sharing a message you find back into a channel is the single most strongly positive thing we did a good job of finding what you're looking for. Google on the ad system, it's the same kind of thing; you get very fast feedback. I have not personally worked at Netflix, but I have friends who have, and my understanding there is that, ultimately, all they care about is churn. Generally speaking, they need on the order of 30 days to get a full understanding of the impact of an experiment, since they're waiting to see how many people in treatment A versus treatment B churn.

From their perspective, they have things that correlate with churn that they can detect early, that they use as a kill switch mechanism, but their iteration time on experiments has to be long because their ground truth is long. I imagine folks who are doing fraud stuff for credit cards and stuff like Stripe, again, have very long lead times with that similar kill switch things early, but let the thing run to really fully understand the impact of it. I think that sounds unpleasant to me and I would not like to work in that kind of environment. I mean, it's nice, fast feedback is good.

Best Practices

Participant 5: A question for the whole panel. You build successful ML systems and this is a pretty new system in all this space. The question is, what have you learned, what are the basic principles you can share with us? Like when it comes to data, the data must be immutable. What kind of products are similar to that, for ML-based system?

Moderator: One top best practice each?

Wills: What was hard for me in coming to Slack- I was at Google before, I may have mentioned that - I very much had it in my head that there was the Google way of doing things, and the Google way was the truth. It was the received wisdom, it was the one and only way. What was hard for me at Slack was learning that that was not the case. Getting to work with folks who've been at Facebook and Twitter and stuff and seeing other things that worked helped me understand the difference between good principles and just path-dependence coincidence. There's a lot of stuff at Google that's been built up around the decision that Kevin, the intern, made in the year 2000, to make it seem like it was the greatest decision ever, when it wasn't. It was just a random decision, and you could have done it another way and it would have been totally fine.

The only thing I take as absolute law and I imposed in my first week at Slack during my management days, was having evolvable log file formats. I got rid of the JSON and the data warehouse and moved everyone over to Thrift. That was, for me, the equivalent of the guy who on September 10th mandated that every airplane had to have a locking cockpit door. I have saved Slack hundreds of millions of dollars with that decision that no one can tell, it just seems invalid. From a machine learning perspective, being able to go back and replay your logs for all time using the same ETL code without having to worry about, "How the hell do I parse this? What does this field mean? Oh, my God, the type change." Not worry about that nonsense is just the only 100% always true thing everyone should do everywhere right now.

Muppalla: To add to what Josh mentioned, when you're running ML experiments, I think the quality of the data is something you have to take for granted to retrain and iterate on your models. That's something you have to make sure that’s good, some counters or some way of measuring the data you're training on is reliable, I think that's very important.

Samuels: Then, to add to that, the ability to experiment really fast. Having a good offline evaluation system has been really important for us to be able to try out a lot of different experiments, rather than having to run so many A/B tests that take time to set up and you have to wait a couple of weeks to get the results. The offline evaluation allows you to experiment much faster, that's been really important for us.

Rangwala: Ok, I'll take a different stand here. In a previous life, I was an infrastructure engineer, I have even written networking protocols in my previous life. One of my key takeaways when I started working with the machine learning folks, is that when you're building architecture, don't think like an infrastructure engineer. The key is, the metrics are all that matters, and you have to understand that machine learning algorithms by their very nature are statistical, so your infrastructure need not be non-statistical or deterministic. Sometimes, there are a lot of variations that you can try out if you understand the fact that your true north metric is statistical in nature.

I have done that a few times in my career where I have understood that part and exploited it to build systems which otherwise won't be able to be built, if I only thought about full consistency or absolute deterministic systems. I think in our talk, we talked about the fact that at a certain scale, you have to start thinking machine learning and infra in conjunction with each other. Without giving a specific example, I can think of a problem that can be solved in two different ways using machine learning, where one of them is much more amicable to infrastructure compared to another. It's actually very important to go as far as you can up the stack, where you figure out, "What things can be changed?" True north metrics are all that matters, beyond that, everything should be rethought whenever you hit a scaling problem.

Chen: To my point, I would say learn with your partners. Be simple and stupid in the beginning, I can say this. Three years back while building Michelangelo, actually, in the team, nobody even understood what do you mean by supervised learning. Why do you need a table where categorical features need a string indexing? We thought, "Oh decision tree. How can a decision tree handle categorical features? We don't know." Our data scientists know that, I think to make a system really useful, one, learn with your partner, understand their needs and the second, advocative after you understand that, so then you can educate more people. More or less, you eventually become a central hub, you absorb different knowledge from different users, and then you advocate the same to more users. Then, you make this practice more unified in the whole company. That makes your life easier, and makes your customers’ life easier as well.

Quantifying the Impact of Platforms

Participant 6: Some of you have built platforms for machine learning. I was going to ask, how do you quantify the impact of those platforms you've built on the productivity of your practitioners, your data scientists, or ML people, or whatever, and what part of that platform had the biggest impact on that matter?

Moderator: That's a good one. What part of your platform had the biggest impact, and how do you know?

Chen: Yes, how do you quantify it? I don't think this is really a machine learning problem, because I’m coming from data org. Originally, I think the whole data org idea, because we work very closely with the data scientists, where we're talking about data scientists to data engineer ratios, I think that still applies. Basically, as one single data engineer, how many different data scientists can you support? How many different use cases can you support?

Rangwala: I would just second that, most of the time it's either the metrics, or the productivity of the machine learning engineers that you can improve. At times, it's measured how long it took to actually build a model or iterate over a model. One of the metrics sometimes being used is the number of experiments or number of models that you can put into production and iterate over it over a given period of time. Those are the usual metrics to measure success.

Samuels: Yes, I don't have much else to add other than another thing that we were looking at, is how fast can you add features to the model? That was another big effort on our part to improve the way that our infrastructure was, so that when got that down to shorter amount of time that was how we knew we were doing a better job.

Muppalla: One more thing is, once you start getting these logs that are results of your experiment, how soon can you make them available for research and analysis, and are there tools that support this automatically? The smallest number of systems that an engineer can touch to get results, the better it is.

Wills: I hate panel questions where everyone agrees, I'm going to disagree vociferously. Everyone else is totally wrong. No, I'm kidding, everyone else is totally right. The thing I will chime in is, data scientists are great, but they don't necessarily write the most efficient code in the world, I think the dollars and cents impact of bringing in data engineering and infrastructure was to watch the AWS costs fall precipitously. We stopped letting them write some of the ridiculous things they were trying to do. We didn't really need quite so many whatever x32 758 terabyte instances to fit models anymore. That was a good thing.

Moderator: Yes, I like the idea of AWS costs, it's the ultimate metric we try to optimize.

Wills: It's very satisfying, it's more money.

Machine Learning with Different Amounts of Data

Moderator: We have five minutes, so I'm going to ask a last question for the panel. A lot of you went here and did presentations about how you do machine learning at this humongous scale. Is this inherent to the problem? Is machine learning always part of the machine learning at scale sentence structure? Or, if I'm a small company and I have a bit of data, can I still do machine learning on my tiny bits of data?

Wills: The answer is, yes, 100%, yes, those people are not at these conferences, though, because they're just trying to keep their company open. They're just trying to not to run out of money, if they were here, I'd be like, "What the hell are you doing here? Get back to work." All of us have the fortune that I don't think any of our companies would go out of business if we step away for the day.

I think it's a biased sample we're getting. I would say, honestly, my friends who do that who have small series A, C companies doing machine learning on small data, are far cleverer and work much harder than I do to do things on large data. Large data makes a lot of problems much easier, definitely not without challenges, not without hard stuff, but I think, generally speaking, it's honestly just way easier. Yes, it very much can be done, and I'm sure there'll be an incredibly valuable next wave of startups that’ll come out of that. Then in a couple of years, we'll be able to come to QCon and talk about it.

Moderator: Can you tell which ones so we can all go look for jobs while the options are still good?

Wills: I guess it's hard, personally I'm a big fan of Robin Healthcare, is doing some very cool stuff. If you go check out Robin, a little tiny series A startup out in Berkeley that is doing automatic transcription of doctors notes to enter stuff into the EMR for them. The doctor doesn't have to spend half their week entering information into the EMR. This is a very hard problem, they have a very small amount of data and they have to be much cleverer than I do to figure out how to do it.

Moderator: Oh yes, doctor handwriting is the worst, that's an amazing problem to solve.

Wills: Precisely, it's general A.I. level stuff. Yes, I'm trying to figure out how to profit from the fact that I just mentioned them right now. Nevertheless. Yes, I mean there's a lot of that stuff, I don't really know half of it.

Samuels: I don't have too much to add here, because I've just mostly been working on things at scale, but it seems like there's a lot of really interesting problems when you get to this scale, in dealing with all of the data, and wrangling it. I feel sometimes the hard problems are also the organizations that you have to deal with at your company, in terms of who you have to interface with, and who you have to work with, and what are their systems doing, and how do they interact with yours? How can you coordinate with other teams so that you're all doing the same thing, instead of reinventing things in different pockets of your company? At scale, I feel like those are the other types of problems you have to deal with, not just the technology, but the org and the people.

Moderator: The organizational scale.

Samuels: Yes.

Rangwala: I think it's both easy and difficult. Generally, what we have seen is that, when you have a lot of data, sometimes even simple models are able to perform really well. With small amount of data, probably you will have to use different machine learning techniques when compared to what techniques you use when you have large amounts of data? In a way, the problem shifts, when you have small amount of data, the machine learning techniques that you use in order to learn from those models are the challenging part. When you are at high scale, sometimes it is the data, or dealing with this large amount of data, that becomes a bigger challenge. However, you're also reaching the limit of diminishing returns, which means that once you have tried a certain model, even at high scale, you won't be able to discover new things that you would probably have to try new techniques like deep learning to discover.

Moderator: Can you even do feature engineering at small scale? Is that even a thing?

Chen: Sure, why not? To me, I think it's not about big or small, it's about the thinking style. It's about how you solve your problem. I think we're on a journey to changing people's mind, how to understand, how to solve their problem. I see machine learning really as a part of data-driven applications. We're here changing people's mind, brainwashing, that's how I see it.

 

See more presentations with transcripts

 

Recorded at:

Jun 27, 2019

BT