BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Machine Learning Engineering - A New Yet Not So New Paradigm

Machine Learning Engineering - A New Yet Not So New Paradigm

Bookmarks
30:13

Summary

Sravya Tirukkovalur discusses how ML engineering leverages skills from other engineering branches such as principles and tools, development and testing practices, and others.

Bio

Sravya Tirukkovalur is a Senior Machine Learning Engineer at Adobe working on the Sensei Content ML Framework team which powers ML capabilities across the Adobe product line enabling consistent and reusable building blocks with a high software engineering rigour across the board.

About the conference

QCon.ai is a practical AI and machine learning conference bringing together software teams working on all aspects of AI and machine learning.

Transcript

Tirukkovalur: I would like to say why I'm interested in talking about this first. This was some information I shared approximately a year ago when I just finished interviewing for ML roles in the industry and the experience of interviewing for ML roles was quite different from what I experienced as a software engineer, so I wanted to share what I've learned through the process and, clearly, many people were interested. There was no consistent definition of roles, the roles are very broad. To give you an example, in one of the interviews, I found myself explaining Central Limit Theorem, the next interview, I was designing a fault-tolerant Web service on whiteboard, so, it was quite a big spectrum.

Fast forward now, having been working in the industry in one of these roles, I wanted to look back to see why these roles are so broad, and dynamic, and shared. If you look, there are some interesting patterns which emerged, although the role is quite newly termed. Lots of people have been doing this, although it has changed quite significantly. I work on the Adobe Sensei team at Adobe, which is a centralized machine learning platform team, which works very closely with our various products as well as research to build and enable machine learning capabilities across our product line.

My talk is going to be mainly focused on the blue box that you see here, opportunities and challenges in ML engineering. Before that, I would like to provide some context on what are the ML problems that we're solving at Adobe and our focus areas. I would like to close by sharing some of my perspectives on where I see machine learning engineering going.

ML at Adobe

We have many products which leverage ML today, one of the use cases is, for example, content-aware fill. Adobe has a lot of products to cater to creative professionals. We would like to leverage machine learning to make creative professionals more effective and make more people creatives. Content-Aware Fill is one such example, as a creative, if you want to remove an object from a beautiful thing like this, for example, this boat, the process, which used to be very cumbersome to get to aesthetically pleasing results, is now very simple with Content-Aware Fill. You can just select the areas you want to be removed, and this feature allows for filling the mixing pixels with something which comes from a natural manifold, so, it looks very real and aesthetically pleasing.

We also use machine learning in visual search in Adobe Stock, you can use your existing image and find similar looking image. What does similar mean? It could mean many things based on the user intent. It could mean a similar color composition, or it could mean similar contents. This one example is for color, I can pick one color palette that I want, and I can search for all landscape images with similar color palette. Similarly, I can also search on composition, you can see landscape image with a certain kind of composition, like some pathways leading towards the horizon, you could search for similar images with similar composition.

There are many more features across creative product lines like auto cropping the videos, for example. The video you take might look really nice on a laptop, the moment you see it in the phone, it doesn't look good, so, how do you auto crop the images to look nice on beta surfaces automatically? How do you auto caption your images? There are various applications that we power using machine learning. I mostly focus on creative product line, but we also have many use cases in document use cases and marketing use cases. In terms of PDF, being able to understand PDFs at a very deep level and marketing use cases like user segmentation, recommendations, and so on. You can find really elaborated list at this Adobe Sensei landing page. There's a fun video too if you are interested.

Adobe is really interested in various modalities of learning, visual to language, structured data to sequential data like time series. We're interested in understanding as well as generation, we're also especially interested in vector art, so being able to generate and aid in the process of creating vector ad, which looks nothing like our natural manifold image. While we shoot these features with ML capabilities, there are some new challenges, and some actually not so new.

Typical ML Lifecycle

As most of you are well aware, this is how a typical, very simplified ML life cycle looks like. It starts with identifying the right problem to solve using ML, then you would collect the required data. None of this is serial, you would often always go back and reiterate, so you'd collect the data, potentially label them, clean them, prep, then you train and evaluate on a test set. We also often deploy the ML capabilities on the cloud to serve it to potentially multiple teams looking for the similar capability. In doing so, we would like to deploy these services at a much larger scale and in a robust way. In some cases, we're interested in deploying these capabilities on the devices for various reasons, one is privacy, we would like users to use these features without having them send their data to the cloud all the time and for connectivity reasons, the users are not connected to internet, and they still want to get these capabilities.

Also interactive purposes, when you're in a creative workflow, even some millisecond, like a millisecond latency would interrupt your workflow, where interactive workflows are very important, we tend to deploy devices. We will do this like in an isolated way, we deploy, and retrain, and so on. Coming to the role of ML engineer, especially coming from a platform perspective is leveraging and enabling ML capabilities in our products. This is a very dynamic role for majorly like two reasons, one is the changing landscape of opportunities and the challenges, changing landscape of challenges.

Opportunities

What is possible today is changing at breakneck speed. Need for label data is changing, so not all problems today require millions of data points. Many problems do work with just a hundred label samples. What is possible is really changing, and it's important to keep up-to-date with how much you can leverage these capabilities. As most of you know, language modeling has seen a revival last year with techniques, with transfer learning on multiple tasks, and so on. Model interpretation is very important for the industry, we would like to understand why a model performs the way it performs.

Generation, in terms of both visual and language, have seen great progress in recent times. Last but not the least, framework capabilities, if you see the commit history in terms of low in PyTorch, it's outstanding to see the kind of pace we're seeing in the frameworks. While we're going to see challenges and needs, I'm going to start focusing first on some of the challenges and needs which have parallel domains where we can borrow ideas from.

Needs – Robust Deployments on the Cloud

We deploy most of our ML capabilities on the cloud and we want these services to be easy to scale, today one product might have a thousand users, and tomorrow, there will be another product which is going to use this capability, and it has a million user base. We would like to be able to scale without entire new team building infrastructure for each of these services.

We would like to be able to easily manage, so, as we will be talking about hundreds of capabilities very fast being spin up, we would like to make sure we have the entire cycle set up while we have monitoring and alerting in place. The increased velocity of iteration, we want each individual team to execute and deliver their ML capabilities and integrated data products. At the same time, we also would like to increase the velocity of the organization as a whole where teams can leverage each other's work when appropriate.

Along with it comes standardization where each service has its own way of calling it, or is there some standardization so that people can really quickly use all services? And also, in the industry, context security is a very big need, we want to bake in the security best practices so that not everybody has to reinvent the wheel. If you look at all these requirements, most of the requirements are not very ML-specific. We can leverage the microservices best practices and adopt a DevOps mindset for machine learning where a single owner can go from identifying the problem to deployment, A/B testing, getting the feedback back, collecting the right data, retraining, and so on.

Based on many conversations with colleagues and from other teams to other companies, there's a lot of handoffs happening in ML engineering. Someone trains the models, somebody else collects the data, and someone else deploys, and now someone else does the A/B testing so, now how do you get this feedback loop? It takes many months before you can start learning from what you shipped.

Needs - Performance

The next one I would like to touch upon, unlike software engineering and unlike Big Data systems, there is a unique need in [inaudible 00:12:19] of machine learning in the sense that machine learning workloads are both data and compute intensive, which often means you need to do performance optimizations to really make most of it from single data point low-latency inference where you're serving online data inference to high throughput batch inference, where you have ML capability detecting objects in an image, and now you want to run a billion sets on this model. How'd you run this efficiently without waiting for like months to get this piece of information?

If you look from a high-performance computing angle, solving for compute-intensive problems is the backbone of high-performance computing, GPUs have been leveraged since decades in SIMD style, like Single Instruction Multiple Data, where the algorithms are embarrassingly parallel, just like [inaudible 00:13:23] at operations, where you have a matrix to matrix multiplication, and you can easily parallelize it on multiple data.

Techniques, like pipelining, prefetching, and using the memory hierarchies well to really optimize for this Tensor operations, is now becoming mainstream in the industry. These are the backbones of high-performance computing, and we should completely leverage that knowledge. Similarly, there are lots of principles we could be leveraging from data-intensive applications world as well. In machine learning, it's not just some data where we run like hundreds of computations. We also run these hundreds of computations on like huge amounts of data, so we're talking about extending these compute-intensive operations on huge amounts of data. Sometimes it doesn't fit on your memory, you need a cluster, so, you can borrow the principles from distributed systems, and Hadoop ecosystem where you can leverage the data locality and send the compute to the data where needed. It's really a good combination of both, and that's one of the things that makes it very interesting.

Experimentation as a Core Mindset

There are many things we could be leveraging from panel domains, but there's also some very ML-specific needs. One of the main things I feel is the experimentation mindset, in software engineering, you would often get some specifications. Not saying specifications are always frozen, they're always moving, but still, you would know what you're building for, and they're very deterministic, experimentation is usually left to the research teams.

With machine learning, I think these two fields are really merging very well that all ML engineers will be doing lots of experimentation as you've seen in the first doc. There are lots of experiments which are run, and there will be a few winners, and then you will want to take it forward, test them in the real wild world, and then iterate. How do we enable experimentation as a software core tenant? We want these experiments to be easily trackable, reproducible, and we should just treat experiment as a good piece of software entity.

We leverage Dockerized containers and strong orchestration framework to achieve this where anybody can spin up and execute experiments on the fly. All of these is tracked and [inaudible 00:16:18], so you know exactly how to reproduce an experiment.

Device Deployments

The second one is on device deployments, which I shortly mentioned. There are some ML-specific problems in deploying these ML models on devices. Mainly, the binary sizes of this ML frameworks are humongous, you can't just ship the TensorFlow or PyTorch on the mobiles with very limited storage availability. The model sizes can be too huge and also the compute needs, I heard this from one of my colleagues, and I found it very interesting and funny that they completely managed to squeeze in a model, ship it and everything and finally, they realized whenever users use the app, it gets so hot that they can't use it.

Even after you've managed to do all of that and make sure it works, compute is really at premium on mobile, and you should be very careful on how you use it, so there are some mitigations on how you can go about it. By the way, to give context, Adobe has tons of mobile products, and that's one of the reasons this is very important for us. Some of the mitigations here are like always optimized on time. Every OS has its own hardware-accelerated run times. Like, Apple has CoreML, Android has NNAPI, and Windows has WinML, which is part of the OS.

The first problem of binary sizes of ML frameworks is kind of mitigated, but it comes with its own challenges, "Okay, how do you go from this X framework to this Y runtime engine," Add to this the complexities of each device having its own hardware stack, it can be AMD, or it can be Intel, or it can be some other thing. It's a huge O(n) square matrix to support the conversion, and lack of good standard intermediate representation really exacerbates this problem.

This is another field I feel where we get leverage high-performance computing principles. We'll see a Java byte code movement in machine learning where interoperability becomes the main thing that people will focus on, at that point, deployment becomes much simpler. The next one is, we often do some kind of quantization to reduce the size and the complexity of the model before we deploy to the device.

Data Collection, Labelling

Data collection and labeling is quite a bottleneck right now in terms of both, it's time-consuming and expensive. Also, it's very easy to fall into some common pitfalls while you're collecting this data and labeling it, it could be a semi-automated way of labeling it, or it could be a human labeler.

Our models are only as good as the data that we feed it, we'll have a lot of maturity in this area going forward. How do we collect these labels and data in a robust way, and have guardrails in this process to make sure your distribution of your training data represents what you want to achieve in the wild? How do you detect bias when you're collecting data?

Evaluation and Test Driven ML Development

This is the kind of most critical aspect of enabling ML at scale in various products. Evaluation is a very critical and non-trivial task in real world, not all models can have a very quantitative measure. We have aesthetic filter, how do you define and quantify aesthetics similarity? Something which looks similar to me might not look similar to you based on my intent, so there are many subjective tasks that we're targeting which don't have quantitative measures, and it just makes the evaluation even harder.

Even after you have an evaluation metric, which is reasonable, how do you make sure this reflects the Key Performance Indicators of your products? It’s about being able to detect for bias and rectify it. You might make sure your labeling process doesn't have bias, your data is representative. Even then, when you deploy in the wild world, there will be cases where you haven't thought before, and your model really performs badly in those cases. How do you have a feedback loop such that you detect those in time and rectify it?

All of this brings to an idea of test-driven ML development. We use test-driven software development very widely. It’s a known fact that it really increases your velocity and collaboration power once you have the CICD set up. For machine learning, it is at least as important as software engineering test-driven development where the nature of the work is so inherently iterative. I'm looking forward to see how we can bring the A/B testing cycle very close to the development cycle where we'll be able to quickly deploy to a very small set, test it out, iterate, collect feedback, and iterate from there.

Summary and Future Trends

In summary, there are three main aspects where we'll see progress in terms of ML engineering from a platform standpoint. I haven't spoken anything about training because tooling training is very powerful, but it's not the bottleneck right now, mainly because of few things. The tooling has progressed enormously in the past few years, so it is reasonably easy to train a model once you have everything else figured out. We'll see a lot of progress in all the other aspects of ML engineering from data labeling to evaluation to deployment and compiler technology to make the deployments easy.

In terms of standardization, I think we're still very early on in the framework evolution. Some of the things which I think would come up more prominently are like ONNX, which tries to make this O(n) square matrix into O(n) like all frameworks converted to ONNX, and from ONNX, you have a way to go to all these device deployments. Google just released something called MLIR, it stands for Multi-Layer Intermediate Representation, but you could also say Machine Learning and Intermediate Representation.

The idea of some standard intermediate representation is that you can have a pluggable compiler architecture to convert this intermediate representation to any hardware-specific accelerator that you might want, or even heterogeneous hardware accelerators. You can have CPU and GPU working together. And last but not the least, I think test-driven ML development is the single most thing which would accelerate the innovation tremendously in the ML engineering landscape.

Coding in your ID, imagine you getting feedback of what you're coding only after a day. That would be so horrible, you can't really progress fast, you can't learn from your mistakes. Imagine you only get your compiler error like after a day after you code. That's how we are in ML development today where we have very long cycles of feedback and the moment we bring that life cycle to a short time, I think the rate of innovation would be much faster.

Questions and Answers

Participant 1: Do know of any tools that you can follow the pipeline that you recommend? I don't know any tools, but it looks very important, so the group can know exactly what each other is doing so they can follow, and know what data you need, and what data you need to collect, and what is missing.

Tirukkovalur: I'm predominantly focusing on deep learning in my current job, but we also support machine learning framework. In the case of training, I think it's very clear we mostly use TensorFlow, PyTorch, CycleGAN, and so on for data exploration, collection. I group many things here, if you already have your data repositories in Hadoop, it's usually Spark to clean the data.

Participant 1: I mean, project-wise, if you have a big project, if you put the software engineering, and the machine learning, and all the steps that you said that are very important in A/B testing. In the project view, which tool do you use so you can coordinate and all the projects includes the machine learning?

Tirukkovalur: In terms of project management, we use a variety of Jira board.

Participant 2: I have a similar question. Test-driven ML development sounds great. What do the tests look like in that? Because you made the point on the same slide that defining a metric of success is fraught, so, how do you write a test?

Tirukkovalur: I kind of testing in multiple layers, first, validating the inputs and outputs, functional tests to make sure your pipeline just doesn't fail in some cases. You could have offline metrics which you can track. One step beyond would be actual A/B testing, in an ideal world, it will be A/B testing tied to the development cycle where you can test out on like a small subset of user base and iterate.

Participant 2: For your ideal?

Tirukkovalur: Not necessarily ideal. Ideal maybe in your microservices pipeline where you deploy to prod, but you only deploy to a subset of users and get the feedback.

Participant 3: How do you think about testing if the model is stochastic?

Tirukkolavur: It's hard to test, there's no determinism, so, it's definitely hard to test compared to software engineering. There are few things you can test like the functional correctness in terms of, are you handling the cases, for example, if your model is designed to take to 224 by 224 image? Are you doing correct validation? Those things, like, are more like software engineering validation, but actual output, it's hard to test.

The best thing we could do today is having an offline test, which we make sure it is representative of our real world. It might change, your test also might change over a period of time. Making sure it performs, you can have a metric on, "Ok, it has to be at least like 90% precision on this test set." and have this Continuous Integration loop.

 

See more presentations with transcripts

 

Recorded at:

Jun 12, 2019

BT