BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Making AI FaaSt

Making AI FaaSt

Bookmarks
49:55

Summary

Dragos Dascalita Haut and Akhilesh Kumar demo an AI app built with serverless, composing multiple AI functions into one workflow. The functions are deployed into a FaaS platform powered by Apache OpenWhisk. They talk about FaaS architectures, open source technologies, as well as areas where serverless streamlines the experience for developers.

Bio

Dragos Dascalita Haut is the Principal Engineer for the Adobe I/O, working with Apache OpenWhisk community on extending Adobe’s Cloud Services using a distributed serverless platform. Akhilesh Kumar is a senior machine learning engineer at Adobe. He works in applied machine learning team at Adobe which is primarily responsible for putting deep learning models in production.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Drascalita Haut: Today we're going to talk about functions, and in particular Functions as a Service. This talk analyzes this new computer model. It applies it to AI in order to present a solution that seems to bring strategic advantages when deploying AI services at scale.

During this session, it may feel like we're dancing a bit, moving through tools, new technologies, maybe you might even see some new steps like workflows or methods to work with AI. And for those of you that know salsa, you know that it starts with a step forward. So today, I'm going to start with some bold statements, but bear with us, I'm going to take a step back, and then me and AK are going to rehearse something through a live demo, which hopefully is going to go just fine, to illustrate what we're talking about.

FaaS Value Props

Let me start with a step forward, FaaS value prop. What does FaaS bring that more and more people are talking about? I came with three reasons. Number one is FaaSter to prototype, FaaSter to create services, because we work with code, with functions, just code, and we just push the code as it is. Second, never pay for idle. FaaS platforms have the capability to shut down the parts of the system that are not used, so we don't incur any cost. And the third one is a low maintenance overhead. That's because FaaS platforms usually take away the burden to create containers, keeping them up to date, apply security updates, auto-scaling the functions, deploy them in multiple regions.

In other words, FaaS boldly claims that you will find it easier to build more services, and you're going to pay less. Now, this is a pretty bold statement, isn't it? So allow me to take a step back and look at how developers are producing microservices today. A few years ago, we realized that microservices are better than monoliths because in essence, they add flexibility, and they simplify the experience. At the same time, it's also less risky to independently update parts of the system. And I would assume that many of us know what microservices are.

A very high-level microservice architecture is in this slide. So the final solution basically consists of isolated pieces, with its own independent deployment lifecycle. Now, microservices used to be deployed in their own VMs, and then containers came and it was such a revolution because we're able to correctly run multiple services in isolation in the same VM. Then Apache Mesos and Kubernetes, the container managers, came into the picture and they made this nice API and beautiful API to deploy to thousands of VMs as many microservices as we want with a beautiful API.

However, this simple picture, we take it, we put it in multiple regions, we add monitoring, tracing, we pay for logging and we add all these services. It makes me wonder, did we actually really simplify a lot of things? Because it looks like we have more things to care about. Like we don't just monitor the VM, we monitor the containers, we can monitor the container manager, and so forth.

This brings me to this chart, which is about how much it costs to run a microservice and the cost of goods sold; that's what COGS stands for. There are two ways to look at this chart. First, my organization has an allocated budget. As long as I stay within that budget, I'm completely fine. I can run as many microservices long-running, it doesn't matter. The second view is my organization cares about the cost of goods sold and wants to know how much it costs to run a service. So the cost model for microservices is dependent on what? It's dependent on how many microservices I deploy, how many microservices I run, and then we add to this the cost for monitoring, alerting, and so forth.

Now, when the traffic is unpredictable, spiky, how do we compute the cost of goods sold? If your manager asks you how much it costs to put the new service, what do you answer? Well, there's kind of two, or it's a range. It's either I'm going to answer, well, I'm provisioned for 100,000 users, but I have less, so which cost do you want, the true cost or the actual cost? I don't want to say that microservice architectures are wrong, just because the cost model is not perfect. But I do want to bring into the conversation, the aspect that there could be something better. And today, probably you can guess, we're going to talk about functions.

Let's observe how the cost of running microservices compares to the cost of running functions. On the top right chart, you see on the vertical axis is the cost, and on the horizontal axis is the number of functions deployed. Now the line is straight. Why? Because when I deploy functions I actually don't pay for them. The only cost that I incur is shown in the chart on the bottom right, is when requests come in, and I also have a very fixed model for every function that I invoke. So this cost seems to be a little better.

Why am I talking about this? To make it easy for you, the reason why I spoke about this so far is I’d like to suggest- this is another bold statement- that when we start to build a new service, consider doing it serverless, because we may end up paying less. And if it's not going to be used a lot, then we can iterate faster. But how can serverless actually promise this? Here's just my answer. I think FaaS platforms just happens to have better premises. It handles code which is a smaller unit of deployment and scaling than containers, actually. Second, is that it has this request based auto-scaling. In the sense that if I have 100 users concurrently hitting my system, I'll have 100 instances of my functions deployed that are going to serve those 100 concurrent requests.

Making AI FaaSt

Let's make a step forward and talk about AI and FaaSt. You may be wondering why so much talk about AI recently, and I like how Andrew Y Ng puts it in a very simple chart. Horizontal is the amount of data, and traditional machine learning algorithms wouldn't perform better regardless of how much data we would add. But in recent years, and I would probably mention 2012 - AK keep me honest here, 2012 AlexNet?

Kumar: AlexNet, yes.

Dascalita Haut: AlexNet the first neural network that actually performed really well for computer vision. 2014, 2015 DeepSpeech?

Kumar: DeepSpeech, yes.

Dascalita Haut: Yes, neural network used in voice. We noticed that if we are able to train larger neural networks, our performance is much, much better. It's much more fun so we can actually do stuff with it, and hence we're talking so much about AI. What's interesting to observe when starting to make apps- and I like it how Peter Norvig, the director of research at Google, puts it: "With AI, we look at the programmer more like a teacher assisting computers in learning rather than a micro-manager,” that just imagines a program, starts at the top, runs all the way to the bottom. It's very deterministic. “It's interesting to note that we spent the last 40 years building up tools and programs to deal with text, which is code in a good way. But right now, we're creating models instead of text. And we just don't have the tools to deal with that."

Peter Norvig goes on to say that, "We need to retool the industry." And he's talking about probably zeroing in to the model and maybe put a breakpoint on a neuron and see what happen, and see if we can change it. For the scope of our talk today, we're not going to go that deep, but we are going to touch on some new tools and workflows. The last quote I'm going to leave you with is Andrej Karpathy, which says that, "Neural networks are not just another thing for software." He goes on to say, "It's a fundamental shift in how we write software and calling it software 2.0."

Please keep in mind these codes when you see that we're experimenting with some tools that may be new for you. The goal of our presentation is to show how developer experience can be improved with these tools and processes, and how AI development could be made faster. So if these tools are maybe new or it's the first time you see something, don't let your mind confuse you. Just please focus on the process and on the intention that we want to show here. As with dancing, the more you practice, the better you get.

Building AI Apps

Going forward, let's analyze how AI applications are built at a very high level. This is just my view. You start with the process, you have an idea, you write some code, you experiment, and you iterate. As you do this, you use some tools. Today we're going to see Jupyter Notebook in action. This code needs to run in some compute. There are, at the high level, two types of jobs. There's long-running jobs, which are used for training, and short or on-demand jobs, which are used for inference. And these jobs may run on the cloud, in computers, or on the client device.

I mentioned training, training versus inference. Training is about learning; inference is about answering. So inference is getting the new data sample, passing it through a network to infer an answer. In other words, or to put it simpler, I have on my left two images, and I put the image through the network to infer it on the right side if it's a dog or a cat. So inference runs faster than training and also a neural network would process one data input at a time.

This logic matches very well with the FaaS model. There is enough code for a function, and each function processes one request at a time. If you look at my slide, I just put conceptually how an AI function would look like. It would first download and cache the model. And second, it would run the inference and return the output, whatever output of that inference is. Additional FaaS benefits I mentioned, is faster to deploy just the code, the model directly, with the models in some cases. You never pay for idle, so you can experiment with as many algorithms as you want. And it's a low maintenance overhead.

The question is though, if I have an AI algorithm, do I have an application? What do I need to have a real application? From what I noticed, real applications actually integrate multiple algorithms into workflows, and if I already have an existing algorithm, and I build a new app, I might want to try to reuse existing functions or existing models. Sounds fair?

Demo

Without further ado, let's go into the demo. To illustrate how AI and FaaS work together, we simulated a real-world use case, a workflow. Me and AK have been given the wonderful task to create an intelligent content workflow to make it easier to build product catalogs, starting directly from photo shoots. The user would upload the photos, they will be stored in the cloud storage, and in our demo it's going to be Creative Cloud. The cloud storage event will trigger a function in the background, the function then will invoke our workflow, will first check to see if the image was good - if it was blurry or not, or if it's interesting enough, if it's good enough to work.

Then my task today is going to create a body crop function to crop from eyes to hips. Then we're going to go through an auto-tag function that will extract the product SKU, whatever the subject wears, and maybe other tags. And then in the end, upload the final image to a product catalog so that it can be used. We also call this content velocity, it just goes super-fast. I'm going to reuse two existing functions: the image quality, I already had it, and the auto-tag, it's in place.

To make this demo, we're going to use an open source serverless platform, Apache OpenWhisk, we're going to use a composer which is a nice project done by IBM Research that just allows us to put multiple functions together into a workflow. I'm going to show you this, so don't be confused if this is new. And then for editing AI actions and deploying, we're going to use Jupyter Notebooks.

Let's go into the demo. I've created this composition. I'm into the IBM Cloud Shell, which is the OpenWhisk composer, and I create a composition called assets created composition. We zoom in. It basically shows the same thing that we saw previously, like the workflow that I was just describing. Let me put this to work and then I'll explain a little more.

I'll go into Creative Cloud and I will upload a picture. This one, the picture was uploaded. Behind the scenes, the event will be triggered and I can watch all my activations, the activation list in the background, to see what's happening. Yes, the event hasn't triggered yet. Let me go on and just show you a little bit of what's behind the composition while the event works. I will click on this icon. Okay, the workflow is small. I just want to show the code behind this workflow, what I used to create it.

This is something that I really, really like, and super excites me. A lot of workflow engines so far, they use their own language to describe learn an XML, learn a JSON, learn something new. Well, it's not the case here. Composer uses JavaScript. If I want to compose, I just say compose that sequence and I put something there. For instance, if I want to run the image, whatever I do, the image quality, which should show here, is the image quality, I just create a sequence, and I pass the parameters.

What's neat about this model is that if I just need to do a simple JavaScript, or I just need to tweak a little bit, for instance, just insert an extra tag that would say, this was created with Adobe runtime, which is our OpenWhisk deployment, I will just find it here. You see, I do results that tags that push, and I have my JavaScript code. I think this is really powerful, I'm not dependent on someone's language, or definition, JSON, Schema or whatever, to define any workflow. I can just do it with my code.

Let me get back and see what happened. Let me do the session list. I have at least a composition that I executed. If I look into the session flow, I see some boxes. The nice thing is that I can see the trace and how the request went. You see something grayed out on the right side, and you see something with blue. The blue is the path that my script took, and the right is the path that my script did not take.

Another nice thing is that if I roll over an action, in this case, the smart body crop in my composition, it gives me whatever the output of that action was, the x, the y, the width, and the height, to crop the image. For instance, if I roll over the image quality it will give me the scores. In this case, we do look at multiple scores: color harmony, depth of field, there are a bunch of features that we're looking at.

Now, let me show the end result. So the image went through all this flow. And then we went into my assets. Let me refresh my assets. I have to delete some - we're just playing with it while it's supposed to identify a piece of clothing, but it's not working. So allow me to delete it, you didn't see anything. And my smart auto crop is not that smart. AK, would you help me from here?

Kumar: Yes, I'm going to fix it. Would you mind if I come here? Thank you, Dragos, thank you, Sangeeta. How's everyone doing today? Before I start, how many of you are familiar with machine learning, or have some idea about it? Wow, quite a few of you. Now that we have seen the broken body crop from Dragos, I'm going to show you how we can test a model on our local system. And once we are convinced that it is doing a good job, we can deploy it on serverless.

I'm going to share with you Jupyter Notebook. In case you are not familiar with Jupyter Notebook, it is an open source tool that is used for visualizing and analyzing the data. The model that I'm using today is basically called OpenPose. It is an open source model, developed by Carnegie Mellon University. I'll be talking about it a bit later. Basically, what in this demo I'm doing is it is divided into two parts. First I'm testing a model, and once I'm convinced that it is doing a good job, I'm going to deploy it.

Now, the reason Dragos' body crop didn't do well was because it is rule-based; the rules are hardcoded to crop, and that's why it is not doing a good job. I'm going to replace those rules with an AI model, which is in this case, OpenPose model. Let's start. I'll run my fourth cell. What this cell is doing is basically importing all the important libraries like TensorFlow, the main action is happening in inference class here. And then I'm going to download the deep learning model called OpenPose. I'm downloading from Amazon, so I'll give the remote URL and I'm going to download it. Yes, just a second. So while it is downloading - just a second.

Dascalita Haut: We actually saved it. I saved it before for the WiFi.

Kumar: The model is already there in my local system. Now, I'm going to show you the image that I'm going to crop. So this is the image that we are going to crop, this is the image that Dragos tried to crop and it didn't do a very good job. Now, we can see this image and we can see where are the eyes, where is the nose, where's the shoulder, hips, and so on. Because we are human beings, right? Let me show you how the machine sees the body.

I'll run this cell. What is happening in this cell is basically my model is running in inference mode and it is trained to detect body parts. Now, the version of model that I'm using, it is trained to identify 18 human body parts. It is running in inference mode, so it takes around 9 to 10 seconds. Let's see what we have got. You can see that it has identified some body parts, like it has identified nose, eyes, shoulder, and it has done a relatively good job than what Dragos had. One thing to note here is that it has not identified all the body parts. You can see that the right knee it has not identified, ankles it has not identified, and that's okay.

The thing with machine learning model is that it is not 100% perfect. Again, if you have your own data and if you don't want to use this open source model, you can train your own model. Now, once my machine learning model has identified the body part in the image, I can crop it. Let's say I want to crop my image from eyes to hips. So that's why I have given the upper body part as eyes, and the lower body part as hips.

So I'll run this thing again. What it is doing is it is doing the same thing, running my model in inference mode, my model is returning the coordinate corresponding to different body part, and I'm using those coordinates to crop my image. You can see that it has done a good job, it has cropped from eyes to hips. And you can see different stats also, like how much time it took to do that.

Now let me try to visualize the part of image that was cropped. What I can do is I can draw a bounding box around the area of image that was cropped. I can do that by running this cell. You can see that that the part of image that is inside my bounding box, that is the area that was cropped. So that's just for visualization.

Now let's try with another image. And this time, let's say I want to crop from eyes to elbows. So last time I cropped from eyes to hips, this time I'm cropping from eyes to elbow. So I'll change my parameters here, I'll give eyes and lower body part as elbows. I'll run this cell, and again, the same thing is happening. My model is running in inference mode, it is identifying the different body parts, and it is returning the coordinates corresponding to those body parts. And again, it takes around 9 to 10 seconds. Now, this time, I have also printed the x, y, width, and height, coordinate. You can see on my screen, this is the new image that it has cropped.

Now, both cropping I have done on the images is present on my local system. Now let's repeat the same thing on a remote image. For that what I'll do is I'll give the image URL, which is my first parameter. At the end, I'll give the same, the upper body part, and the lower body part. You can see I want to crop from eyes to elbow. I'll run this cell and it will take again 9 to 10 seconds. Same thing is happening, the image is getting downloaded and the model is getting run in inference mode, and things are getting cropped.

Now you can see that it has done a good job, but not very good. But the reason may be because the person here is wearing sunglasses, the image resolution is not good. But I'm convinced that for my purpose, this OpenPose model is kind of a good model. Now, up until here, it was all about testing the model, whether my model is doing a good job or not. Now I'm convinced that it is doing a good job.

The next step is to deploy this model in serverless. Now to deploy that I'm going to write a file called smartbodycrop.py. Now, this file is an injectable code that is injected on fly in Docker container which is running inside OpenWhisk. In this file, I'll write four parameters: model URL, the URL, or the location of my model, the image URL, the location of my image, the upper body part, and the lower body part. I expect that this code will return me x, y, w, and h, these four coordinates from where I'm going to crop.

I'll run this cell and you can see that it has returned a file called smartbodycrop.py. Now, before deploying this code, let me test this file locally, whether it has been written properly or not. I'm going to test this file by running using the model that is there on my local system. We have not touched the serverless part yet. I mean, we have not touched the remote server yet. Everything is happening right now in my local system.

I'll run this cell, and it takes some time because my model was running in inference mode. You can see that it has returned four parameters, w, h, y, x. And then it also shows how much time it takes. So that means the file that I wrote, smart body crop, it's a good file that means it's ready to be deployed.

Now, the easiest way to deploy anything on serverless is using Apache OpenWhisk Shell. It is an open source tool, which I will be using for deploying the function. I'm going to download this, and I'm going to install this by running this cell. The next step is configuring my Apache OpenWhisk credentials. I'll put all my credentials in .wskcrop file which is here, and I'm going to run this cell. Now, deploying a function also expects that you have all the dependent Python file zipped and put in main.py folder, which I'll be doing that by running this cell. Of course, I have to specify, where is the location of my model here?

Now you saw the smart body crop action that Dragos showed and it was broken. I'm going to update that action with all the necessary file, and all the model that I tested on my local system. So I'm going to update my action. And you can see that it says updated; my action has been updated. So one last time, I'm going to check whether things like my action is up-to-date or not. I can get the status of that by doing action get. I'll run this cell. You can see that it's returning to me all the proper output, that means my function has been updated properly.

The last step is basically invoking the function, which is like making the function call. For that, I can do action invoke, and I'll pass the parameter like the URL to my image, the upper body part, and the lower body part, and I'll run this cell. You can see that it has returned a unique function ID. This function ID is associated with each unique function call.

Now, for the first time, when my model is running in the serverless, it takes some time. That's because it is downloading the model that it is warming the container, because containers are cold. It may take some time. In my case, it is taking around 40 to 50 seconds. I can get the status of what has happened to my function call by copying this ID, which is associated to my function call, which I did, and pasting it here, and then I can get the status of that.

Now, you can see that it is giving me error message, and that's okay. That means my function call has not completed yet. So I'll wait for some more time. Generally it takes 40 to 50 seconds, but sometimes, if things are running slowly, it may take, let's say, one minute or so. I'll run this again. Now you can see that it has done the proper job, it is returning to me all the four coordinates that I was expecting. And you can get all the other stats also, like how much time it took to download the image, how much time it took to download the model, how much time the TensorFlow session was run, so on and so forth.

Now, after I have made my function call, if I make my second function call before my container is recycled, you can see that response is almost instantaneous. You can see that it took only 1 to 10 seconds, instead of waiting for 10 to 15 seconds. And that's the beauty of serverless, that's the beauty of OpenWhisk. Once your containers are hot, it’s just like any server full environment.

I want to tell you that me and Dragos both worked on this, and the way we were able to collaborate and work on this demo, and the way we tested and deployed things, my experience as a developer was very good. And yes, that is my conclusion for this demo. From here, I'll ask Dragos to take back.

Dascalita Haut: Thank you, AK. Let's see how this function works in the real workflow. I'm going to upload three images. AK didn't quite agree with me, but I was really curious for those of you that saw Silicon Valley, I was very curious if we put a hot dog, what's going to happen to the algorithm? Would it explode or will it actually work? Where is the hotdog? I had the image of the hot dog. And actually, if you remember this show there was this algorithm that was all of a sudden able to see the hot dog. I'm going to put a hot dog and actually, let me just put the first person, get this guy. And I'm also going to put our hot dog and let's see what happens.

Now that AK has deployed a function, I can go right now and put 1,000 images and it will still work. I think this is a very nice thing with serverless; it scales. And yes, it may not warm up so fast; you noticed that first time he had to download the model and do those things. But it's so peaceful to know if I have a background job, that in this case, in just the images, I can go from 0 to 1,000 images. Then I only pay for that, and then the system shuts down. Besides this, I really love the experience that we were able to do it together. Basically, AK deployed from the JupyterLab Notebook, deployed a new model.

Let's see how our models are doing here If I do a session list I get this one, 28 seconds, and this one is 9 seconds. So I think this is the hot dog because it took less time, let's see. Just bear with me a little bit. I don't have the image name. It was on top, somewhere on top, something. Yes, hotdog.jpg. All right, so it looks like hot dog, it didn't pass the image quality, because the image quality - let me go into session flow - is only 54%. And we only let images that are more than 61%. So this is very interesting.

Anyways, the algorithm thinks that this may actually be a person. Who knows, a little, very, very close, I don't know why. So it didn't go on the path of doing the body crop and so forth; it just uploaded back the image into the Creative Cloud for the photographer just to know that it didn't go.

Now let's go and see from my poor body crop algorithm, how it looks right now. All right, this is the person. Thank you very much, AK, you saved my day. In terms of tags, let's look at some of the tags. This is the one that I put manually, created with runtime, and it was able to identify its fashion, male adult, so forth. You can imagine that, if you have pictures, you can even train to identify the product SKU, but we just didn't have the data for this demo. I think the possibilities are pretty much endless here, you can imagine any workflow you want. If you fall under this event-based workflow, and if the latency is not super important for you, then you can build workflows, you can deploy AI models right from Jupyter Notebook and do this at scale.

To wrap it up, what have we seen? So we've seen that software 2.0 entangles the model with the code. That we were able to use JupyterLab to assist us in the model development and functions to assist us in deploying the model. And that machine learning engineers can collaborate with software engineers. By the way, we're not here to say that everybody should use JupyterLab, or that everybody should use Apache OpenWhisk. But it's just one workflow that we came up with to make it easy, to streamline how engineers can collaborate on this.

Lastly, with FaaS it's easy to deploy a new AI model as a function. That's what we wanted to share it today. So thank you very much for coming here. I'll leave you with some conclusions. FaaS platforms are still maturing, but we can make it faster to deploy AI models and build more services and pay less with serverless.

Kumar: Thank you, everyone.

Questions & Answers

Participant 1: Thanks for the talk, really awesome work. I can see that with this approach, you can definitely iterate and move much faster. I wonder if there's any sacrifice from this approach in terms of the server stability, and how to, for example, do testing of the code that you wrote before you push things like that?

Dascalita Haut: Yes, that's a very good question. Right now, in our demo, the only test we've done was in the Jupyter Notebook. You just do Shift Enter and make sure it works. To some extent, you can write some unit test, for vision it's quite interesting, how would the unit test look like? You will run inference and then you will compare the image and yes, it's still an open ...

Participant 1: Not necessarily in the case of deploying a model, because sometimes you will probably want to add some preprocessing or some business logic before or after you run this auto, and those codes, you might want to add more testing. And even when it comes to the CV model itself, you might want to run it on some fixed dataset or run it on some Docomo traffic to just observe for a while, see if it breaks, things like that.

Kumar: I can answer this question. In the Jupyter Notebook that I showed, everything that I wrote, it was in a class, inference class. At the top, everything that I'm doing, it's happening in a class. Now, you can write your function, your own class, which is handling all the use cases that you're told about. So that's very easy to write. And the way we deploy things inside Adobe is, let's say, I want to test first on let's say, 100,000 images, and I don't want to do it on my local machine. What I generally do is I deploy my model on stage, and then I test that on 100,000 images, get the result. And then once I'm convinced I deploy it on my production. So all the use cases and all the error handling that you want to do that you can take care of in your inference class which I have written here.

Dascalita Haut: We can talk more if you want.

Kumar: Yes, we can talk more.

Dascalita Haut: I think there was a question in the back also, Sangeeta. Thank you. I think it was all the way in the back.

Participant 2: My question is, do you know how often the functions are recycled? Like once they are warmed up, can they be recycled again?

Dascalita Haut: They are, yes, I'll say it's configurable. I'm trying to come up with a number from Amazon Lambda, for instance, it's probably minutes.

Participant 2: Eight minutes or something, right?

Dascalita Haut: I don't know exactly, between 5 and 10, if I'm correct.

Participant 2: So let's say another request comes in between the eight minutes, and then is it going to be again, kind of expanded the time slot, or would it be ...?

Dascalita Haut: After?

Participant 2: Yes.

Dascalita Haut: Yes, you're going to pay for the cold start, and this is indeed something that is a trade-off. You pay for a cold start, but you don't pay when you deploy, you only pay per usage. So yes, it's one of the challenges. Thank you for mentioning this.

Participant 2: How would deployment and versioning work? I understand that this is serverless, but what if we want to add, let's say, a couple more models to do some kind of A/B testing, and then we will want to deploy a new version of the function. Or we have a more complex system where we are using multiple models?

Dascalita Haut: I can give an answer, unless did you want to take this?

Kumar: Yes. We had an action called smart body crop. You can create your own action with something like a smart body crop version one action, smart body crop version two action, smart body copy version three action. And in those actions, you have different version of models. Did I answer your question?

Dascalita Haut: When you create the action here - where was the code to create the action? I'll just get to it here. I can even create a package, but I can also put body crop view one. I think there could be always improvements to how we solve versioning. But right now, serverless platforms kind of offer this. I know, in Lambda, for instance, you have the latest and you have other versions. And you can say, my request on these endpoints should always use this version, something like that. Because, you see, if I named the function v1, I also provide a model URL as a parameter here, so I can actually link a version of the model with a version of the action one to one.

Participant 3: How does serverless manage the dependency libraries?

Dascalita Haut: That's a great question. Per function? The answer is it does not. It expects that if you have multiple libraries, you just create a zip, and you upload the zip to the function. We actually did this here. I will just show you, maybe we went too fast, it's right here. We actually created a zip because AK created some Python classes, and they had to be zipped together. So we created a zip.

Kumar: You can control actually which version of [inaudible 00:41:45] you are using, which version of TensorFlow you are using, those things you can control. Actually, when we were doing this, there was some library called OpenCV. We tried to use something like image or PIL Pillow Library, because of these license issues. So those things you can take care of.

Participant 4: It was a great talk. Usually there are trade-offs with software development. What are cases where this is a good idea to implement, or where are cases where this would be a bad idea?

Dascalita Haut: Thank you for this question. Yes, I wanted to put a table and I missed. I once saw a slide from AWS. Picture this - serverless makes some choices for you for the framework, so you just push the code. In this case, we use an action that has the version of the TensorFlow embedded in and maybe other versions. With containers, you have full control over those versions. Second aspect, when you don't have a latency-sensitive use case, like an application actually waits for an answer, I wouldn't use this, unless my system behind the scenes figures out how to use a predictive model to pre-warm my containers beforehand. Which you can do if you know you have traffic at 8 a.m. in the morning; you just send 1,000 requests at 7:50 or something, which you could do. I'd say event-based and cases where latency is not critical for the application.

Participant 5: I just want to understand how the decision of taking down that container is made on the cloud side. I would not want my container to be destroyed, I would like it to be idle and run next time without needing to be warmed up again. But the other side, they have different objectives, right?

Dascalita Haut: I see in even libraries on GitHub that work with Lambda. Let's say you want to keep five functions alive, so you schedule the functions to run five minutes to actually make those five in few occasions. What I do expect to happen? FaaS platforms are still maturing, but what I expect to happen is that teams will work on a predictive model, and if your traffic can be predicted, then it will pre-warm ahead of time.

And second, the performance of how long it takes to pre-warm a container should be improved by these platforms. For instance, let's say you deploy a lot of models. I have a serverless platform that would give you temporary storage that is very close to the action code. Like an EFS, if you're familiar with this, Elastic File System, from Amazon, or network file system, so you don't have to download the model. So these things can be made faster. They're not as fast right now.

Participant 5: I can just artificially make a heartbeat kind of request to that container until Amazon kills me?

Dascalita Haut: If you actually have traffic, the containers never die they just warm up.

Participant 6: I'm also new to serverless. How portable are these functions between different cloud providers? If I wrote a function on Lambda on AWS, can I move it to agile functions or Google functions, or I have to change my code significantly?

Dascalita Haut: I think today, you have to make some small changes. I can try to show you - this was the code. Let me see if I can make this bigger. I don't know how to make this bigger. Let me find the slides. It's usually in the input parameters.

Moderator: So while you're looking for that, Dragos, what about things like serverless framework? Part of their goal is to abstract the serverless platform away from it?

Dascalita Haut: There's serverless framework, which is like a manifest where you can define. At the function level code, in this case, with OpenWhisk, I only have one input object. With Amazon, Google Cloud functions, and Azure functions, you have a secondary object, which is called context. In the context, you may have something else. So you might have to have a wrapper. You could write your function to work in all systems; that would be lovely. I call that library as a service. You just put a library and if somebody needs it, it's just deployable.

But you have to write a little bit. But my hope is that these platforms will standardize on the input, and they will be more portable-ish, because every provider will try to make it easy to use. If you're in Lambda, you will have some pre-existing libraries, which won't exist in Azure functions or Google Cloud function. So that's why it's “ish”.

Participant 7: What are the guarantees from Amazon on the data I uploaded there to run as FaaS so that this data is owned by me, so even Amazon should not be able to use it for any purposes to share, or for commercial uses?

Dascalita Haut: Correct. In this case, we copied from an open box from Dropbox, and actually put it in our own owner account in Amazon. Yes, so if you train your custom model, you don't want to make it public. In this case, I think you can use any Blob Storage, you name it.

Kumar: Machine learning models are like a file. It can be stored in any secure environment. It doesn't have to be Amazon. For this demo, we uploaded our model on S3, but it could be anything, it could be Dropbox, it could be our own custom provider, something like that. So it all boils down to how secure your platform is, where you are storing the model.

Dascalita Haut: Did that answer your question? I'd say if I were to make a choice right now, and we have to end with this question. Thank you. Thank you very much for highlighting that. I'd say if I were to make a choice I would still use object cloud storage, but there's a ton of them, like this 2030. I will show you. I use a tool that is called Rclone, and real quickly, you see a list of object cloud storage that it works with. So you can actually make something generic and upload it anywhere you want. Then just build a model from there. You should have security there, it's fine, yes.

Moderator: I guess that's orthogonal to serverless, I would say, is it? But anyway.

Participant 7: The model and the data and the model algorithm should not be paired by anyone hosting this service.

 

See more presentations with transcripts

 

Recorded at:

Mar 23, 2019

BT