InfoQ Homepage Presentations Deep Learning on Microcontrollers

Deep Learning on Microcontrollers

Bookmarks

View Presentation

Speed:

Download

34:10

Summary

Pete Warden discusses why Deep Learning is a great fit for tiny, cheap devices, what can be built with it, and how to get started.

Bio

Pete Warden is technical lead on the mobile and embedded side of TensorFlow, and was previously CTO of Jetpac, acquired by Google in 2014.

About the conference

QCon.ai is a practical AI and machine learning conference bringing together software teams working on all aspects of AI and machine learning.

Transcript

Warden: I'm going to be talking about running deep learning on really tiny devices, running on less than 100 kilobytes. I'm a tech lead on the TensorFlow team, I've been with TensorFlow since the beginning. Initially, I was working on a bunch of the mobile side of TensorFlow, getting things running on Android and iOS, but, over the last year or two, I've been working much more on the embedded side of TensorFlow.

Why am I here? Why am I talking about running TensorFlow, running machine learning on these embedded devices? It's not something that people usually associate with deep learning, you usually think about things like running on the cloud, running on massive data centers, so, why is this such an important topic? Why does this matter?

The biggest number that jumps out at me is the fact that there are already 150 billion embedded devices out there in the world, that dwarfs by far any other computing platform that's out there. Not only that, the number is growing at 20%, roughly, every year. The good news is we've reached a lot of the world, but that means that the number of internet users is not growing nearly as fast as it was over the last 10 years. Smartphone sales have actually been pretty flat, so the growth there has been very small.

In terms of where the compute is happening, these embedded devices are by far the most common platform that's out there. There's a massive amount of compute sitting out there in the world already, these are increasingly 32-bit devices that actually have a lot of capability, they're able to do a lot of computation. It's a really interesting platform that isn't really being used anywhere near as much as it could be.

You may have noticed I haven't used the word "edge" when I've been talking about this because this is another slightly abstract thing, but it really bugs me when we talk about edge computing, because this is just a random diagram I picked up when I searched for edge computing, and there's something missing from this. What's missing is the people who are actually using these devices, if you put the people on here, they would be shoved off the edge. I don't think that we are at the edge when we're doing our computing, I think that our users, the people who are actually doing this stuff, should be at the center of all of our diagrams. Then everything that gets further away, that should be considered the edge, I see like the edge as being a very, very cloud-centric way of thinking about the world. We actually want our compute and our data to be near us so that we can have interactivity, so that we can have control over it. I really try and avoid the use of the word "edge" because I think it really marginalizes what's actually happening here.

Why ML: Energy

There are a lot of embedded computers out there, but why does machine learning matter for these devices? There's one big sticking point for all of these devices out there, and that is energy usage. If you think about these 150 billion devices, almost all of them are not in devices that are actually wired into the mains power. They rely on either having a battery or doing increasingly some energy harvesting, so you don't even need any kind of battery. The reason for that is pretty simple, as soon as you have something that relies on having a battery, you have to have somebody who changes that battery or somebody who recharges that battery.

If you already have 10 or 20 of these devices for every single person in the world already, if that number keeps growing, pretty much we will soon be spending all of our time just running around just changing batteries for our robot overlords. Really having batteries that you need to change, or things you need to wire in, or anything that you need to touch rapidly becomes unworkable in terms of what can actually manage.

One of the things I've found interesting as well is, even places where you think, "Hey, no, we've got access to power here, like factory floors" you very often actually find it extremely hard or extremely expensive to tap into that power. Having something that you can peel and stick for like predictive maintenance, is actually a much more practical solution than trying to go through all of the processes are required to plug something into the mains power. You have this big bottleneck on deploying these devices where energy usage is crucial, and you have to keep energy usage really low.

One of the things I discovered over the last couple of years as I've been working in this area is that there are very hard limits on how much power it takes to transmit data. Even sending data through something like Bluetooth low-energy and sending it just a few meters, will use tens to hundreds of milliwatts. To give you an idea of what that means, if you are below one milliwatt, you can probably run on a coin battery on the order of weeks to months, which is the minimum that you want to reach for if you are trying to design a device that's going to be out there in the world, possibly running on energy harvesting, but definitely running on a battery for a long time.

There seems to be a very hard problem in terms of none of these radio transmission technologies have ever been able to get below the low hundreds of milliwatts down to the high tens of milliwatts. It just seems it's extremely hard to send data, even a short distance, once you get into sending cell data, it gets really tough.

What's interesting is that the sensors that can capture that data are able to run with very low power usage. They can run well below a milliwatt into the microwatt ranges to do things like video capture, to do things like audio capture, accelerometers, all of the things you would expect. Running arithmetic on the CPUs in these embedded processors is also extremely cheap in terms of energy, and it's getting cheaper. You can do millions of operations per second for well under a milliwatt already and there are lots of signs that that's going to get even cheaper, and you're going to be able to do much more processing.

You have this situation where you can capture lots of data, you can process lots of data, but if you ever want to try streaming it to the cloud or sending it anywhere, you're going to rapidly just drain the battery. That means that almost all the data that's currently being captured is being dropped on the floor. You have these sensors that are capable of doing this amazing data capture, but we can't do anything with it currently.

One example that really stuck with me, to make this more concrete, I was talking with some people who were working on satellites, and they were actually using cell phone components for their satellites, so, I was like, "Oh, that's awesome." You have these image sensors up there that were able to capture like HD resolution, you can capture at this really high frame rate, all of the wonderful things that have come through innovation in smartphone cameras, you can actually do using the satellites way. They shuffled around, and they're like, "Well, yes, we can capture at that resolution, we can capture at that frame rate, but we only have a few hundred megabytes that we can send every few hours down to the base station."

Because of these limits on how hard it is to transmit data, especially in their case, the few hundred miles from near-Earth orbit down to the ground, they were unable and they are unable to actually take advantage of these amazing sensors that they have actually launched up into orbit. It means that they're sending down these really low-res images.

To give you an idea of how machine learning could help with that, the thing that I'm excited about, like some of the practical examples there would be, "Hey, a lot of their images of clouds. Let's throw away the images of clouds immediately on the device on the satellite, so you don't have to download those." Then even more, "Hey, most of the images are of blank areas of sea. But if there's a ship or some other feature in there, let's zoom in on that, and let's actually send a much higher-res image of the parts of that are interesting."

It's this idea of, if you're able to do some clever things where the data is being captured, where you have the cheap processing power, where you have the cheap sensor data capturing, then it gives us a chance to actually turn all of this really messy data into something actionable, into something that's actually a lot easier to transmit, that's a lot more valuable when you actually do transmit it.

Demo: How is This Done?

To give you an idea of what you can actually do with machine learning on a very small device, I will say, "Yes." That is actually a $15 board that you can buy from SparkFun, what it has on there is a Cortex-M4 Microcontroller, and a couple of microphones, and a coin battery, so, it's able to run on that coin battery for weeks listening out for the wake words.

In this case, I've just used the wake word, "Yes," but you can imagine customizing it for things that you wanted an interface to wake up with. It's using opensource TensorFlow code to actually do that, so, you could actually grab the code yourself, and you can buy the device off SparkFun for $15, and it will come pre-flashed with that demo, and you can alter it, and you can do whatever you'd like with it.

What Are the Challenges?

To get this working, the challenges for running on these tiny devices are that they have very small amounts of memory to work with. You have less than 100 kilobytes of RAM, less than 100 kilobytes of storage that is running using less than 10 million arithmetic ops per second. You can't rely on having fast floating-point hardware on these devices, and this doesn't have an operating system, this is actually running on bare metal. There aren’t even ways to do things like malloc, or new, or allocate memory, or any of the other things that you might be used to having on an operating system, let alone having files or anything else that you would hope.

Model Design

One of the things we needed to do there was, "Hey, figure out how to fit this into 20 kilobytes because we needed a model that was small enough to fit on this device and small enough to actually run on this device." Part of the inspiration for this came from my experiences. When I first joined Google back in 2014, I actually discovered a whole bunch of projects that, as an outsider, I didn't know about within Google, but the one that really blew my mind was the fact that the speech recognition team behind OK Google, they were already running deep learning models that were only 13 kilobytes in size.

That really was amazing for me because, at that time, I was coming from this world where image models were many megabytes at a minimum and they were running these on DSPs. These were very pragmatic engineers, these weren't head-in-the-clouds researchers, these were people who were hard-bitten, embedded microcontroller programmers. They had spent a lot of time and done a lot of experimentation and had found that deep learning was the most practical approach, even on these really, really tiny devices.

That was not something that was widely known at the time, but it really gave me the itch to try this out on all sorts of other areas and see if we could actually get this technology out into other people's hands and see what they could actually do with it. To fit this sort of model into that sort of size, we had to actually quantize it down, we didn't have a floating point to use here, so we were quantizing it down to eight bit. One of the interesting things we discovered was that one of the most common ways to do this is running, essentially, image recognition after you take the raw sample data and you turn it into a frequency domain image by taking slices and doing an FFT and figuring out what the frequencies are for each of like 20 millisecond chunk of the last second. Then you just run a very simple, very familiar, if you've done image recognition, convolutional network on that spectrogram. We actually have a tutorial showing you how you can do this, it uses less than 400,000 arithmetic operations per inference, so it's comparatively lightweight compared to a lot of models you'd see out there.

Software Design

The other challenge we faced was that TensorFlow Lite, even though it's a named lite, was aimed at Android and iOS phones. In those circumstances, lite means, "Hey, if you're 200 to 300 kilobytes, that's pretty lightweight for a framework that's going to be added to an app." That's more memory than a lot of these devices have in total. We still haven't figured out a really good name for this, we're calling it TensorFlow Lite for microcontrollers, but we had to figure out how to put TensorFlow Lite on even more of a diet.

We also had a lot of dependencies, as you would, on all of these POSIX functions and standard C and C++ library calls. We use malloc, and new, and things like that because you would, and these are all things that you can actually rely on in the microcontroller world. We didn't want to do something completely different because there's a whole bunch of stuff that we want to carry over from mainline TensorFlow. It has a whole bunch of op implementations already there, it has a well-documented API, it has a file format, it has a lot of the conversion tooling. A lot of the work that you need to do here is actually going from the Python environment of TensorFlow down to something that's capable of being run as an influence model.

We really didn't want to lose all of those advantages, and we wanted to have our cake and eat it as well. What we ended up doing was actually modularizing the code that was there and breaking it up into smaller pieces so that we could just use the pieces that we needed on the microcontroller side, and making a very clear distinction between the API definitions and the implementations, and also having a lot of reference code, so, really trying to write non-optimized, very simple code that would take up very little room and would use very few specialized dependencies. Together with that, we were able to add a new runtime layer for the microcontrollers.

The other thing, if you do look at TensorFlow Lite for microcontrollers, you might see that we only have a handful of ops implemented, and that's because we really wanted to focus on getting one example running end-to-end, rather than trying to do a much broader but shallower implementation. We were shooting to being able to yell "Yes" at the microcontroller and have it sometimes light up the little yellow light, versus trying to do something that was much less focused.

What Does This Mean in Practice?

You have something that lights up our little yellow light, what does this actually mean in practice? It's really about opening up a whole bunch of applications that have never been possible to build before. One of my dreams, just on the voice side, is, "Hey, can we have a 50-cent chip that runs for a year on a coin battery that does voice recognition?" That isn't a million miles away from being possible right now, that is coming into view. One of the things that we recently launched for the Pixel at Google was on-device voice transcription that is server-quality. The same quality as the best models we've managed to put up on the cloud, we now actually have running on Pixel phones for doing transcription. That's an 80-megabyte model, and it requires a high-end ARM A full CPU to run.

It's still not something you can run on microcontrollers, but you can see that it's getting into the realm. There's no big theoretical reasons why, with a few more advances around hardware and some advances on the software side, that we can't get to that point. Then you suddenly have this component that you can use everywhere, giving a voice interface to anything out there in the world for, basically, the same cost as putting on a button or switch.

It also means that you can start to think about all sorts of other applications, whether it's spotting stuff out in agriculture, so spotting pests. We actually have a group I've been talking to, a nonprofit, who are able to distinguish different mosquitoes by the frequency of their buzz, so different species of mosquitoes, some of them disease-carrying and other insects. I've been working, as I mentioned, with some people who are trying to put these things into space. There were some really fascinating nanosatellites that are less than 10 grams each, so they have these massive power constraints. If you could actually send them up there and have them doing image processing and then sending back just the information that you need, that gets really interesting.

Just to give you an idea, I won't dwell on this code, but this is not as scary as you might think for doing machine learning code, the fundamental ideas behind machine learning are actually very straightforward mathematics. This is one of the more complicated operations, but if you write it out as reference code, it's basically a bunch of four loops.

This is why having the reference code is so important for us here, and this is why we're actually trying to really get this into people's hands and democratize being able to write this sort of code. You shouldn't have to understand the underlying machine learning to be able to port and write this code to new platforms. What we're really trying to do is have reference codes, is have unit tests, we're not trying to optimize for every single microcontroller platform because there are a lot of them, but we're trying to make it really understandable and have the vendors themselves optimize this.

There is No Killer App

What do I really want you to take away from this? We're still figuring out what the killer app is going to be for these microcontrollers. I know, and I have a certainty for how many of them are out there, their capabilities, and the fact that we're dropping so much of the data we're capturing on the floor, that there are going to be these incredibly strong applications that are going to emerge. One of them that seems closest to being reality is a voice interface on everything; this idea that anything that we're manufacturing, you should be able to go up to and talk to, and have a useful conversation with.

There's so much more that you can get from vision sensors, from accelerometers, from audio sensors up that we're not making use of yet, and that we would love to figure out, "Ok, what can we actually do?" so, one of the reasons I'm up here talking is I'm betting that a bunch of people in this room actually have problems that might be helped with this new idea of using microcontrollers with machine to learning , make sense of the world right out there where the data is being gathered.

I'm really hoping that this might spark some ideas that might spark you thinking about things that you want to actually solve, and start some discussions and start some collaborations around, "Hey, wouldn't it be cool if we could actually do this?"

Think about Your Domain

I'm hoping that you'll go away and think about, ok, in your domain, if we could have this tiny cheap chip that ran essentially forever doing machine learning on sensor data, what would be the cool things that you would be able to do?

This code is all opensource, you can grab it. I will be sharing the slides after this, so you don't need to try and write down these URLs, we have some docs and examples of that too. Please, do reach out to me with any questions or things that you want to talk about, or ideas that you want to run by or anything around this work.

Questions and Answers

Participant 1: A very interesting talk, thank you. The basic premise, if I understand correctly, is that you know what sort of model can be used, and then you are downsizing it to fit it into it. The problem that I face, and I have seen other companies do is, there is a lot of data that is getting generated, but we don't know what model to use. It's really hard to collect and transfer all that data back to the cloud to first figure out the model so that it can be deployed. Have you seen any best practices around that?

Warden: It's one of the biggest problems we face, especially in the opensource world because there are very few opensource data sets around things like predictive maintenance. When you're training a model, you really want data that's coming from the device that you're are actually going to be deploying on, because it's going to have its own distinctive noise characteristics, and ranges. It's really a big challenge, one of the things I had to do to get this voice demo working was actually collect our own opensource data set and try and get 100,000 utterances from volunteers talking into their computer microphones from all over the world, just to be able to train up something, we still need way more data than we actually have in the open. That's the biggest issue with the accuracy of the model, is that we don't have enough data to train it with.

One of the things I'm hoping with platforms like the SparkFun Edge Board that we are passing around is, at least if we can have some standardized devices for things like accelerometers, we'll be able to generate data sets that might be able to make public run well on those, and then maybe figure out how to translate them to other devices if we need to but, at least, have a common prototyping or common few prototyping platforms where we can share data sets with each other.

That's still by far the biggest problem is, how do you actually gather the data? You have to do a bunch of data gathering at the moment before you can even show any value, so, that's very hard for a company to say, "Hey, we need to spend six months and all of this money gathering data and then at the end of it, maybe I'll have a model." I wish I had better answers, but I feel your pain.

Participant 2: Where do we get the slides from?

Moderator: They will be put up on the InfoQ website together with the talk we recorded, and it will be put online later. If you paid to get here, you get early access to things.

See more presentations with transcripts

Recorded at:

Jun 04, 2019

Pete Warden

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?