Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Inside a Self-driving Uber

Inside a Self-driving Uber



Matt Ranney discusses the software components that come together to make a self-driving Uber drive itself, and how they test new software before it is deployed to the fleet.


Matt Ranney works on Uber ATG's simulation systems to exercise autonomy software before it gets onto the road. Before that, he worked on distributed systems, architecture, and performance at Uber.

About the conference is a AI and Machine Learning conference held in San Francisco for developers, architects & technical managers focused on applied AI/ML.


1.3 Million People Die in a Car Crash Every Year

So hello, everyone. Thanks for having me. I'm happy I could come to talk to you today. Are we're up? No, we're not. Okay, great. So I want to start with a pretty staggering statistic, and that is that 1.3 million people die in car crashes every year. That, on average, is over 3,000 deaths a day. While I'm talking here, 100 people are going to die in a car crash. Now, this very, very staggering number doesn't even include the 20 million people every year that will be injured or disabled from cars. Cars are pretty lethal. If you live in the United States and you're a younger person, it is the most likely way that you will die.

So, you can see on this chart here, MV stands for motor vehicle. So the black areas are the categories and the demographics you see, that like if you are between 8 and 24, the most likely way you will die is in a car. This is a really, really big deal. And the reason for all of this is the drivers. It is the human beings. The humans behind the wheels of these cars are causing all of this sort of death and injury. Sometimes like maybe it's the vehicle or the environment, but like the vast majority of this is caused by the humans. So what are these people doing, like what's happening here?

The biggest thing is recognition error, which means you're not paying attention or you just couldn't pay attention, you can't see everywhere at once. It's just there's a lot going on out there on the road. Decision error number two, that means like you took a bad decision, like you decided to break the law. You're driving too fast or you just decided to do something that was unwise. And these are not the kind of decisions that software is going to make; software can fix these problems. And this is the thing that you can work on. You can work on as an engineer.

And for me personally, I'm sure many of you in the audience have lost family members to car crashes. I know that I have; my parents were in a very severe car crash, one of them died. This is the thing that as engineers, you can change. It's not like making slightly better ads. It's like saving lives.

I mean, this is pretty powerful, not only is it saving lives but also, especially as Americans, we spend a ton of time in our cars. And these cars are incredibly expensive and yet most of the time they sit around doing nothing. So, the financial responsibility it takes to sort of own and operate a car is a pretty big burden for a lot of people, and yet this big investment sits idle most of the time.

Self-Driving Cars Will Give Us Our Time and Space Back Idle

In many, especially American cities, huge amounts of land are devoted to parking. Here are a couple of the worst offenders. The red there is surface parking and the yellow is a parking structure and the green is parks, space for people. So, cars really take up a big chunk of our cities. Automating the driver out of this equation will give us our time and our space back. Okay, cool. It sounds very noble and interesting but why specifically, why is Uber doing this?

So the reason is, it's part of Uber's broader business model to improve urban mobility, to remove the requirement for owning a car to get around in many cities. This is directly in line with that business. But so specifically, why Uber? It's because Uber already has a ride-sharing network. And until these vehicles can go everywhere, they'll be able to go somewhere, and as long as they can operate safely in some smaller area, and there are people that want to use them to get around in that area, we can deploy them there. We have a ride-sharing network, so we can sort of slowly feed these self-driving cars onto this network.

So there are also trucks. These trucks drive a lot. The life, especially of a long-haul truck driver, is very hard. The safety consequences are significant. And so we are looking at applying the same hardware and software to make self-driving trucks, as well. So a big question people often ask about self-driving trucks particularly is “What about all the jobs? Aren't you sort of automating away this like major source of employment?” And so we've done this analysis, and the data is up here on GitHub. I would encourage you to download it and poke around with it, if you like.

A bunch of experts have looked at this and according to this analysis, it's actually likely to increase the number of trucking jobs by automating away these sort of dangerous and hard to do long-haul jobs. There are still plenty of short-haul jobs that they're going to take a lot longer to get fully automated and they're easy; they are the kind of jobs that people would rather do. And so, if you have self-driving trucks, how might you deploy them? Well, similarly to ride sharing, on a freight network. So we have a freight network very similar to the ride-sharing network that is currently right now moving freight around, and until the self-driving trucks can go everywhere, they'll be able to go somewhere. And on certain stretches of long-haul routes, some self-driving trucks will augment those portions of that haul.


Okay. So let's get into the hardware. Let's get into the pieces that make this thing go. I've got some slides here that show the car. The stuff on the truck is pretty similar. So you may have seen these before, this is a Volvo XC90 hybrid. It is a plug-in hybrid. It's pretty handy for that, that we can run it off shore power. It's got an ample source of electricity to run all the instruments and looks pretty cool too. So on top, we've got a 360-degree LIDAR. This spins around, paints the world with 64 lasers, and this gives us a 3D image of the surroundings. That's that thing that you see spinning around on top. There are a whole bunch of cameras. There's cameras facing forward, cameras facing backwards, sideways, lots and lots of cameras.

Under the bumper, cleverly concealed are a bunch of radars, and in the back is a bunch of compute storage and cooling. Now, you might think that with all of this gear, this would end up being not a very nice vehicle to be in. But inside the back, it looks like a normal Volvo XC90. That is a picture of just one of the cars when they put the carpeting in place, so still ends up looking pretty good, pretty well-integrated.

So how do people use this thing? Well, the way that you use it, like I said, is on a ride-sharing network. So you would get out the Uber app and you'd hit a button. And if the place where you're going matches the AV's capabilities, we can dispatch an AV to do that trip. And in the back, there's a little tablet that you interact with and off you go. At the moment, we see the safety driver in the front seat. As the capabilities increase, we can prove that it's safer than a human; that person doesn't need to be there anymore.

So we've got some sensors. They feed into some compute and then they drive the vehicle. So we got a system here that is kind of unique compared to a lot of different kinds of software, in that it is like moving a physical object in the real world, and there's a feedback cycle there, that we move the vehicle through space. It interacts with the world and there's no coordination externally, all the decisions are made onboard. There's no kind of remote large computer somewhere else that decides what to do. But there are some external inputs. People tell us where they want to go and there’s sort of a tablet on the inside.


And so if we dig in a little bit more into the software, what kind of software goes on here? So inside the compute, well, so we start with the sensors. The sensors hook up to something and they hook up to a compute blade. This is an X86 computer like you might find in a data center, fairly powerful and it runs Linux. There are normal things that you might have on a computer like storage and so on. On this local storage, we have our maps, our models, as well as all of the logs. So all the data gets logged. As we take sensor data and make decisions about it, that all gets logged. And running on this compute are a bunch of tasks. These tasks are Linux processes. They might have multiple threads, but there are different tasks that sort of drive the autonomy system. And the tasks talk to each other with an event bus. And so this event bus is not entirely unlike, say, a Kafka. But in this case, it's optimized for exchanging events very efficiently for processes running on the same machine. So in the message format that they exchange is not unlike protocol buffers, but it's a different thing that we use for various legacy reasons.

But there's more than one of these compute blades. And so each one runs some different sets of tasks, similarly it's got storage and the tasks on there can talk to each other. But then sometimes the tasks need to communicate and so there is a little bit of networking there. The routers on each blade can talk to each other over a vehicle on the network. So this is kind of interesting because it's like people talk about, "Oh, events are better than our PC in the sort of microservices like scaling your website. Events are this kind of cool technology." That is indeed what we use for communicating between all the processes on this self-driving car.

So, getting even a little bit deeper into what are the actual tasks that run on this thing? Before we can do that, before we can get the details of the function of each of these things is, we have to have a map. And so the map is the way that the vehicle knows where it is. It's a way that it understands what the rules are. And these are not regular maps like you have on your phone with GPS. These are very, very detailed 3D maps. And the way we get these is we use the cars to collect data. So we spin the LIDAR around and we drive roads that we want to operate in autonomously. We calibrate the sensors, that's the cool sensor calibration rig there, and we record the data of what it looks like to drive around on these roads. Then we algorithmically subtract away all of the vehicles and we have humans look at these maps and make sure that they are correct, that all the traffic laws are sort of encoded correctly, that sort of thing. And once all of this happens, we load those maps onto the car. Most of the autonomy capabilities access the map in some way.

Okay, so starting off, there's some kind of input. The rider says like, "I want to go somewhere, please take me there." So the first system that we're going to talk about is navigation. So navigation is how we get around the road network. This takes an input from the Uber dispatch system. So these are where we use regular maps like normal maps you see on your phone. These are road networks, like “take me from this address to that address”. And so the navigation, so now the vehicle knows where would we like it to go.

So then it's time to figure out where we are. And the way we figure out where we are is we use the sensors. We spin the LIDAR around and then we match what we currently see up to the map, the recorded LIDAR data, and that process is called localization. That's how we figure out where we are. So, this gives us incredibly accurate, compared to GPS anyway, localization down to like single-digit centimeters. This is a reliable way of figuring out where you are even in urban canyons or places where GPS doesn't work so well.


So now that we know where we are, we have to figure out what is around us, like where everything is. And for that, we use a system called perception. Perception takes multiple sensor data, data from multiple sensors. It takes the radars, it takes the LIDAR, it takes the cameras, and it fuses them together and produces a single output stream of objects, their type, a classification for those objects as well as their simple trajectory sort of over time. So we want to make sure that we can track an object over time as the same object. And the reason that we use three sensors is every sensor has something that it is particularly good at. I'm not going to get into the details but they're better together than they are separately. So this is mostly learned models.

And in order to build a good model, you need a lot of training data. And so this is label data from driving around San Francisco that human beings drew boxes around. And we have a whole lot of label data that we used to train all of the detectors. This is why we need the sensors calibrated so well, is so that we can provide the labelers with a relevant stream that they can work against. And once we have all that data, that allows us to do this, which is a view of the perception system. And so you can see the yellow are pedestrians and the red are vehicles. And the sort of rainbow radial coming out is the LIDAR landing on something. So you can see how it makes little shadows as the vehicles pass through.

But the thing I want to point out is look at what's going on in this scene; there are 50 pedestrians crossing in front, they're sort of milling around. A vehicle just did this weird U-turn behind us. And if you can imagine, like imagine being a human being driving this car, how many of those objects do you think you can keep track of? I don't know, a few, maybe right in front of you. Maybe your peripheral vision would sort of pick up on some, but we're tracking pedestrians weaving between cars way behind us, and so this is where we can change the game in the safety of operating vehicles because we can track this many objects, and in full 360 degrees, so it doesn't get distracted, doesn't fiddle with stereo, tracks all those objects.


Now that we figured out where all the objects are, we need to figure out where they're going to be. And so to do that, we have a system called prediction. So prediction is a thing that you also do. As a human driver, you mostly unconsciously make judgments about where you think the different actors in a scene are going to go and you make your own plan accordingly. And that's exactly what this system does as well. This system, for every object in the scene, it produces at least one but potentially multiple predictions with a ranking about where these objects are going to be over the next several seconds. This is both discrete decisions, like “We think this person's going to turn right”, as well as continuous predictions about “Where specifically is this person going to be over the next five seconds?”

And we can visualize this a little bit here. So the little dots, the dotted lines, those are the predictions of all the different actors in the scene. So you can see how the vehicles they have the sort of cyan prediction and then the pedestrians will get a sort of orange prediction or a purple one if they're a cyclist. Oh, yeah, here's a more interesting scene - just computing predictions for all of the objects in the scene. So the way that this works is it's a combination of hand-coded algorithms as well as learned models. It's not a pure machine learning solution by any means, but there's definitely some machine learning involved.


So now that we know where we are, we know what's around us, and we know where they're going to be, we want to make a plan for where we should go. And this is the planning system. This is where we have to take into account the navigation. So navigation tells us where we want to go in the road network. And now prediction will tell us places we need to avoid basically because of the other actors. So planning is where we now enforce the traffic laws, the speed limits, the turn restrictions on lanes. That's where all of this gets implemented. And we can see, so the green carpet here, that represents the plan. So that's what our vehicle plans to do for the next, looks like, about 10 seconds there. And you can see, it's sort of constantly updating the plan.

As we make progress, we'll plan to make a left here. And then the plan, of course, also has to get updated as different actors in the scene make different choices. Yeah, it's sort of hard to see there. Anyway, so this is, again, also what you do as a human but you kind of don't think too much about it, sort of an unconscious process. So now we've got a plan for how we want to move the vehicle. And the output of that goes into controls, and controls are the part that don't necessarily understand why they're doing what they're doing. They're just implementing this plan. And these are the parts that connect directly to the vehicle. So this is the steering, the brakes, the accelerator. That is how it works. That is the full pipeline of how the autonomy software works.


How do we ever test such a software? So this is like a ton of software, and like I said, it's in a feedback cycle with the real world. And that makes it a lot harder to test than traditional software, whatever that means. So the main thing that we did was we built a track because we wanted to be able to test these things in a realistic environment that we can control though, and so we can control every aspect of this environment, and so we built this. And this is a facility in Pittsburgh, and we've got traffic lights. We've got a whole bunch of different vehicles. We've got city buses and emergency vehicles, a whole lot of just any other kind of passenger car that you would see on American roads and we have a set of exercises that the vehicles go through with every new software version. We also have automated obstacles, like this one. And this one; note the material the door is made out of, not a steel door. Also it looks like it's been replaced a couple of times perhaps. This is a reasonable thing to do, like build the track and sort of test the software on a vehicle in the real world.

And yeah, that sounds like hard and it is hard, but it's even harder because it's more than just having a bunch of cars. We also need a bunch of vehicle operators, and these vehicle operators go through a lot of training. They learn all about the autonomy system. They learn how all systems work together. It's a multi-week process of training. They do a bunch of time on the track. Like on our track, they go to an actual racetrack and learn performance driving, like getting out of problems. They spend a lot of time before they actually go out on the road. And when they do, this is how they're operating the vehicle - hands on the wheel, ready to take over. And all of this, this process is very carefully, very expensively done. And boy, it is expensive and time consuming.

Online Testing

This is the kind of testing that we call online testing, and it takes a long time because from the time it takes when you release new software to when you can actually get it on the vehicle, get it on the track, and then get it on the road, it can be weeks. It's a pretty long feedback cycle; like imagine if your unit tests took that long to run, that would be not a good way to build software. Obviously, very expensive. There are physical assets we had to purchase and maintain. We have a huge staff that needs to be trained and work on this thing. It's also kind of inconsistent. Even though we have automated obstacles and we can rerun the same scenarios on the track again and again, they're never exactly the same, they're always going to be subtly different because it's the real world, and the real world has more variables, more inconsistency than computers do. And it's also hard to make sure that we have encountered every possible permutation of driving behaviors that need to be handled.

Offline Testing

However, it is absolutely necessary. It's the only way that we can ever tell that it's truly going to work. The software meant to drive a car has to be tested on a car. And this is the part that I work on, it’s simulation. So we call this offline testing because it doesn't use a car. And what we can do to speed this up and work around some of the issues with online testing is we can test the software in a simulator where we get results faster and for a lot less money. Yeah, it's faster because you don't have to wait for it to cycle through and get to the track and into the road, you can just put it right on your computer. It's cheaper, a lot cheaper per mile to run in a simulator than on the road, and it is consistent. You can get the same results, repeatable results, because you control the environment perfectly. You put things where you want them to be and then you will get the same result. And it allows us to get very consistent understanding about of the coverage of the scenarios that we're testing. Much like online, it's still absolutely necessary. It's the only way to evaluate the software.

So, now we've seen this is what the stack looks like on the car. So how do we change it to do simulation? Well, if you look at the sensors and the vehicle, we replace them - so one way that we do it is we have this thing called log SIM. We replace the sensors with logged sensor data and we replace the actual vehicle with a vehicle model. So, we replay sensor data back through the whole stack and then we pretend to follow the controlled output. And we send the pose back into the perception system and we run the software. And this is pretty cool because we can run the full stack, we can run it from perception down. And the problem is that logged actors only do what they did in the log. So if you need a new kind of interaction with an actor because the new software does something different, that is not going to happen because the actors only do what they did in a log.

So here's an example - you'll see a ghost car diverge from the car. So a new version of the software decided to stop and then now these other cars are just plowing through us because that's what happened in the log was those cars moved. So this is unfortunate, but it's useful in a great many scenes that don't require sort of different actor interactions. But if they do require actor interactions, that's when we have to go to virtual SIM. The virtual SIM is where we take the sensors, the log sensors or whatever, and actually all of perception, and we run this in a game engine. We use Unreal and we simulate the world in Unreal and we have the software play this video game of the scenario that we're trying to test. And this is very cool because now we have perfect control of the world algorithmically. If we need a specific type of actor reaction or behavior, that can be programmed. But also, we can explore the space. We can sweep through different parameters and test out scenarios that maybe we haven't even encountered yet.

Here's a simple example of pedestrian interaction. You can vary the speeds and just understand what the performance envelope is and figure out like how this thing is going to work. So one thing that we're not necessarily doing is we're not doing a fancy 3D world like this because we are simulating perception. So what our world mostly looks like is kind of like this, where if we're doing some sensor simulations, so this is a virtual LIDAR and you can see the LIDAR bouncing off objects and that's actually what LIDAR looks like. If you slow it way down, each beam makes like these sort of pulsed dot patterns. So we can simulate in Unreal the way the LIDAR returns would look. And that is a useful technique.

The real challenge with these virtual SIMs is that they're incredibly time-consuming to produce. So to make them realistic like artists, you need videogame level designers to sit there with the authoring tools and tediously create all of these. I mean, this just goes on and on for a long time to make a not even that interesting of scene, just to get all the lines right and then all of the additional details. So building realistic virtual worlds is pretty time-consuming... so this is the kind of view that most engineers look at. It's just sort of simple shapes. So we built a system to take logs like logged data and make virtual SIMs out of them. And so the data is labeled, it runs through the labeling pipeline because we need a lot of labels anyway. And for labeled scenes, we recreate them in the virtual SIM and then that allows us to have reactive actors as well as to permute the behaviors algorithmically.

Okay. So putting it all together, how do you validate new software; how do you make sure that the software that we're going to put on the cars is any better? So somebody writes some new software, they train a new model, first thing we do we test it with a simulation suite. There are a whole bunch of different ways that we test. I showed you log SIM and virtual SIM. We also compare against human driving. There's a whole bunch of different simulation steps that it goes through. And, if it performs better, then we promote it to track testing and it spends ... I forget how long it is, but it's days on the track going through the paces and assuming that it's better on the track than the previous version, then we deploy it to the test fleet. And the test fleet is a small subset of the hundreds of vehicles that are on the road. Assuming that that goes well, then it gets deployed to the full fleet and then this whole time we're extracting all the data, so all the data that the vehicles are logging – they are all logging everything and we offload it all. And after it is all offloaded, we look for interesting events. We use those interesting events to inform what the current problems are and like what we want to work on next. And so then the whole thing starts all over. But you can see that this feedback cycle is lengthy, it takes a long time to get all the way through to the full fleet, which is why simulation is so important.

So I'll show you an example of a problem that we fixed in simulation. So these are some scenes from Pittsburgh where they're coming down a kind of fairly narrow street and somebody is opening their door and they're getting out and the safety driver had to take over because we're too close. And the problem was not that we didn't detect the person. We actually have a pretty good pedestrian detector. The problem was we didn't detect the door. So the door opening is the thing. So we labeled a bunch more data, retrained, and then before we put this software on the road, we took those exact same logs and we ran the new code against it. And you can see the purple path on the bottom is what the new code did. So the original plan was we were going to get too close to that door. The new plan was we were going to give it enough space. And so we could test this entirely in simulation faster and without having put it on the road where we're getting too close to people.

So wrapping it all up here, I want to make a plea to everyone here, come work on this problem, this is a huge problem. This is the leading cause of death in the United States for younger people. This is a thing you can help fix and it doesn't have to be all just machine learning or robotics; we need help from multiple disciplines, although machine learning and robotics are definitely useful things. Me personally, I did this. This is the team I was on in the DARPA Grand Challenge 2005. I didn't know anything about self-driving cars then, but I knew about working on cars and I knew about software networks. And so I got on this team and I helped weld and do control systems for the integration of the accelerator and brake. That was super cool. In the process, I learned a lot about how all of these things fit together to make a working system.

And this is, I think, a good example of where we find ourselves now. I'm not saying we you should all ... this is 2005. These days things are so much more sophisticated and yet we need so much more help. This is like scaling up. It's becoming real; we need people to build the data pipelines and to do everything from making websites so that people can look at the results down to the actual training of models and evaluating their performance; it is truly a full-stack problem in computing. And it's one that can really save people's lives. So I hope if you're not working in this space, you will consider doing so. And that's all I have. Thanks a lot.

See more presentations with transcripts


Recorded at:

Jun 27, 2018