Uber Unveils its Realtime Market Platform
Matt Ranney, Chief Systems Architect at Uber, gave an overview of their dispatch system, responsible for matching Uber's partners, i.e. drivers, and riders. Ranney explained the driving forces that led to a complete rewrite of this system. He described the architectural principles that underpin it, driven by availability and performance, several of the algorithms implemented and why Uber decided to design and implement their own RPC protocol.
The old dispatch system was designed for private transportation (1 driver, 1 vehicle, 1 rider) and built around the concept of moving people, but Uber wants to enter new markets, such as goods transportation. Uber also sharded its data by city. As Uber grew and expanded to ever more cities, some of them quite big, it became difficult to manage its data. Finally, the dispatch system had multiple points of failure, a consequence of Uber's hectic growth and the system's scramble to keep up with that growth.
The new dispatch system has two major services: supply, the drivers, and demand, the riders. These services track all the capabilities and the state machines of supply and demand. For instance, the supply service knows how many seats a vehicle has or if it can fit a wheelchair. The dispatch system has a third service, called Disco (Dispatch Optimization), whose main function is to match supply and demand. Disco enables Uber to "look into the future" and to use information as it comes in. For instance, the old dispatch system only looked to current available supply. As most partners are usually busy, this approach allowed Uber to maintain a global index. The new dispatch system is more efficient, but it requires much more data. Uber wants this new system to handle one million writes a second and a much higher read rate, so it needed to shard its data.
To achieve that kind of scale, Uber chose to use Google's S2 Geometry Library. S2 is able to split a sphere into cells, each with an id. The Earth is roughly spherical, so S2 can represent each square centimeter of it with a 64-bit integer. S2 has two important properties for Uber: it is possible to define each cell's resolution and it is possible to find the cells that cover a given area. Uber uses 3,31 km2 cells to shard its data. All this new data enables Uber to reduce wait times, extra driving by partners and the overall estimated times to arrival (ETA). So, what happens when a rider wants to use Uber? Uber uses the rider location and S2's area coverage function to look for drivers that can be matched with a rider. Uber then chooses the shortest ETA, taking into account not only the drivers who are available, but also those that will become available in time to pick up the rider.
The dispatch system is mostly built with NodeJS, meaning that it is single-threaded. Uber wants to take advantage of all cores of a machine, but it also needs to add new nodes to the system with ease. Ranney also argues that servers need to be stateful, or else the datastores won't be able to cope with the load. Uber thus opted to treat all Dispatch processes the same, whether they are running on the same machine or not. They've built ringpop to handle this problem. Ringpop uses a consistent hash ring, also used by Amazon's Dynamo, memcached or Riak, to distribute state across nodes. To manage cluster membership and failure detection, ringpop uses SWIM, which stands for Scalable Weakly-consistent, Infection-style Process Group Membership Protocol. It is the same gossip protocol that's used by Hashicorp's Serf. Ringpop uses TChannel, also built by Uber, as its RPC protocol.
Most of Uber's architectural choices are driven by availability and performance, as it is easy to drivers and riders turn to the competition. At Uber, everything has to be retryable, thus, idempotent and killable, including databases. Each piece of the system must be built on the assumption that the only way to shutdown a process is by crashing. All these constraints also favour small services so that if any one crashes, then the disruption is contained.
The proliferation of small services and the extreme distribution of them can have an impact on performance: the overall latency of a request is greater or equal than the latency of the slowest component. Ranney likes Google's Jeffrey Dean approach on this subject. For instance, TChannel supports "backup requests with cross server- cancellation". This means that the same request might be sent to two instances of the same service, with a slight delay between the two. The first instance to reply handles the cancelling the request on the second instance, to cut redundant work.
Uber's approach to data center failure is ingenious. No data is replicated across data centers, as that puts a lot of constraints on availability and consistency. Uber uses the driver's phones to distribute the data. Given that the driver's phones post location updates to the server every four seconds, the server periodically replies with an encrypted state digest. If a data center fails the driver will contact a new data center to post a location update. The new data center doesn't know anything about this particular driver so it asks for the state digest and picks up from there.
As mentioned before, the dispatch system is mostly built with NodeJS, but Ranney mentioned Uber wants to switch to io.js, a NodeJS fork. Ranney also briefly talked about other Uber's architecture components. Maps and ETAs are written in several languages, such as C++ and Java, due to the need to integrate with different kinds of services. All their business logic is written in Python. Uber is building their own column-oriented distributed data store but they also use Postgres, Redis, MySQL and Riak.