Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Massively Scaling MySQL Using Vitess

Massively Scaling MySQL Using Vitess



Sugu Sougoumarane gives an overview of the salient features of Vitess, and at the end, covers some advanced features with a demo.


Sugu Sougoumarane is the Co-Founder and CTO at PlanetScale. He is the co-creator of Vitess, and has been working on it since 2010. Prior to Vitess, he worked on scalability at YouTube and was also part of PayPal in the early days. His recent interest is in distributed systems and consensus algorithms.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


A quick show of hands how many of you have heard of Vitess before? If you wonder what Vitess is and you go search Hacker News, you'll probably not find much. If you go search Stack Overflow, again, you'll see a few references here and there. Twitter, maybe a few more, but then that's because Vitess has been mostly an open secret that very few people know about. Because, if you go and ask serious hardcore software companies what they are doing to solve their data problem, they'll all say that they are either using Vitess, or they're seriously looking at it. So it's kind of a product that is mostly grown from the grassroots, and it's been mostly us developing working hard on it. Vitess is about eight years old now. And over time, what has happened is, one by one, companies that have difficult scalability problems have been slowly adopting Vitess over time. So I will talk about who are the adaptors and stuff.

It's something that I've been excited about. When we started Vitess, this was in 2010, myself and my co-creator, we thought we would be done in maybe six months, one year. This was in YouTube. 2010. YouTube was having a lot of trouble, and we said, "Okay, let's write something that'll protect the database." That's how Vitess was born. We wrote a proxy to do some connection pooling. And lo and behold, people said, "Oh, can you add this feature? Can you add that feature?" And over time Vitess has gone into this huge, pretty big, solid product.

Just to give you an idea what is Vitess, a one-sentence overview of what Vitess is: it's essentially a sharding middleware for currently MySQL, but it could take on other databases also. That'll help you scale it massively, like tens of thousands of nodes, millions of QPS, basically, essentially, indefinitely. So that's what Vitess is, and we scaled it at YouTube. And initially, we decided to open-source it right from the beginning. But it was mostly us using it at YouTube for our own sake. We used to submit the code in open source, import it back, build it, and deploy it within YouTube. But about 2015 or so, Flipkart came on board and said, "Hey, it looks like what you're doing is something that can be useful for us, too." So they took a look at it. We helped them out a bit. And eventually, they went into production. And slowly like that, more and more companies started coming on board. And now, Vitess is now pretty much the de facto standard for sharding MySQL definitely. Hopefully, in the future, other databases also.

In terms of trends, there are two trends that have made Vitess more relevant today than it was when we actually started the project. When we started it, we thought it's going to be like huge companies, like YouTube's and Facebook, that need this kind of product. But now what is happening is the data that people have is exploding so much, that it has now caught up to everything, including transactional data. And saying that you have a terabyte of data, people laugh at you. It's nothing, right? You're talking 50 terabytes, 100 terabytes. But if you look at MySQL instance, it can't take more than a terabyte comfortably. Beyond a terabyte, it starts to run out of steam.

And the other big trend, obviously, is everybody is moving to the cloud. And in a way, data people are kind of left behind. It's mostly applications that are moving to the cloud, and data is kind of being left behind in on-prem situations.

Transactional Data Options

If I have transactional data, what options do I have if I need to scale? Obviously, I can basically milk my MySQL instance as much as possible, keep buying runway, keep buying runway, optimize this, optimize that. Eventually, that is going to run out of steam. So once you decide to go out beyond a single MySQL instance, you actually don't have many options. You don't have any good options, in my opinion. What do you do? Most people have just gone to NoSQL, because at least NoSQL scales forever, right? You can just scale your data, deal with the fact that you have now had a major downgrade in functionality. You don't have transactions. You don't have joins. You don't have all these other cool things that databases give you, but you don't have a choice. You have to scale, so NoSQL is your only solution.

Some companies that have good engineering bandwidth have adopted their own sharding. Basically, they shard their database, put some logic in the application to handle that sharding and have scaled that way, but that has generally been very painful. Usually, sharding is very exciting in the beginning. But after that initial excitement wears off, all you're left with is the pain of maintaining this sharded system. And as it grows, you have to keep scaling it, and that is usually a burden for most companies.

There is a new trend, which is the NewSQL trend, that is coming up. There are a few products there, few projects and products there, but they are still relatively green. In a way, Vitess is a NewSQL solution. Some people may disagree, but it solves it differently. Instead of reinventing the wheel, building it from the ground up, what we do is instead leverage what is already there, basically, shard MySQL, and give you a better abstraction of what is sharded.


The other trend is the fact that data is moving to the cloud. And this is very scary for somebody who has critical data because the cloud is not the same as an on-prem environment. If I am a DBA or a sysadmin that is maintaining data, I would say, "Well, if I want to do maintenance, I will schedule it on a Tuesday evening, tell everyone, and then do whatever it takes to perform the maintenance myself." But in the cloud, that is not the case. It's the software. The cloud decides when it wants to do its maintenance, and it expects the software to react to that maintenance. When it says, "I'm going to take you down," and if you are a master of database, you have to know what to do about it. The cloud expects you to build all these automations. And then if my master database is going to go down, I need to react, and I need to know how to promote another replica as a master. And all that has to happen automatically because it may happen at 2 a.m. in the morning when all of us are sleeping.

The other one thing the cloud expects is, it treats these nodes as mostly disposable. Any time, it'll throw away your node and says, "Well, I'm going to restart your somewhere else." But if that node contains data, that is very, very scary and uncomfortable for that person. And the general approach in cloud is to actually achieve durability through replication. Which means that when you write a data in on-prem, writing data to disk is generally considered a durable operation. But in the cloud, it's not. You'll write it once, and then you have to send that data somewhere else. And the data has to be saved in another machine and then it is called durable.

And then also, you can't have these massive multi-terabyte instances anymore. They want 2 core, 4 core kind of VMs. That's the sweet spot for the cloud. And finally, because the cloud moves things around, you need a good discovery mechanism. So all these things put together have mostly discouraged people from moving their data to the cloud. But it is happening, but much slower. And this is another place where Vitess will help you, because Vitess was actually built in the cloud. It was built to run within Google's cloud, and we actually had to deal with all these problems and solve them in Vitess. And as a matter of fact, when Kubernetes came out, we were the first ones to say, "Hey, we are ready for the cloud because we run in Google. So, what do you think?"


So Vitess at a high level basically gives you the semblance that you're talking to just one database, except underneath it can be a form of MySQL instances. It gives you a unified view of all that data. And it's not 100% acid and perfect, but not many people want that. They just want scalability. Remember, what are your options? If you don't have this, you go to NoSQL. So instead having a sharded database system still continues to give you many of the properties of those databases compared to, for example, moving to a NoSQL system.

And it can scale massively; like I said, tens of thousands of nodes. And it has run in YouTube for many years, and now many other companies are also adopting it. It runs on Kubernetes. It runs on Mesos. It runs on AWS. It runs on GCP. It runs on on-prem. There are use cases of Vitess running in production in all of these environments.

These are some of the stats. As I said, we started in 2010. The project has been steadily gaining popularity. We now are at, what? 6000 plus shards. Lots of commits because it's an old project. The Slack is pretty active actually. We have over 500 Slack members there. They are all people that have found us organically and joined the Slack channel, and I would encourage you to come and join our channel.

But more importantly, it's who is using Vitess. There are not that many users, but who they are. For example, Slack is actually a major user of Vitess. They are actually going to be fully migrating to Vitess. They are about 20% migrated so far. And, as a matter of fact, at 3:00 there is a presentation by Michael Demmer from Slack. Make sure you attend, and I'm sure he'll talk about how they're using Vitess. how many of you have heard of Oh, that's good. That's good. is actually the second largest portal in China. They have a huge Kubernetes instance, and they now, at this point, the last number I got from them, is they run over 10,000 Vitess tablets in their Kubernetes. MySQL in Kubernetes, people used to joke about it, and here's an example where there are 10,000 plus instances of MySQL running. And they've been running it for a couple of years now.

Square. I don't know how many of you know about the Cash App. It is now topping the shards in Android for finance category, fully runs on Vitess. They actually started with one MySQL instance and were about to die. And they deployed Vitess on top and re-sharded it. I believe they are at 32 shards right now, and they're saying they're continuously re-sharding because you can keep doing that with Vitess.

And in reality, this is kind of the bigger picture. You can see some big names here like HubSpot, GitHub, Pinterest. And recently, Twitter has also announced that they are going to be evaluating Vitess. So it's a solid product because it's been used for a long time and now it's proven. And we also got the Community Award from MySQL for application of the year recently.

Growing with Vitess

I'm going to go through it like a story, and show you what Vitess can do for you if you are a person who has databases and is trying to grow them. Initially, you start a company, you have one database. And you connect your app servers, connect to the database, and everything is fine. You could immediately bring Vitess on board, even for a single database. And Vitess gives you some built-in benefits. These are the benefits that we built at YouTube because we were struggling to even just maintain, protect the database, the only database that we have. The first one is connection pooling. MySQL is not good at handling too many connections. So that is an instant gain that you get from Vitess.

But then eventually, programmers write bad queries, programmers write unbounded queries, and all these things happen. When that happens, the penalty is paid by everyone else. So what Vitess does is any misbehaving queries, it will kill you. Any query that is fetching more rows than it should, it will give you an error and, thereby, keeping the rest of the system healthy. If there are too many queries that hit the same row, it will protect that row. For example, it'll actually hold off the other query saying, "Just wait. Let this query return the result, and as I'll share the same result as everybody else because it's the same query so the results are going to be the same."

So there are all these cool optimizations that Vitess has built that you gain immediately. But eventually, you're going to have some scalability problems, your main DB. So usually, the first problem that you face is you can't scale your reads. And obviously, the easiest solution to scale reads is to actually bring some more replicas, and then send some of the re-traffic over to the replicas. And when you do that, the way Vitess helps you is it will route your traffic accordingly. It will load balance across replicas. If one goes down, it will uniformly balance against the others. And if a master goes down, it'll promote a new replica as the master and continue to serve traffic as if nothing has happened as there no downtime at all. So at YouTube, for example, nobody wakes up if a master goes down because everything is automated.

And eventually, what will happen is your writes are going to reach capacity. When that happens, you have to figure out how to shard your data. The first way you shard it is actually you split tables. You say, "Oh, these groups of tables are not related to that group. Let me move them into a different database." And so that's the first thing that you do generally when you run out of capacity. And when you do that, what Vitess gives you is, it's where its magic starts to happen. The application doesn't have to know that this is happening. So you split the data into two, the application just thinks that it's still talking to one database.

But eventually, this approach won't work for you forever because some tables are going to get too big to fit even in one database. So when that happens, you perform horizontal sharding where you actually split tables into smaller parts, and then shard them widely. This is where Vitesse's power truly comes where once you do this, you still don't have to re-write your application at all. Because this Vitess system will look at your query, look at your WHERE clause and say, "Oh, this WHERE clause, I know where this data is. I will send it to that shard, or I'll send it to this shard." So as far as the application is concerned, you're still looking as if your entire data is one database.

And then finally, as your business grows, you are going to be deploying your databases all over the world, multiple data centers, and stuff. Vitess understands those concepts. And as you add all these things, Vitess will continue to grow with your growing data, and your apps will continue to remain simple, like they were in the beginning. And finally, last icing on the cake is eventually, your company is going to make a decision, "Oh, it's time for us to move to the cloud. So when you do that, Vitess knows how to live in the cloud. You can move this entire thing into the cloud, and still continue to operate. Essentially, all your databases appear as one database. This is a reference to a tech culture for a while ago.


If you drill down into Vitess, these are the different parts that Vitess has. This architecture is what makes Vitess cloud-ready. So if you are in a cloud and you just move your MySQL instances there, and the cloud starts to Whac-A-Mole with your databases, every time it brings on a database, you need to update your app saying that, "Oh, this database has moved from here to there, there to here." So that actually makes it hard for deploying a database in the cloud. What Vitess actually has these vtgates that are essentially stateless proxies that can scale elastically up and down based on your need. If one vtgate goes down, the app just connects to through the load balancer to another vtgate and continues to do what it wanted to do.

And that means that what happens in the background about databases going down, new keyspaces coming up and down, are completely isolated from the application. And this allows a system like this to be very healthy and happy in a cloud environment. When the app server sends a query, vtgate is the one that analyzes it and decides where it should be routed. Sometimes, you may have a query where the WHERE clause is ambiguous, which means that they may not be sharding key in the WHERE clause. In that case, it sends it to all shards, gathers the results, and sends them back. If you do a complex join, it'll say, "Oh, even though this join is complex, this is all a single shard join. I'm going to just send it to one shard." Or sometimes it says, "Oh, it's cross-shard join. I'll go fetch some data from this shard, some data from that shard, combine the results and then return it back to you." So as far as you're concerned, you are still looking at a unified database.

And the way the system maintains itself is through this lock server, which can be ZooKeeper, console, or etcd. And what happens is that log server is used for discovery. Every time you bring up a vttablet, it registers itself with the log server, and the vtgates are subscribed to those changes. And as soon as they see a new vttablet come up, it'll start sending traffic to those vttablets.

I talked about vttablet a little bit. It's essentially the protector of every MySQL. So every MySQL instance has a vttablet attached to it. It receives a query, and makes sure that that query is not naughty. If it is naughty, it'll kill it. And vttablet also does housekeeping work like backups, restores. All things that a DBA has to manually do, it can actually do them automated for you. And vtctld is the dashboard. You can watch your entire topology in the dashboard, and you can also initiate workflows. If you want to do re-sharding, you can initiate those workloads using these daemon.

Sharding Concepts

So this is the true secret behind Vitess. Unfortunately, I have a more exciting demo that I want to show you, so I'm going to skim through this. Do go to the later and look at how well we have extended relational concepts into a sharding system. Typically, when you shard a database, the only thing you'd think about is, what's your sharding key and shard it that way, right? But what we have done is we have studied the database, what are its features, what is important, and how many of those features actually make sense in a sharded environment. So, it is not just the sharding key.

The first thing that we did was, just like databases have Schema, Vitess has a VSchema, which is basically, just as a Schema describes how your tables are organized, the VSchema describes how your sharding is organized. And databases have primary keys. Vitess has primary vindex, which is like a sharding key but much more, because in a sharded system, you only choose your sharding key, and the system decides where your shard goes, where your data goes. In Vitess, you'll define your sharding key, and you'll also define the mapping function.

So why we did this is because data is distributed differently in different scenarios. Sometimes, it incrementally grows. Sometimes, you don't want that incremental growth to cause hot shards. Sometimes, you want ranges to be in the same shard. So all those decisions mean that you want to decide how your system is sharded. Primary vindex allows you to choose the column and also what mapping function you want to use to decide where those rows go.

The secondary vindex is another very important concept. What it essentially is, is allowing you to use a WHERE clause, that is other than the sharding keys. It's basically a cross-shard index. But the way we implement it is, it is just another table in Vitess. It's just another table that maps the column to what we call as the key space ID, which is kind of the street address for a row. And the advantage is that you insert a row in your main table. Vitess will automatically populate this table. It'll insert another reverse lookup for the table to be able to find the row later if there is a WHERE clause with that column. And, again, you can have unique, non-unique - this is basically elegantly extending database concepts into a sharded environment.

And there are foreign keys in databases. The parallel to that is shared vindexes; I won't go into those details, but maybe later, if you can ask me, I'll talk about how they are related. And then MySQL has auto increments and Vitess has sequences. So these actually basically reduce the pain of migrating from an unsharded system into a sharded system.

And the best part of Vitess in all of this, is that once you're sharded and suddenly your shards have reached a certain size, and you now have to grow bigger, you can reshard in Vitess with no downtime. And you can do splits and merges, and the application doesn't even know anything about it happening. And it is all done safely with all kinds of data checks to make sure that no data is lost. And this is actually one of the most popular features of Vitess, one of the most loved features of Vitess.


I have about 30 minutes to do this cool demo. So here is example from hell. Let's say we are building a marketplace. And in this marketplace, there are users, or customers, that are going to come in and buy items from merchants. A simple application. But there are three main entities here. There is a user, which is their own entity, there is a merchant, and there are products. If you say, "I'm going to shard the system," you have to split this, and assume these are billions of orders, millions of rows, and etc, right? If you do that, you have to separate the users from the merchants because the user's ID is different from the merchant name. Product, again, has its own key. So this is basically a system that has three major products, three major keys, entry points, right?

The way I have designed it, this is a decision that you have to make as a person who's deciding to shard. “How do I shard the system?” I'm going to say, product is a small table I don't need to shard it. I'm going to put it in an unsharded database, single database. User is going to be massive. It's going to be millions and millions of users, and, therefore, I'm going to put users in their own sharded database. Let's assume two shards for now. In my example, it's just two shards. And merchant, again, is going to be a pretty big database, and they are going to be sharded in their own database separately.

Now I come in and say, "Ok, the order table, when a user places an order, where do I put that table? How do I shard it?" And so far, until even a month ago, we said, “You have to choose.” You have to choose what the strongest relationship that exists between order and the other tables. And the order table has a relationship with product, user, and merchant. In this case, we say, "Let's go with user because that is the strongest relationship," which means that I am now giving up on something. There is no choice. If you are going to shard a system, you have to make this tough decision, which is order is going to live with user. If a merchant's order has to be found out, you have to get the merchant's record and then send essentially a scatter query through all the user shards to find out the orders for that merchant. So far so good.

Also, when an order is placed and you want to know the product information, you have to read the order record first. And then go to the product database and say, "What's the information about this product? What is its price, and whatever else?” So this is a tough problem to solve in a sharded environment and is essentially so far unsolved, but I have a proof of concept that solves this.

Remember I told you about the lookup or the secondary vindex? The problem with the secondary vindex is that you insert, let's say, in this case, the secondary vindex is actually something that maps a username to their ID, right? So I inserted a card into the user database. That means that a reverse lookup record has to be inserted in currently the unsharded database. Now, I issue a commit. This is now a distributed transaction. You have to go to the user commit, and you have to go to the unsharded commit. What happens if it fails in the middle? So that's a problem.

Vitess gives you the ability to do two-phase commits. You can use two-phase commits. It gives you a guarantee that all or none will happen. But two-phase commits, as you know, have their costs. Is there a way to avoid this two-phase commit? It's another problem that we will talk about solving. Let me quickly fire up my cluster while I keep talking. Actually the cluster I am firing up, it´s all within my laptop. It's a full legitimate Vitess cluster. That's another cool thing about Vitess; you can pack a large number of MySQL instances in one box, and they scale actually pretty well.

In this cluster, the cluster is actually going to have nine databases. If time permits, I will show more magic with the last four databases, but, definitely, the first five. The first five are one lookup database, two user shards, and two merchant shards, a live example that I'm going to run. Let's see if my cluster is up. Cluster is up. Let's go. Skip out, go to localhost. This is your e-commerce app, but it's kind of a lower level version of the e-commerce app. What this is actually, is a view of all the tables in the system, and a way for which, instead of a UI that allows you to place orders, it's a UI that allows you to directly run SQL statements into Vitess. Before I go in, let me describe what is here. You can see the product column. You can see these two, user0, user1 are the two shards, and merchant0 and merchant1 are the two shards for the merchant. They are all empty because it's a brand new database.

So Sougou comes in and creates an account for himself. And Sougou has gone into shard0 of user. And then Demmer comes in and creates an account for himself. Demmer is going to be speaking later in the Slack session. He is a major contributor to Vitess, and that's why his name is here. Demmer has ended up in shard1. As you can see, these insert statements are not saying anything about which shard you should go to. They are just issuing more insert. In the same way, I'm going to create some merchants. I create monoprice and I create newegg are two merchants.

As you can see right here, the vindex in play, the primary vindex for user is an integer so there is a hashing algorithm that is used for that. And that's different primary vindex function. Then what do you do for merchant because, in the merchant, it's a merchant name, so we use an MD5# for the merchant, a ---# for the user. Now, we are going to also insert some products. Let's create a keyboard and a monitor. Now the marketplace is ready to accept orders, and we are going to say, "User Suogou comes and buys a keyboard from monoprice." So here what I have done is I have made the choice that orders are going to live with the users. The way I tell Vitess that that is the case, is by saying that the sharding key for order is the user_id. And it uses the same sharding scheme as the user, which means that that row will go with that same user. And here, user Sougou places another order, and this time he buys it from newegg, and he buys a monitor.

Now, if Demmer comes in and places an order that will go into Demmer zone shard, which is in shard1. So far so good? Everyone still with me? Now comes the magic. Here is a big join, it joins user with the order, with the product. Say, "Show me all the users along with their orders, and show me the product description for each and every one of these orders." If I run this query, what should it do?

Product table is separate. What it should do is, it should do a full scatter of user and user order together because it knows they are together. It should split this join into two different joins: one that joins user and user order, the other that joins the product. So issue a scatter query to all shards of the user, get the rows, and then, this is the part that's bad, for each row, fetch the product information. So we run this query, which it did. It says, "My first shard, I run the scatter query, two queries ran on each shard. But for each row, I did a select the product ID." And you can see the results here: Sougou keyboards, Sougou monitor, and Demmer monitor.

This is bad if you want to do millions of rows of joins in this. So how do you solve this problem? Product is a small table. What if we materialized the product table on the fly into each and every one of these shards? So, in other words, I'm going to maintain a live copy of what's happening in product into these user tables. And how do we do this is a feature called V replication. This V replication is what we used in Vitess for sharding and re-sharding. What it does is it subscribes to the binlogs of the MySQL, and looks at every statement and filters based on what value it sees. Which means that it is basically very, very smart about understanding data. It can do a lot of tricks.

What I'm going to do is I'm going to create a table. I've already created the table. I'm going to just reveal it. I am creating a new table called uproduct, and it's in every shard. It's currently empty. And I'm going to say, "Please replicate from …" I have a Python script for this vproduct. You can see here, the way we are going to name this is we are going to name this materialized views, but it's actually a sharded materialized views. What I'm going to say is, basically, "Replicate all the rows that are in the product into my target." There are two targets, which is why there are two commands. These are low-level commands. You will build some more human-readable ways to do this. But today, you can do it.

If I now go and hit Refresh, boom, your product table is now replicated into your user shards. Now let's say if I insert a new product called mouse into simple insert, right, main product, mouse is inserted, instantly replicated into your target shards. And what can you do now, is you can do that big join that we talked about; it becomes a simple scatter. Just send it to user shards because the product table is now local to those shards.

Now let's talk about the other problem. The problem that name_user_idx, the index table is - I'm now going to reveal that table. It is right here, name_user_idx. This table is unpopulated. This table is going to be used to select * from user where username = 'X' because user is sharded by ID. But now my WHERE clause does not have ID; it has a username, which means that we are going to populate this table with the username as the main key and the target being the user_id. Then there is metadata, which I don't have time to go into, it's all part of the VSchema. It tells you if there is a WHERE clause with name in your user table, makes use of this index. And instead of doing a full scatter, make use of this index.

What I'm trying to do is replicate from the user table into this table. And the advantage of replication is that you're not using two PC, but it is guaranteed because replication has delivery guarantee, which means that, eventually, the row will make it into this table. I'm going to run _rename. These are, again, Python scripts, and you can see here the new select statement. It's, again, the same feature that I used previously for replicating product, but now it's a different select statement because I'm not selecting all the columns. I'm only selecting the name and the user_id.

The select statement is actually a materialization of a view. It's as if you did create view name_user_idx as select, right? Once you do this, it refreshes, and there you go. So Sougou and Demmer have shown up in name_user_idx. And now if I issue a query called, select, select * from user where name = 'Sougou', you can see that it went to the lookup, did the select, and used that result to route the query to just shard0.

Let's create a user call. What if we create a user called Rafael? Rafael is another major contributor to Vitess. So Rafael was added here and immediately shows up in name_user_idx, which means that I can now query with a username of where = 'Rafael.' All right. If I change the username, I change Sougou to Sougou1, that also gets reflected. So it's not a blind copy; it's actually a very smart system that understands what is changing, and instead of inserting new rows, it updates what is already there.

More tricks. The last one, this is the toughest problem. This is a join where I'm joining merchant with their orders. I already explained this problem. You have to do a scatter select in all the merchants, take the result, and then scatter it to the user because the merchant could be on any user, right? If I run this query, you can see there's the scatter select. But then for each result that I got, I had to do a scatter again into the user. It's a scatter followed by nested loops scatters. Very, very expensive.

But merchants need to have their orders. I mean, it's a common query, and this needs to be solved. The way we are going to solve this is by replicating your order into the merchant database but sharded by the merchant name. It's basically a crisscross, N X N stream of the source into the destination where the target is, on the fly, re-sharded using a different key.

I'm going to run that magic command, which is vmerchant. I'm going to reveal the table now using i3, and there you go. As you can see, in the source, monoprice was in shard0 and shard1. But in the target, all monoprice orders have ended up in merchant0, and all newegg orders have ended up in merchant1. And if I go and play around with some more queries, let's see. Let's say I insert a new order that, let's say, Rafael places, so newegg order went to shard0. But in the target, it went into shard1. And now, I can do that same join, which is now super cheap because it's all a simple scatter into the merchant. And here we go. And last, again, to prove my point that this is a smart system, let's say I'm going to rename - Sougou changes his mind, changes order from monoprice to newegg. I changed the price, and there you go. That row has now moved from shard0 into shard1.

Cool. Now, what more can we do with this, if you think that this is not enough? Because this is materialized view, because Vitess understands queries, we can do roll ups, not just moving rows. We can actually do on the fly live roll-ups which means that we can do real-time stats for Vitess. Let me show you a quick demo of that.

The theoretical company here is a company that's collecting click data. It has IDs, user_ids, they have some tenants, and etc. Basically, wide rows that have lots of columns. But what they are collecting this data for is so that they can later do analytics to report on what trend is happening and stuff like that. Typically, this data goes into columnar stores. And columnar stores are good for use cases where your analytics queries touch one, two or three columns. But if your queries become more complex, your columnar stores start to run out of steam. You are better off actually bringing some of the data back into row-based storage, which is actually the use case that I'm showing here. I'm going to insert some random events here.

For example, here's another event. There are some events that are getting inserted. And actually, let me even do the replication on the fly. In this replication, I'm going to show you the query here, select. You can see what I've done here. It's a complex query. It goes all the way up to here. If you see here, there are some simple column expressions. But there are also some aggregation expressions like sum, end count. So, in other words, I'm now creating a materialized view using all these expressions, and I'm saying, not only materialize it, also re-shard it by tenant ID. That's the last expression which is key range re-sharded by tenant ID using a hash algorithm.

If you do that, you can see that it's materializing here. Since Vitess speaks MySQL, you can actually run a MySQL client. I have a SQL file that just loads data into the system. And you can see, boom. As soon as you load data, instantly materialized as queryable. You can see, in this case, the results are sums and counts, which means that you can now run more efficient queries on these target databases, and, basically, pull real-time analytics out of the system.

So lots of exciting stuff. Unfortunately, each and every one of these subjects is probably a one-hour talk so I try to crunch everything all together in one. Definitely make sure that you go into the There's a top right click, you can get into our Slack channel, and very friendly people, very excited people. You will see how passionate people are about Vitess when you enter the Slack channel. So get in there, and feel free to ask questions. And any other questions? Yes. There is a mic.

Questions & Answers

Participant 1: Let me ask the question clearly. In one of your queries, you were supposed to ask for data from one of the replicated tables that were re-sharded. Are you smart enough to know which replica of the table to create from U order instead of M order?

Sougoumarane: Correct, yes. The replicated table has a different name. So the U order table was with user and the M order table was with the merchant. The user, at least in my demo, currently you had to request that, "I want merchant to be joining with M order but not with U order." We can look at hiding that- since we have the VSchema where you can represent that relationship, we can make that automatically happen. But we haven't thought about what that means, whether a user would want that control well or whether they won't want that control.

Participant 2: So like a query planner. You can have a distributed coordinator layer, that obstructs that away from user.

Sougoumarane: Correct.

Participant 2: I think that's maybe where your question is going, that the end users sort of needs to know how the data is laid out, as opposed to a query planning layer that would just take care of that. Is that your question?

Sougoumarane: Yes.

Participant 2: Is there a plan to add that in the future or...?

Sougoumarane: This is the first time I'm publically speaking about this feature. I'm pretty sure people will have opinions based on how people want to use this feature. My guess, the thing about Vitess users, the community I know is they want both. Somebody will want full fine control over which tables they want to access, and somebody will say, "I don't want this detail. I will send you the query.” And then in the VSchema, when I say order, optimize based on which is the best way to do it.

Participant 3: Is there a downside to the replication? I mean, why wouldn't someone always, unless there is a delay, want to read their own right kind of consistency?

Sougoumarane: Yes. There are two downsides to the replication. One is you increase storage and network costs, and the other one is the fact that that replication is eventually consistent. Which means that you insert and instantly read from the other. You may not see that row yet. So those are the two downsides that you have to deal with.

Participant 3: My question was with the reverse lookup indexes that you had on, like username and user_id. Can you also have those sharded because if the users make the indexes [inaudible 00:49:19].

Sougoumarane: Yes. That is something that I didn't have time to cover. Because it's just a table, as far as Vitess is concerned, it doesn't care how the table is laid out. So that can be in a single shard or it can be separately sharded.

Participant 4: I have two questions basically. The first one is about backups. Can Vitess help somehow these backups?

Sougoumarane: Yes. Actually that part I didn't have time to cover at all. There are a whole bunch of cluster management features for Vitess, and backups and restores are one of its prime functions. You can pretty much schedule when you want to take backups, and it'll actually take RD only, which is actually a disposable instance, use that for backup, and bring it back online.

Participant 4: Understood. And the second one, you had reverse index for name and users, for when I woke up for users by name and it goes for that table and then for the main table for the user data. So my question was does Vitess cache somehow this data about where, basically, what's the partition for the user?

Sougoumarane: No caching yet. The only cache that currently Vitess relies on is MySQL zone cache, which has been pretty good so far because the cost of a round trip from vtgates to vttablet is about a millisecond. If the row is in Vitesse's cache, it won't take very long to fix the data. But we are not giving up. I have a secret plan to, maybe once we finish all these V replications, to actually build a row cache at the vtgate layer. So it's been brewing in my mind.


See more presentations with transcripts


Recorded at:

Feb 19, 2019