InfoQ Homepage Presentations One to Many: The Story of Sharding at Box
One to Many: The Story of Sharding at Box
Summary
Tamar Bercovici presents Box’s transition from a single MySQL database to a fully sharded MySQL architecture, all the while serving 2 billion queries per day.
Bio
Tamar Bercovici is a Staff Software Engineer at Box where she leads the Data Access Team in scaling Box’s database architecture and ORM layer. Prior to Box, Tamar was an early-stage employee at XMPie (now a Xerox company), where she drove the development of the award winning uImage product. Tamar holds a Ph.D. in Computer Science from the Technion – Israel Institute of Technology.
About the conference
Cloud Tech is the largest gathering of cloud technologists & engineers in the bay area. Our speakers include the top cloud computing entrepreneurs & experts.
Community comments
Funny how websites built with PHP end up re-inventing sharding
by peter lin,
re-inventing sharding
by Robert Sullivan,
Re: re-inventing sharding
by peter lin,
Re: re-inventing sharding
by Den Samo,
Funny how websites built with PHP end up re-inventing sharding
by peter lin,
Your message is awaiting moderation. Thank you for participating in the discussion.
It's 2013 and people are still rolling their own sharding.
re-inventing sharding
by Robert Sullivan,
Your message is awaiting moderation. Thank you for participating in the discussion.
She didn't dwell on why Box's requirements ruled out existing implementations, but probably the 2 billion plus queries per day has something to do with it. When you are talking facebook, twitter, amazon, etc, scaling is everything. Sure, many might ask whether facebook's Thrift looks exactly like CORBA, or why reinvent a PHP compiler, or whether it wouldn't have been easier to run PHP on the JVM than building their own VM, but they'd also say we should be coding in assembly language on the mainframe. As one example, here's what facebook has to say about CORBA:
When you've got a few smart folks on your staff, or even a few PhDs, you've probably got the talent available and can do stuff like this, that give a competitive advantage. And why not?
Re: re-inventing sharding
by peter lin,
Your message is awaiting moderation. Thank you for participating in the discussion.
proper sharding, which used to be called partitioned databases isn't new. There's literally dozens of papers on how to properly shard, manage and scale partitioned databases. From DB2 mainframe's database partitions to federated partitions, there's just far too much prior art to ignore. Partitions will eventually get unbalanced, especially if the partitioning scheme is something like username. Random partitioning tends to require less management, but at some point the partitions need to be rebalanced when new nodes are added or removed.
Having a few smart Phd's is NOT enough to build a robust, scalable and easy to manage partitioned database. There are many lessons that only come from first hand experience using and building partitioned databases.
My point isn't "don't re-invent". My point is look at existing prior art and learned what has been done to avoid making mistakes others have already made. Looking at how many Php shops have re-invented sharding poorly gives me the impression Php developers don't like to spend time reading prior art or making sure they avoid known issues with naive implementations.
Re: re-inventing sharding
by Den Samo,
Your message is awaiting moderation. Thank you for participating in the discussion.
hi Peter,
What papers on sharding would you recommend?
Thank you!