Article: Scalability Best Practices Lessons from eBay

| by Floyd Marinescu Follow 38 Followers on May 28, 2008. Estimated reading time: 1 minute |
At eBay, one of the primary architectural forces they contend with every day is scalability. It colors and drives every architectural and design decision made. With hundreds of millions of users worldwide, over two billion page views a day, and petabytes of data in our systems, this is not a choice - it is a necessity. There are many facets to scalability - transactional, operational, development effort. In this article, eBay Distinguished Engineer Randy Shoup outlines several of the key best practices they have learned over time to scale the transactional throughput of a web-based system.

Read Scalability Best Practices: Lessons from eBay. 

Randy goes into depth on the following best practices:
  • Best Practice #1: Partition by Function
  • Best Practice #2: Split Horizontally
  • Best Practice #3: Avoid Distributed Transactions
  • Best Practice #4: Decouple Functions Asynchronously
  • Best Practice #5: Move Processing To Asynchronous Flows
  • Best Practice #6: Virtualize At All Levels
  • Best Practice #7: Cache Appropriately
Randy conclude: "Scalability is sometimes called a "non-functional requirement," implying that it is unrelated to functionality, and strongly implying that it is less important. Nothing could be further from the truth. Rather, I would say, scalability is a prerequisite to functionality - a "priority-0" requirement, if ever there was one."   eBay has also been presenting at all of our past QCon events, and Dan Pritchett, eBay technical fellow is hosting the architecture case studies track at our next QCon San Francisco (Nov 19-21).

Rate this Article

Adoption Stage

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

I like the way this guy thinks. by ARI ZILKA

After having spoken on a few InfoQ panels with Randy and talking offline, I have always intended to jot down some of his frameworks and concepts for others. Glad he did it himself.

Anyone who questions whether to start simple or not, and whether or not to carefully weigh architecture decisions through a financial lens should seek Randy's guidance.



Re: I like the way this guy thinks. by Randy Shoup

Hi, Ari --

Right back at you! I've very much enjoyed our conversations, both public and private, and have a lot of respect for your approach to problems as well.

Thanks for raising the points on simplicity and economics. As we have discussed, over time I have become increasingly convinced that every one of the decisions we make as architects and system designers -- what should we do and when should we do it -- ultimately comes down to costs and benefits, and those costs and benefits can be denominated in some common currency of money or time. Probably another article in there at some point ...

Glad you enjoyed this article.

Take care,

-- Randy

Re: I like the way this guy thinks. by David Zonsheine

Thanks for this right to the point article.
Even today not so many architects in companies in their first stages are aware of the architectural costs in terms of real money. In my last company, the CEO, a business man that knew nothing about development nor about architecture had a monthly meeting with the chief architect trying to figure out what his architectural decisions mean in terms of budget. He did not understand the answers but the he gained the architect awareness.

Re: I like the way this guy thinks. by Vishal Srivastava

Can I view the converstion between Ari and Randy.

Re: I like the way this guy thinks. by Randy Shoup

I don't remember whether the scalability panel discussions at QCon SF and London were videotaped. If so, they are not yet available on InfoQ, as far as I am aware.

The private discussions were, well, private ;-).

Take care,

-- Randy

Does replication play a part? by Sid Young

Reading this article give me hope there are other who think like me! Just some questions, With 16,000 servers, is clustering used within a partition or is it load balances and smarts to skip dead servers in the app sets and with 400 databases in the design are all DB's replicated? What DB technology is used and what is the preferred mechanism to ensure DB recovery?


Re: Does replication play a part? by Randy Shoup

Hi, Sid --

Glad to offer you hope! ;-) Some answers to your questions:

* eBay's application servers are not clustered in the sense of shared state -- all the application servers are by design completely stateless. Servers in a pool are load-balanced, and the load-balancer can detect a dead server.

* All databases have several copies for availability -- at least one close by for rapid failover, and one far away for disaster recovery. A single instance is the primary. We spread the load by partitioning a single logical set of data into multiple logical database instances ("shards").

* eBay uses Oracle databases. The various copies are there to allow us to recover from different types of failures.


-- Randy

What if eBay would have been built today? by Nati Shalom


I'm happy i had the chance to speak to you in person and learn more about the thoughts that drove eBay architecture that you outlined very nicely in your article.

As we've discussed i think that beyond the architecture principles it would be interesting to know what would have been the implementation choice if you would built eBay today. Obviously the landscape of product and technology choices available today is very different today and could potentially make the implementation of those same principles significantly simpler.

Nati S.


See: Scalable as google simple as Spring presentation from Spring ONE

How will Cloud Services will help? by J Q

Good article. I am thinking How will Cloud Services help? In my view, the cloud services include centeral large scale Queue Service; centeral large scale Storage service, like S3; centeral large scale database service, like bigtable? Any thoughts.
Is there required a platform to leverage Cloud Services?

Avoid distributed transactions? Only when you HAVE to. by Guy Pardon

Some common misconceptions:

-Acid/XA transactions are to be applied everywhere. CAP is absolute. Not so: use XA wisely, and CAP is much less of a limitation...

-XA does not scale. Not so. The recipe:

Of course, making everything one BIG XA transaction all over the place is a mistake...

But: discarding XA is just as short-sighted as well. If you want to have reliable queue processing then you end up doing something like application-level XA, with the same limitations only much less tested, more complex to develop and less robust.

Of course, my opinion is biased (but whose isn't? :-)

Atomikos - Reliability Through Atomicity

Test Automation - 10 (sometimes painful) Lessons Learned by Dominik Dary

eBay's European quality engineering team has broad experience implementing end-to-end test automation in different software development environments (agile, waterfall, co located , outsourced, distributed). This presentation illustrates the key lessons learned from a technical and business perspective:

The video of our presentation you can find on youtube:
1. Write the right tests
2. A tool is not a strategy
3. Automation is software development itself
4. Speak the same language as the developers
5. Everyone knows what's automated
6. Instant Feedback is essential
7. Flip the testing triangle
8. Invest into the test Infrastructure
9. Maintainability is king
10. Manual Testing is still very important

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

11 Discuss