Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Scalability Principles

Scalability Principles

This item in japanese


At the simplest level, scalability is about doing more of something. This could be responding to more user requests, executing more work or handling more data. While designing software has its complexities, making that software capable of doing lots of work presents its own set of problems. This article presents some principles and guidelines for building scalable software systems.

1. Decrease processing time

One way to increase the amount of work that an application does is to decrease the time taken for individual work units to complete. For example, decreasing the amount of time required to process a user request means that you are able to handle more user requests in the same amount of time. Here are some examples of where this principle is appropriate and some possible realisation strategies.

  • Collocation : reduce any overheads associated with fetching data required for a piece of work, by collocating the data and the code.
  • Caching : if the data and the code can't be collocated, cache the data to reduce the overhead of fetching it over and over again.
  • Pooling : reduce the overhead associated with using expensive resources by pooling them.
  • Parallelization : decrease the time taken to complete a unit of work by decomposing the problem and parallelizing the individual steps.
  • Partitioning : concentrate related processing as close together as possible, by partitioning the code and collocating related partitions.
  • Remoting : reduce the amount of time spent accessing remote services by, for example, making the interfaces more coarse-grained. It's also worth remembering that remote vs local is an explicit design decision not a switch and to consider the first law of distributed computing - do not distribute your objects.

As software developers, we tend to introduce abstractions and layers where they are often not required. Yes, these concepts are great tools for decoupling software components, but they have a tendency to increase complexity and impact performance, particularly if you're converting between data representations at each layer. Therefore, the other way in which processing time can be minimised is to ensure that the abstractions aren't too abstract and that there's not too much layering. In addition, it's worth understanding the cost of runtime services that we take for granted because, unless they have a specific service level agreement, it's possible that these could end up being the bottlenecks in our applications.

2. Partition

Decreasing the processing time associated with a particular work unit will get you so far, but ultimately you'll need to scale out your system when you reach the limits of a single process deployment. In a typical web application, scaling out could be as easy as starting up additional web servers to handle the user requests and load balancing between them. What you might find, however, is that parts of your overall architecture will start to become points of contention because everything will get busy at the same time. A good example is a single database server sitting behind all those web servers. When that starts to become the bottleneck, you have to change your approach and one way to do this is to adopt a partitioning strategy. Put simply, this involves breaking up that single piece of the architecture into smaller more manageable chunks. Partitioning that single element into smaller chunks allows you to scale them out and this is exactly the technique that large sites such as eBay use to ensure that their architectures scale. Partitioning is a good solution, although you may find that you trade-off consistency.

As to how you partition your system, well that depends. Truly stateless components can simply be scaled out and the work load balanced between them, ideally with all instances of the component running in an active manner. If, on the other hand, there is state that needs to be maintained, you need to find a workload partitioning strategy that will allow you to have multiple instances of those stateful components, where each instance is responsible for a distinct subset of the work and/or data.

3. Scalability is about concurrency

Scalability is inherently about concurrency; after all, it's about doing more work at the same time. Technologies such as the early versions of Enterprise JavaBeans (EJB) attempted to provide a simplified programming model and encouraged us to write components that were single-threaded. Unfortunately, these components typically had dependencies on other components and this led to concurrency problems. If concurrency isn't thought about, you have systems where data can easily become corrupted. On the other hand, too many guards around concurrency lead to systems that are essentially serial in nature and limited in the degree to which they can scale. Concurrent programming isn't that hard to do, but there are some simple principles that can help when building scalable systems.

  • If you do need to hold locks (e.g. local objects, database objects, etc), try to hold them for as little time as possible.
  • Try to minimize contention of shared resources and try to take any contention off of the critical processing path (e.g. by scheduling work asynchronously).
  • Any design for concurrency needs to be done up-front, so that it's well understood which resources can be shared safely and where potential scalability bottlenecks will be.

4. Requirements must be known

In order to build a successful software system, you need to know what your goals are and what you're aiming for. While the functional requirements are often well-known, it's the non-functional requirements (or system qualities) that are usually absent. If you do genuinely need to build a piece of software that is highly scalable, then you need to understand the following types of things up-front for the critical components/workflows.

  • Target average and peak performance (i.e. response time, latency, etc).
  • Target average and peak load (i.e. concurrent users, message volumes, etc).
  • Acceptable limits for performance and scalability.

It might be that performance isn't critical, but you need to know this information as early as possible, because your approach to dealing with scalability will be driven by the performance requirements.

5. Test continuously

Once you understand the requirements you can start designing and building the solution. The design that we come up with and the code that we write is static in nature, so you can never quite tell how it will work until it is executed. It's for this reason, then, that all decisions on performance and scalability should be backed up by evidence, and this evidence should be gathered and reviewed from the start of the project and on a continuous basis thereafter. In other words; set measurable goals throughout the system, verify and measure the real performance and consider performance at all stages of the project.

One of the most frequently made mistakes is that our view of a system's performance and scalability can be clouded by our own experience or hearsay. Satisfying the non-functional qualities of a system is one reason why you may need to review the other decisions on your project. For example, the non-functional requirements might influence you to choose to not use a standard or to use something that might not be considered mainstream/fashionable. Non-functional requirements may invalidate religious dogma, and evidence trumps dogma.

6. Architect up front

Probably the most important principle for building scalable systems is that, if you need your system to exhibit this characteristic, you have to design it in up front. One of the pitfalls that many people (including myself) have fallen into, particularly during the early days of J2EE, was that you could build an application and have it automatically scale up and scale out. Applications designed to scale out will almost always scale up, yet applications designed to scale up will almost never scale out. Most applications can be scaled up by running them on more powerful hardware, but scaling out is a more complex problem. For example, how do you ensure that data remains consistent between application instances? And how do you make your singletons and synchronized code blocks work across processes?

Of course, thinking about this stuff up front is not necessarily the same as doing a waterfall style big design up front. Iterative and agile processes are there to help us, providing a framework in which we can do just enough design in order to be able to solve the problem. Just be pragmatic. Oh, and despite how good we think we are at designing scalable applications, it's always best to act as if you can't trust yourself and write/test code as early as possible.

7. Look at the bigger picture

Finally, remember to take the bigger picture view - look at the wood before looking at the trees. It's really easy for us to get carried away tuning components at the fine-grained code level, but ultimately it's the system as a whole that needs to be optimised. Focus on the end-to-end performance and scalability, sacrificing local optimisations if necessary. If you need to use a profiling tool to identify bottlenecks, then do so, but don't start doing this until you have a view of the end-to-end performance. Since performance is inversely related to the aggregation of all latencies throughout the system, any operation whose latency increases relative to load will become a problem. Having said that, if you're struggling to meet your performance and scalability goals, it's worth questioning whether you have chosen the right architecture. Again, look at the bigger picture and ensure that somebody is taking on the architect role.


This article has presented a number of principles and guidelines for building scalable applications, covering a number of different aspects of the software development process. The best advice I can give to anybody building scalable systems is that you need to explicitly think about and design your system. Scalability isn't magic, but it doesn't come for free. On a final note, while it might be true that faster hardware can save your ass, don't count on it!

About the article

The majority of the principles for this article have been sourced from some notes taken during a scalability discussion that took place at a private summit for architects held in London, UK, in late 2005. The summit was organized by Alexis Richardson, Floyd Marinescu, Rod Johnson, John Davies, and Steve Ross-Talbot. The video entitled "JP Rangaswami on open source in the enterprise & the future of information" is also from the summit.

About the author

Simon is a hands-on software architect who works within Detica's Global Financial Markets group. Simon has been involved in projects ranging from desktop clients and web applications through to highly scalable distributed systems and service-oriented architectures (SOA). His specialist technology is Java and, as a hands-on technical authority, he's called upon to advise and shape solutions; defining, delivering and assuring that the chosen architecture is fit for purpose and meets the non-functional requirements. Simon has written and co-written a number of books about Java EE web technologies, spoken at several conferences and founded Coding the Architecture - a website that presents a practical and pragmatic view of software architecture.

Rate this Article


Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • Up-front Concurrency Design

    by Luiz Esmiralha,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for your article, Simon. It will prove useful in my next projects!

    But, I am somewhat puzzled by your affirmation that "any design for concurrency must be done up-front". Can you ellaborate further on the reasons for that, if possible with a real-world scenario where postponing concurrency design to a later moment proves to be a very expensive decision?

  • Re: Up-front Concurrency Design

    by Simon Brown,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks Luiz, I'm glad that you found the article useful.

    Adding concurrency in later is definitely possible ... I just think it's trickier. If I write something with concurrency in mind, I tend to *test* it with concurrency in mind. If I add concurrency afterwards, I tend to not test it as thoroughly and/or introduce a bunch of nasty side-effects!

    This principle can be applied to data too. If you're building a big distributed system, thinking about concurrent data access (e.g. how data will be locked/synchronized/shared) is easier to do when you have a blank canvas. As another example, think about what you might need to do to add concurrency features to a GUI application - you'd need to figure out your concurrency strategy (e.g. pessimistic locking by the user vs optimistic locking by the application) and then modify code right from the GUI through to the back-end.

    At the end of the day, there's no "right answer". I just find that I make a better job of concurrency if I think about it up front.

  • good article

    by Surender Kumar,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks for a good article Simon. I believe it states the points inline with KISS approach which we tend to overlook in quest for abstraction and loose-coupling. My current simple application gave me a throughput of ~20 requests/second on a single CPU linux server which I definitely want to improve. I may not know what's optimum. Is the count an embarrassment :-) or is it okay? Can you please cite some statistics with respect to this. Also increasing the servers in a cluster I'm sure of increasing it many folds.

  • Re: good article

    by Simon Brown,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Thanks ... exactly, simple is good.

    Unfortunately, the answer to your question is "it depends". ~20 requests/sec doesn't sound much, but you don't say what each of those requests does. If they are very large in nature, then "20" might be an excellent result. Increasing the number of servers will help you scale this number, but it might not provide you with linear scalability. That too depends on things like shared state, contention and so on.

    The best advice I can give is this - if your project sponsors are happy with the performance/scalability of your system, then your job is done. If you need additional scale, then you need to get another server and see what sort of numbers you get out. Stats are useful, but not as useful as testing your software yourself. :-)

  • Good principles

    by Randy Shoup,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi, Simon --

    Good summary of some critical scaling principles. You will see your points on partitioning echoed in my article on Scalability Best Practices: Lessons from eBay.

    I particularly like the point that scaling is about concurrency. Very clearly stated. That is, after all, the fundamental reason why partitioning helps.

    Ditto the point that scaling out rarely comes for free. If your only option is to scale up, sooner or later you will run out of runway. eBay has seen this time and again in its history, and the rearchitecture efforts to remove those bottlenecks (first in the database, and then in search) were long and painful. What I would add, though, is that this does not necessarily mean that it is wrong to design such a system -- just that it is important to be aware that such a system will not scale. While it is inarguably cheaper to design in scaling from the beginning, the additional time and effort it requires may not be worth it at that moment. Just make that tradeoff in full recognition of the fact that when the time comes, it will be more expensive than it otherwise would have been.

    Take care,

    -- Randy

  • Re: good article

    by Surender Kumar,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    thanks for the comments simon. Will test/profile to gauge the optimal throughput. Good work on your site btw.
    keep up.

  • advises for junior programmers

    by Giuseppe Proment,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This article is enough just for junior programmers, what about something more profound like patterns for how to build a scalable domain model ?

  • Nope. Concurrency is not about Scalability

    by navr sale,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    You got it all wrong. A system can scale well within performance metrics (throughput, responsiveness) yet fail concurrency metric. For example, if the concurrency requirement is 1000 and system can only support 100, yet at the same time the system will scale linearly as the total number of users and data increases in time all the way to the projected targets. But If the concurrency was afterthought, it is too late to address it even when more servers are added to serve requests (adding servers is about throughput and not concurrency). There are inherent limitations in the system like internal TCP/IP queues, switches, routers, maximum number of threads in the pool etc. Once the system is delivered and the peak throughput is over the capacity, it is too late. A system should specify concurrency requirement specifically before the implementation starts. Most of the time it is not possible to remediate later on.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p