New patterns and middleware architecture needed for true linear scalability?

Is there a need for new patterns and middleware-architectures when building linearly scalable applications? Nati Shalom, CTO at GigaSpaces, believes that existing middleware designed for a tier-based approach cannot be used for a true linearly scalable architecture. Instead he proposes a new middleware stack based on self-sufficient processing units that supports a partitioned/scale-out model. While Shalom proposes a new middleware stack, Pat Helland at Microsoft, some years ago proposed some new transactional patterns and formalizations to be used in what he calls almost-infinite scalable systems.

Nati Shalom argues that the tiered approach (messaging, data and business processing) is a dead-end, since it introduces a lot of state and "messaging ping-pong" in and between each tier, only for the purpose of keeping the shared data in sync. He means that the tiered approach is bound to provide a non-linear scalability, making it necessary to exponentially add new CPUs in order to achieve linear increase of throughput.

Instead, Nati suggests a different architectural approach where these tiers has been put together in one processing unit, keeping the messaging, data and processing in the same address space. Combined with a share-nothing architecture between processing units, this gives a linear scalable solution by just adding machines as the processing needs increases. This model is of course very suitable for stateless applications, but for stateful applications things get a bit more complicated. Nati previously spoke about how to scale a stateful app by giving two basic rules:

You need to reduce contention on the same data source

You need to remove dependency between the various units in your application. You can only achieve linear scalability if each unit of work is self-sufficient and shares nothing with other units

These are basic principles of scalability. A common pattern to achieve these two principles in a stateful environment is through partitioning, i.e., you break your application into units of work, with each unit handling a specific sub-set of your application data. You can then scale by simply adding more Processing Units.

If the data is dividable in these disjunct sub-sets of application-data, an application can be scaled-out to many independent processing units where each unit owns all the data for a sub-set. An example of typical data that is dividable in this way would be session information for a web application. This partitioning model does not work however when many application processes needs to access/update the same shared data. Shalom says that: "In such cases the data can be referenced through a remote partition i.e. the business logic and messaging would seat in one processing-unit and the data in a remote separate partition - in this way you still the get the scaling at the cost of latency."

But what if the volumes of the shared data is huge? One solution is to partition same type of data into different data-store partitions, but this solution gives two major problems to address:

Aggregation. How to perform query over a non centralized data store (i.e. one query over many data store partitions)
Use of atomic transactions vs. not using atomic transactions. Distributed transactions doesn't scale well, so some other solution is needed.

Shalom gives a solution for the aggregation problem:

You parallelize the query such that each query runs against each partition. By doing so you leverage the CPU and Memory capacity in each partition to truly parallelize your request. Note that the client issuing the queries can be unaware of the physical separation of the partitions and get the aggregated results as if it was running against a single gigantic data store with one main difference - it will be much faster!

In order to find a solution for the atomic transaction problem, we turn to Pat Helland, who has addressed this problem in a paper, "Life beyond Distributed Transactions: an Apostate’s Opinion", he made during his time at Amazon.com. In the paper he concludes that one basically shouldn't use cross-system transactions in large scale systems.

Reacting to the lack of well known terminology for concepts and abstracts used in building scalable systems, Helland defines:

- Entities are collections of named (keyed) data which may be atomically updated within the entity but never atomically updated across entities.

- Activities comprise the collection of state within the entities used to manage messaging relationships with a single partner entity.

Workflow to reach decisions, as have been discussed for many years, functions within activities within entities. It is the fine-grained nature of workflow that is surprising as one looks at almost-infinite scaling.

By this, Helland means that two entities can not be updated in the same transaction. Instead he assumes "multiple disjoint scopes of transactional serializability", where he later in the paper defines this scope as an Entity. An update of multiple entities can not with this definition be performed within a single atomic transaction, but has to be done through messaging across entities in a peer-to-peer fashion between entities. This messaging introduces the need for managing conversational state by itself, and Helland defines this management of state for each entity partner as an Activity. He gives an example:

Consider the processing of an order comprising many items for purchase. Reserving inventory for shipment of each separate item will be a separate activity. There will be an entity for the order and separate entities for each item managed by the warehouse. Transactions cannot be assumed across these entities.

Within the order, each inventory item will be separately managed. The messaging protocol must be separately managed. The per-inventory-item data contained within the order-entity is an activity. While it is not named as such, this pattern frequently exists in large-scale apps.

The lack of transactional atomicity between entities and the messaging introduced by this, causes new problems that lurks it's way all up to the business logic; message retries and processes that must be able to handle idempotence. Also, the need of asynchronous messaging between entity-peers enforces implementation of fine granular workflow's - including tentative operations with the following cancellation/confirm operations.

This architecture desribed by Nati Shalom has been implemented in GigaSpaces platform, which recently released version 6. Pat Helland's paper is timeless and definitely worth a thorough read.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Performance & Scalability topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter