Should developers write their own transaction coordination logic?
Mark Little is the Director of Standards and Development Manager for the JBoss division of Red Hat. He spent most of his career working on distributed computing and distributed transaction protocols. He has been involved in several working groups charted to define a Transaction Protocol for Web Services including BTP, WS-CAF and WS-TX.
In December, Mark blogged on "Large-Scale Distributed Transactions". He explains that as he worked in the past on replication protocols, he came to the conclusion that you need to trade consistency for performance and availability. This is in line with what Werner Vogels explained in his talk at QCon London 2007.
Mark argues that:
It turns out that the same is true for transactions: in fact, it's necessary in Web Services if you want to glue together disparate services and domains, some of which may not be using the same transaction implementation behind the service boundary.
He points out that other people such as Pat Helland are also thinking around the same line of relaxing transactionality. Mark is using the Heisenberg's Uncertainty Principle to illustrate his point:
you can either know the state that all participants will have, but not when; or vice versa.
He also points us to the WS-BP specification as part of the WS-CAF (Composite Application Framework) set of specifications:
One key difference between the business process transaction model and that of traditional 2PC is that it assumes success, that is the BP model is optimistic and assumes the failure case is the minority and can be handled or resolved offline if necessary, or through replay/void/compensation, but not always automatically, often requiring human interaction.
This approach is echoed in the example that Pat uses to illustrate his paper:
Consider a house purchase and the relationships with the escrow company. The buyer enters into an agreement of trust with the escrow company. So does the seller, the mortgage company, and all the other parties involved in the transaction.
When you go to sign papers to buy a house, you do not know the outcome of the deal. You accept that, until escrow closes, you are uncertain. The only party with control over the decision-making is the escrow company. This is a hub-and-spoke collection of two-party relationships that are used to get a large set of parties to agree without use of distributed transactions.
When you consider almost-infinite scaling, it is interesting to think about two-party relationships. By building up from two-party tentative/cancel/confirm (just like traditional workflow).
Greg Pavlik, also co-author of the many Web Services Transaction specifications, is starting to think differently:
the application may subsume the role of the coordinator at which point it becomes less and less clear why the protocol is needed at all above and beyond the business logic itself.
There are patterns here that need to be understood by application developers. Perhaps supported by frameworks. But it may simply be that transaction management won't play a broad role in that context.
Mark responded, that yes,
[He doesn't] see distributed ACID transactions having much of a future in large scale systems
There's still a reliable coordinator in there that controls the state transitions and can "do the right thing" on failure and recovery.
when several software agents are involved in performing a common unit of work, reinventing ad hoc transaction protocols to achieve state alignment is not necessarily a good idea.
Distributed Transactions in a distributed environment
I believe that the reason why were seeing the raise of all this second thoughts on consistency in general and distributed transactions semantics specifically is due to the inherit conflict between distribution, loose coupling and 2pc (two phase commit) protocol that is apposed by current distributed transaction semantics. 2pc introduces runtime dependency between all the so called loosely coupled entities and requires that they will all agree on something at the same time. Obviously that breaks the idea behind loose coupling i.e. separation of concerns, the ability to add new services without breaking or effecting the others, scaling etc. and introduces very tight runtime coupling.
There are several ways in which i see how this is issue is being dealt with using the alternative appraoches suggested in this article. The first one is relaxing the requirement for consistency (eventually consistency for that matter) and the second is splitting of a transaction scope into smaller steps that can be "coordinated" locally and synchronized later on through workflow and messaging. I had written my thoughts on this matter in the following post Lessons from Pat Helland: Life Beyond Distributed Transactions.
The challange that we face with those alterntive approach is that the expose high degree of complexity to the application and often requires re-architecture of the application.
IMO there are ways to abstract large part of that complexity from the application code and enable to maintain consistency and scalability in a distributed environment without exposing the complexity that is imposed by the alternative patterns. Data Grid, XTP (Xtreme Transaction Processing)and SBA (Space Based Architecture) is aimed to provides the means and concrete patterns to abstract that complexity. In a nutshell the idea is to partition your application into groups each containing all the services that have latency or tight consistency dependency between them. We refer to that group as processing-unit. In some cases each group would also collocate the data required for its own processing in that same processing-unit. The processing-unit becomes the unit of scale, and fail-over of our application. By doing so we can handle consistency concern of those services in a single VM (thus avoiding network coordination overhead) and use messaging and workflow to communicate between those groups.
You can read more about that idea here
Write Once Scale Anywhere
Re: Distributed Transactions in a distributed environment
That's an implementation issue or maybe even down to the specific transaction model. But there's nothing inherent in the general concept that would require your statement to be true.
In general I'm pretty disappointed with XTP. However, what you describe in your final paragraph is superficially very similar to WS-BA or WS-BP (neither of which actually need to be tied to Web Services). I'm sure there must be some differences and I'll try to take a look when I get a chance.
Re: Distributed Transactions in a distributed environment
The processing-unit becomes the unit of scale, and fail-over of our application. By doing so we can handle consistency concern of those services in a single VM (thus avoiding network coordination overhead) and use messaging and workflow to communicate between those groups.
Nice, sounds exactly like our _perfectly scalable_ 2PC architecture in the Atomikos product range.
BTW: the trick for successful transaction use in your architectures is using them where appropriate. To reach 'eventually consistent' properties there is nothing better than JTA/XA between the queues involved!!!