Are Cross-Service Transactions A Violation of the Autonomous Tenet of Service Orientation?
WS-TX, the 3rd attempt to create a web service transaction standard since 2001, was ratified as an OASIS standard last May. Evan H. started a discussion last week on the MSDN Distributed Transaction and Services forum by asking the question:
Are distributed transactions (ie.. WS-Transaction) a violation of the "Autonomous" tenet of service orientation?
The goal of a transaction is to achieve state alignment between two or more software agents collaborating on performing a unit of work, whether the unit of work fails (because of a business or technical error) or succeeds.
Transaction standards are typically composed of three elements:
- A Context Management facility which manages the shared state of transaction instances. The state may be passed by value or by reference
- Coordination facility which offer participant registration and transaction protocol management
- Transaction Protocols which are key in achieving the state alignment (whether a communication error happens or a business exception is thrown by one or more participants).
WS-TX chose to design the context management and coordination facilities under the responsibility of the transaction coordinator and specified two transaction protocols: WS-AtomicTransaction (WS-AT) and WS-BusinessActivity (WS-BA).
"The [OASIS] technical committee recognized that there is no single transaction model appropriate for all use cases, and so WS-Transaction defines an extensible coordination framework that accommodates classic two-phase-commit, as well as more relaxed forms of transactions with isolation behavior appropriate in loosely-coupled systems.", said Ian Robinson, co-chair of the WS-TX working group.
Choreology presents a classification of transaction protocols which is worth a reading. In this classification, WS-AT belongs to the “provisional-confirm” category while WS-BA belongs to the “do-compensate” category. Please note that I don’t think WS-AT as being ACID because WS-AT relaxes the isolation requirement. The goal of WS-TX is simply to offer a choice of behaviors between “provisional” and “do”.
So services can be autonomous and yet implement a transaction protocol which can coordinate units of work in which the services participate. Does it hurt scalability? Sure since state alignment is achieved at the expense of extraneous message exchanges between the transaction participants and the coordinator.
The debate in the MSDN forum focuses on two opinions: Juval Löwy –Chief Architect of IDesign.net- suggests that we cannot do without state alignment when several software agents are involved in performing a common unit of work and reinventing transaction ad hoc protocols to achieve this state alignment is not necessarily a good idea. He argues that whenever practical, it is a good thing to use atomic transactions because it is simpler (Note that WS-BA requires specific logic managed by the coordinator to achieve compensation). However, if you cannot hold a provisional state for the duration of the unit of work, then you have to use a different protocol, for instance a do-compensate protocol, such as WS-BA. The trade off here is that you can only use this protocol if you have “tolerance to the truth”.
On the other hand, Roger Sessions, Ollie Richies, Ahmed Nagy and Arnon Rotem-Gal-Oz argue that:
it is a very bad idea to have cross-service transaction”… “Do you really want to flow transactions between services that are spread over large geographical distances, multiple trust authorities, and distinct execution environments? I know I don't.”
They all warn against implementing WS-AT using database locks. Arnon suggests that we should be using two different names for database level transactions and so called “long running transactions”. This type of interaction is often called “sagas”.
Pat Helland, who just returned to Microsoft (though he did not participate in the forum discussion), provides some clarifications when comparing database transactions and SOA transactions.
Inside of a database, the meaning of the data can be interpreted with a clear and crisp sense of "now" provided by the current transaction. Nothing moves when you are in a transaction unless the currently running application that began the transaction changes the data. There is a strong sense of stillness and of now. Inside data is very much what we have historically programmed to.
In SOA (again, how I think of it), we are acknowledging the existence of independent machines. [i.e. Autonomous]
When System-A sends a message to System-B, the data contained in the message will be unlocked before sending it. That means that the data is a historic artifact. System-B can only see what some of System-A's data used to look like. This is an essential aspect of these independent systems which do not share transactions.
Beyond isolation, pat also sees that imposing a consistency constraint may not be the most efficient approach when dealing with distributed transactions:
More and more, I see businesses being willing to loosen Consistency even more than what I was describing above. They are willing to occasionally give the wrong answer because it is more cost-effective.
Many companies claim they never use a transaction protocol and a transaction coordinator to achieve state alignment, yet in effect they do, it is simply not an XA compliant 2-phase-commit transaction protocol. Their systems typically log the operations that are invoked, along with their outcome. At a later time, agents perform the clean up whenever an operation failed and would have resulted in a roll-back. In this case, scalability is improved by not having extraneous communications related to the transaction protocol during the unit of work. The “transaction protocol” in effect happens offline. A friend of mine who once worked for such a company –a large computer manufacturer- told me that they planned to “bribe customers” as part of their transaction protocol when the coordination agents could not return to a satisfactory state, yet, they never had to do it !
To summarize, it looks like everyone is correct in this debate. Both WS-AT and WS-BA are valid transaction protocols for SOA. Neither protocol would violate the autonomy of the participants. However when using WS-AT, it is unwise to try to achieve full ACIDic properties (WS-AT does not require it). Isolation, i.e. the ability for operation invocations to be isolated from the effect of other operation invocations as part of another unit of work, is typically very expensive to achieve and would be responsible for most, if not all, of the scalability problems. In a database, isolation is often implemented via some form of serialization of the processing of the incoming requests, which is almost always unacceptable in the SOA world. Pat suggests that we might even go further and relax the consistency requirement, which goes also in the right direction because it might be impossible to enforce business rules when participants are assembled dynamically in arbitrary units of work. This reinforces the notion of autonomy of the transaction participants. However that state alignment is essential for SOA and regardless of how you will achieve it, you will need to bring together a context management facility, a coordinator and an operation invocation protocol (a.k.a transaction protocol).
OASIS WS-CAF was right
Previous InfoQ coverage
Re: Previous InfoQ coverage
Re: Previous InfoQ coverage
my understanding is that WS-AT does not require ACID transactions. WS-AT is just a state alignment protocol that can be used to align the state of participants in a unit of work. I am not sure that all state alignment in a service oriented architecture can be achieved with a DO / Compensate semantic.
For instance, I use a well known brokerage firm (one of the largest), I don't trade stocks often but If I happen to do two trades in the same day I get this wonderful message that tells me that If I happen to buy more stock than I have money, I will be penalized because my account doesn't do it. Clearly they are using wrongly a do/compensate semantic when a provision/do semantic woul be a lot better, even though they cannot and don't need to implement their transaction in isolation of the rest of the world.
There are plenty of business activities where provision/do is a good fit. I don't see the need for recommending people not to use it. I however agree with you that ACID is not recommended (and not feasible) for SOA.
Re: Previous InfoQ coverage
If you look at the specifications (pretty much any of the versions), you'll see traditional TP assumptions creeping in. These are all to make sure that when you are interoperating between existing heterogeneous ACID transaction systems, things work as you'd expect. But as I said before, of course we can't stop people using WS-AT as an atomic coordinator protocol, and it will work for that too.