BT

Nobody Needs Reliable Messaging

Posted by Marc de Graauw on Jun 18, 2010 |

A received view in SOA and Web Services is the need for reliable messaging. Reliable messaging is the guarantee that a message sent by a sending application is indeed received at the other end, and received only once. One of the most common objections against REST is that REST doesn't offer reliable messaging. Stefan Tilkov writes: 'It’s often pointed out that there is no equivalent to WS-ReliableMessaging for RESTful HTTP, and many conclude that because of this, it can’t be applied where reliability is an issue (which translates to pretty much every system that has any relevance in business scenarios)'[1]. Tilkov does not agree, of course, and prefers a solution on the application level. Joe Gregorio made a similar point in RESTify DayTrader [2]. Rightly so, the assumption: for business purposes we need reliable messaging, is simply false. The reverse is true: from a business perspective, there is absolutely no need for reliable messaging. If we have well-defined business semantics and business logic, separate reliable messaging is redundant.

Web Services and Reliability

Web Services offer a way of insulating message exchange details from business logic. Basically the idea is we describe our business in terms of services (i.e. 'browse catalog', 'place order', check order status' etc.). The services are implemented by exchanging business documents, which embody the business semantics. If we use Web Services, the business documents are carried in SOAP Envelopes. The SOAP Envelopes also may carry SOAP Headers, which implement pieces of messaging functionality: message security, integrity, addressing, reliability et cetera. Each piece of messaging functionality is independent of the other pieces: i.e. it is possible to do message integrity without reliability, the reverse, neither or both.


The above picture highlights some important features of SOA as implemented using Web Services:

  • the business layer is independent of the messaging layer;
  • Web Services add independent 'plug & play' pieces of message functionality;
  • message headers carry information on required messaging functionality.

Web Services are not always reliable by themselves. The basic problem: if I send a message, say I order a book, and through some netwok glitch the message never arrives, I get no book. Simply resending the message would solve this, as scenario 1 shows.

However, if the message arrives, but the response is lost, resending will not work: if I've ordered a book, I'll receive two books in scenario 2.

Reliable messaging solutions usually solve this problem through acknowledgements, duplicate detection and duplicate removal, as scenario 3 shows.

Web Services offer WS-ReliableMessaging[3] as the standard for reliable messaging. WS-ReliableMessaging can offer several guarantees: that messages sent arrive at least once, at most once, exactly once and/or in order. Since what one usually wants is that messages arrive once and only once, I'll only discuss the 'exactly once' and 'in order' cases. Let's start with in order arrival of messages.

The Right Order

There is a friction between the guarantees of WS-Reliable Messaging and the ordinary processing of messages from a business perspective. A common case where in-order processing is important, is in online banking. If I transfer money from my savings account to my checking account, which is near-zero, and subsequently transfer money from my checking account to a third party, I want the money transfers to be processed in order: otherwise the second transfer might bounce due to insufficient funds. I know this is important: my bank doesn't offer in-order processing, and my transfers regularly bounce when I forget to submit them in two separate sessions...

It seems WS-Reliable Messaging is ideally positioned to cure such situations. However, on further inspection, it's not that crystal-clear. What does WS-Reliable Messaging do to achieve in-order processing? Unsurprisingly, it attaches an incremental sequence number to each message. If the messages arrive out of order (i.e. 2-1-4), the WS-Reliable Messaging software at the receiving end waits till the missing messages arrive, and then submits them to the next layer which contains business logic in-order (i.e. it submits 1 and 2, waits for 3, then submits 3 and 4). The first strange thing is that apparently the order is a property of messages which is important to the business layer. So if it is important to the business layer, why isn't there a sequence number in the business message itself? We have a message, with its own business-level semantics, and the order is important: so why isn't there some element or attribute in the message, on a business level, which indicates the order?

There are two possible answers. First: there is also a sequence number in the payload, with business semantics regarding order attached to it. If there is, why do we need WS-Reliable Messaging? It's doing things twice. Maybe, in a few cases, doing things twice may be the thing to do (say my very fast and efficient WS-Reliable Messaging box can do this really really fast, and then the business layer receives only in-order messages and just checks to make sure) but in general doings things twice makes me wary. It introduces redundancy. If the sequence indicators in the WS-Reliable Messaging headers and in the payload differ, what do I do? How do I make sure the same rules apply to errors in in-order processing in both levels (say a missing message never arrives: do I submit none of the remaining messages, or all with an error condition, or alarm a human to sort things out?). Things will only work if both levels, WS-Reliable Messaging and business, conform to the same logic.

The second answer is: there is no sequence number in the business payload. After all, one might say, we've got WS-Reliable Messaging, so why do we need it? This, frankly, is turning the world upside down. If the order is important on the business level, the business level needs to indicate order, ensure its proper processing and persist it as well. If we make the order, important on a business level, dependent on the order in which messages are shoved into a WS-Reliable Messaging-bus at one side, and the order in which they come out at the other side, we make a permanent feature of the business logic (order) dependent on a transient feature of the message processing: the order in which they come out of a message bus. The WS-ReliableMessaging sequence number is lost after the WS-Reliable Messaging bus has done its work, making any decent logging or auditing impossible. Of course it is possible to attach a new sequence number to the messages, which indicates the order in which the messages came out of the WS-Reliable Messaging box, but still, without logging the entire WS-Reliable Messaging stream this carries little weight for serious auditing purposes. And logging the entire WS-Reliable Messaging stream is certainly possible, but it all seems so much doing things at the wrong places.

Moreover, the in-order processing is not just a feature of the interaction between the WS-Reliable Messaging bus and the business layer. If my bank has resulted from a merger of two banks, and my savings account happens to be in a different database than my checking account, on a different machine in a different location: then just submitting the messages from the WS-Reliable Messaging bus to the business layer in the right order doesn't do the job at all. The business layer needs to make sure the savings software and the checking software do their jobs in the right order too. This example shows how deeply embedded in-order processing may be into the business logic. Implementing this without order indicators in the message payload itself is insanity.

To summarize: if in-order processing of messages is a property of the business we're conducting, we need order indicators in the messages on the business level, with appropriate business semantics and business logic attached. If we follow this simple and sound design guideline, then we don't need WS-Reliable Messaging. Maybe in some cases using WS-ReliableMessaging might be more efficient. From a business perspective, however, functionally there is no need for WS-Reliable Messaging in a properly implemented business layer.

Once, and Only Once

A similar line of reasoning applies to exactly-once delivery. I've got a message for you: and on a business level, it's important that it's delivered once and just once. Say I have a book order: I don't want to receive the same title twice, nor do I want it not at all. Now, if it's important to us, on a business level, that I get my book exactly once, what does the assurance that my message has been received exactly once really bring me? I want to know that your book ordering system has received it. If the WS-Reliable Messaging bus accepts it, and subsequently the book system rejects it because I've entered a wrong client number or a non-existent catalog item, knowing the message has been received brings me preciously little assurance. And even when the message is syntactically and semantically correct, it's no good if the title is out of stock: I want my book, not just the certainty my message has been received well. If the processing of my message once, and exactly once, is important on a business level, I need to confirm the exactly-once processing on a business level. As the figure below shows, the transport acks are pretty meaningless on a business level: we need the business ack.

Some may argue that the WSRM module should do syntax checks on incoming messages as well. And of course a WSRM module could do most of the syntax check, say a schema validation. But look at customer numbers, or catalog items: it is impossible to know whether a certain customer number or catalog item is a valid one without doing a database lookup. And knowing whether a title is in stock isn't possible at all on a syntactical level. There is no way to guarantee that a message will be acceptable to the business level without actually submitting it to the business level. And if the business level may refuse my message, knowing the WSRM module has properly received it is not what I need. I need a business reply, assuring my message has been accepted once, and only once, on a business level. If receiving every message exactly once is important on the business level, the business level should respond with a message saying it has received, and accepted, the message. Again, if this simple design guideline is followed, there is functionally no need for separate reliable messaging from a business point of view.

Let's look at the 'exactly once' requirement in some more detail. Stating that it is important on the business level that every message is received exactly once, like it is in order processing, means that every message constitutes a unique business transaction. Like with in-order processing, WSRM guarantees exactly-once delivery through attaching unique numbers to messages, acknowledging receipt, and possibly resending or duplicate removal. Again, if every message is a unique business transaction, clearly there must be a unique id on the business level: an order id, a reservation number, some unique token. And if we need such a unique token on the business level, the business level should assert it's uniqueness. Uniqueness on the business level should not be dependent on the transient uniqueness on the message level, but must be a persistent feature of the business message, and business semantics must guarantee it.

Idempotency to the Rescue

When the business logic requires in-order processing or exactly once delivery, I clearly need a business reply: the business reply is the only guarantee that, on the business level, my message was received and processed correctly. Simply returning the business reply instead of all the WSRM magic does the trick, and way better than WSRM can do. And what happens if we do implement unique business transaction ids and business acks on the business level? Basically we make every message idempotent on the message transport level. If we have unique business id's, duplicate detection on the business level, and business acks, it is always safe to resend a message on the message level. This makes reliable messaging a no-brainer: if I receive a HTTP 200 OK response (or some other 'success' response) on my message, everything is fine: my message has been received, and if the business response is not sent in the HTTP response, I may wait till I get it. Of course, in implementing the web service, we need to make sure the incoming message is stored on some persistent medium before we respond with '200 OK' - otherwise the message would still be lost if a computer crashes. But with WSRM we would need a similar guarantee, WSRM by itself doesn't offer it. And if I don't receive a '200 OK' through some communication glitch, it is safe to resend the now idempotent message until a response is received.

The Case of Dutch Healthcare

In the Netherlands, we're setting up a national healthcare infrastructure. All healthcare organizations will exchange information through a central Healthcare Information Broker. All healthcare professionals with relevant credentials will be able to access information for their patients through the national exchange. A national standardization organization, Nictiz[4], develops the relevant national standards based on HL7v3, the medical vocabulary and messaging framework, and Web Services.

Originally there was no standard for reliable messaging available: in 2003 and 2004, the turf wars about WS-Reliability versus WS-ReliableMessaging where still going on, and we decided to use a temporary home-grown solution till the dust had settled. In 2008 and 2009 we returned to the reliability issue: since the national exchange was coming up to steam quickly, the temporary solution was no longer viable. We designed a solution based on WS-ReliableMessaging and decided against it. Let's look at some of the details.

In order processing was hardly relevant in our case: exactly once delivery was, or so we thought. We use synchronous communication, SOAP over HTTP, where a message is sent as HTTP request and the business answer is carried over the HTTP response. There are, simplifying a bit, two kinds of transactions:

  1. queries, such as a query for a patients medication history, where the query response is returned as HTTP response;
  2. orders, such as a medical prescription, where a business response (usually a HL7v3 acknowledgement) is carried over the HTTP response.

In the first case, queries, there is simply no need for reliable messaging. If, through some communication glitch, the query or the answer is lost, the query can simply be submitted again. Queries are safe: the state of the server is not changed in any way (other than maybe traffic counters and other non-relevant side-effects).

For orders, the case is different. If a GP sends a prescription to a pharmacist, it is important to know that is has been received, and received only once. If all goes well, there is no problem: the GP sends a prescription, the pharmacist's server returns an HTTP '200 OK' response, and the GP's application reports the prescription has arrived. If things do not go well, there is a problem. If the GP's application does not receive a '200 OK' response, what to do? If the prescription never arrived at the pharmacist's server, it should be resent. If it did not arrive, it may not be resent: that might be interpreted as a second prescription, not a duplicate.

However, prescriptions already carry an unique prescription id.

<Prescription>
    <id extension="0003000201"
        root="2.16.840.1.113883.2.4.6.1.6005465.12.1"/>

This XML fragment shows the prescription identifier in HL7v3 format. The 'root' part is an OID, which is assigned to each healthcare provider's application: no two applications will have the same root attribute. The 'extension' part is a local unique key for the prescription: the same number which appears in print on prescriptions as well. Together they constitute a globally unique identifier.

Since the prescription id is unique, we require a receiver to check for double prescriptions, using the prescription id. If a prescription is received twice, an error message is returned. (In some cases, where needed, we also require the receiver to return a duplicate of the original answer, if it contains information which is needed.) What does this do? Since duplicates are removed on the business level, all messages become idempotent: it is always permitted to resend a message in case of doubt about the communication.

Much of the use case for transport level reliable messaging is gone: if no acknowledgement is received, the prescription is simply resent. The error which is received when the first prescription has been received after all, is as much proof of successful transmission as the original acknowledgement. We've tightened our specs a bit on returning and interpreting this specific error condition, and all of a sudden separate reliable messaging is no longer necessary. It's even hardly an effort: since HL7v3 has a prescription id, which must be unique, each healthcare application has to handle duplicate prescription id's anyway. If the acknowledgement carries information which the GP has to receive, we could require the pharmacist to reconstitute the original acknowledgement and return it on receiving a duplicate prescription: the situation doesn't occur too often though, and for simple acks it is not needed. So a bit of tightening of the business rules has removed the case for separate transport reliability: all it would do is add another layer which assigns unique id's and handles duplicates again. Note that it isn't just WS-ReliableMessaging. It may be the main contender, but for alternatives such as ebMXL Messaging or WS-Reliability, the same line of reasoning applies.

WS-ReliableMessaging alone isn't good enough for synchronous messaging either. In cases where the client is behind a firewall, or has an unreliable mobile connection, the client isn't addressable directly by the server. So the server has no way to resend unacknowledged responses to the client if the HTTP connection is closed unexpectedly. Another WS-* specification is needed for this case: WS-MakeConnection, which enables the client to set up a new HTTP connection and poll for potentially waiting response messages. Since all our traffic in Dutch healthcare is synchronous, this addition is necessary. So instead of upgrading all necessary clients with just WS-ReliableMessaging capabilities, the much newer WS-MakeConnection is also needed (and most clients today simply do not have the necessary libraries yet). WS-MakeConnection also basically makes all synchronous traffic asynchronous in case of failure. While this not necessarily a bad thing, it would make the Dutch healthcare specification much more complex. The WS-* mantra is often: the complexity of the specification is hidden from the developer by the software: install your WS-* libraries, and they will do the magic for you! I never believed in this 'complexity-hiding-philosophy'. Any developer worth her salt will want to know what's going on behind the curtains on this level. It's impossible to debug a live session if you do not even understand whether your traffic is synchronous or secretly split up.

Conclusion

In Dutch healthcare, given the complexities of reliable messaging, and given the fact that the use case from a business perspective mostly evaporates, we've decided not to use reliable messaging on the transport level. With a bit further tightening of the business logic, which requires unique prescriptions anyway, we can have a much simpler solution.

To summarize: if reliability is important on the business level, do it on the business level. A Reliable Messaging layer can handle only generic logic, but that's not what we want: we want business-specific logic for in-order and exactly-once processing. WS-Reliable Messaging (or its competitors) may sometimes have some value in optimizing solutions, especially point-to-point. But from a business perspective, a well-designed business solution does not need reliable messaging.

About the Author

Marc is an independent consultant with over 20 years of IT experience. He specializes in cross-enterprise interoperbility and semantics, and is a frequent speaker and author. Marc lives and works in Amsterdam, the Netherlands.


[1] See Stefan Tilkov, Addressing Doubts about REST (http://www.infoq.com/articles/tilkov-rest-doubts) point 7: REST is unreliable for a discussion.

[2] See Joe Gregorio, RESTify DayTrader, http://bitworking.org/news/201/RESTify-DayTrader

[3] See Paul Fremantle, An Introduction to Web Services Reliable Messaging, for a good overview: http://www.infoq.com/articles/fremantle-wsrm-introduction

[4] http://www.nictiz.nl/

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

But by Duraid Duraid

But isn't TCP reliable? what if it wasn't and we had to code it's reliability logic by hand all the time? and some times we'd get it right and some time not? isn't this reliability at the transport level?

See, business logic is hard and expensive to get right and that's why those software standards exists so that we don't have to take care about one more extra thing.

Re: But by James Watson

There is no coding by hand in the REST approach. The point is that WS-Reliable standard is redundant and adds no value. Basically all of the WS* standards are re-inventing the wheel because the original web service approach was created in order to (ab)use HTTP to get through firewalls.

Re: But by Duraid Duraid

How there is no coding by hand when he says that reliability has to be done in the business tier?

Re: But by James Watson

Reliability always has to be done in the business tier, WS-Reliable doesn't change that. For example: if the a customer doesn't receive a response for their order they must send another request.

In a proper REST design, if you attempt to create the same order twice, it has no effect. There is no special coding around this. It's a property of the architecture. It's kind of like a 3-prong outlet. As long as you don't tamper with the design, you can't plug it in the wrong way.

Re: But by Duraid Duraid

What about the TCP example that I brought up. With TCP you don't have to care about the packet ordering because it's taken care for you at the transport level and I believe that's a good thing, isn't it? Isn't WS-* trying to do the same thing?

Re: But by Stuart Charlton

You're absolutely right that TCP does a level of reliability, as does WS-*. The point, I think, is that these are all practical examples of the "End to End Principle" at work.

The original paper, still worth reading, is here: web.mit.edu/Saltzer/www/publications/endtoend/e...

"[Certain functions] in question can completely and correctly be implemented only with the knowledge and help of the application standing at the end points of the communication system. Therefore, providing that questioned function as a feature of the communication system itself is not
possible. (Sometimes an incomplete version of the function provided by
the communication system may be useful as a performance enhancement.)"

You can't guarantee application-level "reliability" in an application by using a lower layer's subsystem, whether WS-* or TCP. At best, providing those qualities at a lower layer is a performance or modularity optimization. And many times, it is neither -- it gets in the way.

TCP is very well designed and configurable (the way TCP is with adjustable windows, NAGLE disabling, backoff, etc.) but for some applications (e.g. video conferencing, some gaming) it makes sense to avoid it and use UDP.

WS-RM doesn't have the field tests that TCP does, I'm sure it adds some modularity value but it isn't "required" to enable application-level message reliability.

Re: But by Marc de Graauw

Thanks, Stu, for pointing to that paper. Did not know it yet, but indeed makes the same points I'm trying to make in a more general way.

'in order to achieve careful ... transfer, the
application program that performs the transfer must supply a
... end-to-end reliability guarantee'

Same for duplicate removal, later on.

Marc

Re: But by James Watson

What about the TCP example that I brought up. With TCP you don't have to care about the packet ordering because it's taken care for you at the transport level and I believe that's a good thing, isn't it? Isn't WS-* trying to do the same thing?


It's not really meaningful to compare a transport layer protocol with a transaction layer protocol. In my work we use reliable messaging but we would never expose this detail to our service consumers. In my opinion, specs like WS-Reliable violate a number of SOA principles. In practical terms, it's really of no concern to the client how their request is satisfied. And as I think this article makes very clear, if you have a business requirement for ordering, then you must address it at the business layer. The history of computing fads is littered with failed attempts to abstract functional requirements in protocols.

Re: But by Duraid Duraid

James your statement is very contradictory. You say 'The history of computing fads is littered with failed attempts to abstract functional requirements in protocols' but TCP is a protocol and is not a fad.

Re: But by Stuart Charlton

YW, Marc. Yes, it is a deep insight and a cornerstone of Internet architecture, still subject to debate to this day. What's important is that it is an "argument", not a law, or maxim -- there are times where it makes sense (for modularity or performance reasons) to do functions like de-duplication or security at lower layers. But it's important to realize what that does and does not buy you.

Re: But by Stuart Charlton

Duraid -- put another way, how many examples are there of successes like TCP? I'd say this is the exception, not the rule.

TCP has proven to be a very useful abstraction, hitting the right balance of modularity and configurability for most Internet applications.

One can not say the anything close to the same with regards to WS-RM. Proprietary messaging protocols & subsystems such as MQSeries have had much more field testing and deployment. And as anyone who's used MQ will tell you, it ain't necessarily wonderful out of the box (as TCP is).

Re: But by Duraid Duraid

I see your point. The article sounded very strong in abolishing all reliability at the transport level and I thought to mention TCP as a glaring example since it's the air that we breath on the internet.

I agree with you maybe it's hard to get reliability right when the scale is big but maybe there is nothing wrong of doing it in the context of one application or organization and maybe that's the way to go. Don't change the method, reduce the scale.

Re: But by Marc de Graauw

Duraid,

"The article sounded very strong in abolishing all reliability at the transport level" - my point is there is no *functional* need for reliability at the transport level - it should be done at the business level. That does not mean reliability at transport level is always wrong. Also for WSRM, there are cases where it's useful. If your transport is not very reliable, and you're doing a lot of asynchronous messaging, it may make sense to optimize at the transport level. So transport level reliability may be useful as an optimization, but even then, business level acks will still be required.

Do it at the business level? by Paul Fremantle

The history of computing is a set of people trying to provide useful abstractions. Turing completeness says that almost any dumb computing platform can do anything. Similarly you can write everything in assembler language. WS-RM and Reliable Message Queueing are abstractions that can make life easier.

So the first thing that might motivate people to use reliable messaging is to re-use code that is tested and optimized. However, I think Marc, you make a good argument why it doesn't hurt to think about it in business logic if you can.

Unfortunately, many people using reliable messaging systems (like Apache QPid, WS-RM, IBM MQSeries) are not writing business logic from scratch. They are often integrating with existing systems and CANNOT write Idempotent logic. So for all those people who aren't writing logic from scratch, WS-RM and other reliable messaging systems are very useful - and often essential.

2 reliable messages in 1 transaction? by Peter Verkest

Interesting article!
How would you handle the classic example where I want to send 2 messages, one for booking a flight and another one for booking a hotel? Do you propose to implement compensation logic in the business layer (for rolling back the flight boooking) in case the second message (booking the hotel) doesn't return an ACK or returns an error?

Re: Do it at the business level? by Paul Fremantle

Marc

One other mistake in your article is to say that WSRM 1.0 doesn't support working through firewalls. The replay model is documented and supported in most WSRM1.0 stacks and works fine. And all stacks that support WSRM 1.2 also support WS-MC as far as I know, so in either case you get reliable messaging through a firewall.

Paul

Re: Do it at the business level? by Marc de Graauw

Hi Paul,

Glad you respond!

"many people using reliable messaging systems ... are not writing business logic from scratch. They are often integrating with existing systems and CANNOT write Idempotent logic."

True. In the Netherlands, the existing HL7 messages weren't always idempotent. Fortunately we can add extra requirements in our national standards. If not, we indeed could not have implemented reliability on the business level.

That's why I stress in my article that a well-designed business solution does not need transport reliability. And I'd argue that those non-idempotent legacy systems without reliability on the business level aren't designed as well as they should have been; still, there are plenty of those around, and yes, in those cases transport reliability does add value.

Re: Do it at the business level? by Marc de Graauw

Paul,

The Replay Model is indeed a potential solution, but it is not an open standard. So using it means depending on a non-standard extension to an open standard... We considered it, but it didn't make me happy.

As for WSRM 1.2 stacks supporting WSMC, I immediately believe you. However, in Dutch healthcare we're talking about thousands of different software environments, very modern to very old. Fact is we have to be conservative in the what we require, and WSRM 1.2 stacks won't be ubiquitous yet... That the standards and stacks are moving forward is a good thing, and I'd agree that the firewall problem is a temporary thing.

I'm not convinced by George Skalley

By applying the ExactlyOnce message pattern, you will grantee that the message reaches the Application Destination (Pharmacist Prescription Application). The business acknowledgement should also respond using ExactlyOne message pattern albeit in reverse (where the Pharmacist acts as the sender and the GP application is the Receiver). Message correlation would ensure that the business transaction is contained as a whole.

Also employing infrastructure that allows Guaranteed Delivery (store and forward) in conjunction with Durable Subscription patterns for infrequent connections should address these issues and keep WS-RM entirely relevant.

This would prevent the prescription being sent repeatedly and treat business acknowledgements as "equally important", something which your example says it expects, but doesn't deliver. Instead you've bypassed this by allowing extra conversations between the servers to take place and re-interpreting the error response as a successful condition.

After the initial success of sending the prescription, that message should be stored and not repeated. Wait for the response to come (as it will be coming if implemented back using WS-RM) before finalising the transaction.

Business obviously understands that reliability is a serious concern, hence the reason for formulating WS-RM in the first place. I'm fully not convinced that the current state of play is one in which RESTful services do cover this subject areas well enough, as it's reliant on interpretations of messaging patterns developed for WS-RM to be abstracted and supplied by development teams using additional business logic.

Given a requirement for high volumes of mission critical transactions over low latency networks I'd advocate open standards that at least attempt to embody the required patterns over an architecture which does not.

Re: But by James Watson

James your statement is very contradictory. You say 'The history of computing fads is littered with failed attempts to abstract functional requirements in protocols' but TCP is a protocol and is not a fad.


TCP doesn't abstract functional requirements. Specifically, it doesn't guarantee delivery. What is does is hide the details of resending dropped packets and determining when a message failed to be sent correctly. REST Web services and WS* are 'always' implemented over TCP but we still need something else to guarantee that our transactions are delivered. The reason is that, ultimately, only the business requirement of the transaction can tell you whether it has succeeded.

Any system can fail, including reliable messaging systems. Therefore the client will always need to have some ability to submit retries. There is no way to abstract that detail away.

Re: I'm not convinced by James Watson


This would prevent the prescription being sent repeatedly and treat business acknowledgements as "equally important", something which your example says it expects, but doesn't deliver. Instead you've bypassed this by allowing extra conversations between the servers to take place and re-interpreting the error response as a successful condition.


I don't see where you get "re-interpreting the error response as a successful condition" from this but I can offer some information about how this actually works from the US perspective (Paul giving a European perspective.)

Pharmacies are 'client' only. They do not run app-servers that can receive responses. Therefore all requests are synchronous.

A large number of claim requests (perhaps most) that I see are rejected and are submitted 2, 3, sometimes there are a hundred attempts to process a single claim. Sometime there is a timeout or other system error requiring that the client resubmit. Each claim request has a unique ID so duplicate requests are never treated as separate requests is a request were to succeed but not get reported back to the client properly.

None of this is done with SOAP, XML, HTTP or any 'reliable messaging'.

Re: I'm not convinced by James Watson

(Paul giving a European perspective.)


Sorry, I should have written 'Marc' instead of 'Paul'.

Re: Nobody Needs Reliable Messaging by ben wilcock

Great article Marc!

As a retail SOA architect who has designed loads of services, I agree that good idempotency rules and idempotent messages can allow you to create quite a large SOA ecosystem without the need for the WS-* standards. In fact, on my next system I'm going to try and reduce the complexity of the SOAP headers I designed back when I thought that sequence of delivery etc. was important.

Thanks for the write-up and case study. I'll be bookmarking this.

benwilcock.wordpress.com

Nice solution by Robert Sullivan

Thanks for the article - it's always interesting to see how typical problems like this are solved. One thing to clarify here is you are doing reliable messaging, not WS-ReliableMessaging. This requires some logic on the server, to send back the duplicate exists error message, and some logic on the client to keep resending. Perhaps I shall take the liberty of coining the term "Optimistic Messaging" as this reminds me of the Optimistic Locking pattern in web apps, a "soft lock" where you hope that the data didn't change, but if it did, have some error handling logic.

Anyway, there's code involved, admittedly in your situation it's limited, but I will follow Bjarne Stroustrup's philosophy which is that if some functionality is already done in the language, or in this case some other layer, use it.

For example, before the days of web services, and when standards like HTTP-r (I think) and WS-Reliable were in their infancy, MQSeries was used quite heavily for interoperability, and still is, and this feature is built-in, it's only a matter of setting some message properties. It's always interesting to see the newer "silver bullets" come out, much simpler at first since they don't have all the complexity of previous technologies - security, QoS, etc, and then have to frantically add that in, as is the case with WS, and now REST. Then it gets too complicated, and we start all over with the next new new thing.

Re: Nice solution by James Watson

Thanks for the article - it's always interesting to see how typical problems like this are solved. One thing to clarify here is you are doing reliable messaging, not WS-ReliableMessaging. This requires some logic on the server, to send back the duplicate exists error message, and some logic on the client to keep resending.


WS-ReliableMessaging requires this too.

Anyway, there's code involved, admittedly in your situation it's limited, but I will follow Bjarne Stroustrup's philosophy which is that if some functionality is already done in the language, or in this case some other layer, use it.


What code? Do you mean the code that e.g. captures a constraint violation from the DB? You are going to have to do that anyway.

That's the point. a REST architecture already guarantees that duplicates won't be reprocessed. There's no need to build any special message formats or code to do that ala WS-ReliableMessaging.

For example, before the days of web services, and when standards like HTTP-r (I think) and WS-Reliable were in their infancy, MQSeries was used quite heavily for interoperability, and still is, and this feature is built-in, it's only a matter of setting some message properties. It's always interesting to see the newer "silver bullets" come out, much simpler at first since they don't have all the complexity of previous technologies - security, QoS, etc, and then have to frantically add that in, as is the case with WS, and now REST. Then it gets too complicated, and we start all over with the next new new thing.


Who says you can't implement REST using MQ-Series or any other reliable messaging system? We need to be clear that REST is an architectural style. It is not a protocol or technology. Even when we talk about RESTful Webservices (what is really meant by REST here) there is no specific approach that must be followed to satisfy the request. The only thing that REST indicates is the structure of the interface.

Easy to use RESTful resources to help evangelize REST by Juergen Brendel

Thank you for the article, very interesting. There certainly is still a lot of uncertainty in some communities about REST, while others have started to embrace it readily. Your point about the need to implement reliability in the business logic is well made and helpful in this discussion.

We have been working on RESTx - an open source project that makes it very simple and straight forward for business users to create new RESTful resources, merely by filling out a form and very easy for developers to write components to implement RESTful services. Maybe by exposing more users to REST and showing how straight forward and simple resources and architectures can be, this project can help to educate user and developer communities so that we don't have to explain the same things over and over when trying to introduce REST as an architectural option.

1 service by Guy Pardon

Hi,

This all looks very reasonable, and I agree that WS-* is too heavy. But: if you need more than one service to address then you probably need something like TCC: www.atomikos.com/Publications/TryCancelConfirm

Guy

Alternative like diversity is a good thing by Arturo Hernandez

I am evaluating WS-ReliableMessaging even when useful, this article doesn't do it for me. Depending on the effort level I rather do both application and transport. We all know bugs are a fact of life, so why depend on only one thing.

So the cost or effort issue is most likely the determining factor but is only assumed here. Complex systems are harder to maintain, so why add more complexity when it we can kill two birds with one stone. We HAVE to do the application level anyway? Even if WS-RealiableTransaction was easy to configure and maintain. The added reliability is questionable.

The question then becomes. How much effort do we place on making every single transaction reliable at the application layer? Keep in mind not only banking applications benefit from it. If the effort is significant, then we can ask this next question. How much more reliability will we gain from application level reliability, if we have WS-ReliableTransaction? Even if it may be more sound, do we really NEED reliability at the application level?

I guess the answer still depends on the application domain and effort. Others here say they have done it before without WS-RM. In that case they already have reusable code, and they are familiar with it. So it is less effort for them. But for a newbie like me, I don't really have a good idea of how difficult it is to implement
WS-ReliableTransaction.

Silly Rant by Chuck Farley

So instead of letting the infrastructure handling the RM contract and implementation of a mature international standard with incredible momentum, implemented now by just about everybody, we should all reinvent the wheel separately with different from scratch code at the "business level" rather than focus on our core competencies? This nitpicky artical missed the whole point. Not sure if this is some silly RESTful service predjudice or this dutch guy's wooden shoes had some splinters that day. :-)

Re: Silly Rant by Arbenz Guido

Chuck,

the article is trying to say that reliability on a lower level can never guarantee reliability on any higher level. Look for example at TCP. Even though it is reliable - that's the whole point of TCP - why do higher level protocols that sit on top of TCP (be it via HTTP or SMTP etc.) like WS-RM try to add "more reliability"? Isn't the lower level guarantee enough?

The reason is that the lower layer cannot guarantee that there are no errors on higher levels, or the interface between lower and higher levels. There will always be potential points of failure between those levels. Say an application crashes. Now although TCP reliably transported a number of bytes - did the application, before crashing, manage to process (and e.g. store in a database) those bytes or did it not? How is TCP expected to know that? It cannot, and so the higher layer has to ensure that.

The same that I said about TCP and the higher layer is true for any lower level / higher level scenario. Reliability can be guaranteed on a level, but not across levels. And WS-RM is lower level than your application that uses it.

Re: Silly Rant by Arturo Hernandez

On top of saying that the bottom level is not good enough. He thinks once you make your application level reliable, WS is not 'what we want'.

Let's stay with simpler examples. Say I have a known set of files to transfer via ftp. My code could be written to transfer any non-transfered file in my known set. Would that eliminate the benefit of a reliable FTP? That depends on the Reliable FTP implementation, it may very well be that a reliable FTP could recover much quicker than an application level retry.

So in a way the author is making two arguments, a generic flawed argument of reliability at different application levels. And how WS-Reliable may not have worked in particular cases. At the end it did not give me the information I was expecting.

Better and simpler by Ad Gerrits

Nice how your 'business logic at the business level' approach leads tot better and simpler solutions. Often in practice it's hard to convince people that simpler solutions can be better (also nicely pointed to in community.spiceworks.com/topic/308372-reliabili...)

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

32 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT