An Introduction to Web Services Reliable Messaging
The OASIS WS-RX Technical Committee recently released the Web Services Reliable Messaging 1.1 specification for public review. As one of the two co-chairs of the committee, this seemed like a really good time to provide an introduction to WSRM and an overview of the specification. This article provides an introduction to the specification and talks about how it might be used in real systems. It is based on the WSRM 1.1 Committee Draft 4 which is available for public review.
Web Services Reliable Messaging (WSRM) is a specification that allows two systems to send messages between each other reliably. The aim of this is to ensure that messages are transferred properly from the sender to the receiver. Reliable Messaging is a complex thing to define, but you can think about WSRM as providing a similar level of guarantee for XML messaging that a JMS system provides in the Java world. There is one key difference though - JMS is a standard API or programming model, with lots of different implementations and wire-protocols underneath it. WSRM is the opposite - a standard wire-protocol with no API or programming model of its own. Instead it composes with existing SOAP-based systems. Later in the article I will address the exact meaning of reliability and what sort of guarantees the specification offers.
Before I explain the wire protocol, I'd like to explain the way it fits into an existing SOAP interaction. Unlike a queue-based system, WSRM is almost transparent to the existing applications. In a queue-based system, there is an explicit third party (the queue) where messages the sender must put messages and the receiver get messages from. In RM, there are handlers or agents that sit inside the client's and server's SOAP processing engines and transfer messages, handle retries and do delivery. These agents aren't necessarily visible at the application level, they simply ensure that the messages get re-transmitted if lost or undelivered. So if, for example, you have set up a SOAP/JMS system to do reliable SOAP messaging, you will have had to define queues, and change the URLs and endpoints of the service to use those queues. In WSRM that isn't necessary, because it fits into the existing HTTP (or other) naming scheme and URLs.
In WSRM there are logically two of these agents - the RM Source (RMS) and the RM Destination (RMD). They may be implemented by one or more handlers in any given SOAP stack.
The RM Source:
- Requests creation and termination of the reliability contract
- Adds reliability headers into messages
- Resends messages if necessary
The RM Destination:
- Responds to requests to create and terminate a reliability contract
- Accepts and acknowledges messages
- (Optionally) drops duplicate messages
- (Optionally) holds back out-of-order messages until missing messages arrive
It is important not to confuse the Source and Destination with the "service client/requester" and "service server/provider". In a two-way reliable scenario (where both requests and responses are delivered reliably) there will be an RMS and an RMD in the client, and the same in the server.
The main concept in WSRM is that of a Sequence. A sequence can be thought of as the "reliability contract" under which the RMS and RMD agree to reliably transfer messages from the sender to the receiver. Each sequence has a lifetime, which could range from being very short (I create a sequence, deliver a few messages, and terminate) to very long. In fact the default maximum number of messages in a sequence is 2^63, which is equivalent to sending 1000 messages a second for the next 292 million years!
A Sequence is created using a CreateSequence interaction, and terminated when finished with a TerminateSequence interaction.
Example of a CreateSequence message:
Each message in a sequence has a message number, which starts at one and increments by one for each message.
Example of a Sequence Header and message number:
The message number is used to Acknowledge the message in an SequenceAcknowledgement header.
Example of a SequenceAcknowledgement Header:
<wsrm:acknowledgementrange lower="1" upper="1" />
<wsrm:acknowledgementrange lower="3" upper="3" />
Example One-Way Scenario
Let's walk through a simple example. For simplicity we will add reliability to a one-way interaction so in this case there is just an RMS in the client and just an RMD in the server. After this I'll talk through some of the options.
- The client wants to send an application message, so the the RMS first sends a CreateSequence message to the same URL as the application messages go to, and
- The RMD intercepts the message and responds with a CreateSequenceResponse. This includes the all important SequenceID which is the identifier by which this sequence will be known
- The RMS now adds a Sequence header into the original application message. This has the SequenceID and the message number (in this case it will be 1).
- The RMS continues to add incrementing Sequence headers into application messages.
- The RMD delivers these messages to the server application, maintaining any guarantees that it offers, such as exactly-once and in-order
- According to its timing policy, at some point the RMD will send SequenceAcknowledgements back to the RMS. When an RMS creates a sequence, it passes an address for acknowledgements (the AcksTo address) to the RMD. In this particular scenario, we will assume that the AcksTo address is the WS-A anonymous URI - which implies you use the transport backchannel. In this case the RMD will send the acknowledgement on the HTTP response channel. Because this is a one-way interaction, there is no SOAP envelope flowing back to the client, so the RMD will create an empty SOAP envelope, add the header, and return it on the HTTP response. The RMS will pick this up before it gets to the client application.
Note that the acknowledgement isn't just for one message, it acknowledges all the messages successfully received by the RMD.
- If there are any missing messages, the RMS will resend those
- Once the RMS has had all the messages that it has sent acknowledged, it can terminate the sequence. To do this is sends a TerminateSequence message to the RMD.
- The RMD responds to the RMS with a TerminateSequenceResponse, and
- That's all folks!
Actually, spelt out in that level of detail it seems like quite a lot, but if we recap, there were two extra service calls (Create and Terminate), and then a few extra headers floating around. I don't think that is unnecessary overhead. At one point an early draft of the spec had an inline or implicit CreateSequence. Unfortunately, that left the first message in doubt. The current design means that once you have successfully created a sequence, you have a "contract" with the other end to deliver messages. In most implementations, if no TerminateSequence is sent the sequence will be timed out automatically. And of course, you do get extra message flows if messages are lost, as in that case they will have to be resent.
So what could have gone differently? In other words, what options are there?
Well firstly, the acknowledgements don't have to use the backchannel. The RMS can open up its own HTTP port (or other endpoint) to receive acknowledgements on. This is specified in the AcksTo address. If the AcksTo address is the same as the WS-Addressing ReplyTo address, the RMD may piggyback acknowledgements in response messages flowing back to the client in some circumstances.
Secondly, the RMD doesn't have to acknowledge the messages it has received. Instead, if it is missing just one message in a million, it can Nack just the missing message. This is like a prompt to the RMS saying, I'd really like this missing message. Thirdly, the RMS could have requested an acknowledgement. Suppose the RMD is set to only acknowledge rarely (minimizing extra bandwidth), but the RMS wants to clean up its store of messages, then it can ask for an acknowledgement by adding an AckRequested header. The RMD will respond immediately with a SequenceAcknowledgement.
Closing a sequence
The other thing that could have been different is that maybe for some reason the RMS might decide to shut down the sequence before all messages are delivered. Why? Maybe my server is being closed down and I want to clean up in an orderly manner, or maybe there is one message that In this case, its tricky. Once I terminate the sequence, I can't ask for an acknowledgement, because the RMD will have cleared its state. If I ask for an acknowledgement first and then terminate, I might not get a true picture - maybe some extra messages might end up being delivered after I receive the SequenceAcknowledgement but before the Terminate happens. Arggg.
Well, we thought of this. So, we added the ability to Close a sequence. This basically is an extra interaction that allows the RMS to say that it won't be delivering any more messages. The RMD then responds with a Final sequence acknowledgement showing the ultimate state of delivered messages. After that its ok to terminate the sequence.
In the case of request response, there is very little difference, except that there is a sequence in each direction. The sequences are independent - so there is no linkage between transmission of the messages on one sequence with transmission of the messages on the other sequence. The only "linkage" is that you can optimize the creation of the two sequences by sending an Offer of a return sequence in the outgoing CreateSequence.
Imagine you are a client and it is clear that there will be a two-way reliable connection. In that case the client can create a sequence and Offer it to the server for responses. Effectively this lets you create two sequences in one message exchange. However, after that the sequences are independent: for example you can terminate one and still use the other.
Most internet users can't just start up an HTTP server on their machines and have other systems connect in. The problem doesn't come with running an HTTP server - that's simple. The real problem comes with getting the packets to your machine. For example, many home users have a broadband router/firewall that performs Network Address Translation. Without complex configuration these will drop all inbound packets. Similarly if I walk into a coffee shop and use the wireless LAN, I have the same problem - my IP address isn't globally accessible. Why do we care? Well, if I just want to do one-way reliable, then this doesn't matter. In fact, in the example above we showed how it works. By piggybacking the acks on the HTTP response flow, everything works just fine. But if I have a request-response flow, things change.
Suppose a response goes missing. The server wants to resend that message to the client. But the client isn't addressable. There is no open connection to resend the message on, and no way of the server opening one. Help!!!
MakeConnection to the rescue
MakeConnection is a simple one-way message that logically flows from the client to the server. By opening up an HTTP connection, this allows the server to respond with any "queued" or waiting messages that need to be transmitted to the client. Effectively the client "polls" the server every once in a while for any waiting messages. If you think about this carefully, you will see that this message flows from the RMD to the RMS, because it is designed for the return (response) path. Effectively the client's RMD is asking the server's RMS if there are any messages waiting. Of course, the client has to identify itself to make this happen. There are two options in MakeConnection. One is to modify the WS-Addressing headers to use a special URI that includes a unique ID. This is really there for complex scenarios. For simpler scenarios, the following approach works well:
- Client creates a sequence and offer's a sequence at the same time
- Client sends requests, ideally receives response on the backchannel
- For some reason, some responses are timed out or connections lost
- Client initiates MakeConnection, passing the Sequence identifier of the offered sequence
- Server responds with missing message, plus a flag to indicate if more are waiting
- Once no more messages are waiting the client can terminate the sequences
In many ways RM just plugs in with whatever other security model is already in place. However, there are some issues that need watching out for. In particular, there is the possibility of a "sequence attack". In this model, imagine there are two valid "clients" each with a sequence. Both are authorized at the service level, but one of the clients is actually a maverick, and he wants to attack the other sequence. If he can guess (or sniff) the sequence identifier, then he can start a Denial of Service attack, by for example, terminating the sequence. So the RM spec addresses how to associate the sequence with a particular credential or security session. This means that the RM agent can protect against this kind of attack. This is particularly important with MakeConnection, because otherwise an unauthorized user could retrieve messages destined for another system.
As well as the core spec, the TC has published a Policy Assertion Language for WSRM that can be used with the WS-Policy Framework model. In the previous spec (1.0) the policy model was fairly complex. There were a number of timing parameters that were published in WSRMP. Firstly the TC decided a number of these were "unhelpful" as they tied the parties to using static timing models instead of dynamically adjusting them. Secondly, it was felt that it would be better to have any remaining timing agreed during the CreateSequence. This means that WSRM can be used very successfully without needing to use WS-Policy. Now WS-Policy is simply used to signal whether WSRM is optional or required on a given endpoint. So what does Reliability mean anyway?
Are you still reading? Congratulations on making it this far! Well we've covered the protocol in a reasonable amount of depth. Now let's step back and see what it actually gives us! The first question that challenges people about WSRM is: "What level of reliability do I get?". And the answer isn't that simple, unfortunately. WSRM was designed as a wire protocol not as an end-to-end application level protocol. There are two reasons for this. One is that the Web Services standards (WS-*) are generally designed to cover the externally visible view of a service and not the implementation, to promote the concept of loose-coupling. The second reason is composability: to provide end-to-end reliability you need to have some kind of transaction manager associated with the application. Because there are other WS-* specifications that cover transactions, and different ways of implementing transactions, it doesn't make sense for this specification to cover that aspect. This is a thorny issue that comes up every time I discuss WSRM with customers or potential users, who are looking for much more of a plug-in replacement for existing messaging systems that tightly integrate with transactional applications.
The guarantee that WSRM - by itself - offers, is simply that the message was successfully transferred from the RMS to the RMD and that the RMD acknowledged it. Different implementations can have different guarantees behind this. For example, Apache Sandesha2, an open source implementation of WSRM, has a pluggable storage manager. This means that you can have a persistent store behind the RMD, so the acknowledgement is only sent when the message has been written to disk. This means that Sandesha can support server failure and restart. The WSO2 Tungsten server supports this model of operation.
The previous specification (WSRM 1.0) specifically talked about delivery assurances such as AtLeastOnce, AtMostOnce, ExactlyOnce and InOrder. However, these assurances are really guarantees between the RMD and the application, not across the wire. So as a committee, we removed these from the specification. We still expect implementations to offer these levels of assurance, but they are part of the implementation not the wire protocol.
Programming model and implications
If you are a JMS or messaging developer, you will be used to learning a programming model (PM) for reliable messaging, such as JMS. So WSRM might come as a shock to you, because it can be used without any new programming model. Of course its hard to generalize, because each implementation can have its own approach, but the core spec doesn't imply any particular PM. For example, Sandesha allows you to turn on RM. If there is no sequence in place, it automatically creates one, and then when no more messages are being sent, it times out and terminates the sequence. The fact that the RMS and RMD are just "handlers" in the chain of processing also means that there are no new "visible entities" such as queues that need to be configured or that show up in the client code - the RM infrastructure can share the same URIs that the existing Web Service uses. So WSRM can be added into an existing Web Services interaction with no extra application code. (By the way, Sandesha also has a full programming API that gives access to sequences if users wish to hand-code the RM behaviour).
Despite this transparentness, it is worth thinking about the implications on coding. Many recent Web Service stacks and APIs, including Microsoft WCF (Indigo), JAX-WS, and Apache Axis2, offer the ability to call a Web service asynchronously (non-blocking). In this model, instead of the client blocking until the response comes back, the client passes a callback object in when it makes the outbound call. Processing then continues on the client thread, and when the response comes back a separate thread handles passes the response to the callback handler.
This style of programming is very important for WSRM, because it means that even if the server goes down, RM can resend the request and response messages until the response is received. With a blocking call, at some point the client would timeout, leaving the reliable response "orphaned" - properly delivered back to the client but without any code available to process it. So in general, if you think you might use RM, it makes sense to write clients using this non-blocking approach. (Its actually good practice anyway: imagine a web application server that is making calls out to a third-party using Web services; if too many requests are blocking waiting for responses the server's thread pool would end up exhausted and the server couldn't handle incoming requests).
History and differences from the existing 1.0 specification
WS ReliableMessaging dates all the way back to March 2003, when it was originally published. In June 2005 the 1.0 specification was submitted to OASIS for standardization. The current draft reflects a number of changes from the 1.0 spec. Without listing all of them I can summarize the main changes:
- Namespace changes Since the specifications have significant changes they are not compatible at the wire-level. The 1.1 spec has a different set of namespaces reflecting the ownership by OASIS
- Cleanup The TC really worked through the specification with a fine-toothed comb, and found many small issues ranging from potential errors to potential problems interoperating.
- Addition of CloseSequence As discussed above, there are cases where it is necessary to close an incomplete sequence, and CloseSequence allows that to happen cleanly
- Removal of LastMessage The 1.0 spec had a marker on a message to indicate it was the last message, which was largely superfluous.
- Improved security composition The original spec had very specific composition with WS-Security/WS-SecureConversation. The 1.1 spec has a much more flexible approach that also supports composition with SSL/TLS based security sessions.
- Updated to use the W3C WS-Addressing Recommendation The 1.1 spec uses the recommended version of WS-Addressing from the W3C.
- Simplification of WSRM-Policy The published policy assertion is much simpler - basically is RM on or optional. The previous spec had a number of timing parameters which would not allow for dynamic adjustment of the protocol, so they were removed, or moved into the CreateSequence.
- Support for two-way reliability with firewall crossing The MakeConnection support was added in the 1.1 spec
There are a number of implementations of the existing WSRM 1.0 specification, including Microsoft WCF (formerly known as Indigo), and Apache Sandesha2. The OASIS WSRX TC hosted an interop based on the last Committee Draft earlier in 2006, and 5 companies turned up with implementations. Although the interop didn't produce 100% coverage, three companies managed to interop fully between their implementations in all scenarios. The TC is hosting a second interop during the public review period, to fully test the implementations on the latest specification. We are also expecting more companies to take part this time.
In this article we've covered a lot of ground, from the overall model down to the main elements of the wire protocol. There are more complicated scenarios I haven't covered, and I encourage you to read the spec itself to understand the nuances, but I hope its been useful. I'd like to finish off by looking at some of the potential uses I see for WSRM, and some of the ideas that customers have talked to me about.
- B2B messaging A number of people see WSRM playing a key part in business to business links. Many companies are looking for a low-cost simple way of ensuring that orders, invoices, etc. are reliably and securely transmitted over the Internet to partners. WSRM is an ideal technology to provide the reliability for those links.
- Internal department-to-department or server-to-server links WSRM is also a very useful protocol inside the enterprise. More and more companies are developing and using Web services and XML communications internally, and as those links become "line-of-business" WSRM will become a key technology to ensure reliability.
- JMS replacement Some companies are looking at WSRM as a long-term replacement for existing proprietary JMS systems. The next release of Windows, Vista, will include WSRM support built-in. That makes it tempting if companies have currently got to install proprietary JMS clients on many workstations.
- JMS bridge You could use WSRM as a standard protocol to bridge between two different JMS implementations. The Apache Synapse open source project is designed to help you do this, amongst other things.
- Browser-based scenarios and notifications As AJAX applications get more interesting, the idea of doing reliable messaging directly from a browser becomes pretty exciting, especially if you were building, for example, an AJAX trading application. At least one effort is creating a plug-in for the Firefox browser that supports a SOAP-based AJAX model. RM support is coming and will make it very simple to create reliable AJAX applications. Since AJAX already uses a non-blocking asynchronous approach it is ideally suited to being composed with WSRM. The ability to cross firewalls using the MakeConnection facility also means that RM can be used without the client needing to open ports. This approach can also be used to support subscriptions, where the browser makes a single request (subscribe) and receives multiple responses (notifications) back using MakeConnection.
All in all, I see a bright future for WSRM. Its taken a while to pull together all the companies and the technology into a single approach, but we are making good progress, and the public review of the specification is a major milestone on that path.
- The WSRM 1.1 Public Review specification from OASIS: http://docs.oasis-open.org/ws-rx/wsrm/200608/wsrm-1.1-spec-cd-04.pdf
- The WSRM Policy 1.1 Public Review specification: http://docs.oasis-open.org/ws-rx/wsrmp/200608/wsrmp-1.1-spec-cd-04.pdf
- The OASIS WSRX TC homepage: http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=ws-rx
- The "submitted" WSRM 1.0 specification (ZIP): http://lists.oasis-open.org/archives/ws-rx/200506/zip00000.zip
- Coverpages overview of WS-RX and reliable messaging: http://xml.coverpages.org/reliableMessaging.html#ws-rx
- The Sandesha2 WSRM implementation (supports 1.0 and 1.1): ws.apache.org/sandesha/sandesha2/
- Microsoft Windows Communication Foundation (currently includes support for WSRM 1.0): http://msdn.microsoft.com/winfx/technologies/communication/default.aspx
About the author
Paul Fremantle is a co-founder and VP of Technology at Open Source Web services startup WSO2. He co-chairs the OASIS Technical Committee standardizing Web Services Reliable Messaging, and is a committer and release manager on the Apache Synapse project. Before co-founding WSO2, Paul was an SOA and Web Services leader in IBM's WebSphere division. Paul is a member of the Apache Software Foundation.
WS-* versus JMS
Jack van Hoof
And all messaging-infrastructures will understand WS-*, even the network hardware devices like routers and switches. This way the message can - and will - break out of the closed systems and travel freely across heterogeneous infrastructures. Within or outside your company. With your ESB as the company's business events container, representing the real-time state of your business; including all the historical states in a fluent time-line to the past. Imagine it's huge potential...
Jack van Hoof
Re: WS-* versus JMS
You are quite right! The term that is most used is "composable". In other words, you can write a service-based interface and use it. Then when it becomes necessary to add reliability, you can do that without changing code. That is a huge benefit.
In comparing WSRM to JMS there is another interesting aspect - the naming convention. JMS has a complex naming model based on JNDI with Connection Factories as well as Queue or Topic names. This can be complex to setup, and especially when trying to interoperate between different systems (e.g. SOAP/JMS). WSRM just uses the same URI that the SOAP message is going to, making it much simpler and cleaner. Its almost a REST-like argument why a WS-standard is good :-)
This is why god invented tools...
I'm very happy to see WS-RX (featuring RM) out there in the wild now as it makes some of the B2B scenarios much more possible.
One of the bits that seems to be overlooked with Reliable Exchange is that its very simple from a programmatic sense, but very complex from a policy and infrastructure perspective. I'm extremely glad to see that the tools vendors are looking to wrap WS-RX up so we don't all have to read the specification just to send a message reliably.
One question I have is should WS-RX become the default position for all externally facing web services.
Re: WS-* versus JMS
Its almost a REST-like argument why a WS-standard is good :-)
A REST-like argument for WS-*? There can be no such thing :-)
Excellent article, BTW; I learned a lot even though I had closely looked at the earlier versions.
Re: This is why god invented tools...
Certainly we simplified WSRM Policy (to simply say: RM is on, optional). Because it is so simple, it is possible to have only one switch - "RM is on" - and bypass the requirement to have a policy XML.
From an infrastructure perspective, its very simple to make RM completely automatic. For example, WSO2 Tungsten has the ability to turn on fully persistent reliability just by selecting RM from a drop-down box. That's all it takes.
Very nice article! But I remain puzzled about the persistence part: why doesn't the spec say anything about it? Any why doesn WS-RM Policy allow to configure if the communcation should use persistent messaging (queueing)?
This is in my opinion a big hurdle in the adoption of WS-RM and WS-* in general as an alternative for JMS, AS2, RosettaNet and other reliable protcols.
Kind regards, Guy Crets
PS: looking forward to your talk about WS-RM at JavaPolis 2006!
Re: Persistent messaging?
The reason we don't say anything about persistence in the spec is because of two key SOA concepts:
1. Composability. If the RM spec defines transactional/persistent properties then that would overlap with other specs such as WS-AT.
2. Independence from implementation. WS-RM is designed to be a wire-protocol, not an end-to-end protocol, because it is meant to shield the user from the implementation. The wire protocol does not and should not care about issues such as persistence.
I personally would like a WS-Policy assertion that defines reliability, but the OASIS TC preferred to leave this out of the specification as being out-of-scope for the group.
WSRM vs JMS
The question is does your business intend to expose it's SOA services to outside than WSRM may make seance. If your business is looking to quikly turn around data in relible fashion JMS is the best and simplest to implement out there.
RMD vs RD
We have a 'legacy' system that conforms to an earlier version of the Web Services specification (SOAP 1.1 and WSDL 1.1). The legacy system however does not natively conform to the ws-rm specification. We are using BEA Aqualogic as a middleware platform to expose all Web Services within our landscape. BEA does conform to the ws-rm specification.
We will expose these Web Services from the legacy systems as proxies via the ESB. Thus all interested consumers call these Web Services via the proxies. The proxies itself call the native 'legacy' Web Services synchronously over http. This of course is natively unreliable.
If we expose a ws-rm compliant "proxy" via the ESB, representing a native Web Service of the legacy system, does this then become overall reliable?
What I've come to understand thus far is that the scope of ws-rm concerns only the communications between the RMS and RMD.
Now if I take the following remark in your article:
"In RM, there are handlers or agents that sit inside the client's and server's SOAP processing engines and transfer messages, handle retries and do delivery"
.. and considering the scenario I describe above, my understanding is that the RD is actually the "proxy" and not the actual Web Service in the legacy system itself.
Thus any communications between the RMS and the RMD is reliable, so calls to the proxy itself is relaible but what's to be said for the unrelaible communication between the proxy and the actual web service in the underlying system?
My understanding is that the RMD should be an agent that sits "on the other side of the unreliable infrastructure separating it from the RMS" and inside the SOAP processing engine of the legacy system?
Given this then the communication between the proxy and the actual web service remains the "weakest link"... or perhaps I miss something?