Implementing Exceptions in SOA
In the ideal world service invocation always completes successfully and returns required results. Unfortunately, in reality, services may and do fail. Such failures can be caused by an array of problems. It can be caused by the service itself, for example validation of the incoming parameters, or just a bug in the service implementation, or by a communication problem, for example, service can not be reached or implementation can not reach underlying database. Finally, the failure can be caused by a deployment issues, for example, following a software upgrade one of the required libraries is not correctly deployed.
A widely adopted mechanism for dealing with failures is exception handling, encompassing both capturing and logging the error and choosing an alternative execution path in the case of failures. It has become a standard mechanism in application and component implementations, where it is typically based on the ability of application designers and developers to anticipate the possible exception conditions and appropriately instrument code 1 to handle them at the run-time. This approach relies on the following assumptions:
- The application is designed as a whole in all of its completeness, including all of the possible exception situations. This means that all of the execution paths of applications can be fully defined and, as a result, completely tested by an application team.
- The application is executed on a single machine (limited set of machines) and reports all of the exceptions in the local log files using standardized exception reporting schemas.
- Changes in the application components are administered centrally meaning that the application development team is in complete control of changes.
In the case of distributed systems an attempt to implement these exception handling approaches becomes significantly more complex, due to the fact that in this case exceptions can be caused not only by the application code itself, but also by the infrastructure (for example network) malfunctioning, which makes it harder to analyze all possible exception scenarios. Additionally exceptions logs, in this case, are spread between multiple physical machines, which makes their reconciliation significantly more complex. In the context of Service-Oriented Architecture (SOA) features like loose coupling (both organizational and technological), autonomy and reliance on existing applications for implementation of the business functionality complicate exception handling even more.
Because every service is designed, implemented and maintained by itself and can be used in multiple enterprise solutions which might not be know at the time of the service design, exception handling implementation for a given service is revolving around processing and logging exception local to the service implementation and reporting them to the service consumer, when they can not be resolved locally. If special measures are not taken, this results in the "islands of exception handling" (Figure 1) 2 .
Figure 1 Islands of exception handling
The following complex issues which are not a major concern in monolithic applications have to be dealt to support exception processing in SOA .
- The distributed and heterogeneous nature of SOA makes it particularly prone to failures, causing exceptions at multiple levels . System level exceptions result from the messaging, communications and other infrastructure failures. Application-level exceptions result from incorrect message semantics or logical errors within the application. Business-level exceptions result from violation of best practices, compliance laws, regulations, or business policies mandated by business managers. The later ones might not even be visible on the level of the service execution itself and might require a service management solution in place to detect them.
Exceptional conditions that are related to a particular business transaction spanning one or more services across different business processes (sometimes across multiple companies) cannot be detected by exception handling localized within one of the participants. In this case, exception processing might require aggregation of the exception information on the business transaction level. This means that exception information of the particular service has to be separated by business transactions employing this service. Additional requirements for this type of separation can be reinforced by privacy, Health Insurance Portability and Accountability Act (HIPAA) and other compliance requirements.
- Individual services have no visibility into the entire solution (business process). Fixing errors without a process-wide perspective is hard. Consequently, upon detecting exceptions service implementations can not always choose an alternative execution path and have to notify the consumer, which may have the context required to correct the situation. However, the recursive composition characteristic of SOA complicates this further: the consumer is often another service and consequently is in no position to resolve the issue . Not all exceptions can be automatically processed; sometimes the only way of dealing with a failure is through human intervention. Doing so requires determining who the appropriate people are and notifying them about the exceptions.
- Loosely coupled, heterogeneous services often discover and process exceptions differently. Some may use specialized components such as log4j, log4net, .NET Enterprise Library, etc. Others employ proprietary solutions. Additionally as defined in  wrapping the functionality of the existing applications currently represents the prevalent approach to the service implementation. These legacy applications can detect, log and communicate exceptions in different ways.
An elegant solution to exception handling in SOA is applying SOA principles to exception handling implementation. This leads to "servicizing" all of the major elements of exception management , (i.e., logging, exception resolution and notifications). Figure 2 shows the overall architecture for exception logging, resolution and notifications.
Figure 2 Unified architecture for exceptions logging, resolution and notifications
Instrumentation code within service implementation detects and logs system- and application-level exceptions. Logging takes place through common purpose APIs exposed by standard logging components such as Log4J, Log4NET, .NET Enterprise Library, and so on. The implementation of logging translates invocations requests into service calls to the exception logging service. To lower the performance impact of exception logging, asynchronous invocations are typically used for the service invocation in this case. Although this implementation revolves around the logging service, exception handling relies on several additional elements:
- Logging service accepts all logging requests, stores them into logging database and forwards them to the Exception Resolution Service.
- Exception resolution service processes each log message using exceptions resolution rules. These rules specify whether the message should be ignored (e.g., information messages), resolved automatically, or whether human intervention is required.
- Notification service receives notification requests and uses a set of rules to dispatch the notification (e.g., email gateway, pager gateway, enterprise management solution).
- Exceptions/Logging Portal allows people to view and browse the logged exception information.
- Service Management monitors services traffic to determine business-level exceptions and reports them to the logging service, which treats them the same way as any other exceptions in the system.
The above partitioning of functionality ensures that exception logging and resolution takes place in a consistent fashion. This allows to formalize enterprise best practices ("common knowledge") and improves auditing, monitoring, and the control of exceptions. This represents a big step toward regulatory compliance.
The centralized exception resolution service allows for faster implementation of changes in handling of the specific exception types. The most common approaches to the exception resolution are:
- Automatic resolution that resolves the problem without the need for human intervention
- Semi-automatic resolution that evaluates a rule set and suggests possible resolutions
- Fallback to humans for manual resolution.
Sidebar: Service exceptions definitions
Exception processing in SOA relies heavily on Service exception definitions semantics. Service exceptions can belong to one of three groups (i.e., system, applications or business) and consequently might be represented differently in the services payload (replies). The service payloads could be encoded in SOAP messages; however these ideas are applicable even when the messages employ other encodings. The SOAP envelope provides a specialized element (SOAP Fault) for exception propagation. The best practices of using SOAP Fault are described elsewhere .
The SOAP Fault is best suited for reporting system-level exceptions and should be always used for reporting of these types of exception. (J2EE implementations usually provide additional information about the problem by supplementing the SOAP Fault element with a content of the execution stack.) System level exceptions are usually related to failure of the system software and hardware. This means that a service call can be retried after the problem was resolved, either automatically (e.g. through failover) or manually (e.g. after physically replacing the faulty element).
Application and business-level exceptions require significantly more data to properly define the cause of the error. These situations often call for special schemas defining information for every specific error, thus effectively extending domain semantic model to describe failure scenarios. Another complication is that it might be necessary to return multiple errors, based on a single service invocation. Typical examples of such situation are validation errors. To lower the communication overhead it is usually advantageous to return all of the validation errors together (i.e., batch them) so that they could be addressed simultaneously.
These types of exceptions can be delivered back either as SOAP Faults with the extended detail information, or as a normal service response with a payload denoting an error. Both approaches have their advantages and shortcomings, outlined below.
SOAP Fault Specialized Payload Advantages
- Supported directly by Web Services specifications and tools
- Provide uniform delivery of all kinds of exception
- Mapped directly into exceptions in programming languages like Java and C#
- Provides clear separation between system-level and application level exceptions.
- Allows for introduction of a single semantic model supporting both success and failure scenarios.
- Does not require usage of SOAP.
- Requires splitting of the semantic model into success and failure scenarios
- Requires service consumer to examine reply payloads and explicitly distinguish between application level successes and failures
Despite some of the drawbacks, outlined in the table above, usage of special payload for reporting application and business exceptions is a better approach, due to its increased flexibility and extensibility.
The solution presented at Figure 2 has the following prerequisites:
- All logging, messages, including information, warning, exceptions, etc. must follow a standard format, for example, Common Base Events (CBE) .
- All participants (i.e., service consumers and providers, logging, exceptions resolution and notifications services) should be able to interpret the exceptions/logging information, which should conform to the enterprise semantic model (see Service exceptions definitions sidebar).
- Analyzing and understanding failure entails linking log messages across service boundaries. This requires unique correlation ID, spanning the scope of business transaction.
An exception management approach, described in this article, applies the principles of Service-Oriented Architecture to provide the foundation for the effective management of exceptions in SOA implementations. It introduces the use of specialized infrastructure services to build flexible, extensible exception handling solutions. It allows improving implementation consistency by providing a uniform approach to exception handling throughout the enterprise. It also simplifies maintenance and improves implementation testability by providing a single, unified log that spans between multiple service consumers and providers.
About the Author
Boris Lublinsky has over 25 years experience in software engineering and technical architecture. For the last several years he focused on Enterprise Architecture, SOA and Process Management. Throughout his career, Dr.Lublinsky has been a frequent technical speaker and author. He has over 40 technical publications in different magazines, including Avtomatika i telemechanica, IEEE Transactions on Automatic Control, Distributed Computing, Nuclear Instruments and Methods, Java Developer's Journal, XML Journal, Web Services Journal, JavaPro Journal, Enterprise Architect Journal and EAI Journal. Currently Dr. Lublinsky works for the large Insurance Company where his responsibilities include developing and maintaining SOA strategy and frameworks. He can be reached at email@example.com.
References1. B. Lublinsky. Defining SOA as an architectural style. IBM Developworks, January 2007
2. Ramesh Ranganathan. Managing exceptions in a SOA world. IT toolbox: Emerging Technologies, September 2005.
3. Sean Fitts. When exceptions are the rule. Achieving reliable and traceable service oriented architectures. SOA/WebServices journal, September 2005.
4. Peter Abrahams. Resolution accelerator-exception handling for soa. It-director, September 2005
5. Andy Brodie, Amanda Watkinson. The common event infrastructure: From technical preview to production. DevelopWorks, April 2005
6. Russell Butek Ping Wang. Web services programming tips and tricks: Exception handling with jax-rpc. DeveloperWorks, February 2004.
2Compare to "islands of data" and "islands of automation" .
Commercial Exception Handler
Nice article, we have implemented the features listed as well as making the product highly configurable.
SOAP Fault vs. Specialized Payload
I see clearly the 3rd advantage of using "Specialized Payload". You can use this strategy also without SOAP. However, when using SOAP I prefer using the SOAP fault for communicating application and process errors to the service consumer. This avoids the need for special exception detection code in the client since this is already provided by the SOAP stack (at least in usual Java stacks).
If using "Specialized Payload" it might be worth thinking about using a common data structure "within the payload", not "instead of the payload". This allows to always use the same parser for a normal response as for an error response. Otherwise you need to detect the error situation in a first parsing step in order to unmarshal the response correctly. Although this migth be clear, I have seen real world services that did not follow this recommendation, so it might be worth considering.
Should definitely be asynchronous
As you mentioned, applications should log the exception messages asynchronously, for performance reasons, using any of the logging API's.
A good solution for applications using the Log4j framework would be to either write a custom appender or use the existing JMS appender to post exception messages on the message bus.
The logging service can now act as a message consumer and consume messages. Of course, one has to come up with a standard exception logging message schema for all the enterprise applications.
Even better alternative for applications which could be aspectized, using AOP, is to design a logging aspect which could post messages to the message bus. As enterprises are gradually getting comfortable using more AOP based solutions, they might already have several aspect libraries and adding the exception logging aspect to their arsenal will only help them.
Re: Should definitely be asynchronous
JXInsight - A Diagnostics Solution
JXInsight Product Architect
"Java EE tuning, testing, tracing and monitoring with JXInsight"
But how to handle exceptions now?
I agree with the classification of the exceptions you propose. I agree as well that the best is to have some special payload to transport replies (either repsonses without fault or faults). I even see the use case for a centralized logging and exception handling.
But I ask myself: How can I really handle faults with that infrastructure? A real life example would be really helpful. Take business processes as an example: if I have an application or business fault, I may want to react in the business process, not independent of it. Human interaction to resolve the problem may be a human task modeled in that business process (since it may be part of the business process, how to deal with these kinds of faults), but how do I propagate the fault to the business process in the solution you propose?
Would be nice if you could shed some light on your ideas to this...
Thanks a lot and cheers
Re: But how to handle exceptions now?
Re: But how to handle exceptions now?