Nothing Is Permanent Except Change - How Software Architects Can Embrace Change
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Hari Poolla on Sep 13, 2010
Thorough analysis of error handling requirements during SOA analysis and design phase is the key to getting the services designed and implemented right. Lack of more detailed requirements that identify error handling scenarios or lack of understanding on how to incorporate those into SOA analysis and design phases result in development efforts to design services first with happy path functionality. Such an approach could potentially lead to significant project cost overruns as it requires considerable re-work and might require re-design of some component(s) to incorporate error handling considerations at a later point.
This paper looks at various error handling considerations associated with design of re-usable services and provides an outline of what error handling considerations apply during SOA analysis and design phases and also describes some best practices into designing these considerations to ensure that services are designed and implemented in all its completeness.
Unlike in monolithic applications, error handling becomes a significant step in the design of SOA applications as SOA applications integrate heterogeneous IT systems across the organizational boundaries, vendor and partner IT assets. Focusing on error handling analysis early in the analysis and design phases ensures that appropriate error handling standards/guidelines are put in place for modules in different platforms. This paper identifies common error handling considerations that architects and designers need to address while going through the SOA solution design. SOA analysis and design tasks are broadly classified into three major phases i.e. Service Identification, Service Specification and Service Realization as identified in Service Oriented Modeling and Architecture by Ali Arsanjani. Subsequent discussion of this topic is oriented around error handling considerations that apply to these three phases.
The goal of service identification is to come up with a candidate service portfolio that leads to identifying re-usable service portfolio. This phase involves analysis of business artifacts package that includes key requirements, business goals, capability models, Business Process Analysis Model (BPAM), use cases, etc.
Errors are broadly classified into two types:
Analyzing through the business artifact package provides many opportunities to discover business errors associated with services. If there are existing asset(s) for a business service, those component interfaces could be used to discover additional business errors that are otherwise not identified in top down analysis. Business errors are what referred to as recoverable errors. Once the service portfolio is in final draft stages, evaluate the re-usable services for the following error handling considerations:
These attributes that define the business errors could either go into service contract or could be packaged into service response as needed.
Service Specification phase consists of tasks defining inputs and output messages, service and operation names, schemas, service composition, non-functional requirements and other service characteristics such as sync/async, invocation style, etc. for the services that are marked as to b eexposed.
Common service characteristics that are related to error handling are:
Error propagation to service consumers can be accomplished in many different ways that it is important to have an architecture design decision to choose the most appropriate style for the enterprise.
The two most popular choices for returning error information are use of SOAP fault or custom error payloads. While each approach has its own pros and cons, the choice mostly boils down to existing development and runtime platforms. Services implemented using web services oriented platforms find SOAP faults as their natural choice due to the support of a lot web services based tooling while custom payloads suite better for services implemented on more traditional message oriented middleware (MOM) platforms. For a deeper discussion of this subject, refer to some of the contrasting done by Boris Lubinsky. In general, non-recoverable system type of errors are better suited to be returned as SOAP faults due to the varying degrees of support in client tooling and server support for the various profiles while recoverable business errors are better described using custom error payloads because of the flexibility and extensibility it provides to define custom error schemas. However use of custom error payloads require the service consumer do additional client side handling to parse response messages to determine if the invocation could be determined to be successful.
Identify meta data and common schemas to describe errors consistently across the enterprise. This data could include common attributes include date, time, error code, descriptions, severity level, message source, correlation id, etc. Thorough analysis of this metadata would turn out to be very useful for setting up effective service monitoring.
Service realization phase is where the service model is mapped to service component and runtime/deployment model. This step typically involves designing service components, allocating the components to SOA stack layers choosing component interaction styles, runtime platforms and making architectural design decisions (ADD). Subsequent discussion of the subject will be focused around some best practices to implement error handling considerations in the three layers of typical enterprise SOA stack: business processes or choreography, mediation/BUS and component layers as highlighted in Figure below

Components deployed to this layer implementing business process flows or choreographies. The following error handling considerations apply here:
Enterprise Service Bus (ESB) layer is at the core of typical enterprise SOA stack. This layer supports the transformation and routing capabilities required off of the enterprise re-usable services. Components in this layer provide a well defined interface to the various provider implementations such as existing underlying assets and partner or vendor based services, by applying appropriate message and protocol transformations. Error handling by the mediation components mostly involves transforming the provider error structures into well defined error structures defined in the context of business domain. These components also could handle applying some complex transformation and mapping rules on the errors returned from the backend functional components to provide more simplified error info to the service consumers within the enterprise.
A lot of error handling considerations mentioned for this layer is also possible to be implemented in the component layer. But there are number of ESBs and frameworks in the market that does these things in a lot more configurable and flexible manner than what individual platform developers could implement in their functional component implementations. Separation of such error handling mediation concerns to ESB layer relieves the platform developers from having to satisfy a variety of error handling consideration and have them focus more on implementing the business functionality resulting in greater developer productivity.
Error handling by the components in this layer includes handling abnormal execution conditions such non-availability of a resource or some runtime conditions that the component is not programmed to handle or is considered in violation of logic. Components are required to handle such events to notify client programs and also do appropriate logging to help facilitate troubleshooting and service monitoring. In Java programming language, such events are thrown as exceptions and the API provides two different types of exceptions: checked and unchecked. Checked exceptions inherit from Exception class and are used to handle recoverable errors such as business error scenarios. Unchecked exceptions which are descendents of RuntimeException class are the ideal candidate exceptions handle non-recoverable errors such as resource non-availability or some null pointers.
The second part to component level error handling is to do appropriate logging. It is a good practice to perform logging closest to the source where the error occurred. When components throw application errors, they could log the exception at the appropriate interface within the component boundaries and then throw the exceptions. Use of correlation ids to identify the events and passing the same to calling applications would greatly enhance error tracking and monitoring by way of linking logs across different platforms.
Designing appropriate fault tolerant mechanisms to maintaining ACID (Atomicity, Consistency, Isolation and Durability) properties in process flows poses a big challenge in designing SOA solutions. These solutions typically involve business processes that invoke services spanning multiple platforms, interaction styles and resource providers. It is more than likely that not all services that participate in a business process are transactional. If any particular transaction in a process flow fails, appropriate recovery implementations are to be designed to preserve the data integrity. Transaction rollback and compensation transactions are two approaches aimed at solving this problem.
Transaction rollbacks could be implemented by coordinating the transactions through a transaction monitor, if the business process spans over a confined domain and if the resources are all transactional. If the business process is more complicated, failure to complete the business process not only requires rollbacks to bring the data back to its consistent state but might also require processes to invoke certain compensation transactions such as sending notifications or invoking reversal actions on some of the previous service invocations.etc. It is beyond the scope of this paper to elaborate more on these topics. Readers are encouraged to refer to upcoming web services standards in this space: Web Services -- Coordination (WS-C) and Web Services -- Transaction (WS-T) from OASIS.
This paper provides SOA architects techniques to discover error handling requirements from the business artifacts package and how to analyze these while going through SOA analysis and design phase. Also provides some best practices to implement error handling in the three layers of SOA i.e. orchestration, mediation and component layers.
A thorough upfront analysis of various error handling considerations help architects make the right decisions during design and implementation phases, platform and SOA stack products.
Hari Poolla is a SOA practitioner at a large Insurance and Financial Services company in Midwest, USA. Focusing on application architecture, BPM, solution design, integration and collaboration in the enterprise applications space, specializing in building enterprise re-usable services. Provides expertise in enterprise SOA adoption roadmaps and designing custom SOA solution development methodologies. An IBM certified SOA solution designer and Sun certified enterprise architect for Java 2 platform. He has been part of architecture and implementation of a SOA based business solution for electronically moving funds (EFT) between different lines of business. He can be reached at hari_poolla@yahoo.com.
Design or error Mgt is very important in SOA projects since by nature they cross existing software/systems boundaries. I'm looking for the implmentation best practices. How to manage soap error, BPM process error, REST error, non recoverable error, etc.
Here is an example using utility services
www.perficient.com/Solutions-and-Services/Busin...
Michael Stal discusses system architecture quality, how to avoid architectural erosion, how to deal with refactoring, and design principles for architecture evolution.
Every developer has had to integrate with another system, API or component. Tis article provides strategies to handle the change and for he separating system boundaries.
Alex Russell talks about the shortcomings of the web platform and how it is evolving in order to adress them. He also explains about how browsers are improving and shares his vision on things to come.
Jeff Lindsay discusses creating distributed and concurrent systems using ZeroMQ – a lightweight message queue-, and gevent – a coroutine-based networking library.
Brian Ketelsen introduces Skynet, a platform for polyglot, distributed and composable services that communicate with each other over RPC/JSON.
Carin Meier tells the story of Alice discovering Monads, meeting three types of monads – Identity, Maybe, State-, and learning how to implement them in Clojure.
The need for agile, queryable, reliable, scalable storage without the pain of SQL schema migration is real. This article uses MongoDB to introduce NoSQL concepts to Java, PHP, and Python developers.
Jérôme Giraud introduces Wink Toolkit, an open source mobile JavaScript framework for HTML5 web or hybrid apps, showing widgets and interactions.
2 comments
Watch Thread Reply