Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles A Message Type Architecture for SOA

A Message Type Architecture for SOA

This item in japanese

One of the main objectives of a SOA Governance organization is to define the processes and policies that foster the development of reusable services. As such a SOA Governance organization will be involved across the service lifecycle from identification, funding, design, deployment, operation, versioning and retirement.

One key aspect of SOA Governance that is often overlooked is how Data Governance can complement SOA Governance. Even though they both have very different objectives, they both share a set of metadata often called the “Enterprise Data Model”. An EDM is a Logical Data Model, an Ontology if you will, of the overall Information System. Its structure is often abstract and loosely related to the physical structure of the systems of record. However, all data elements stored in any given system of record should be traceable to an element in the EDM. The EDM is often used to construct maps to transform data being synchronized or replicated from one system to another.

In addition to the EDM, Data Governance owns processes that have an impact on Service Design, Operation, Versioning and Consumption: these processes include Data Quality, Metadata Management, Reference Data changes, Business Rules Changes, External Data requirements, Data Model changes…

In this article we will not focus on the necessary alignment between Data and SOA Governance processes. We will focus on the first step that needs to happen before an effective collaboration can happen: the shared usage of the Enterprise Data Model.

Ever since XML was invented in the mid 90s, people have argued about the best way to describe the structure of XML documents, especially when it comes to creating reusable XML fragments. Three camps emerged with different, sometimes diverging, requirements: the Web camp, the Document camp and Data camp. When it became clear to everyone that DTDs were not going to suffice, the W3C quickly published an XML Schema specification which is now nearly a decade old. Only minor changes are expected for the coming minor version (XML Schema 1.1). Despite some of the criticisms (complexity, shortcomings…) and the development of alternative technologies (Relax NG) or complementary ones (Schematron), XML Schema has been and will remain the standard used to describe XML Message Types. Yet, no one has really found an efficient way to model EDMs using XML Schema definitions only. This has contributed to keep the two disciplines, Data and SOA governance separate.

In this article we will argue that Message Types should be generated from EDM metadata. We will also argue that the usage of traditional models such as XML, ERD or UML is not suited to enable the consumption of the EDM for this purpose.  We will propose to define two complementary DSLs (Domain Specific Languages), one for the EDM and one for the Message Types referencing the elements of the EDM. These DSLs will be used to generate a textual notation from which EDM and Message Type definition can be captured. These DSLs are also well suited to create a graphical notation.

An Enterprise Data Model Domain Specific Language

UML and ERDs have both a strong lineage heavily anchored in concrete M2 metamodels (OO and ER respectively). Both models are actually incompatible with the hierarchical nature of XML documents and its associated schema language. It does not matter which pattern you use: salami slice, russian doll or venetian blind, or any combination of the above; XML Schema cannot fulfill this function effectively. It is not a modeling language; it is an XML document structure definition which enables you to validate documents. In addition, the complex XML Schema import structures imposed by these patterns break interoperability as not all frameworks are capable of generating classes from these intricate XML import structures.

Actually these three types of Data Models can only be reliably transformed via an intermediary data model called the Hyper-Graph Data Model.

it is, of course, possible to create XML documents from UML or ER models and vice versa and based on a series of well defined (and ad hoc) rules, in particular the ones that specify how to process associations. However, these rule-based mappings have never yielded realistic message types. The reasons are very simple to understand:

a)  Data is relational in nature, this was true 8000 years ago when man invented writing, accounting and contracts. It was true one 100 years ago when all information systems were paper based and it will remain true probably as long as man manipulates data. In an Enterprise Data Model, pretty much every entity is related to any other entity. However, neither UML or ERD establish a clear boundary between the classes that make up the entity from the other entities.

b) Message Types elements often contain a subset of the attributes of the corresponding entity in the EDM. People have tried to create a set of granular reusable data elements, but that strategy has proven brittle and introduces unnecessary intricacies in the EDM

We suggest to create a metamodel with semantics specific to the needs of an EDM:

a)  An Entity which define a scope where associations (aggregation or composition) will most often result in belonging to the same message type (or database table, or report for other applications of this metamodel). An example of this type of association is the “Customer-Address” relation.

b)  An Entity Association which describes the relationships between Entities. An example would be “Customer-Account”

This metamodel is well aligned with the concepts of Domain Design Development (DDD) which include “Entity”, “Value Object” or “Aggregates”.  

If we take an example in the healthcare insurance space, we see that the Member-Address association belongs to the Member scope while Member and Group (or Coverage) define separate scopes.

Figure 1. A simple UML diagram illustrating the different types of associations

Our EDM DSL grammar is represented (Fig. 2). We used the Eclipse Modeling Framework with the OpenArchitectureWare plugin to create a textual DSL. Markus Voelter recently gave an introduction on “Textual DSLs”.

Our DSL allows us to define Entities and Datatypes. An entity can be qualified as “basic”. As such it can only be referenced within the scope of another entity (either using an aggregate or composition) relationship. This is the equivalent of a value object in DDD. Of course, a basic entity can be referenced by different entities. 

The DSL specifies “associations” between entities. These associations may extend an Entity type when you need to specify association properties.

Figure 2. The Enterprise Data Model DSL (defined as an XText Grammar)

From this Xtext grammar, OpenArchitectureWare generates a couple of Eclipse plugins that allow us to edit models using the syntax specified by this grammar and transform them in configuration or deployment artifacts.

One could argue that this metamodel could easily be expressed using a UML Profile and they are probably correct. However, the complexity of the UML Metamodel itself make it difficult to create a transformation from a UML class diagram (the Enterprise Data Model expressed graphically) into deployment artifacts. In addition the UML profile itself has a fairly limited value as it does not offer a particular way to represent the classed scoped within an entity definition. You would have to use a special notation for that purpose.

Overall, there is also a tendency to move away from graphical representations as the main form of edition and management of metadata; just as well, there is also a tendency to move away from XML representations. As Markus Voelter argued in his presentation, the main reason people have not used textual DSLs is simply because there were no simple ways to create parser and syntax sensitive editors (including intellisense). XText make it very simple to create the grammar and the rich editor experience that everyone expects (Figure 3).

Figure 3.A sample EDM (from Fig. 1) based on the EDM DSL (Fig. 2)

A Message Type Domain Specific Language

The idea of relating an Enterprise Data Model to message types is not new. For instance, in their book “Applied SOA”, Mike Rosen et al. argue that (Chapter 3.):

Information represents the data resources of the organization. Data resides in a variety of different stores, applications, and formats. Different levels of data are used by different levels of SOA constructs. The semantic information model defines the data for business processes and services. The information passed in business processes in the form of documents is based on the semantic information model. The documents provide a form of semantic message between processes and services. The SOA defines the mechanisms for transforming data from its native operational format to the semantic data required for the business processes.

The question is how can we establish this relationship effectively (Figure 4)?


Figure 4. How can we relate Message Type Definitions to the Enterprise Data Model?

Michael Rosen et al. recommend relying heavily on the EDM (they use the term Semantic Information Model) to design effective Message Types (see “Applied SOA” Chapter 6. p 249). The authors argue that one of the core benefits is to increase the compatibility between new consumers and existing service providers as the interface foot print is designed with the Enterprise Data Model in mind, rather than the boundaries of a back-end system or a particular project. However, the authors do not provide a model to approach this problem, let alone an automated way to perform this task. 

The first step is of course to define an EDM metamodel as we have seen in the last paragraph. 

Figure 5 details our Message Type Architecture. After creating an EDM DSL and a Message Type DSL, we will be in the position to generate the Message Type Schemas and WSDL files. The resulting schemas will be standalone and only referenced by the corresponding WSDL file. There is no need to create complex import structures to achieve any kind of type or element reuse since the reuse is coming from the references to the EDM within the message type definitions.

Figure 5. A Message Type Architecture

The message type DSL is represented Figure 6. The core concept of this DSL is the “Projection” element in combination with the scope defined by Entities in the EDM DSL. The Entity scope is used to define data fields that often go together in a message type, a database table or a class definition. The scope elements enable “reuse”. For instance, a basic entity “address” may appear in different entities in the EDM. These scoped elements are automatically reused in the message type definition unless they are explicitly “excluded” from the entity’s projection. A projection definition may however exclude any attribute or basic entity from the message type definition. On the other hand, it is often desired that some attributes or basic entities of the entities associated to the base entity may be part of a given message. In the example Figure 1, the groupID is often part of a message representing member information. In that case, the projection may also "include" elements outside the base entity scope. The model also supports the inclusion of elements that do not necessarily belong to entities directly associated to the base entity via a composition of projections.

This approach completely alleviates the need to manage one-to-one or one-to-many “associations” at the Message Type and XML Schema levels. Elements and attributes from related entities are simply “projected” into the message type. If we take the example figure 4, our DSL be used to generate XML Schemas where an Orders collection element is a child element of the Customer element. Each order may also include its shipment information. This is typically where XML Schemas have failed since people had to model associations with complex XML Schema imports and references within the XML Schema.

Many-to-many relationships (for instance orders and products) are treated with the dependencies element in the EntityArea definition. In that case, the XML Schema generator will have to use key and keyref elements to ensure the proper representation of the association between the message elements. There is no need, however, to manage these elements in the Message Type DSL since this information will be coming from the EDM definition during the XML Schema generation.

The message type may only contain projections, i.e. references to entities, basic entities, associations and attributes of the EDM. There is no provision to add elements that may not be part of the EDM. This is a design decision as the EDM is supposed to represent the enterprise data model and all data elements stored in the systems of record are supposed to be traced to an element of the EDM.

Figure 6. A Message Type DSL

The message type itself is composed of:

  • a Verb (GET, NOTIFY, PATCH, CONFIRM, CANCEL, PREPARE, SUBMIT, SHOW or ANY – see Uniform Interface and Verb definitions, figure 6)
  • a Noun (the base entity)
  • Whether the processing of this message is idempotent or not
  • A payload which can be of three different types

The first 3 elements will be used to configure the business envelope of the message such as the one defined by the Open Applications Group. When the XML Schema is generated all the elements of the envelope will be weaved in all message types. These elements include a message identifier, sender information, date...

The verbs are a subset of the Open Applications Group verbs and the HTTP verbs. As a design decision, we chose to prevent the use of POST (or PROCESS in the case of the OAGIS), PUT and DELETE verbs which tend to encourage a CRUD interface definition since CRUD like interactions tend to create a strong coupling between service consumers and service providers.

The payload of the message may be of three different types (again following some of the Open Applications Group design guidelines):

a)  a Query Area (expressed as a Query-by-Example -QBE, in this case the verb is GET)

b)  an Event Area (that represents the occurrence of a state of the source entity –publishing the event message with the NOTIFY verb)

c)  an Entity Area (that represent the argument of all the other verbs)

Figure 7. A Message Type Definition

The PATCH verb is used in combination with an entity area and is associated to a REQUEST-(CHANGE)-UPDATE pattern. This pattern has been implemented with DataSets in the .Net world and with SDO (Service Data Object) in the Java World. The pattern is particularly useful for BPM applications that need to implement a certain type of human activities which goal is to change the content of a base entity (and related elements). However, when an action (which triggers a state change in the base entity) is invoked it is advised to use its corresponding verb, rather than a generic one like POST. It is important to point out that the proposed metamodels do not support readily the definition of Change Summary. One might think however that a Change Summary schema could be generated from an entity definition with the appropriate algorithm.

The PREPARE and SUBMIT verbs are examples of commonly used verbs that are worth standardizing across an enterprise within a Uniform Interface. It is common for instance that a particular Entity such as a purchase order be “prepared”, i.e. created and updated until it is ready for “submission” and start its lifecycle. The response verb to such a submission (or any other action for that matter) is “confirm”. 

The CANCEL verb is used as a standard action verb which indicates the intent to terminate the lifecycle of the business entity (e.g. Cancel Purchase Order).

The SHOW verb is used for responses, for instance, in response to a GET request.

The CONFIRM verb is used in response to an action request, for instance, in response to a SUBMIT or CANCEL request.

These two verbs (SHOW and CONFIRM) do not exist in REST because REST is RPC oriented and not message oriented. A response in REST has no particular semantics except in the case of Atom collections. REST does not make any distinction between a "technical" acknowledgement and a "business" acknowledgement.  

This Message Type DSL can then be used to generate all Message Type Schemas. Certain aspects, such as versioning or a business envelope, can be weaved in the target schema generator (figure 5). A business envelope wraps the message payload within an envelope that contains information that is relevant to the interaction and the parties or components involved in the interaction. The Open Applications Group for instance is using a “BOD” (Business Object Document) for that purpose. We highly recommend using this business envelope pattern rather than developing proprietary SOAP headers. SOAP headers should only be used by service containers which need to implement some policies or quality of service (security, reliability, transaction...).

You may also want to go all the way and create a Service Interface DSL such as the one from figure 8, which can be used to generate abstract WSDLs (without bindings or service endpoints).

Figure 8. A Service Interface DSL well suited to generate abstract WSDL artifacts

We will not detail here how XML Schemas are generated from the Message Type definitions. This topic mandates its own an article. It is important to note however that our approach to Message Type generation enables a couple of important behaviors:

  • First, any Message Type XML Schema can be generated individually. This means that when a change occurs in the EDM, we can selectively regenerate XML Schemas (from the same message type definition). You may want to keep track of the EDM version with which each schema was generated. When complex XML Imports are used, a change in one of the imports propagates to all message types schemas.
  • Second, our approach does not require a “Canonical Schema Model” (CSM) between service providers and service consumers. A CSM, by default, is forcing 2 transformations in the path of any message (both request and response). Modern SOA concepts advocate for “smart” edges which are capable of transforming incoming and outgoing messages. If a message is “produced” and “consumed” by a single back-end system, there is no reason to transform it to a canonical format. It is just as easy to transform it on the consumer edge from whatever the provider schema is.

Jack van Hoof’s statement:

In a technical sense [a Canonical Schema Model]  has a benefit in that at the endpoints only one transformation service per message type has to be configured. A subscriber needs to subscribe to only one message type, regardless whether there are multiple sources or not.

does not really applies to SOA and a Service Provider/Consumer pattern. Jack’s recommendation makes sense in synchronization/replication scenarios where endpoints are in effect the systems of record themselves but the main focus of SOA is to precisely avoid synchronization/replication patterns by avoiding creating new systems of record when building new solutions. SOA is not Integration, though it is sharing technologies with Integration. SOA is about creating services that encapsulate existing systems of record such that new solutions can be developed by consuming these services without creating the need to duplicate information from other systems of record (Figure 9). When information is not duplicated it does not need to be synchronized and replicated.  

In a Service Oriented Architecture, the service interface is “the” canonical model (Figure 9). It isolates the service consumers from the systems of records. When a service is well designed, all consumers invoke that particular service, and this service, in turn, invokes all the necessary back-end systems. Introducing a “Canonical Schema Model” above the service interface is in our opinion superfluous. Some may argue that service interfaces will not be consistent without it, and, each developer will create its own semantics. We certainly agree with this argument, this is why our approach is based on defining Message Types from the EDM definition. The Message Type DSL gives you some flexibility to deviate from the EDM semantics but it would require more work. By default it is using the EDM semantics and structure which overall contributes to consistent message types across service interfaces.

At the same time, it is unlikely that a service consumer will interact with different types of services. This can happen in B2B scenarios of course, it can also happen in large organizations, but it is generally undesirable as it might simply means that you have designed your service interface incorrectly.

Figure 9. Services supporting new solutions with existing systems of record. The figure represents the footprint of Entities in each system of record, behind the service interface

A CSM is also counter to many SOA Versioning principles as it “forces” all endpoints to conform to the same “canonical” schema, which we have seen is not practical since without the proper versioning strategy organizations often end up creating one service interface per consumer, thereby defeating the purpose of a CSM. SOA technologies (such as XML or WSDL) where actually designed to alleviate the need to always conform to a CSM which was a key design pattern for Integration Platforms in the 90s, but in the end this pattern propagated too many changes to the endpoints that did not need to change if only contracts could be designed in a compatible way.

There are also some important benefits stemming from a CSM. For instance Jack emphasizes that:

Defining canonical message formats creates the opportunity to supply the company with an unambiguous catalog of available messages about business events, representing valuable business assets.

Nick Malik calls this a “Business Event Ontology”.

This capability is critical to Business Activity Monitoring and Complex Event Processing, for instance. We provide it with the business envelope concept that we introduced in the message type definition. In our model, Events and Actions are clearly expressed and agreed upon at the enterprise level, outside the message payload.

Event Driven Architectures are an appropriate architecture when you want to integrate silos, i.e. autonomous information systems, but EDA's principles do not generally apply to SOA. Events (defined as the occurrence of a state) and Message Events are of course an important part of a Service Oriented Architecture but EDA alone cannot be the foundation of a Composite Programming Model.


The management of Message Types in a Service Oriented Architecture is a complex topic. Different approaches have been taken from creating a complex XML Schema import structures to using “Data Contracts” expressed as a set of OO classes and annotations (used as a DSL). None of these approaches seem to have given satisfactory results as they lack the backbone of an Enterprise Data Model. The approach presented here reinforces the need to start designing service interfaces from the “contract” perspective (and not the code) and establishes a reuse strategy founded on the EDM as a key enterprise asset. 

Our approach also surfaces the synergy between Data Governance and SOA Governance. As new services are discovered, funded and implemented it is critical to rely on the Enterprise Data Model to design message types which have an enterprise footprint. Without using an EDM, service interfaces will tend to be designed with a footprint specific to projects and backend systems, reducing their ability to be reused by other consumers. The approach also enables Data Governance to effectively communicate changes to the Enterprise Data Model to the SOA Governance team which will trigger a new version phase in the services lifecycles when necessary.  

The author would like to thank Kjell-Sverre Jerijærvi and Boris Lublinsky for very stimulating discussions which contributed to this paper.

Rate this Article