One of the main objectives of a SOA Governance organization is to define the processes and policies that foster the development of reusable services. As such a SOA Governance organization will be involved across the service lifecycle from identification, funding, design, deployment, operation, versioning and retirement.
One key aspect of SOA Governance that is often overlooked is how Data Governance can complement SOA Governance. Even though they both have very different objectives, they both share a set of metadata often called the “Enterprise Data Model”. An EDM is a Logical Data Model, an Ontology if you will, of the overall Information System. Its structure is often abstract and loosely related to the physical structure of the systems of record. However, all data elements stored in any given system of record should be traceable to an element in the EDM. The EDM is often used to construct maps to transform data being synchronized or replicated from one system to another.
In addition to the EDM, Data Governance owns processes that have an impact on Service Design, Operation, Versioning and Consumption: these processes include Data Quality, Metadata Management, Reference Data changes, Business Rules Changes, External Data requirements, Data Model changes…
In this article we will not focus on the necessary alignment between Data and SOA Governance processes. We will focus on the first step that needs to happen before an effective collaboration can happen: the shared usage of the Enterprise Data Model.
Ever since XML was invented in the mid 90s, people have argued about the best way to describe the structure of XML documents, especially when it comes to creating reusable XML fragments. Three camps emerged with different, sometimes diverging, requirements: the Web camp, the Document camp and Data camp. When it became clear to everyone that DTDs were not going to suffice, the W3C quickly published an XML Schema specification which is now nearly a decade old. Only minor changes are expected for the coming minor version (XML Schema 1.1). Despite some of the criticisms (complexity, shortcomings…) and the development of alternative technologies (Relax NG) or complementary ones (Schematron), XML Schema has been and will remain the standard used to describe XML Message Types. Yet, no one has really found an efficient way to model EDMs using XML Schema definitions only. This has contributed to keep the two disciplines, Data and SOA governance separate.
In this article we will argue that Message Types should be generated from EDM metadata. We will also argue that the usage of traditional models such as XML, ERD or UML is not suited to enable the consumption of the EDM for this purpose. We will propose to define two complementary DSLs (Domain Specific Languages), one for the EDM and one for the Message Types referencing the elements of the EDM. These DSLs will be used to generate a textual notation from which EDM and Message Type definition can be captured. These DSLs are also well suited to create a graphical notation.
An Enterprise Data Model Domain Specific Language
UML and ERDs have both a strong lineage heavily anchored in concrete M2 metamodels (OO and ER respectively). Both models are actually incompatible with the hierarchical nature of XML documents and its associated schema language. It does not matter which pattern you use: salami slice, russian doll or venetian blind, or any combination of the above; XML Schema cannot fulfill this function effectively. It is not a modeling language; it is an XML document structure definition which enables you to validate documents. In addition, the complex XML Schema import structures imposed by these patterns break interoperability as not all frameworks are capable of generating classes from these intricate XML import structures.
Actually these three types of Data Models can only be reliably transformed via an intermediary data model called the Hyper-Graph Data Model.
it is, of course, possible to create XML documents from UML or ER models and vice versa and based on a series of well defined (and ad hoc) rules, in particular the ones that specify how to process associations. However, these rule-based mappings have never yielded realistic message types. The reasons are very simple to understand:
a) Data is relational in nature, this was true 8000 years ago when man invented writing, accounting and contracts. It was true one 100 years ago when all information systems were paper based and it will remain true probably as long as man manipulates data. In an Enterprise Data Model, pretty much every entity is related to any other entity. However, neither UML or ERD establish a clear boundary between the classes that make up the entity from the other entities.
b) Message Types elements often contain a subset of the attributes of the corresponding entity in the EDM. People have tried to create a set of granular reusable data elements, but that strategy has proven brittle and introduces unnecessary intricacies in the EDM
We suggest to create a metamodel with semantics specific to the needs of an EDM:
a) An Entity which define a scope where associations (aggregation or composition) will most often result in belonging to the same message type (or database table, or report for other applications of this metamodel). An example of this type of association is the “Customer-Address” relation.
b) An Entity Association which describes the relationships between Entities. An example would be “Customer-Account”
This metamodel is well aligned with the concepts of Domain Design Development (DDD) which include “Entity”, “Value Object” or “Aggregates”.
If we take an example in the healthcare insurance space, we see that the Member-Address association belongs to the Member scope while Member and Group (or Coverage) define separate scopes.
Figure 1. A simple UML diagram illustrating the different types of associations
Our EDM DSL grammar is represented (Fig. 2). We used the Eclipse Modeling Framework with the OpenArchitectureWare plugin to create a textual DSL. Markus Voelter recently gave an introduction on “Textual DSLs”.
Our DSL allows us to define Entities and Datatypes. An entity can be qualified as “basic”. As such it can only be referenced within the scope of another entity (either using an aggregate or composition) relationship. This is the equivalent of a value object in DDD. Of course, a basic entity can be referenced by different entities.
The DSL specifies “associations” between entities. These associations may extend an Entity type when you need to specify association properties.
Figure 2. The Enterprise Data Model DSL (defined as an XText Grammar)
From this Xtext grammar, OpenArchitectureWare generates a couple of Eclipse plugins that allow us to edit models using the syntax specified by this grammar and transform them in configuration or deployment artifacts.
One could argue that this metamodel could easily be expressed using a UML Profile and they are probably correct. However, the complexity of the UML Metamodel itself make it difficult to create a transformation from a UML class diagram (the Enterprise Data Model expressed graphically) into deployment artifacts. In addition the UML profile itself has a fairly limited value as it does not offer a particular way to represent the classed scoped within an entity definition. You would have to use a special notation for that purpose.
Overall, there is also a tendency to move away from graphical representations as the main form of edition and management of metadata; just as well, there is also a tendency to move away from XML representations. As Markus Voelter argued in his presentation, the main reason people have not used textual DSLs is simply because there were no simple ways to create parser and syntax sensitive editors (including intellisense). XText make it very simple to create the grammar and the rich editor experience that everyone expects (Figure 3).
Figure 3.A sample EDM (from Fig. 1) based on the EDM DSL (Fig. 2)
A Message Type Domain Specific Language
The idea of relating an Enterprise Data Model to message types is not new. For instance, in their book “Applied SOA”, Mike Rosen et al. argue that (Chapter 3.):
Information represents the data resources of the organization. Data resides in a variety of different stores, applications, and formats. Different levels of data are used by different levels of SOA constructs. The semantic information model defines the data for business processes and services. The information passed in business processes in the form of documents is based on the semantic information model. The documents provide a form of semantic message between processes and services. The SOA defines the mechanisms for transforming data from its native operational format to the semantic data required for the business processes.
The question is how can we establish this relationship effectively (Figure 4)?
Figure 4. How can we relate Message Type Definitions to the Enterprise Data Model?
Michael Rosen et al. recommend relying heavily on the EDM (they use the term Semantic Information Model) to design effective Message Types (see “Applied SOA” Chapter 6. p 249). The authors argue that one of the core benefits is to increase the compatibility between new consumers and existing service providers as the interface foot print is designed with the Enterprise Data Model in mind, rather than the boundaries of a back-end system or a particular project. However, the authors do not provide a model to approach this problem, let alone an automated way to perform this task.
The first step is of course to define an EDM metamodel as we have seen in the last paragraph.
Figure 5 details our Message Type Architecture. After creating an EDM DSL and a Message Type DSL, we will be in the position to generate the Message Type Schemas and WSDL files. The resulting schemas will be standalone and only referenced by the corresponding WSDL file. There is no need to create complex import structures to achieve any kind of type or element reuse since the reuse is coming from the references to the EDM within the message type definitions.
Figure 5. A Message Type Architecture
The message type DSL is represented Figure 6. The core concept of this DSL is the “Projection” element in combination with the scope defined by Entities in the EDM DSL. The Entity scope is used to define data fields that often go together in a message type, a database table or a class definition. The scope elements enable “reuse”. For instance, a basic entity “address” may appear in different entities in the EDM. These scoped elements are automatically reused in the message type definition unless they are explicitly “excluded” from the entity’s projection. A projection definition may however exclude any attribute or basic entity from the message type definition. On the other hand, it is often desired that some attributes or basic entities of the entities associated to the base entity may be part of a given message. In the example Figure 1, the groupID is often part of a message representing member information. In that case, the projection may also "include" elements outside the base entity scope. The model also supports the inclusion of elements that do not necessarily belong to entities directly associated to the base entity via a composition of projections.
This approach completely alleviates the need to manage one-to-one or one-to-many “associations” at the Message Type and XML Schema levels. Elements and attributes from related entities are simply “projected” into the message type. If we take the example figure 4, our DSL be used to generate XML Schemas where an Orders collection element is a child element of the Customer element. Each order may also include its shipment information. This is typically where XML Schemas have failed since people had to model associations with complex XML Schema imports and references within the XML Schema.
Many-to-many relationships (for instance orders and products) are treated with the dependencies element in the EntityArea definition. In that case, the XML Schema generator will have to use key and keyref elements to ensure the proper representation of the association between the message elements. There is no need, however, to manage these elements in the Message Type DSL since this information will be coming from the EDM definition during the XML Schema generation.
The message type may only contain projections, i.e. references to entities, basic entities, associations and attributes of the EDM. There is no provision to add elements that may not be part of the EDM. This is a design decision as the EDM is supposed to represent the enterprise data model and all data elements stored in the systems of record are supposed to be traced to an element of the EDM.
Figure 6. A Message Type DSL
The message type itself is composed of:
- a Verb (GET, NOTIFY, PATCH, CONFIRM, CANCEL, PREPARE, SUBMIT, SHOW or ANY – see Uniform Interface and Verb definitions, figure 6)
- a Noun (the base entity)
- Whether the processing of this message is idempotent or not
- A payload which can be of three different types
The first 3 elements will be used to configure the business envelope of the message such as the one defined by the Open Applications Group. When the XML Schema is generated all the elements of the envelope will be weaved in all message types. These elements include a message identifier, sender information, date...
The verbs are a subset of the Open Applications Group verbs and the HTTP verbs. As a design decision, we chose to prevent the use of POST (or PROCESS in the case of the OAGIS), PUT and DELETE verbs which tend to encourage a CRUD interface definition since CRUD like interactions tend to create a strong coupling between service consumers and service providers.
The payload of the message may be of three different types (again following some of the Open Applications Group design guidelines):
a) a Query Area (expressed as a Query-by-Example -QBE, in this case the verb is GET)
b) an Event Area (that represents the occurrence of a state of the source entity –publishing the event message with the NOTIFY verb)
c) an Entity Area (that represent the argument of all the other verbs)
Figure 7. A Message Type Definition
The PATCH verb is used in combination with an entity area and is associated to a REQUEST-(CHANGE)-UPDATE pattern. This pattern has been implemented with DataSets in the .Net world and with SDO (Service Data Object) in the Java World. The pattern is particularly useful for BPM applications that need to implement a certain type of human activities which goal is to change the content of a base entity (and related elements). However, when an action (which triggers a state change in the base entity) is invoked it is advised to use its corresponding verb, rather than a generic one like POST. It is important to point out that the proposed metamodels do not support readily the definition of Change Summary. One might think however that a Change Summary schema could be generated from an entity definition with the appropriate algorithm.
The PREPARE and SUBMIT verbs are examples of commonly used verbs that are worth standardizing across an enterprise within a Uniform Interface. It is common for instance that a particular Entity such as a purchase order be “prepared”, i.e. created and updated until it is ready for “submission” and start its lifecycle. The response verb to such a submission (or any other action for that matter) is “confirm”.
The CANCEL verb is used as a standard action verb which indicates the intent to terminate the lifecycle of the business entity (e.g. Cancel Purchase Order).
The SHOW verb is used for responses, for instance, in response to a GET request.
The CONFIRM verb is used in response to an action request, for instance, in response to a SUBMIT or CANCEL request.
These two verbs (SHOW and CONFIRM) do not exist in REST because REST is RPC oriented and not message oriented. A response in REST has no particular semantics except in the case of Atom collections. REST does not make any distinction between a "technical" acknowledgement and a "business" acknowledgement.
This Message Type DSL can then be used to generate all Message Type Schemas. Certain aspects, such as versioning or a business envelope, can be weaved in the target schema generator (figure 5). A business envelope wraps the message payload within an envelope that contains information that is relevant to the interaction and the parties or components involved in the interaction. The Open Applications Group for instance is using a “BOD” (Business Object Document) for that purpose. We highly recommend using this business envelope pattern rather than developing proprietary SOAP headers. SOAP headers should only be used by service containers which need to implement some policies or quality of service (security, reliability, transaction...).
You may also want to go all the way and create a Service Interface DSL such as the one from figure 8, which can be used to generate abstract WSDLs (without bindings or service endpoints).
Figure 8. A Service Interface DSL well suited to generate abstract WSDL artifacts
We will not detail here how XML Schemas are generated from the Message Type definitions. This topic mandates its own an article. It is important to note however that our approach to Message Type generation enables a couple of important behaviors:
- First, any Message Type XML Schema can be generated individually. This means that when a change occurs in the EDM, we can selectively regenerate XML Schemas (from the same message type definition). You may want to keep track of the EDM version with which each schema was generated. When complex XML Imports are used, a change in one of the imports propagates to all message types schemas.
- Second, our approach does not require a “Canonical Schema Model” (CSM) between service providers and service consumers. A CSM, by default, is forcing 2 transformations in the path of any message (both request and response). Modern SOA concepts advocate for “smart” edges which are capable of transforming incoming and outgoing messages. If a message is “produced” and “consumed” by a single back-end system, there is no reason to transform it to a canonical format. It is just as easy to transform it on the consumer edge from whatever the provider schema is.
In a technical sense [a Canonical Schema Model] has a benefit in that at the endpoints only one transformation service per message type has to be configured. A subscriber needs to subscribe to only one message type, regardless whether there are multiple sources or not.
does not really applies to SOA and a Service Provider/Consumer pattern. Jack’s recommendation makes sense in synchronization/replication scenarios where endpoints are in effect the systems of record themselves but the main focus of SOA is to precisely avoid synchronization/replication patterns by avoiding creating new systems of record when building new solutions. SOA is not Integration, though it is sharing technologies with Integration. SOA is about creating services that encapsulate existing systems of record such that new solutions can be developed by consuming these services without creating the need to duplicate information from other systems of record (Figure 9). When information is not duplicated it does not need to be synchronized and replicated.
In a Service Oriented Architecture, the service interface is “the” canonical model (Figure 9). It isolates the service consumers from the systems of records. When a service is well designed, all consumers invoke that particular service, and this service, in turn, invokes all the necessary back-end systems. Introducing a “Canonical Schema Model” above the service interface is in our opinion superfluous. Some may argue that service interfaces will not be consistent without it, and, each developer will create its own semantics. We certainly agree with this argument, this is why our approach is based on defining Message Types from the EDM definition. The Message Type DSL gives you some flexibility to deviate from the EDM semantics but it would require more work. By default it is using the EDM semantics and structure which overall contributes to consistent message types across service interfaces.
At the same time, it is unlikely that a service consumer will interact with different types of services. This can happen in B2B scenarios of course, it can also happen in large organizations, but it is generally undesirable as it might simply means that you have designed your service interface incorrectly.
Figure 9. Services supporting new solutions with existing systems of record. The figure represents the footprint of Entities in each system of record, behind the service interface
A CSM is also counter to many SOA Versioning principles as it “forces” all endpoints to conform to the same “canonical” schema, which we have seen is not practical since without the proper versioning strategy organizations often end up creating one service interface per consumer, thereby defeating the purpose of a CSM. SOA technologies (such as XML or WSDL) where actually designed to alleviate the need to always conform to a CSM which was a key design pattern for Integration Platforms in the 90s, but in the end this pattern propagated too many changes to the endpoints that did not need to change if only contracts could be designed in a compatible way.
There are also some important benefits stemming from a CSM. For instance Jack emphasizes that:
Defining canonical message formats creates the opportunity to supply the company with an unambiguous catalog of available messages about business events, representing valuable business assets.
Nick Malik calls this a “Business Event Ontology”.
This capability is critical to Business Activity Monitoring and Complex Event Processing, for instance. We provide it with the business envelope concept that we introduced in the message type definition. In our model, Events and Actions are clearly expressed and agreed upon at the enterprise level, outside the message payload.
Event Driven Architectures are an appropriate architecture when you want to integrate silos, i.e. autonomous information systems, but EDA's principles do not generally apply to SOA. Events (defined as the occurrence of a state) and Message Events are of course an important part of a Service Oriented Architecture but EDA alone cannot be the foundation of a Composite Programming Model.
Conclusion
The management of Message Types in a Service Oriented Architecture is a complex topic. Different approaches have been taken from creating a complex XML Schema import structures to using “Data Contracts” expressed as a set of OO classes and annotations (used as a DSL). None of these approaches seem to have given satisfactory results as they lack the backbone of an Enterprise Data Model. The approach presented here reinforces the need to start designing service interfaces from the “contract” perspective (and not the code) and establishes a reuse strategy founded on the EDM as a key enterprise asset.
Our approach also surfaces the synergy between Data Governance and SOA Governance. As new services are discovered, funded and implemented it is critical to rely on the Enterprise Data Model to design message types which have an enterprise footprint. Without using an EDM, service interfaces will tend to be designed with a footprint specific to projects and backend systems, reducing their ability to be reused by other consumers. The approach also enables Data Governance to effectively communicate changes to the Enterprise Data Model to the SOA Governance team which will trigger a new version phase in the services lifecycles when necessary.
The author would like to thank Kjell-Sverre Jerijærvi and Boris Lublinsky for very stimulating discussions which contributed to this paper.
Community comments
Missing words and typos
by Gregor Rosenauer,
Re: Missing words and typos
by Jean-Jacques Dubray,
Idempotency, Safety
by Stefan Tilkov,
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Re: Idempotency, Safety
by Stefan Tilkov,
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Re: Idempotency, Safety
by Stefan Tilkov,
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Re: Idempotency, Safety
by Stefan Tilkov,
Re: Idempotency, Safety
by Jean-Jacques Dubray,
REST and RPC
by Stefan Tilkov,
Re: REST and RPC
by Jean-Jacques Dubray,
Nice article
by Peter Rajsky,
Re: Nice article
by Jean-Jacques Dubray,
Re: Nice article
by Peter Rajsky,
Re: Nice article
by Tiberiu Fustos,
Re: Nice article
by Jean-Jacques Dubray,
Re: Nice article
by Peter Rajsky,
Re: Nice article
by Jean-Jacques Dubray,
Re: Nice article
by Kjell-Sverre Jerijærvi,
Re: Nice article
by Kjell-Sverre Jerijærvi,
Re: Nice article
by Jean-Jacques Dubray,
Re: Nice article
by Tiberiu Fustos,
Re: Nice article
by Jean-Jacques Dubray,
Thank you
by Hermann Schmidt,
Really nice post
by Alejandro Raiczyk,
Re: Really nice post
by Jean-Jacques Dubray,
Missing words and typos
by Gregor Rosenauer,
Your message is awaiting moderation. Thank you for participating in the discussion.
There are some missing words and rough edges in the first part of the article:
Section "An Enterprise Data Model Domain Specific Language":
Re: Missing words and typos
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yes, sorry, I just realized that. This is a bug in FrontPage nad Expression when you copy a selection it changes the content of the selection from time to time. I'll correct them later today.
Idempotency, Safety
by Stefan Tilkov,
Your message is awaiting moderation. Thank you for participating in the discussion.
Interesting to see you, of all people, adopt a uniform interface …
You define idempotency as an attribute of the message, independent of the verb. Isn't e.g. a GET or CONFIRM request always idempotent in you model? On a related note: Any particular reason not to include a verb (or a message attribute) that signals that a method is "safe" to enable access without consequences and support caching?
Can you give a small example (with source in your DSL, I mean) of a projection?
REST and RPC
by Stefan Tilkov,
Your message is awaiting moderation. Thank you for participating in the discussion.
Unsurprisingly, I strongly disagree with applying the "RPC" moniker to REST. My guess is that what you are referring to is just one typical characteristic of RPC, which is that it's a request/response model. REST doesn't share the other two main characteristics of RPC, an application/service-specific interface and the idea of hiding networking behind a local programming model.
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Well, the verbs are UniformInterface | Any, so the uniform interface only supports the "standard" verbs and is an extensible mechanism to standardize as many verbs as you need within your organization. So it is not as uniform as REST for instance.
The DSL here is a proposal, I highly encourage people to include the attributes that they feel are important for a given message. I am not sure idempotency is a property of verb. Of course in REST it makes total sense to associate idempotency to a verb. However, REST is missing the action dimension. When you consider actions it might be hard to always assert that (action,noun) will behave consitently whatever the noun.
If I take the verb "Pay" and in my enterprise I have "Claims", "Bills", "Invoices",... I may want to standardize on the verb Pay but I can only garanty that my payClaim operation is idempotent. For some reason when I implemented my billing service we could not achieve that result.
In SOA, by definition all messages are "safe" :-) since they do not mandate any changes anywhere (remember we are not CRUDing). The receiver of the message is ultimately responsible for deciding what to do with it. The mere existence of "safe" prooves that REST had CRUD in mind when it was designed.
Caching does not work in the enterprise except in very limited cases which are then controled by the Service Provider. The service consumer does not need to know anything about it. I don't know of any instance where a service consumer would want to cache (or someone else to cache) the content and state of a business object (Account, Claim, Bill...) None of the business entities are cacheable.
Figure 7 provides examples of Projection definitions:
projection basicMemberDetailRequest { &entity Member
exclude { address; }
}
projection basicMemberDetailResponse { &entity Member
exclude { SSN; address; }
include {groupInformationForMember;}
}
projection groupInformationForMember {&entity Group
exclude { coverages;}
}
Re: REST and RPC
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
After seeing how people use REST in practice, for instance Doug Purdy, explaining how he created a "Hellow World" RESTful "Service" with Oslo by offering a POST /service/helloworld/{string} syntax that returns a "Hello {string}" resource representation (as you can see a "safe" and "idempotent" VERB+NOUN combination), I can safely say that 99% of developers and architects will use REST that way.
The problem is not so much if REST was intended to be RPC or not, the problem is what kind of architecture constraints the REST model provides for people to do the right thing (no RPC and no CRUDing). I don't want to argue about REST much more, but the fact and the matter is, people that use REST are either remoting or CRUDing and we know what kind of connected system you can build with these two approaches.
The remoting people have tried to make WS-* very remotish (as hard as they could), but fortunately, WS-* has intrinsic properties (inherited from the B2B days and ebXML) that let some people use a message oriented approach. WS-* or SCA support:
- bidirectional interfaces
- forwards compatible versioning mechanism
- assembly mechanisms
- orchestrations (and choreographies)
- the ability to subsitute a service provider by another by changing a single endpoint
- federated security
- ...
Re: Idempotency, Safety
by Stefan Tilkov,
Your message is awaiting moderation. Thank you for participating in the discussion.
You have your logic backwards. In HTTP, there's a guarantee that "safe" methods (OPTIONS, HEAD, GET) can be called without negative consequences for the caller. For this reason, they can be used by a search engine, or to receive metadata, or to get something to present to the user, or to get at the next possible state transitions.
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Stefan:
I understand why REST needs to make the distinction, but again, this only exist because REST has a very small set of verbs. In practice and in reality you have N+4 verbs, so I don't see any real value to make this distinction. If a method is directly acting on the system of record in REST, you may want to flag it, but again, this is a pattern that is not desired for connected systems. I don't want anyone to manipulate my system of record. As a service provider, I am responsible for making this decision. So you can argue that all (~N+2) messages are safe, or none (~N+2) of the messages are safe. Since you can argue in both direction, it tells me that this attribute is not suited for SOA.
Now, if you have special agents arbitrarily calling methods of a certain type you may want to flag it with whatever attribute you want (calling it "safe" is ok). I would however point out that you don't let agents roam around your systems of records. So, in SOA, there is a "safe" method which is getEmployeeSafeDetails and there is another one that is really not "safe" getEmployeeCompleteDetails (with salaries, ssn,...). Again, I don't see the distinction valid in the enterprise, while I fully understand why it is in REST.
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Stefan:
incidentally, I have not talked about it but you can design your EDM DSL to mark fields that are sensitive with respect to privacy.
A very important aspect in the enterprise is to figure out if a message contains "private" data or not, so I guess you could also declare a message to be "sensisitive" or not based on the fact if it contains sensitive EDM elements.
This message type architecture is not an end-all be-all, it simply details an approach that can help you manage message type based on any requirement that your organization may have without having to reify your enterprise specific requirements behind very general concepts.
Re: Idempotency, Safety
by Stefan Tilkov,
Your message is awaiting moderation. Thank you for participating in the discussion.
I don't understand how you can't see the value of a safe operation; it's the very basis for much of the Web's success. Being able to distinguish between safe and unsafe operations gives a lot of power to the infrastructure, e.g. for indexing purposes. Admittedly, you'd not get much out of it, since you don't do links. So I'll move on to the next topic.
Re: Idempotency, Safety
by Stefan Tilkov,
Your message is awaiting moderation. Thank you for participating in the discussion.
Again, I strongly disagree. Let me point to the most obvious example: An invoice that has already been sent out will quite probably never change again. Why should a client not cache it? The same is true for a claim that has already been processed, or an account that has been closed.
Let me elaborate on the last example: An account that is updated only once per night can be cached all through the day.
We can argue about the relevance or the percentage of cases where this is useful. I claim it's useful in a vast majority of cases. I accept when you disagree and claim it's lower, but not when you say it's zero.
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Stefan:
I can see the value for the Web, absolutely, it is just that I can't see the value in the enterprise. I understand the values of the links, especially at the presentation layer: if you are a CSR, you need to navigate freely and arbitrarily to get to whatever piece of information is relevant to you conversation with the customer.
But I don't see a service consumer getting an Entity representation and then navigating arbitrarily to links and exploring the content of linked entity representations. I am sure that some enterprise might have a need for something like this, but I don't see why when you get a Purchase Order in your business process you would need to navigate to Customer information. The PO contains the customer information that the business process needs. Nobody wants to do a second call to get a "shipping address", it is expected to be in the PO document.
If you had an example that was not "indexing" or "search engine", that would be great. (Again for the Web, I totally understand the criticality of the concept and why you say this is one of the keystones of the Web success, no question there).
Re: Idempotency, Safety
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
yes, I think the key is to consider the frequency at which a particular instance is going to be invoked. How often do you call a company to figure out the state of an order or if you paid your bill? How many times do you need to fetch an order before you set out to pay it?
I can see caching helping in some MDM scenarios, but even in CDI, it will really depend how often a customer interacts with your call center. So I am sorry, again, I don't see a big advantage for caching, but nevertheless all this can be put in the model if you see any benefit, I have no problem with it.
I think what is clear to me today is that:
- every enterprise is different
- technologies cannot reflect that variability
- reifying concepts into whatever a given technology provides does not work, tools are missing, it creates inconsistencies between developers and projects
- today there are great technologies such as Xtext that allow you to pick your prefered technology stack (HTTP+ATOM) or (WS-*+SCA) and layer the semantics that you need. If one day you need to change stack for whatever reason, or god forbids you would actually need to use both stacks to build realistic connected systems, then it is possible.
So for me, these debates are less important today than they were a month ago. All I know is that neither REST or WS-*/SCA can give me the semantics that I need, though I can use all the capabilities that they provide, including caching.
Nice article
by Peter Rajsky,
Your message is awaiting moderation. Thank you for participating in the discussion.
It is really interesting article.
We use UML for generating XML schema. Week ago I wrote blog entry about the same problem I feel with our approach.
Combination of UML for definition business entities and message type DSL for "envelopes" will be fine.
Re: Nice article
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Peter:
thank you for your comments. Yes, it is probably possible to do it with UML. I tried and I failed because of the complexity of the metamodel. One advantage of using a DSL for both is that Xtext knows how to cross reference elements in two different files (I have not implemented that yet). I worked on a project in 2007 with adapative software that used UML and this approach to generate schema. I left the company I was working for before the project was finished but the consultant who continued the work told me that he was able to generate XML Schemas from it, but I think they had to be tampered by integration developers to get the final product.
This article I think provides a solution to the problem you exposed in your post: it.toolbox.com/blogs/system-integration-theory/...
JJ-
Re: Nice article
by Peter Rajsky,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yes, this approach really solves my problem, but there are a few minor negatives (except of the first one) probably:
You are right that UML profiles are too weak for defining projections (views). I hope there will be better tool support for PIM-to-PSM transformations, which could be used for this purpose tool probably.
Meanwhile I will store message DSL definition as tagged value :(
Thanks for sharing your ideas.
Re: Nice article
by Tiberiu Fustos,
Your message is awaiting moderation. Thank you for participating in the discussion.
Hi JJ & Peter,
I agree - it's a nice article, but it requires some "digesting" and tooling...Just from our experience (SOA approach to deal with legacy silos in a telco): we have done our BOM in UML and we have partitioned it into different domains. Each domain exposes capabilities via services to the other domains. In order to avoid the complexity problem when inferring the message model from the CDM (described in your blog), we allow the messages to have different representations (schemas for each domain) - I assume these are the "projections" or views you are talking about.
Example: the order management domain does not need all the Customer entity attributes and structure (associations) used in the CRM domain. The CRM Domain remains the master of the Customer Entity in the BOM, but the message models are allowed to differ (thus the Order Management programmer only has to worry about the relevant sub-set of his domain).
The tooling - well, we use a widely used UML tool (EA) with a home-made plug-in and some self-defined constraints for generating the XML schemas and service contracts from each message model. The BOM -> Message Model transformation is however a manual process, we only use the tool support for the Message Model -> XSD (WSDL) transformation.
Re: Nice article
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
thanks guys,
@peter,
again, I think it is necessary to walk away from UML, if you need a graphical notation use GMF on top of the Xtext grammar DSL. It creates an ecore metamodel and I believe that Xpand (which transforms a model into XML schema) will simply transform the ECore based metadata into XML schema.
You might also consider creating an XMI file from the xtext model definition. The bottom line is that you don't have to use UML.
>> There could be namespace chaos
I may be wrong but I don't think so, remember now Schemas are generated you don't care to slide and dice your them, there are no imports. The only usage of the namespace is for keep track of the major version of the message type (it's okay to keep a classification in terms of business areas and process areas for instance, but it is not mandatory).
@Tiberiu
>> the order management domain does not need all the Customer
>> entity attributes and structure (associations) used in the CRM
>> domain.
If you are saying that a PO has some customer data, but when you ask for a customer profile you have a lot more attributes, this is already covered bu the projection mechanism. This is actually the whole goal of the projection mechanism.
Now if you mean that a PO in an order management system looks different from a PO in a CRM system, I would argue that's covered too. The model can expose many Query/Response interactions via different operations. The projection mechanism ensures that each query/response messages are defined by reusing as much as the EDM as possible.
>> The BOM -> Message Model transformation is however a manual
>> process
I think this is where the projection concept is going to help you.
The Message Type to XML is fairly straightforward, you have simply created a UML model of the message type. The holy grail is to integrate the EDM in the chain.
I have added more comments here: www.ebpml.org/blog/167.htm
Re: Nice article
by Peter Rajsky,
Your message is awaiting moderation. Thank you for participating in the discussion.
JJ,
I do not understand why you "mix" two (for me) different problems in your article:
I do not see any relation between these two problems. I really like your approach to message type architecture, but I can't accept your approach to EDA and ESB.
Note:
I agree REST is related to internet-wide integration. I fan of REST for this purpose, but not in enterprise.
Re: Nice article
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Peter:
I am not necessarily "fighting" EDA, what I am fighting is more the notion that Gartner introduced years ago that somehow EDA would replace SOA (and WOA too).
My point was to show that:
a) a CIM is not useful, in general (but not in the context of B2B), the service interface is the CIM.
b) there is a difference between SOA and Integration even though they are using the same technologies
c) Events can be very easily integrated in the SOA model and in that case, these are real events. If you use a "pure" EDA approach (which I have seen for instance at SUN in the late 90s), you are forced to "reify" lots of semantics behind a pub/sub mechanism.
In the article's model, you can naturally define an event as the occurence of a state of a particular entity and nothing more. So ultimately this article is really the foundation to unify resources, services and events. I am pragmatic, I think it is best to provide people to do that than telling them how precisely they should do it. Every company is different and as I said earlier, it is best to define the semantics that fit your problem model best. For instance in Healthcare, "privacy" is a key issue, much more than "safe" and "idempotency". In finance, transaction volume is (or was) the problem, privacy is not a major concern (it is but not as spread across the EDM), so there, concepts like idempotency and safe are important. By having very precise semantics at that level you can start weaving consistantly aspects that will make your SOA that much more simpler and efficient.
Thank you
by Hermann Schmidt,
Your message is awaiting moderation. Thank you for participating in the discussion.
Jean-Jacques,
thank you for this inspiring article! You are addressing so many of the problems I am struggling with. I am currently burning in hell with an UML-based class model (in Enterprise Architect) and a generator built with oAW, which I have to maintain. In the same project we have a humungous XML Schema with "the canonical model". Two big mistakes in a row.
I have suffered from three naive implementations of a canonical model now and I am fed up with it. I'll not watch a fourth false attempt silently.
I will pursue the strategy you are suggesting. I've had some similar ideas in my mind but I never got them nailed down.
Re: Nice article
by Tiberiu Fustos,
Your message is awaiting moderation. Thank you for participating in the discussion.
JJ wrote:
This is exactly the point. We have the BOM (EDM, CIM), we have the message types pro domain but it's not traceable. If I get you correctly, with the DSL you still have to define the projections "manually", but you have the whole thing consistent, without creating the scary "company-wide XSD" that Harmann described or losing consistency (what I experienced).
You definitely hit a pain point here, it's worth digging deeper!
Re: Nice article
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yes, I think we have all been there, I too have struggled with UML and tried to create modular XSDs. The only reason I prefer DSL over UML profiles is that you have a lot more control over the semantics of your model. In theory profiles are very powerful, in practice they are hard to deal with. There are also semantics that are very hard to reprensent in profiles: for instance a "choice" is a great data structure semantics that XSD innovated on, but in UML it's hard to model. A DSL gives you complete flexibility.
The reason I prefer textual over graphical is simply because graphical plugins are not as robust (and easy to use) as the ones from Xtext. Of course If I could manage the EDM graphically I would, but that not easy either with DSL Tools (MS) or EMF.
You guys can email me, and I'll send you the Xtext files gmail / jdubray
Re: Nice article
by Kjell-Sverre Jerijærvi,
Your message is awaiting moderation. Thank you for participating in the discussion.
Unfortunately, the term CIM has been reified by parts of the community from being a common representation of business entity objects like "Customer" as defined by Eric Roch (it.toolbox.com/blogs/the-soa-blog/soa-and-xml-7517) to include also the event data such as the payload for "CustomerHasMoved" messages - and even the event taxonomy.
Two examples of CIM reification:
"you need to create your common information model. That model must contain not only information entities, but also a notion of what business documents you will communicate with, and what events occur on each document"
"CIM is a completely controlled and totally governed centralized data model that defines the dataflow of an Enterprise Service Bus (ESB)"
I use the same definition as Eric, thus CIM and EDM are similar concepts that models business entities and differs only in scope; the latter encompass all data in all systems of record, while the former is domain-driven and only comprise the parts of the systems of record that pertains to SOA.
Re: Nice article
by Kjell-Sverre Jerijærvi,
Your message is awaiting moderation. Thank you for participating in the discussion.
An explanation of CIM based on articles by Mike Rosen and Eric Roch: Common Information Model.
My post also relates CIM to the reckognized SOA design patters "canonical schema" and "schema centralization", as I agree with JJD that the DSL approach has less negative side effects.
Re: Nice article
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yes, I think it is important avoid making the service interfaces static via a canonical schema or schema centralization. Personally, I am more a "bottom-up" guy when it comes to building the EDM so I am ok with a CIM approach. Again, "static" schemas must be avoided, this is what kills reuse in the enterprise. In B2B of course the problem is different, you want more "staticity" to build very large consumer communities.
Really nice post
by Alejandro Raiczyk,
Your message is awaiting moderation. Thank you for participating in the discussion.
It helped me so much.
One question, when you say:
"The message type may only contain projections, i.e. references to entities, basic entities, associations and attributes of the EDM. There is no provision to add elements that may not be part of the EDM. This is a design decision as the EDM is supposed to represent the enterprise data model and all data elements stored in the systems of record are supposed to be traced to an element of the EDM."
You mean that every field in a message should be projected to the EDM. What happens with fields that are not part of the EDM, for example a filter that is not by example like “sendEmailConfirmation” in a transaction message?
Another one, you've defined
projection basicMemberDetailRequest { &entity Member
exclude { address;}
}
…
query MemberQBE on basicMemberDetailRequest;
...
message getMemberBasicInformation {
verb GET;
noun Member;
query MemberQBE;
}
what kind of xml do you generate from that definition? Something like this?
...
<query>
<member>
<SSN></SSN>
<firstName></firstName>
<lastName></lastName>
</member>
</query>
...
I would like to see some generated requests/responses, I don't know if this is possible.
Thanks a lot!
Re: Really nice post
by Jean-Jacques Dubray,
Your message is awaiting moderation. Thank you for participating in the discussion.
Alejandro:
thank you for your kind comments, I just saw your question today. Sorry for the late reply.
>> You mean that every field in a message should be projected to the EDM.
Actually, from the EDM, they are basically a subset of the EDM but not necessarily following the same boundaries of the entities defined in the EDM.
>> for example a filter that is not by example like “sendEmailConfirmation”
Well I would argue that there should be somewhere an attribute of the transaction thare is "emailConfirmationRequired".
>>what kind of xml do you generate from that definition? Something like this?
Yes exactly, in this case (for a QBE) the fields are optional (query by first name, by last name, ...)