BT

Avoid a Canonical Data Model

| by Jan Stenberg Follow 38 Followers on Apr 12, 2015. Estimated reading time: 1 minute |

Standardizing on common models for business objects that are exchanged within an enterprise, e.g. Customer, Order and Product together with the attributes and associations they have, might seem like a compelling goal to achieve but for Stefan Tilkov this creation of Canonical Data Models (CDMs) is a horrible idea which he strongly advices against.

Tilkov, co-founder and principal consultant at innoQ, in his experience sees how organisations are notorious in creating work based on bad assumptions and notes that CDM as he defines it will require numerous meetings and coordination between all parties involved. Commonly the result of all this work is models containing lots of optional attributes and strange behaviour to satisfy the needs and restrictions from all systems intended to use the models.

To avoid such models Tilkov refers to bounded context, a concept from Domain-Driven Design (DDD), for dividing a large model into smaller contexts thus allowing for business objects to be modelled differently and according to the need in each context. He emphasizes that this is important for large systems but even more so for an enterprise-wide architecture; all systems don’t have the same needs and should be allowed a design according to their respective need.

For organisations that still are aiming for CDMs Tilkov has some guidance to address problems that might arise:

  • Allow for independently specified parts enabling parts being defined by different teams.
  • Standardize on formats and create building blocks smaller than business objects instead of a large consistent model, allowing for teams to add these building blocks together as suited for their needs.
  • Don’t push models onto teams; instead let them pull a model into their context when they see a value in doing so.

For Tilkov an enterprise architect should avoid centralization and CDMs, instead establishing a minimal set of overarching goals and delegating much of the responsibilities down to the teams and the people within the teams.

Back in 2008 Bill Poole wrote about centralised vs. decentralised data from a SOA perspective. Based on the disadvantages he saw with a centralised model and from his own experiences he favoured a decentralised view which caused some debate.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

yah I agree, in most cases CDM are a RPITA by Pierluigi Vernetto

you end up having a over-bloated model where lots of fields are not relevant for a specific interaction, yet the developer has to worry about providing a meaningful value for them or deciding what is optional and what is required in a specific context...

single model ...no by Erik Gollot

Your're right but not sure your reason are totally exact. Of course they're different context in an IS. But the more important are to know which system is responsible for which business entity. Others systems can use an entity own by a system and define additional data and remove others in their context.
But the "others" system are not allowed to modify the part of the entity own by its owner.

So yes a single definition of an entity in the entire IS does not mean anything

it is not easy by Jakub Jóźwicki

It is hard to create a good CDM. Its architect needs to have long experience with different business areas and knowledge of real life processes, also exceeding scope of the current company. Designer needs to be foreseeing. CDM entity should not just go after two domain systems and merge their attributes. Common parameters should be named and there should be a place for extensions using key-value containers. All current and future systems/messages should fit into current CDM. Popular mistake is to create too tight and too small model which needs serious changes during some project. Another mistake is to create one CDM for whole life-time which is too big and too complex for easy usage. CDM is not given once forever, it evolves over time, constantly and iteratively improving. Not using CDM is the biggest mistake.

It is not black and white by Pavel Grushetzky

I don't think standardizing model is inherently bad, even in context of the enterprise. It is hard and painful - yes. And along the way people often forget why they started the whole thing. And end up building model which abstractly looks good to anyone (design by committee in action).

Standardized model greatly help implementing orchestrated business scenarios, those that spread across multiple systems. So it has value in a public API, public contracts involved in those use cases.
For internal contracts or private API, standardization for its own sake is just a waste of time.

In my experience, often people driving CDM don't spend enough time and effort understanding the systems, private/public touch points, data flows, orchestrated scenarios. Let alone building inventory of these, and designing with these in mind. Now, I don't think doing all those things would make CDM design easier. In fact I think it will make CDM design significantly harder. Its just the end result would make sense.

Oversimplifying article, which has purpose to attract clicks only by Peter Rajsky

Using bounded contexts is great idea, but at the end the individual system models need to be structurally/semantically compatible to allow creation of "system of systems" (e.g. CRM and billing system customer models can be different, but it must be possible to integrate entity Customer from CRM to billing). And this is purpose of Canonical data model. It is the tool, which should be used to promote and validate compatibility between system models.
It does not mean that:
- Interface shall contain all attributes of canonical entity
- Entities, which are not heavily shared across system, should be "heavily standardized by some committee". You can think about these parts of CDM as DDD "published language".

If you're talking about enterprise wide master data model, then yes by peter lin

I've seen health care companies try to create a master data model for the whole enterprise, which ends up being gigantic with 10K+ entities. That's not exaggeration either, that's rather common in health insurance.

If on the other hand, CDM scope is focused and subject matter can be captured in a couple hundred classes, then CDM can make life easier. The problem isn't the technique, it's the person that's using it.

CDM needs tooling support by Faisal Waris

For the record, our CDM efforts did not exactly work out.

I am OK with the CDM concept but without adequate tooling support, the manual effort involved and chances of making errors are just too high.

It could work if:

a) There is a central repository for managing CDM in a flexible way. I think something like OWL would be good modeling language to use.

b) Tooling to transform slices of the CDM into various concrete model formats, XML Schema, Swagger etc. for use in service contracts.

Perhaps a structured approach to solving the CIM / Message Format problem could be considered... by Jean-Jacques Dubray

I have developed a free Eclipse plugin using Xtext to manage the definition of message formats in relation to a Common Information Model. The metamodel behind the plugin is capable of versioning the data model independently of the message formats. Message formats are expressed as "projections" of the CIM such that no one has to deal with extraneous attributes. Projections of course support a different multiplicity for Queries (optional parameters) and Commands (required parameters).

Swagger and API Blueprint are on the roadmap as well.

Perhaps a structured approach to solving the CIM / Message Format problem could be considered... by Jean-Jacques Dubray

I have developed a free Eclipse plugin using Xtext to manage the definition of message formats in relation to a Common Information Model. The metamodel behind the plugin is capable of versioning the data model independently of the message formats. Message formats are expressed as "projections" of the CIM such that no one has to deal with extraneous attributes. Projections of course support a different multiplicity for Queries (optional parameters) and Commands (required parameters).

Swagger and API Blueprint are on the roadmap as well.

Perhaps a structured approach to solving the CIM / Message Format problem could be considered... by Jean-Jacques Dubray

I have developed a free Eclipse plugin using Xtext to manage the definition of message formats in relation to a Common Information Model. The metamodel behind the plugin is capable of versioning the data model independently of the message formats. Message formats are expressed as "projections" of the CIM such that no one has to deal with extraneous attributes. Projections of course support a different multiplicity for Queries (optional parameters) and Commands (required parameters).

Swagger and API Blueprint are on the roadmap as well.

Re: Perhaps a structured approach to solving the CIM / Message Format probl by Faisal Waris

Very cool. I will check it out.

And I totally agree with him... by Jérôme Avoustin

What an awful idea to model a concept in a "one for all" manner... This inevitably leads to code complexity when functional complexity grows...

Re: Perhaps a structured approach to solving the CIM / Message Format probl by Jean-Jacques Dubray

yes, I believe this is exactly what you were describing above, with full versioning support, which is key to make the tool practical to use, otherwise we would be back in dependency hell, like it is with schema languages.

Valid points but some gaps to address by Steve Carter

As others have mentioned it is not black and white and there are assumptions built into the ideas presented. You can use bounded context and write to a shared table set but you still have to manage concurrency. Properly implemented the CDM is only known to the component orchestrating the data conversations between the systems and components. Therefore the bounded context or optimal data representation is maintained for each system. Concurrency is easily managed via the middleware that orchestrates the conversations. The question you have to answer is whether your organization has reached a level of sophistication to support such an approach. You will always have the debates over common elements shared across departments but if you have rules like a shared element is owned by one department you can usually uncover workflows that determine how the element is updated. CDM or CIM (common information model) is not a panacea and takes considerable work to implement and maintain. It can provide for very powerful integration of systems that enables an organization to work with real time data and make better decisions.

Re: Valid points but some gaps to address by Jean-Jacques Dubray

Steve,

I fully agree with your statement. I would say at a minimum, a CDM must define for each data element which Systems are:
a) system of truth
b) system of record

otherwise consistency will suffer greatly (as the organization grows and the footprint of information systems do to).

Without proper tooling, I would not encourage trying to connect integration with CDM / CIM (say at the message format level).

Depends how you do it and who is managing it. by Om Soni

I drive data delivery strategy for large US company. We had same view presented by many when I proposed the canonical data model. But we are running it very successfully. You don't need to push the model on other teams and force fit for every use case or database. I agree there should be no centralized data modelling at all. We build our Enterprise Data Cache, Semantic Layer and Data APIs strategy on top of CDM and we have done it very successfully.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

16 Discuss
BT