Building a Data Maturity Model for Data Governance
On the second day the concept of natural data maturity model progression is introduced:
What is the lowest maturity level of data in your dataset? Your answer could be something along the lines of, "Unreviewed, unmodeled, no metadata, have no idea what it is", "It is in our corporate data model but no information other than field name, not in metadata", or "Its in our model, we have some old definition that we can no longer consider reliable".
This may take some time to do, but it is important to establish this baseline. The main things to capture are:
1. Is it in your datamodel?
2. Do you have metadata for it?
3. Do you trust the information in the metadata, if it is in there?
The third blog entry flips the focus to the lowest maturity to the highest, and then attempts to bridge the gap by using consistent terminology:
What we are looking for here is if there are any natural progressions you can see in the data as it stands today. Starting from your lowest level, what is the next step-up in maturity that you already see? If your first level was "Unmodeled, No metadata, no idea what it is", the next step you see in your data could be, "Its in our datamodel but we have no supporting information on it"... Rather than creating a data maturity model and forcing your data to fit into it, we are letting the various stages the data is already in define the maturity path.
In the forth blog entry working templates for a maturity model are provided, and both entry 4 and entry 5 walk you through appropriately customizing the maturity model for your organization.
In essence, I want you to take what you did on Day 1 and write down the complete opposite. This should help you identify the highest maturity level. So if your lowest level is "Unmodeled, Unreviewed, No metadata" then the highest optimum level would be "In Datamodel, Reviewed and Governed by the Data Governance Council, Metadata verified and up to date". What this does is keep your maturity model framed around the same items. If you talk about your data model in your lowest level, you should talk about it in every other level, including the highest.
First things first, get the following out: Your data governance maturity model template (filled in) and the in-scope data for your program. What you are going to do is take a sample of your data and make sure that you can easily find the position of that data on the maturity model. If I were doing this again, I’d randomly pullout about 40 fields and go one-by-one through them. I’d look at the field, check the model, check if their is metadata, etc., and see if it falls into a level. You need to make sure that all of the data fields fits somewhere on the maturity model… if it is questionable you may not have defined your levels clearly enough. If it falls right between two levels, you may need to define a new level to account for the difference, or incorporate the characteristics into one of your existing levels.
Coping with "Model Natural Progression" using the technology of the day.
Modern technologies are great. A lot of adoption builds parts that can be used independently of the suggested framework.
The "Natural Progression" that you speak off, might be akin to how systems morphed RPC style activities to Synchronous/Asynchronous Message oriented activities.
In a B2B sense WSDL is way too heavy to manage when the same infrastructure provides XML Schema marshallers that themselves can carry the interface definitions.
In effect you are transmitting a reference into a HashMap, where there is no requirement for the client to change any explicit endpoint reference.
Additionally the message oriented version of a function call can relay its own system dependent hash index, as a performance option as well.
It is far more manageable to work at the XML Schema level than the WSDL level. All sorts of Enterprise problems disappear approaching the problem with XML Schema itself.
In fact this model is perfect for REST. And implementable for Web Services.
Still building arrays of schema's to transmit and recieve is not as fast as binary mechanisms doing exactly the same activities. So mapping this methodology between XML Schema and existing binary IDL's would be a good gap to fill.
Using ORM to guide data maturity models.
InfoQ Sep 01, 2015