Author Q&A: Patterns of Information Management
Mandy Chessell and Harald Smith have written the book Patterns of Information Management in which they present approaches to structuring and managing information assets based on their experiences across a range of customers. They use a Patterns approach to identify ways of addressing information problems which are common to many of the organisations they have worked with. InfoQ talked to the authors of the book.
InfoQ: Please briefly introduce yourselves to the InfoQ audience.
I’m Mandy Chessell, an IBM Distinguished Engineer, Master Inventor and member of the IBM Academy of Technology Leadership Team. My current role is the Chief Architect for InfoSphere Solutions in the IBM Information Management CTO office. I lead the design of reference architectures for different industries and solutions, including IBM’s Next Best Action solution, the Big Data Lake solution and Information Virtualization. In addition to my technical responsibilities, I’m involved in initiatives designed to enhance technical vitality with the IBM technical community including mentoring, serving on technical career development and promotion boards along with Women in Technology initiatives. Outside of IBM, I’m a Fellow of the Royal Academy of Engineering and a visiting professor at the University of Sheffield, UK. In 2001 I was the first woman to be awarded a Silver Medal by the Royal Academy of Engineering and this year received an Honorary Doctorate of Science from Plymouth University.
I’m Harald Smith, a Software Architect for IBM's InfoSphere and Information Server products. I’ve had a diverse career specializing in information quality, integration, and governance products and solutions with a long-time interest in patterns and design practices. I write extensively to help our customers utilize our IBM product line, particularly methodology, best practices, and capabilities, and was recently recognized as an IBM developerWorks Contributing Author. In the course of 30 years, my work has spanned software product management, information technology, consulting services, technical support, system auditing, and business process re-engineering to provide a broad view of the many cross-industry challenges with information.
InfoQ: Why did you write this book, what is the problem you set out to solve?
We’ve worked with many clients over the last 10+ years and we found that most faced similar challenges in managing their information. These clients could find books on how to store their information (data warehouses, for instance), or how to cleanse their information (data quality strategies), but it’s been very difficult to find anything that took a comprehensive approach on the information moving through the many information systems and applications in an organization.
So, the journey to writing the book began with the concept of an “Information Supply Chain”, a term used by IBM’s marketing organization to describe the flow of information from varied sources through operational systems, into data warehouses and marts, and finally reaching end consumers through reporting and business intelligence solutions. The concept is similar to that of a manufacturing supply chain producing products from raw materials. In this case, the purpose of an “Information Supply Chain” is to produce certain “information products” (e.g. documents, reports, web pages, or stored collections of information) from the information supplied to it. We then put an architectural perspective on this supply chain metaphor and, in doing so, began to see patterns emerge that we felt architects in our clients’ organizations could take advantage of.
InfoQ: Who is the intended audience for the book?
The primary audience is the enterprise architects, information architects, and solution architects within an organization who are tasked with determining how information can be shared, integrated, synchronized, and managed across diverse information systems and applications. These are the individuals who must identify the multiple forces impacting effective and optimal use of information intended to enhance decision-making, add value, and reduce costs or risks.
InfoQ: What is an “Information Centric Organization” and why does it matter?
An “Information Centric Organization” drives its business on high quality and timely information that is aligned to its mission and goals. We have entered an era where information is now a key competitive resource, and those organizations focused on what they can do and learn with the information they have at hand are finding far greater success than those who do not have this focus. By making the management of information a strategic priority, and by developing systems and practices that nurture and exploit information to maximum effect, an “Information Centric Organization” can exploit analytics to spot new revenue opportunities, drive product innovation, identify patterns to reduce fraud, and mitigate risk.
InfoQ: Why did you choose to use a “patterns” approach – what is a pattern and why are they useful?
There is a lot of complexity to Information Management in any organization, and not just large enterprises. This complexity comes in two forms:
- Inherent complexity – the problems faced are complex (such as fraud detection, logistics, and customer behavior) and require platforms and solutions built to handle multiple information channels to analyze and overcome this complexity
- Induced complexity – there are inconsistencies and gaps in the information supply chains in any organization due to factors such as: separate and distinct lines of business; and architectures and applications from multiple vendors or organizational mergers.
Patterns are one approach to help address both forms of complexity. Information management technology offers many choices to the architect on how they could implement an information supply chain. Each choice is optimized for a specific type of use case. Patterns enable us to compare and contrast the different approaches and where they create the greatest benefit. Other characteristics of patterns include:
- Patterns are written in a natural human language so there is not special tooling required.
- Each pattern assimilates a wide range of information together working at multiple levels of understanding
- A pattern description lays out choices and how to make them, covering both the pros and cons of taking a specific approach.
- Each pattern description provides a worked example and references to known uses, as well as links to alternative patterns.
- The patterns link together into a comprehensive description of the topic area.
- A pattern describes the emergent properties of using a particular approach. These are the properties that emerge when specific types of components are integrated in a particular configuration and typically include non-functional properties such as a change in latency, reliability and consistency.
This last point is particularly important as many of the design decisions related to information supply chains are concerned with emergent properties. A pattern language is well suited to explain the choices, trade-offs and resulting benefits and liabilities each design choice brings.
InfoQ: The book weaves the description of the various patterns around a case study – how does this help the reader understand which pattern to select when and how to use it?
People naturally relate to real-world examples, particularly ones that they can visualize such as an order fulfillment process. We’ve all been customers at stores or online, and have selected and ordered particular products. And often we’ve felt the frustration of disjointed systems where you are recognized as a customer at one place, but not another (or your address is only correct in one system). At the same time, even in these basic examples for a small-scale organization, you can start to understand the trade-offs required to try to work with the information at hand. This allows a natural introduction to the different pattern, including their icons and the forces involved, so that you can make reasoned decisions with your own information challenges.
InfoQ: What are the broad categories of pattern which you cover in the book?
There are seven general categories in which we place the patterns that are illustrated in the figure below.
(Click on the image to enlarge it)
At the top layer is the Information Organization comprised of the characteristics of an Information Centric Organization and the people of the organization who use the information.
The Information Architecture is the middle layer. These are the pattern groups that classify, design and document which information is needed, how it is used, and where it is located. This information architecture forms a living and ever-changing description of the information used by the organization.
At the bottom layer are the components of Information Management. These components address patterns for: Information-at-Rest, Information in Motion, Information Processing, and Information Protection.
Finally, the patterns are brought together in varied combinations reflecting the multiple goals and forces that an organization is addressing to form Information Solutions. These Information Solutions help an organization to effectively manage their information, achieve new insights, and ultimately drive new value for the organization.
InfoQ: Could you give us a few specific examples of a problem a reader may come across and how they would use the patterns in the book to help them solve the problem?
Let’s take the case of an organization that has a data mart of historical information about its customers and the orders they make. The sales teams want to run reports on the data mart that involves issuing queries on the data to summarize which products are being bought by customers from each region. They need consistent, reliable response times from the data mart. The analytics team also wants to use the data mart to perform data mining on the data in order to create new analytics models for the organization. The data mining takes a lot of processing power at irregular times during the day, which impacts the sales team. How does the IT team reconcile the two usage patterns?
They use the Sandbox Provisioning pattern to create a copy of the data for the analytics team whenever they are performing data mining for new analytics. This sandbox is structured in a way that optimizes analytics processing and is running on a different machine. When the mining process is complete, the sandbox is deleted since it was only created for a specific purpose.
This is a simple example, but reflects a common situation that architects need to resolve. How do you minimize the number of copies of data that are created by having different groups of users share a centralized repository while ensuring that the processing that each group is performing does not impact other groups? Sometimes when the processing patterns conflict it is necessary to create a separate copy of the information for one or more of the groups.
The cost of this copy depends on the amount of data, how frequently it is changing, the technologies available to generate the copy, and how much transformation of the data structure is required to support the intended users. These factors are the forces that come into play within the provisioning pattern. Once you have selected the appropriate provisioning approach, then you have to decide which processing pattern to use within the provisioning logic. Will a simple Information Replication Process pattern that copies data without transforming it work? Or if transformation is needed, then a more sophisticated process such as the Information Deployment Process pattern must be selected.
It’s a very iterative approach, and at each level, the pattern descriptions explain the trade-offs that each approach requires.
InfoQ: Will applying the concepts from the book drive the readers towards a particular technology solution, if so why; if not – why not?
The concepts described in a pattern can strongly suggest a particular type of technology as optimal for a given set of desired business outcomes. At the same time, they can also take into account the technology constraints in place in your organization and treat those as forces that point you in a different direction.
For instance, technically it might be optimal to deliver data to an application as messages using an Information Queuing Process and let the application handle any quality issues with the data. That ensures fast delivery and high availability of the data. However, the application may have no capability to handle poor quality data, and bad values could cause the application to crash. In that case, the constraints in the end point of this Information Supply Chain limit the viability of the solution. Instead, an Information Deployment Process such as an Extract, Transform, and Load (ETL) solution that can transform the data en route is chosen – some latency is added, but the solution can still provide fast delivery and high availability as well as cleansing the data.
About the Book Authors
Mandy Chessell joined IBM in 1987. She is an IBM Distinguished Engineer, Master Inventor and member of the IBM Academy of Technology Leadership Team. Her current role is the Chief Architect for InfoSphere Solutions in the IBMInformation Management CTO office. She leads the design of common information management patterns for different industries and solutions. This includes the Next Best Action solution and the strategy for InformationVirtualization. In 2001 she was the first woman to be awarded a Silver Medal by the Royal Academy of Engineering and in 2000 she was one of the "TR100" young innovators identified by MIT's Technology Review magazine. In 2006 she won a British Female Innovators and Inventors Network (BFIIN) "Building Capability" award for her work developing innovative people and the BlackBerry "2006 Best Woman in Technology - Corporate Sector" award. More recently she was granted an honorary fellowship of the Institution for Engineering Designers (IED), she won the "2012 Cisco everywoman Innovator of the Year" and in 2013, Plymouth University awarded Mandy a Honorary Doctor of Science.
Harald Smith is currently a Software Architect with IBM, with a diverse 30 year career specializing in informationquality, integration, and governance products and solutions and a long-time interest in patterns and design practices. His work has spanned software product management, information technology, consulting services, technical support, system auditing, and business process re-engineering, and includes 4 issued patents worldwide. He writes extensively including his current blog "Journeys in the Information Landscape", particularly on methodology, best practices, and capabilities with IBM products and the topics of Big Data and Information Governance, and is an IBM developerWorks Contributing Author. Harald is also an IBM Certified Solution Developer in Information Management.