10 tips on how to prevent business value risk
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.

Posted by Sourav Mazumder on Apr 21, 2010
By the virtue of being an Enterprise Architect, I'm always in search for new promising concepts/ideas, which can potentially help my enterprise customers across different industry verticals. With the same quest in mind I had also been following the space of NoSQL for a while, even before the term got coined (or miss-coined?). Google put the first brick on the wall by disputing the popular belief of silver bullet RDBMS with the publication of their Big Table architecture subsequently followed by Amazon's paper on Dynamo. Last one year or so we saw a huge NoSQL momentum through explosion of more than 25 products/solution in this space along with the increasing mindshare across different corners of the industry. In that pretext recently I was thinking to take a deep dive on this to evaluate how exactly my clients can get benefited out of this NoSQL movement. More than that, I wanted to find out whether this is the right time for enterprises to give a serious thought about starting adoption of the same.
Neo4j is a robust, high-performance, scalable graph database. It is the only NOSQL database that solves the complex, connected data challenges that enterprises face today.
Like many others who follow this space, I do not like the sense of opposing SQL inherently associated with the term NoSQL. Neither I like the current improvisation of the name, 'Not Only SQL'. To me what all we are talking here is not about whether to use SQL or not. (On the contrary, one may still decide to use SQL like query interface (without support for join, etc.) to interact with these databases just to manage the development scalability and maintainability with existing resource skills.). This movement is rather about figuring out what are the other efficient options of storing and retrieving data instead of blindly taking the RDBMS approach as de facto for anything and everything. And hence to me 'Non Relational Databases' is a better name to summarize the idea.
Whatever may be the name, the scope of 'Non Relational Databases' is little open (and negation oriented)with a 'catch all' type connotation implicit to it. That in turn makes people (especially the enterprise decision makers) confused about what is there and what not and more importantly why it makes sense for them.
Keeping that in mind, here I try to capture the spirit of 'Non Relational Databases' through the below mentioned characteristics.
The'Non Relational Databases' are the ones which
The variations around these four characteristics (Logical Data Model, Data Distribution Model, Data Persistence and Interfaces) of 'Non Relational Databases' are very well covered in some of the recent articles widely available over the Internet. So Instead of detailing the same I summarize the key aspects with some examples for a quick reference –
Interfaces– REST (HBase, CouchDB, Riak, etc.), MapReduce (HBase, CouchDB, MongoDB, Hypertable, etc.), Get/Put (Voldemort, Scalaris, etc.), Thrift (HBase, Hypertable, Cassandra, etc.), Language Specific APIs (MongoDB).
Logical Data Models–Key-Value oriented (Voldemort, Dynomite etc.), Column Familiy oriented (BigTable, HBase, Hypertable etc.), Document oriented (Couch DB, MongoDB etc.), Graph oriented (Neo4j, Infogrid etc.)
Data Distribution Model– Consistency and Availability(HBase, Hypertable, MongoDB etc), Availability and Partitionality (Cassandra etc.). Consistency and Partitionabilityis a combination where Availability of some of the non-quorum nodes is compromised. Interestingly none of the 'Non Relational Database' today supports this combination.
DataPersistence–Memory Based (e.g. Redis, Scalaris, Terrastore), Disk Based (e.g., MongoDB, Riak etc.), Combination of both Memory and Disk(e.g., HBase, Hypertable, Cassandra). The type of storage gives a good idea of what type of use cases the solution can cater to. However, in most of the cases people find that the combination based solution is the best one. They cater to the high performance though in memory data store and also ensure durability by storing the data into disk after enough writes have happened.
In today's enterprises not all use cases lend themselves intuitively to RDBMS,neither they need the strictness of ACID property (especially the Consistency and Isolation). Gone are the days of 80s and 90s where most of the data stored in an organization databases were structured, had to be generated and accesses in controlled manner and were 'records' of business transactions. Unarguably those types of data are still there and will continue to be there and should always be modeled, stored and accessed using RDBMS. But what happens to the large volume of uncontrolled, unstructured, information oriented data explosion happened in enterprises in last 15 years with the advent of web, digital commerce, social computing etc? Enterprises really don't need RDBMS to store and retrieve them, as the core characteristics of RDBMS do not fit with the nature and usage of this data.
The above figure summarizes emerging patterns in Information Management in today's web centric enterprises. And the 'Non Relational Databases' are better choice for handling these trends (compared to RDBMS solutions) given their support for unstructured data, horizontal scalability through partitioning, high availability support etc.
Here are some examples of use cases supporting the point –
Log Mining – Server Logs, Applications Logs, User Activity Logs get generated in multiple nodes of a cluster. For production problem solving Log mining tools are handy which can access logs across servers, relate them and analyze them. Custom solution can be built easily for this using'Non Relational Databases'
Social Computing Insight–Many enterprises today have provided their users (Internal users, Customers, Partners) ability to do social computing through message forums, blogs etc. Mining those unstructured data they are finding of utmost importance to get an idea of user mindshare to further improve the services. Use of 'Non Relational Database' is a perfectly good fit for addressing this need
External Data Feed Integration– Many cases enterprises need to consume with the data coming from their partners. Obviously, even after number of discussions and negotiations, enterprises have little control on the format of the data coming to them. Also, there are many situations where those formats change very frequently based on the changes in business of the partners. 'Non Relational Database' can be used vey successfully to solve this issue while developing/customizing a ETL solution.
High Volume EAI – Most of the enterprises have heavy volume traffic flowing through their EAI system (either product based or custom developed). These messages flowing through the EAI need to be typically persisted for reliability and audit purpose. Again 'Non Relational Databases' can be good fit as underlying data store for this scenario given the variation in data structure of the source and target systems as well as given he volume in question.
Front end order processing systems– Given the explosion of digital commerce the volume of orders, applications, service requests flowing through different channels to the systems of Retailers, Bankers and Insurance providers, Entertainment Service providers, Logistic providers etc. is enormous. Also owing to the restrictions and behavior patterns associated with different channels, the structures using which the information is captured typically little different in each cases and needs different type of rules imposed. On top of that, most of these requests data don't need immediate processing and reconciliation at the back end. Rather what needed is that these requests needs to be captured without any interruption whenever end user wants to put this forward from anywhere across the world. Later on typically a reconciliation system updates them to the source of truth back end systems and update the end user on the order status. This scenario is another one, where 'Non Relational Databases' can be used for initially storing the inputs from end users. This scenario perfectly lends towards use of 'Non Relational Databases' given the characteristics of high volume, differences in input data structure and acceptability of 'Eventual Consistency' during the reconciliation.
Enterprise Content Management Service – Content Management is now used enterprise wide across different functional groups Sales, Marketing, Retail, HR for the various purposes. And most of the time the challenges are faced by enterprises to bring together requirement of different groups in a common content management service platform in terms of difference in meta data structure. 'Non Relational Databases' is a good fit to solve this problem also.
Merger and Acquisition– Enterprises face huge challenges during M&A as they need to consolidate systems catering to same functions. 'Non Relational Databases' can be used to solve this problem either to quickly put together a temporary common data store or even architecting the future data store, which can accommodate structure of existing common applications of merging companies.
But how exactly we can articulate the business benefits of using the 'Non Relational Databases' over traditional RDBMS solutions? Following are some key benefits, which can be drawn from the core characteristics of Non Relational Databases' (as discussed in previous section), along the line of core parameters of any enterprise IT decision – Cost Reduction, Better Turn Around Time and Superior Quality.
'Non Relational Databases' can help in creating Business Agility in two basic ways.
In today's Enterprise IT the quality of applications is primarily decided by end user satisfaction. 'Non Relational Databases'can help achieving the same by addressing the following concerns of end users, which are the most frequent and difficult to handle.
In today's competitive market place where enterprise IT expenditure is scrutinized every now and then, achieving the right quality at right cost is the mantra. 'Non Relational Databases' out perform the conventional databases in that area to a considerable extent, especially when the data volume to be stored and handled is high.
Irrespective of all these long term benefits there are surely challenges at hand of enterprises before they can embrace the 'Non Relational Databases'.
Apart from the high level resistance due to existing mindset and lack of confidence the top tactical challenges I see today are –
Though it is easy to prove theoretically that not all of enterprise data need a relational and ACID based system, the years of bondage between RDBMS and enterprises makes it difficult to decide which all data can go little loose towards non relational solutions. Most of the time IT managers (and other ground level people with core bottom line responsibilities of the applications) don't have clear idea of what all they are going to loose and that apprehension makes them adverse to moving away from RDBMS. Data is the most valuable asset of Enterprise IT. Soability to take a decision for managing the same witha solution which is not that clear or widely used need a different type of mindset as well as big support (and push) from senior management.
The next biggest challenge is to identify the right product/tool to be used as a provider of 'Non Relational Databases'. As mentioned before, in today's industry there are more than 25 different products/solutions available with different characteristics across 4dimensions. Since every product has different characteristics in these 4 dimensions it is typically very difficult to select 1 product, which may address all needs. Sometimes it has even lead to use of multiple types of non-relational database across different groups of enterprise and eventually people turned around towards RDBMS for sheer need of standardization.
This thought essentially stems out from the previous one. If an organization needs to use multiple non-relational database solutions (due to fitment issue of one)ensuring economy of scale in terms of skills (developer, administrators, support personnel), infrastructure (hardware cost, software licensing cost, support cost, consulting cost), and artifacts (common components and services) is a big question. This aspect when gets compared with traditional RDBMS solution the issue looks to be really significant as most of the time organizations run their datastores in a shared service mode.
Given the formative state of the 'Non Relational Databases' world it is very intuitive to anticipate that in coming years there would be many changes in this space in terms of vendor consolidation, feature advancement and standardization. So the better strategy for an enterprise would be not to bet on a particular product/solution available today so that they can move to the better and proven product of future easily. Now given the current product/solution landscape of non-relational products, which mostly work in a proprietary way, Portability becomes an important issue to be considered before IT decision makers can start venturing out in the 'Non Relational Databases' space. This is for the sheer need of protecting their current investment.
Not many of the 'Non Relational Databases' today have a support solution in place through external organizations. Even those, which have one, cannot be compared with the big names like Oracle, IBM or Microsoft. Especially the support around data recovery, backup and ad hoc data fixing is always a big question in the mind of enterprise decision makers, as many of the'Non Relational Databases' don't provide a robust and easy to use mechanism towards these problems.
In comparison to the Big Iron RDBMS solutions the 'Non Relational Databases' typically provide very less data on their performance and scalability characteristics. I'm yet to see any benchmarking figure from the min TPC or equivalent places. This puts enterprise decision makers in a 'no clue' situation where they don't know how much money they need to spend on hardware, software license, infrastructure management and support. This is a big hindering factor towards deriving a budgetary estimate. Hence most of the times at the initial stage itself the decision goes in the favor of the known RDBMS based solutions.
Sometimes, even if the numbers are available, they may not be sufficient enough to feed a TCO model to compare typical RDBMS based data store and non-relational data store for an overall (Capex+Opex) cost analysis. Many a times the high number of hardware boxes (along with software license cost, support cost) required in a horizontal scalability situation does make people more jittery at a first glance compared to vertical scaling based solution unless the benefit is substantiated with an overall comparison based on TCO model.
So does that mean that the enterprises should better watch and see the NoSQL movement at this point of time? Not really. It is true that the 'Non Relational Databases' today is in a nascent stage for a large-scale adoption by enterprise. But the sheer potential of 'Non Relational Databases' to frame the enterprise of future should not be missed out. This is especially true given the fact that enterprises in near future going to deal more with high volume of semi-structured/unstructured and eventually consistent data rather than significantly low volume, tightly structured, ACID abiding data.So what is important today is at least to start developing the mindshare within the key stakeholders of enterprise on the need of using 'Non Relational Databases' for enterprise data handling. In that journey, taking some incremental steps towards 'Non Relational Databases' around key aspects of Enterprise IT (Technology, People and Process), is going to make sense. That can help in holistically addressing the challenges we identified before in a slow but steady way.
There are plenty of choices available today in the market, which can deal with different dimensions of 'Non Relational Database' solutions in different ways. At the same time the use case scenarios of an enterprise may demand different type of characteristics. But going for different solutions for different applications/usage scenarios will not work for an enterprise from the perspective of the economy of scale. So it is better to settle for one depending on the target applications. Please remember that most of the solutions give some work around for the features, which are otherwise available in other products and have a placeholder for the same in the roadmap. Also most of the products will attend some maturity in near future where they can provide different solutions through configuration. So as long as a solution can cater to majority of the need it can be an option to start with.
The thumb rules for selection of product/solution are
Here is a comparison of a set of short-listed 'Non Relational Databases'. This can be a good starting point forenterprises that are thinking of serious adoption right now. To make a sense at Enterprise context, while short listing the subset from the huge superset of 25+ choices the filter criteria primary used are –
With these filter criteria the ones I could short list for an enterprise to use right now are MongoDB (The shard support is coming shortly in next version), Riak, Hypertable and HBase. Following table summarizes the key characteristics of these four options. An enterprise based on its own detail requirements can think of using any of these four options, which has characteristics most fitting to the need.
|
Features |
MongoDB |
Riak |
HyperTable |
HBase |
|
Logical Data Model |
Rich Document with support for Nested Document |
Rich Document |
Column Family |
Column Family |
|
Support for CAP |
CA |
AP |
CA |
CA |
|
Dynamic Addition/Removal of Node |
Supported (Coming shortly in next release) |
Supported |
Supported |
Supported |
|
Multi DC support |
Supported |
Not Supported |
Supported |
Supported |
|
Interface |
Variety of Language specific APIs (Java, Python, Perl, C# etc.) |
JSON over HTTP |
REST, Thrift, Java |
C++, Thrift |
|
Persistence Model |
Disk |
Disk |
Memory + Desk (Tunable) |
Memory + Desk (Tunable) |
|
Comparative Performance |
Better (Written in C++) |
Best (Written in Erlang) |
Better (Written in C++) |
Good (Written in Java) |
|
Commercial Support |
10gen.com |
Basho Technologies |
Hypertable Inc |
Cloudera |
Building a separate abstraction layer for accessing data from the 'Non Relational Databases' is a must to do. It will provide benefits in number of ways. Firstly application developers can be completely insulated from the underlying details of the solution. This will help inscaling in terms of skill. This will also help in easily changing the underlying solution in future if needed. And this can be also used to cater to requirements of multiple applications in a standard way (a la SQL without the complex features like Join, Group By etc.).
Irrespective of whatever solution is chosen, modeling the scalability and performance characteristics of the same using standard techniques (like Queuing Network Model, Layered Queuing Network etc.) is highly recommended. It will provide necessary data which can be used for basic server sizing and topology and also for overall cost for software support licenses, administration etc. This will essentially become the primary data for all budgetary purpose, which will help in taking decision.
There is no other way than replicating data in some backup server to protect any data loss. Though many of the Non Relational Databases' provide automatic replications but they also have the probability of single point of failure of the master node. So it is better to protect your data at a secondary backup and also have a set of scripts ready for data recovery and automatic data fix. It is therefore important to understand the physical data model of the target solution and identify the options for possible recovery mechanisms and evaluating whether those options fit well the overall enterprise requirements and practice.
Like common shared service RDBMS databases, common data service for 'Non Relational Databases' can be built to achieve economy of scale in terms of infrastructure need and support need. This will also help in evolving and changing it in future for betterment. This should be the final goal in the wish list as the maturity level to achieve in mid term or long term. However, having this in vision from the initial days will help in taking right decision in the overall journey.
Every organization has a set of people who has zeal towards learning new and non-conventional things. Forming a group with such hand picked people (full time and part time) to keep a tab on what's going on in this space, known issues and challenges, next generation thinking will help in providing direction to the projects which use this technology. Also, this group can help decision makers demystifying the hypes and providing them with actual data points.
After adopting a product what will make sense is to develop relationship with product community to help in each other being successful. Most of the 'Non Relational Databases' of today has a vibrant community who are more than eager to help others. A thriving relationship between the enterprise and the community will help both the party in a win-win way. Knowing the problems and solutions before hand can benefit Enterprise in taking decisions on some features or versions. Also enterprise can influence the product roadmap with features, which make sense for them as well as general community. On the other hand the community can know the actual ground level issue to make the product robust and feature rich. Also the success stories with big enterprises will help them to be ahead in the curve.
Given the relative maturity of 'Non Relational Databases' the only way to adopt the same with minimal risk is following the Iterative development methodology. The vision of building a common Data Service Platform for Non Relational Databases' along with standardized data access abstraction is not going to happen in a big bang way. Rather working in an iterative and refactoring oriented mode will help better in achieving the same. In this type of technology journey with less matured solutions, changing the solution in mid way is not very uncommon. Also the agile way of seeing things helps creating mindset for absorbing the reworks both for the management as well as implementers.
However, to Go Iterative with this problem it is very important to define a set of decision criteria matrix. For example guidelines (and examples) providing direction whether the object model of an application fits well with RDBMS or Non-RDBMS space, guidelines for infrastructure sizing, list of mandatory test cases etc.
In Enterprise adoption of 'Non Relational Databases' the biggest challenging task is changing the mindset of enterprise decision makers - making them believe that not every data/objects are suitable for RDBMS. The best way to prove that is Trying Out'Non Relational Databases' for the right type of use cases demonstrating how 'Non Relational Databases' can be a more effective solution compared to RDBMS if used in the right context. Identify few 'not so business critical'(but high visibility)projects where Non Relational Databases' can be a good fit. The success (or even failure)of these projects will help change the mindset. That will as well help in learning more about what needs to be done differently for adopting 'Non Relational Databases' in a better way.These baby steps around Trying out are indeed the need of the hour if enterprises want to reshape their information management world in the near future using the 'Non Relation Database' technologies.
Sourav Mazumder, currently works as Principal Technology Architect for Infosys Technologies Limited and has more than 14 years of experience in Information Technology domain. As a key member of Technology Consultancy group of Infosys, Sourav has worked for key clients of Infosys in USA, Europe, Australia and Japan in various domains like Insurance, Telecom, Banking, Retail, Security, Transport, and Architecture/Engineering/Construction industry. He was involved in Technical architecture and Roadmap defnition for Web Based applications, SoA strategy implementaion, Internationalization strategy definition, UI Componentization and Performance Modeling & Scalability Analysis, Unstructured Data management. Sourav's association with Infosys' own Core Banking product, Finacle, provided him with an extensive product development experience also. Sourav was also involved in developing reusable framework for J2EE applications in Infosys and defining Infosys' software engineering methodology for architecting and designing custom built applications. Sourav's experience also includes ensuring Architecture Compliance and Governance for development projects.
Sourav is an iCMG certified Software Architect as well as a TOGAF 8 certified practitioner. Sourav recently presented in Berkeley Globalization Conference of LISA. Sourav's latest white paper on SoA has become immensely popular among various reading communities.
Sourav's current interest area includes NoSQL, Web 2.0 Governance, Performance Modeling and Globalization.
One of the things I'm wondering about is how a developer should deal with relaxed consistency. Normally you configure your transaction and get some form of consistency, but with these NoSQL databases, consistency is much more unclear.
I know from experience that writing algorithms that use relaxed consistency is very error prone, because there is a high chance of race problems (non blocking algorithms rely a lot on relaxed consistency and they are notorious hard to write). But how should mainstream developers deal with it?
I also know from experience that most developers don't know/care about concurrency and databases, but since more problems are possible with NoSQL solutions (there is no consistency check that is going to protect you) I would expect that some form of design guidelines must be available.
Peter Veentjer
Multiverse: Software Transactional Memory for Java
multiverse.codehaus.org
PS:
I certainly think that the NoSQL movement is a very interesting one.
The following papers describe simple ways of managing relaxed consistency:
1. Pat Helland's Life beyond Distributed Transactions: an Apostate’s Opinion is a usefull paper describing on how to deal with eventual consistence systems.
2. Amazon's CTO Werner Vogels blog post on Eventual Consistency
3. Gregor Hohpe's famous paper "Your Coffee Shop Doesn’t Use Two-Phase Commit"
The truth is, as soon as a team is using hand crafted solutions, such as manual sharding/partioning of Sql Databases, the team is already handling relaxed consistency in an ad-hoc way. Therefore, it is not such unusual concept for such developers.
The biggest problem I have with NoSql DB is the lack of relationship capability between entities (i.e. foreign keys).
I can definitely see the benefit in some scenarios though.
ps: Let's not forget Windows Azure table in the comparison of solutions
Peter,
Thanks for your comment.
In my opinion the implementation of Eventual Consistency or Relaxed Consistency should be coming from the NoSQL solution provider (like the NoSQL solutions I mentioned in my article) with configuration options. So developers should not bother about that at all except setting the configuration according to the business need (how much eventual consistency is acceptable).
That is the feature I expect from these solutions which can allow the designer developer to switch between 3 parameters of CAP based on business need and providing option to control each of them at granular level.
Regards,
Sourav
Clem,
Surely we can put Microsoft's Azure Table in the NoSQL group as it conforms to the some of the basic characteristics of NoSQL databases as I outlined in the paper. The architecture they follow is very close to combination of Amazon's dynamo and Yahoo's PNUTS. I didn't cover it in the paper as I excluded all licensed NoSQL solutions (like BigTable, Dynamo etc.)
Regards,
Sourav
What about XML DBMS like eXist and MarkLogic. Which of your categories do they fit into?
Any idea if there is an equivalent way to do this? How do you store "lookup" data containing master records such as SKUs, Regions, etc?? I see you could easily retrieve, group, and filter elements that meet certain criteria, as long as the data was used at least once. What about if I wanted to retrieve all sales by cities, where some of them may or may not have any sales? I've seen some examples where they store say Regions, or Products into the application's memory, but that doesn't seem appropriate, specially when you're dealing with lots of possibly related data. It's not the same to display a drop down menu from a list of countries stored in a regular array, than to select a product number from a huge table of product ids.
That brings me to a big question. Is there a way to switch to NoSQL without relying completely on RDBM? I'm wondering if maybe the best choice is to make both coexist (I mean, in the long haul).
Has anyone taken a look at the database called "Cache" - a post-relational database; stores data in multi-dimensional arrays. It also provides relational and/or object views if and when needed;
primary use - healthcare EMR;
I come from an RDBMS app programmer background, and while this article lays out how NoSQL works/is architected, I still find myself asking "why?" on a more detailed level. I'd really like to see a case study of how an example system (e.g. a data warehouse for some application) would be "traditionally" implemented in an RDBMS solution, and then compare that with a NoSQL implementation, what are the merits of both systems?
Do you know of any articles along those kind of lines?
There is one category of NoSQL solutions that are focused on interconnected data: graph databases and you can find quite a few solutions in this space
In case you are not looking for an enterprise solution, here is some good coverage of how and why Twitter moved from using MySQL to Cassandra.
Perfect, thanks!
Is there a simple example of how eventual consistency is achieved. I am assuming scenarios of overwrites or conflicts of data between partitions - how do you reconcile the data (or determine the winner).
Consistency problems could elegantly be solved in combination with functional paradigms.
For example, a predicate P(x) may not to be true all the time. But a function may define that when P(x) is true, Q(x) is evaluated.
Nice article. I run the developer community over at MarkLogic - here's how I'd list us in the table above:
What is often overlooked in tables and articles like this is a reference to how 'full text search' is integrated. This can be a critical piece of information for architects when deciding on NoSQL, SQL, or other storage infrastructure for your application. With MarkLogic, the NoSQL document store _is_ a transactional, real-time search index as well.
Best,
Eric Bloch
developer.marklogic.com/
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.
John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.
Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.
Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.
Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?
Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.
15 comments
Watch Thread Reply