Evolution in Data Integration From EII to Big Data
Approaches to integrating data are changing with emergence of cloud computing.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Boris Lublinsky on Oct 31, 2011
Is it possible to marry the new favorite data-as-a-platform, Hadoop, with SOA, which is somewhat falling out of favor? According to Joe McKendrick’ recent post, such marriage will be very beneficial, especially to SOA due to data-as-a-platform’s ability to greatly simplify data integration:
Data-as-a-platform, supported by Hadoop, addresses concerns that SOA practitioners have had for years. As Akred put it, for too long, many enterprises have been attempting to sort through increasingly complex spaghetti architectures with point-to-point data integration. "They get to the point where when they want to introduce a new product or make a change, they have to touch 30 different systems," he said. "That has real consequences in the marketplace for enterprises and their ability to adjust to market conditions and succeed."
So the proposed solution is to start leverage Hadoop as a cross-application data store:
Rather than organize data stores around applications, which are then awkwardly integrated as new applications come along, the Data-as-a-Platform approach maintains data as a cross-enterprise resource.
During his presentation at "Hadoop Tuesdays", Akred presented the following vision for Hadoop enabled SOA:
We take the data infrastructure layer, and take data stores like Hadoop, and the existing enterprise systems that give that data valuable context and integrate those at the data layer. And we abstract that integrated data platform from the consuming applications via service-oriented data access patterns. So we’re exposing our enterprise data platform to the enterprise via services rather than direct query access.
This couple of suggestions sounds like good ideas, but it is based on a very strong assumption that something like Hadoop can be used as an operational data store for enterprise applications. This is something that Hadoop was never designed for - it is NOT a database with ACID properties that are typically required for enterprise applications. Even though HBase is typically labeled as a database - it is not, in the sense that this term is used in enterprise applications.
Hadoop plays its important role in the enterprise providing support for storage and processing large volumes of data, but it was never aimed to replace databases when it comes to the highly transactional data access.
It does not mean that SOA principles are not applicable to Hadoop - based solutions. Individual Map Reduce jobs or a combination of them (leveraging Oozie) can and should be exposed as services that can be used by business processes within an enterprise, but those will be functional, not pure data services.
So Hadoop and SOA can live in perfect harmony, but not quite in the way described in McKendrick’s post.
Free Gartner Cloud Services Brokerage Report
I don't think John was advocating replacing ACID compliant RDBMSs within the enterprise.
As I read John's comment's I interpreted it to mean that he was proposing that one can gain access to the data stored within Hadoop (HDFS or HBase) through a SOA rather than having to write specific access methods.
It seems to me that both Boris and John are in violent agreement. (JMHO...) :-)
Around 2003, batch oriented processing was hit by SOA. And now we have evolved to Everything as a Service. Hadoop creates unstructured data as a platform. It may evolve into a service if there is some compelling insight that can be gleaned from processing of that data. This problem too will be solved using a point to point integration between exising SOA aware web servers and Hadoop.
Approaches to integrating data are changing with emergence of cloud computing.
Michele Ide-Smith presents the lessons learned in the process of introducing UX principles and techniques into a large organization through a series of small steps.
Dave Farley and Martin Thompson discuss solutions for doing low-latency high throughput transactions based on the Disruptor concurrency pattern.
Rajneesh Namta shares his thoughts, experiences, and some of the critical lessons learned while implementing software test automation on a recent Agile project.
Dale Schumacher presents several patterns of actor interaction that can be used in collaborative programs written in any language.
Rúnar Bjarnason discusses Scalaz, a Scala library of pure data structures, type classes, highly generalized functions, and concurrency abstractions to perform functional programming in Scala.
One of the main challenges when designing software architecture is considering quality attributes. Not only their design turns out to be difficult, but also the specification of these attributes.
Michael Feathers analyzes real code bases concluding that code is not nearly as beautiful as designers aspire to, discussing the everyday decisions that alter the code bit by bit.
2 comments
Watch Thread Reply