SOA’s Role in the Emerging Hadoop World
Is it possible to marry the new favorite data-as-a-platform, Hadoop, with SOA, which is somewhat falling out of favor? According to Joe McKendrick’ recent post, such marriage will be very beneficial, especially to SOA due to data-as-a-platform’s ability to greatly simplify data integration:
Data-as-a-platform, supported by Hadoop, addresses concerns that SOA practitioners have had for years. As Akred put it, for too long, many enterprises have been attempting to sort through increasingly complex spaghetti architectures with point-to-point data integration. "They get to the point where when they want to introduce a new product or make a change, they have to touch 30 different systems," he said. "That has real consequences in the marketplace for enterprises and their ability to adjust to market conditions and succeed."
So the proposed solution is to start leverage Hadoop as a cross-application data store:
Rather than organize data stores around applications, which are then awkwardly integrated as new applications come along, the Data-as-a-Platform approach maintains data as a cross-enterprise resource.
During his presentation at "Hadoop Tuesdays", Akred presented the following vision for Hadoop enabled SOA:
We take the data infrastructure layer, and take data stores like Hadoop, and the existing enterprise systems that give that data valuable context and integrate those at the data layer. And we abstract that integrated data platform from the consuming applications via service-oriented data access patterns. So we’re exposing our enterprise data platform to the enterprise via services rather than direct query access.
This couple of suggestions sounds like good ideas, but it is based on a very strong assumption that something like Hadoop can be used as an operational data store for enterprise applications. This is something that Hadoop was never designed for - it is NOT a database with ACID properties that are typically required for enterprise applications. Even though HBase is typically labeled as a database - it is not, in the sense that this term is used in enterprise applications.
Hadoop plays its important role in the enterprise providing support for storage and processing large volumes of data, but it was never aimed to replace databases when it comes to the highly transactional data access.
It does not mean that SOA principles are not applicable to Hadoop - based solutions. Individual Map Reduce jobs or a combination of them (leveraging Oozie) can and should be exposed as services that can be used by business processes within an enterprise, but those will be functional, not pure data services.
So Hadoop and SOA can live in perfect harmony, but not quite in the way described in McKendrick’s post.
I think Boris may have read too much in to John 's statements...
As I read John's comment's I interpreted it to mean that he was proposing that one can gain access to the data stored within Hadoop (HDFS or HBase) through a SOA rather than having to write specific access methods.
It seems to me that both Boris and John are in violent agreement. (JMHO...) :-)
Data as a Platform and Data Platform as a Service