Article: Bridging the gap between BI and SOA
In "Bridging the gap between BI & SOA" we take a look at the challenges in building a decent BI solution within SOA constrains. In a nutshell the main problem is BI requires data. The traditional way to get that data is using ETL, which means getting to directly to the different data sources (databases etc.). This conflicts with SOA's view of the world which promoted decoupling and hiding internal data structures behind contracts.
The article suggests using that combining the the aggregated reporting pattern and fusing EDA with SOA can solve the BI and SOA mismatch. EDA helps bring the data out of the services by publishing. it in a structured and decoupled way and the aggregated reporting allows building the data store to load the data into data warehouses.
The article also suggest a way to deal with eventing using Request/Reply which is the prevalent message exchange pattern in SOAs.
Yet another alternative
I am not sure that the service being aware of BI is necessarily the most flexible solution. Definitely, services should publish and subscribe to events but the event dispatcher should act as a coordinator pattern, I doubt that services can be designed to produce/respond to all possible interesting events associated to their message flow and that they could be directly wired with each other at the event level.
ETL vs SOA, real-time etc
When an event comes in it has to be added to a number of different aggregate tables. To have a consistent view of data you need transaction isolation for both the front-end queries and back-end queries on the intermediate data store. All of a sudden a new bunch of technical problems have been introduced that are not standard to BI implementations. There is also a training cost, labour cost etc.
The cost of hardware though is changing the options available: the cost of memory and CPUs is going down so fast, in-memory storage of pre-aggregated data is feasible for many implementations in the mid-market and even some in the high-end (despite the amount of data produced growing faster than that used for major companies). We don't care about the power going out most of the time because it's BI and not transactional. This pushes towards products that bundle custom hardware and the new complexities of the software into an appliance. After all use the price of the new hardware and simplify the traditional technical problems with BI while shielding the IT shop from the new ones is a good move. Not surprisingly there are already products doing this (Cognos bought Celequest for example).
The downside is that what you get is real-time BI: but most BI doesn't need real-time yet; you need to clean up the spaghetti: yet most companies still have spaghetti and they're still going to want a cheaper band-aid than surgery.
My two cents anyway.
Re: ETL vs SOA, real-time etc
The main point here is that events will let you pry the data out of the services without resorting to doing something specific to BI or any other SOA violating technique
This doesn't mean that you have to update your datamart (or datawarehouse) for each event - you can, within the component responsible for the BI, create batch files with all the updates and employ "common" ETL on that. You also have the added benefit that you can do real-time BI (BAM, KPIs etc.)
Re: ETL vs SOA, real-time etc
I see your point. Technically the only problem is that you have an overhead per-transaction in the operational system. I see the problem being much more a business/budgetary problem than a technical problem.
You are pushing SOA and have built a ROLAP engine (yes I read your CV). I work in France and have done many BI projcets in South Africa, France, Australia and the UK. SOA is almost non-existant for BI projects as a source of data. SOA in the BI market that I've worked in is almost purely hype.
The software companies who say to their customers "we use SOA" mean one of : a) they can consume services to read data (e.g. EII, EAI etc) b) they have exposed services from their platform so it can be integrated into larger projects or c) their internal components to their platforms work between each other via services.
The first two may have some business benefit for the customer (the third is just a maintanability issue for the vendor), but the main point of SOA as I understand it is to make the IT shop more agile so it can respond more quickly and with less budget to changing business needs. Almost no one has addressed this need in the BI market because they try to sell something that will work on top of existing architecture. SOA became for a while a buzzword that you had to be able to use to get past a certain stage in the sales process but not much more.
Now the main problem with the approach you've outlined is not technical its IT strategy - lots of IT shops want to have SOA but the business won't give them the budget to do it because you can't prove ROI. BI typically falls into a very different budget. So from an CIO perspective, someone who talks about SOA and BI together is either going to be visionary but only implementable on new systems at a departmental scale, or far too complex and costly to implement. No matter what the technical merits. Teradata is pushing SOA as the basis for its Active Data Warehouse/Right-Time Enterprise and they try to only target enterprise-wide data warehouses. They've got quite a few good examples of business benefits. But they are clearly struggling because of the huge cost barrier (time and money) to having the SOA fundamentals in place to be able to then build the BI.
We've got a Real-Time BI offer at my company but there is hardly anyone who actually needs Real-Time (or Right-Time) yet. This could be a problem of the customer's perspective and we need to demonstrate real ROI benefits. We've got one major project that does do that, but one is not enough. It's really hard without having some significant projects under our belt. So we get a vicious circle. Maybe the best place to start IS small scale - one business line at a time? Any thoughts welcome because at that point it becomes more a marketing issue.
Re: ETL vs SOA, real-time etc
SOA is not popular as a source of data for BI since it seems (to me at least) that just now the hype is starting to fade and real SOAs are starting to emerge.
I am sure that when companies move to an SOA they face the BI dilemma I've seen it - and I've also seen that happen for reporting.
I just try to offer a way to handle that situation
Note that I didn't suggest building the bulk of the BI solution as SOA (which may or may not be a viable option - but it is as orthogonal question) just how to make SOA and BI work together when you need to have a solution that integrate both.
Also note that web-service != SOA so slapping some web-service interface on a product or a bunch of web application does not mean you have an SOA, instead you get Just a bunch of web services
AS for Real-Time BI -As you said, it is basically about building a value proposition.For defense systems it is pretty obvious a big win but I've also seen it in other areas such as media companies (cables/satellite) where they care a lot about their KPAs and the freshness of their data. Airlines also come to mind etc.
Data Services Using EII
Here's a more detailed entry I wrote on the topic blogs.ipedo.com/integration_insider/2007/05/fus...
Re: Data Services Using EII
EDA is a great way to deal with fine-grained data updates and get very low coupling through canonical event formats and event ontologies.
In the open-source world the Esper project (which I co-lead) is processing push-data in an EDA, please visit at esper.codehaus.org.
Very nice article...
I enjoyed your article and I believe it will be a common way to architect solutions in the future (how far in the future remains to be seen). My company, MetriWorks, has developed a product intended to make the approach you described a configurable bolt-on intermediary for web services. I can comment on how we address the performance overhead concerns, as our product has a very light footprint on the web service call. It only takes an in-memory snapshot of the raw web service data, pushes it onto a queue and then allows the web service to continue. The heavy processing of the raw data is handled in our server process asynchronously with the web service processing. The raw data can also be routed to a different physical server for processing in order to reduce overall system load on the web service server.
One common problem we do have with this approach is that, many times, the service does not contain all of the related data that might be needed. For example, in your Order Service, perhaps a "Customer ID" is passed in the service where you would really like to have the "Customer Name" and the "Customer Credit Rating" information to include in the alert. Of course you can write custom event handlers to do a lookup from some other databases. But, I would like to hear if there are any preferred patterns or suggestions for better ways to handle this cross-reference data lookup requirement?
Re: ETL vs SOA, real-time etc
I agree with your points;
the main point of SOA as I understand it is to make the IT shop more agile so it can respond more quickly and with less budget to changing business needs. Almost no one has addressed this need in the BI market.
I think many got confusion with misunderstanding with what SOAP/XML things can do, which SOA supposed necessarily not to be bound. BI covers far more than simple data or tables. It involves lot more object types than what many see. SOAP/XML SOA will never be prevalent!
Instead, other approaches that provide ROI (as you mentioned) will be successful in the field. SOA based on HTML URL Tags can deliver really robust platforms. For examples, see www.roselladb.com/bi-soa.htm. It is based on html pages. One can easily implement BI requirement by simply writting a bunch of html pages!
Re: ETL vs SOA, real-time etc
Looks like this post is inactive for few months. But inveitably I am in a simmilar situation where it is diffcult to convince BI/ETL technical folks about adopting SOA. My take on it is , it is of the BI/ETL folks benifit to leverage on data services (especially to get data from Apps DB and ODS) they can offcourse expose there ETL job as a service as well. But for the first part where ETL acts as consumer there are unnecessary worries and pre-judice in industry about slowness and performance etc. Did somebody have an actual test result or benchmark to share that the overhead added by SOA is significant ? I dont think many have done any testing before jumping on assumptions.
1 year and 3 month later... CDC and MDM
I also did appreciate Michael's constructive remarks.
One of my customer recently asked himself how he could take advantage of the service orientation (SOA) in the process of setup a new data warehouse from several scaterred databases (BI).
That's how I found your article.
My point is that 15 months later, 2 things have appeared and maybe relevant for this SOA/BI question :
- CDC for Fact Data (Sales, Production, ...) : Change Data Capture which allow to load continuously the data warehouse (which is actually is a form of EDA that you have described). New generation ETL should be able to use this mechanism to detect changes on multiple data sources and to expose the associated information (pull mode) or to publish it.
- MDM for Referential Data (Customers, Products,...) : Master Data Management is really Service Oriented as its purpose is to federate referential data of the enterprise and to expose a unique and consistent 360 degree view of the reference data. At least we have to noticed that one of the MDM Architecture Style, "Consolidation" is a pure BI/DWH model : mono-directional flow from data source to the MDM, MDM Data used for Analytics and reports.
if you still follow this thread, your thoughts are welcome, especially on my customers original question and on the relevancy of MDM and CDC in the game.