Big Data: Evolution or Revolution?
Whether you use an RDBMS, hash table or some other structure to maintain your data, you can't fail to have heard something about NoSQL and BigData. Google, Yahoo, Amazon and others have all been developing or using Big Data/NoSQL solutions. But outside of these very specific use cases, are these implementations really generally useful? In a recent article, Cap Gemini's Steve Jones goes as far as suggesting that at times Big Data may be a Big Con, or at least not quite the panacea to legacy RDBMS implementations that some might have you believe:
I'm seeing a lot of 'Big Data' washing going on in the market. Some companies are looking at this volume explosion as part of a continuation of history, new technologies, new approaches but evolution not revolution. Yes Map Reduce is cool but its technically much harder than SQL and database design this means that it is far from a business panacea.
Steve then goes on to suggest how it may not be long before in-memory databases (RDBMS based) will be a reality for significantly useful datasets and sizes. He illustrates this with a reference to an article from a few years back that disussed how Yahoo uses (used?) a heavily modified Postgres implementation to store 2 Petabytes of data:
Here is the point about Big Data: 95%+ of it is just about the on-going exponential increase in data which is matched, or at least tracked, by the increase in processing power and storage volumes. [...] Yes index tuning might be harder and yes you might shift stuff around onto SSDs but seriously this is just 'bigger' its not a fundamental shift.
We've heard similar things in the past from the likes of Mike Stonebraker, who has suggested that many users would benefit from things such as re-architected RDBMS and column-stores, utilising main memory and SSD as much as possible, whilst still retaining traditional strong consistency, ACID semantics and in some cases SQL. But Steve then re-focusses on Map Reduce, admitting that the model behind this implementation requires a different way of thinking about how you store, query and manipulate data, to the point that it makes it much harder for users to integrate this into their existing investments.
In the same way as there aren't that many people who can properly think multi-threaded then there aren't that many people who can think Map Reduce.
So where does this leave Big Data, when very regularly we hear about new implementations or vendors looking to encourage us to adopt their solutions? Well according to Steve:
We see people using Big Data in the same way they used SOA, slapping on a logo and saying things like 'Hadoop integration' or 'Social media integration' or.... to put it another way.... 'we've built a connector'. See how much less impressive the later looks? Its just an old school EAI connector to a new source or a new ETL connector...
This may well be a sweeping statement, but perhaps there is a germ of truth within it? Is there a risk of losing the core message behind this requirement for "new data solutions" because there is too much hype and too many vendors putting a NoSQL/Big Data badge on implementations that are not appropriate for the task at hand? As Steve suggests, this may be similar to the early days of SOA, when vendors added the SOA badge to solutions that were most definitely not SOA. But how precisely can you measure whether what you need is a Big Data solution or what you're being offered is a a Big Con (as Steve puts it)? Well Steve offers up a few suggestions, at least when evaluating solutions from vendors, some of which include:
- Can you replace the phrase 'Big Data' with 'Big Database'. If you can then its just an upgrade.
- Can the 'advance' be reduced to 'we've got an EAI connector'?
- Is it basically the same product as 2009 with a Big Data/NoSQL badge on it?
- Is there anything that moves process to data rather than shifting the data? This is something that many have suggested in the past, including Jim Grey.
Unfortunately none of these "rules" are scientific and require some level of subjectivity. So are there others that could be used? If you have moved away from a traditional RDBMS to something else, what did you use when deciding the move was necessary and how did you select the implementation to migrate towards? Was the move successful and if not, why not?
Remember the Hype Cycle
Yes, Big Data/map-reduce is somewhere at or near the peak of expectations and it surely will see the trough of disillusionment some time in the next few years. Nonetheless there are genuine advances here that will live long into the future.
Its not a magic bullet, just another tool in your arsenal but the long term balance will shift away from RDBMSs to NoSql and map-reduce type analytics and/or the emergence of hybrid products.
I agree, like REST and SOA in the past, BigData is the new marketing cry for vendors.