Big Data – Distinguishing Between Hype and Reality
A recent IBM survey makes an attempt to distinguish between hype and reality when it comes to Big Data. Its findings include: 28% of the companies have started a Big Data pilot or implementation, 47% plan for it, while 24% try to learn about it.
The 2012 Analytics Study - The real-world use of big data (registration required) report was created by the IBM Institute for Business Value and the Saïd Business School at the University of Oxford based on the Big Data @ Work Survey conducted by IBM in mid-2012 with 1144 professionals from 95 countries across 26 industries.
The report defines Big Data by its 4 “V” dimensions:
- Volume – the characteristic mostly associated with Big Data. A little over 50% of the respondents considered that anything between 1TB-1PB represents Big Data, smaller than the multi-PB to ZB usually mentioned.
- Variety – Big Data includes a broad spectrum of data from structured to semi-structured to unstructured data, from within or outside the enterprise. Such data also has many types: text, binary, sensor data, tweets, web data, clicks, logs, audio, video, etc.
- Velocity – today’s data is generated faster than ever before making it quite difficult to process in real-time.
- Veracity – the report added this fourth dimension to characterize data reliability. External or internal factors may influence the quality of data and therefore the analysis based on it, so this dimension needs to be considered.
Following is a summary of the Big Data @ Work survey results:
- 28% of the organizations have started Big Data pilot or implementation projects, 47% plan for it, while 24% have only tried to understand what it is all about.
- Big Data projects are customer-focused -49%, or for operations optimizations -18%, risk or financial management - 15%, creating new business models 14%, employee collaboration -4%.
- Big Data initiatives usually start with internal structured data, then move to semi-structured and later to unstructured data.
The following graphic depicts the main components used by those companies which implemented Big Data initiatives:
Interestingly enough, Big Data sources are not coming primarily from social media, RFID or hardware sensors, but from Transactions – 88%, Log Data – 73%, Events – 59%, Emails – 57%, then comes Social Media with 43%, Sensors – 42%, External Feeds – 42%, RFID or POS – 41%, Text – 41%, Geospatial – 40%, Audio – 38%, Pictures or Videos – 34%.
Companies with Big Data initiatives reported having the following capabilities available in their organization: Query & Reporting – 91%, Data Mining – 77%, Data Visualization – 71%, Predictive Modeling – 67%, Optimization – 65%, Simulation – 56%, Natural Language Text – 52%, Geospatial Analytics – 43%, Streaming Analytics – 35%, Video Analytics – 26%, and Voice Analytics – 25%.
The self-selected participants to this online survey came from all continents, and while the majority represented business professionals (54%) - Executive Management (16%), Marketing (15%), Research and Product Development (10%), General Management/Operations (8%), and Finance (5%) –, a good 46% represented IT.