Hadoop + SQL Server + Excel = Big Data Analytics
Few months back, Microsoft announced HDInsight, Microsoft’s Hadoop distribution for managing, analysing and making sense out of large volumes of data. InfoQ connected with Val Fontama, Senior Product Marketing Manager for SQL Server, to know more about how the Enterprise Big Data @ Microsoft story is panning out.
About the trend of dataset sizes in enterprises -
This ocean of data is getting bigger all the time. Some estimates indicate that a business’s information stores double in size every year. Gartner, for example, sees information volume growing worldwide at a minimum annual rate of 59%, with roughly 85% of that as “unstructured” – data such as video clips, RFID tags, and web site logs. This unstructured data is not readily suited for traditional data management systems. In addition, customers are also seeing increasing velocity of data as they now collect new data in real-time in many cases.
Customers will need a modern data platform to evolve with the needs of the business and the data they are collecting. Big data has created a massive business opportunity for businesses around the world to find new, actionable insights from all the data they collect, whether structured or unstructured. Because at the end of the day, the biggest promise of Big Data is to drive smarter decisions from data. To do that you have to gain new insights from all types of data.
HDInsight is Microsoft’s solution to tackling the Big Data challenge -
Microsoft hopes to accelerate (Hadoop) adoption in the enterprise by offering portability, superior performance, security and simplified deployment through Hadoop distributions for Windows Server and Windows Azure. Microsoft will enhance Hadoop security by integrating HDInsight with Active Directory. This will empower IT to apply the same consistent security policies across all their IT assets including Hadoop clusters.
In addition, through integration with System Center, HDInsight simplifies the management of Hadoop and enables IT to manage their Hadoop clusters alongside SQL Server databases, applications on a single glass pane.
Hadoop-based applications targeting the Windows platform integrate with Microsoft’s business intelligence (BI) tools such as Excel, Power View and PowerPivot, creating unique and differentiated value for businesses by providing easy analysis of massive amounts of business information.
To deliver 100% compatibility with Apache Hadoop, Microsoft’s Hadoop distribution, HDInsight, is built on the Hortonworks Data Platform (HDP). As a result, customers will be able to move their MapReduce jobs from their own Windows servers to the cloud, or even to an Apache Hadoop distribution running on Linux. No other vendor offers this today. In addition, delivering these capabilities on the Windows Server and Azure platforms enables customers to use the familiar tools of Excel, PowerPivot for Excel and Power View to easily extract actionable insights from the data.
How SQL Server fits into this solution –
One of SQL Server 2012’s most significant differentiators to help enterprises handle large datasets from SQL Server 2008 is its compatibility with Hadoop. Hadoop allows users to process large amounts of both structured and unstructured data to quickly find insights on that data, and because Hadoop is open-source, it can provide these insights at a low cost. Microsoft’s partnership with Hortonworks in developing Hadoop compatibility with SQL Server 2012 as well as recently-announced previews of Microsoft HDInsight Server and Windows Azure HDInsight Service allows customers to use Microsoft-developed Hadoop connectors to get the best insights from their data. With the Hive ODBC Driver that connects SQL Server to Hadoop, customers can now use Microsoft BI tools like PowerPivot and Power View in SQL Server 2012 to analyze all types of data, including unstructured data. Furthermore, with the new Data Quality Services in SQL Server 2012, customers can enhance their data, by converting raw data to credible and consistent data fit for modelling.
Microsoft recently announced new features in Office 2013 and how developers will be able to leverage them by building apps and consuming data services. Not surprisingly, Microsoft itself uses this to provide Big Data services right within Excel –
Excel is one of the primary clients to enable big data analytics on Microsoft platforms. In Excel 2013, our primary BI tools are PowerPivot, a data-modeling tool, and Power View, a data-visualization tool, and they are built right into the software, no additional downloads required. This enables users of all levels to do self-service BI using the familiar interface of Excel.
Through a Hive Add-in for Excel, our HDInsight services easily integrate with the BI tools in Office 2013, allowing users to create easily analyze massive amounts of structured or unstructured data with a very familiar tool.
In addition to Excel, Microsoft offers other client tools for interacting with Big Data: BI Professionals can use BI Developer Studio to design OLAP cubes or scalable PowerPivot models in SQL Server Analysis Services. Developers will continue using Visual Studio to develop and test MapReduce programs written in .NET. Finally, IT operators will manage their Hadoop clusters on HDInsight with System Center that they use today.
Overall, Microsoft’s strategy seems to be to offer path of least resistance for it’s customers to adopting Big Data – by extending existing tools such as SQL Server and Office to work seamlessly with new data types and allowing companies to take advantage of their existing investments while making new ones.