News from O’Reilly Strata Conference + Hadoop World 2012: Azure HDInsight, Cloudera Impala, MapR M7
During this year’s O’Reilly Strata Conference + Hadoop World, in addition to a collection of very interesting presentations, there have been several important vendor announcements regarding Microsoft Azure HDInsight, Cloudera Impala, and MapR M7.
"Big data should provide answers for business, not complexity for IT. Providing Hadoop compatibility on Windows Server and Azure dramatically lowers the barriers to setup and deployment and enables customers to pull insights from any data, any size, on-premises or in the cloud."
The HDInsight Server is designed to work with Windows Server and Microsoft SQL Server. In the case of Windows, HDInsight is integrated with Microsoft System Center for administrative control and Active Directory for access control and security. HDInsight (on-premises and cloud) provides connection to Microsoft SQL Server enabling business intelligence
"... and that starts with user-facing tools and components including Microsoft Excel, PowerPivot for Excel, and Power View. Few people don't have access to Excel, and it can handle data extracts from any Hadoop environment."
Microsoft's cloud-based HDInsight will lets users spin up and deploy a Hadoop cluster within minutes. This service will compete with existing MapReduce services including Amazon Web Services' Elastic Map Reduce. Additionally, Azure will host data marketplace, providing users with the ability to purchase each other’s data.
Cloudera debuted its real-time query – project Impala, serving real time SQL queries in seconds and providing integration with leading BI tools. Impala provides native distributed query engine and a lower latency scheduler and can operate on data stored in both HDFS and HBase. It leverages Apache Hive meta-store and is compatible with Hive SQL syntax, ODBC driver and Beeswax GUI (in Hue).
Cloudera claims the new platform, which is entering public beta, can process queries 10 to 30 times faster than Hive/MapReduce. While Cloudera’s marketing materials refer to that sort of processing speed as “real time” and “speed of thought,” the company’s chief architect suggests that “real time” in data analytics is better framed as “waiting less.”
"We are able to work on one single platform for big data rather than many disparate systems for archiving, ETL and analytics… This evolution of Hadoop has enabled us to reduce our latency by 50 percent and produce a new real business insight service not previously viable."
The extra speed is all the more impressive when you consider how companies are wrestling with more in-house data than ever before. But those epic datasets also create sizable backend issues, especially with regard to latency.
Finally, MapR Technologies introduced their new version – M7 – that simplifies the management of HBase and making it an enterprise-grade database platform.
"... the company has strived to solve a lot of reliability and management issues that have been common with HBase. The company aimed to simplify the underlying architecture and how the different pieces work together, making it as simple as possible. Norris said this has resulted in an easier to manage platform, but it also provides customers with unified management, unified data protection, consistent access, more flexibility and higher performance than previous platforms."
According to M7 whitepaper:
"M7 has a purpose-built architecture specifically designed to optimize the storage and processing of files as well as tables within a unified platform. This unification applies the power of MapR’s existing management, access and data protection capabilities to tables. With M7, the layered architecture of HBase is eliminated so that HBase applications can access data directly with only one network hop, without all of the delays introduced by extra layers of communication. The integration of files and tables in the M7 architecture into a single data store is what brings administrative and development simplicity, superior reliability, and unprecedented performance and scale to HBase applications."