BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

News from O’Reilly Strata Conference + Hadoop World 2012: Azure HDInsight, Cloudera Impala, MapR M7

| by Boris Lublinsky on Oct 29, 2012. Estimated reading time: 3 minutes |

During this year’s O’Reilly Strata Conference + Hadoop World, in addition to a collection of very interesting presentations, there have been several important vendor announcements regarding Microsoft Azure HDInsight, Cloudera Impala, and MapR M7.

Microsoft and Hortonworks announced the availability of its Azure cloud-based Hadoop service, now known as Windows Azure HDInsight Service. According to David Campbell, technical Fellow at Microsoft:

"Big data should provide answers for business, not complexity for IT. Providing Hadoop compatibility on Windows Server and Azure dramatically lowers the barriers to setup and deployment and enables customers to pull insights from any data, any size, on-premises or in the cloud."

The HDInsight Server is designed to work with Windows Server and Microsoft SQL Server. In the case of Windows, HDInsight is integrated with Microsoft System Center for administrative control and Active Directory for access control and security. HDInsight (on-premises and cloud) provides connection to Microsoft SQL Server enabling business intelligence

"... and that starts with user-facing tools and components including Microsoft Excel, PowerPivot for Excel, and Power View. Few people don't have access to Excel, and it can handle data extracts from any Hadoop environment."

Microsoft's cloud-based HDInsight will lets users spin up and deploy a Hadoop cluster within minutes. This service will compete with existing MapReduce services including Amazon Web Services' Elastic Map Reduce. Additionally, Azure will host data marketplace, providing users with the ability to purchase each other’s data.

Cloudera debuted its real-time query – project Impala, serving real time SQL queries in seconds and providing integration with leading BI tools. Impala provides native distributed query engine and a lower latency scheduler and can operate on data stored in both HDFS and HBase. It leverages Apache Hive meta-store and is compatible with Hive SQL syntax, ODBC driver and Beeswax GUI (in Hue).

Cloudera claims the new platform, which is entering public beta, can process queries 10 to 30 times faster than Hive/MapReduce. While Cloudera’s marketing materials refer to that sort of processing speed as “real time” and “speed of thought,” the company’s chief architect suggests that “real time” in data analytics is better framed as “waiting less.”

According to Expedia, one of the first Impala beta testers:

"We are able to work on one single platform for big data rather than many disparate systems for archiving, ETL and analytics… This evolution of Hadoop has enabled us to reduce our latency by 50 percent and produce a new real business insight service not previously viable."

The extra speed is all the more impressive when you consider how companies are wrestling with more in-house data than ever before. But those epic datasets also create sizable backend issues, especially with regard to latency.

Finally, MapR Technologies introduced their new version – M7 – that simplifies the management of HBase and making it an enterprise-grade database platform.

With M7:

"... the company has strived to solve a lot of reliability and management issues that have been common with HBase. The company aimed to simplify the underlying architecture and how the different pieces work together, making it as simple as possible. Norris said this has resulted in an easier to manage platform, but it also provides customers with unified management, unified data protection, consistent access, more flexibility and higher performance than previous platforms."

According to M7 whitepaper:

"M7 has a purpose-built architecture specifically designed to optimize the storage and processing of files as well as tables within a unified platform. This unification applies the power of MapR’s existing management, access and data protection capabilities to tables. With M7, the layered architecture of HBase is eliminated so that HBase applications can access data directly with only one network hop, without all of the delays introduced by extra layers of communication. The integration of files and tables in the M7 architecture into a single data store is what brings administrative and development simplicity, superior reliability, and unprecedented performance and scale to HBase applications."

The O’Reilly Strata Conference + Hadoop World presentations, keynotes and interviews can be downloaded from its website.

 

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT