Cloudera recently released the latest version of its software distribution, CDH5. Almost 20 months after the last major version, CDH4 seems like ages in the Big Data world. We take a look at new features this release brings and the future direction of Cloudera after the latest round of investment from Intel and Google Ventures.
In the race for interactive SQL in Big Data environments, there are two open source based front-runners, Impala and Hive with the Stinger project. Cloudera recently announced that Impala is up to 69 times faster than Hive 0.12 and can outperform DBMS. Other than raw speed, we take a look at other considerations in choosing a SQL engine for Hadoop and also Tez, an application framework for YARN.
Hadoop is definitely the platform of choice for Big Data analysis and computation. While data Volume, Variety and Velocity increases, Hadoop as a batch processing framework cannot cope with the requirement for real time analytics. Spark, Storm and the Lambda Architecture can help bridge the gap between batch and event based processing.
Oracle Big Data Appliance and Big Data Connectors support integration with Hadoop, Cloudera Manager and Oracle NoSQL Database. Oracle announced last month the availability of Big Data Appliance and Connectors as well as partnership with Cloudera. They also recently announced the Advanced Analytics for Big Data by integrating R statistical programming language into Oracle Database 11g.
Companies rely more and more on big data when making their decisions. Amazon, Cloudera, and IBM have announced their Hadoop-as-a-Service offerings, while Microsoft promises to do the same next year.
Cloudera recently announced Cloudera Enterprise, a commercial bundling of Hadoop and a dozen other supporting open source projects. InfoQ interviewed Product Manager Charles Zedlewski for more detail about what this means for conventional enterprises and the future face of Hadoop.
The GigaOM Stucture conference a couple of weeks ago addressed many areas of cloud computing. One of the key themes of the event was the emergence of new data architectures. Throughout the panels, interviews, and presentations many speakers identified significant changes in how data gets handled that will be coming.
Numerous projects have sprouted up around the popular Hadoop open source implementation of map reduce in the last year. Now Cloudera is releasing Cloudera Distribution for Hadoop, an open source product seeking to make it easier for company's to begin using Hadoop.