In his new article Kai Wähner compares several alternatives for installing a version of Hadoop and realizing big data processes. He compares distributions and tooling from Apache and many other vendors including Cloudera, HortonWorks, MapR, Amazon, IBM, Oracle, Microsoft. He additionally describes pros and cons of every distribution and provides a decision tree for choosing a most appropriate one.
Paul Dix leads a practical exploration into Big Data in this video training series. The first five lessons of the training span multiple server systems with a focus on the end to end processing of large quantities of XML data from real Stack Exchange posts. He completes the training with a lesson on developing visualizations for gaining insights from the macro level analysis of Big Data.
In this virtual panel, InfoQ talks to several Hadoop vendors and users about their views at current and future state of Hadoop and the things that are the most important for Hadoop’s further adoption and success.
Apache Hadoop YARN – a new Hadoop resource manager - has just been promoted to a high level Hadoop subproject. InfoQ had the chance to discuss YARN with Arun Murthy - founder of Hortonworks. 1
In his new article Benjamin Fagin discuss how to write XJC plugins and use this technique to generate AVRO schemes and marshaling classes directly from existing XSD files 8
Usage of custom Hadoop OutputFormat allows to produce output data in a form most appropriate for other applications. 2
In this article authors show how to extend Oozie by introducing custom actions, specific for a given company/line of business. 4
This article describes how interoperable clouds can be created, today, through the integration of open standards such as the Open Cloud Compute Interface, the Open Virtualisation Format and CDMI. 3
Complete Oozie example, demonstrating language features and their usage in real world examples 2
A new marshaling framework - Apache Avro provides a lot of interesting new features. In his new article, Boris Lublinsky takes it for a test drive and provides some suggestions on its proper usage 4
Matrix presents a white paper on using the open source tool, Hadoop, to implement the MapReduce strategy and a Cloud computing strategy to solve business intelligence problems. 1