In his new article Kai Wähner compares several alternatives for installing a version of Hadoop and realizing big data processes. He compares distributions and tooling from Apache and many other vendors including Cloudera, HortonWorks, MapR, Amazon, IBM, Oracle, Microsoft. He additionally describes pros and cons of every distribution and provides a decision tree for choosing a most appropriate one.
"Real-Time Big Data Analytics: Emerging Architecture" white paper authored by Mike Barlow covers big data analytics topic and how real-time big data analytics (RTBDA) are different from traditional analytics. InfoQ spoke with Mike about the current state of real-time big data analytics and the emerging trends in the Big Data space like Decision Science.
Paul Dix leads a practical exploration into Big Data in this video training series. The first five lessons of the training span multiple server systems with a focus on the end to end processing of large quantities of XML data from real Stack Exchange posts. He completes the training with a lesson on developing visualizations for gaining insights from the macro level analysis of Big Data.
In his new article Josh Wills introduces Crunch - a new Apache incubating project providing a Java library for creating MapReduce pipelines.
Hadoop MapReduce jobs have a unique code architecture that raises interesting issues for test-driven development. In this article Michael Spicuzza shows how to use MRUnit to solve these problems. 1
InfoQ spoke with NoSQL Distilled book authors, Pramod Sadalage and Martin Fowler about NoSQL database space and the emerging trends in NoSQL.
Stefan Edlich reviews NoSQL, considering its evolution, financial impact, standards or their lack of, current landscape, books, the leaders and some newcomers, concluding that NoSQL is here to stay. 3
In this virtual panel, InfoQ talks to several Hadoop vendors and users about their views at current and future state of Hadoop.
Rich Hickey, the author of Clojure, explains the architecture of Datomic - a new database designed as a composition of simple services, combining the capabilities of RDBMS and scalability of NoSQL. 2
Open source web-search framework Apache Nutch version 2 supports link-graph database and HTML parsing. InfoQ spoke with Julien Nioche, VP of Apache Nutch project, about the new features.
In his new article Jonathan Natkins explains how to use components of Apache Hadoop, including Flume, Hive and Oozie to implement a typical Data management system. 2
This article answers the question, is cloud computing really all that hard? 2