Paul Dix leads a practical exploration into Big Data in this video training series. The first five lessons of the training span multiple server systems with a focus on the end to end processing of large quantities of XML data from real Stack Exchange posts. He completes the training with a lesson on developing visualizations for gaining insights from the macro level analysis of Big Data.
In his new article Josh Wills introduces Crunch - a new Apache incubating project providing a Java library for creating MapReduce pipelines. Crunch is based on a set of high level abstractions simplifying MapReduce applications design and provides library of patterns to implement common tasks like data joins, aggregations, and sorting.
Hadoop MapReduce jobs have a unique code architecture that raises interesting issues for test-driven development. In this article Michael Spicuzza provides a real-world example using MRUnit, Mockito, and PowerMock to solve these problems.
InfoQ spoke with NoSQL Distilled book authors, Pramod Sadalage and Martin Fowler about NoSQL database space and the emerging trends in NoSQL.
Stefan Edlich reviews NoSQL, considering its evolution, financial impact, standards or their lack of, current landscape, books, the leaders and some newcomers, concluding that NoSQL is here to stay. 3
In this virtual panel, InfoQ talks to several Hadoop vendors and users about their views at current and future state of Hadoop.
Rich Hickey, the author of Clojure, explains the architecture of Datomic - a new database designed as a composition of simple services, combining the capabilities of RDBMS and scalability of NoSQL. 2
Open source web-search framework Apache Nutch version 2 supports link-graph database and HTML parsing. InfoQ spoke with Julien Nioche, VP of Apache Nutch project, about the new features.
In his new article Jonathan Natkins explains how to use components of Apache Hadoop, including Flume, Hive and Oozie to implement a typical Data management system. 2
This article answers the question, is cloud computing really all that hard? 2
A new Apache HCatalog provides a metadata and table management system for Hadoop ecosystem, simplifying data interoperability between different data processing tools
This article contains an interview with Dipti Borkar, Director of Product Management at Couchbase, on the challenges, benefits and the process of migrating from RDBMS to NoSQL. 6