As data grows exponentially, the modern Hadoop ecosystem provides not only a reliable distributed aggregation system that delivers data parallelism, but also analytics for great data insights. In this article Monica Beckwith, starting from core Hadoop components, investigates the design of a highly available, fault tolerant Hadoop cluster, adding security and data-level isolation.
The new “Hadoop in Practice. Second Edition” book by Alex Holmes provides a deep insight into Hadoop ecosystem covering a wide spectrum of topics such as data organization, layouts and serialization, data processing, including MapReduce and big data patterns, special structures along with their usage to simplify big data processing, and SQL on Hadoop data.
This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.
GridGain announced In-Memory Accelerator for Hadoop, offering benefits of in-memory computing to Hadoop applications. InfoQ spoke with Nikita Ivanov from GridGain about the product's architecture.
Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share some of the initial results obtained by making Hive use Tez instead of MapReduce.
The MLConf conference was going strong in NYC on April 11th and was a full day packed with talks around Machine Learning and Big Data, featuring speakers from many prominent companies.
Apache Hadoop YARN – a new Hadoop resource manager - has just been promoted to a high level Hadoop subproject. InfoQ had the chance to discuss YARN with Arun Murthy - founder of Hortonworks. 1