This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.
GridGain recently announced the In-Memory Accelerator for Hadoop, offering the benefits of in-memory computing to Hadoop based applications. It includes two components: an in-memory file system and a MapReduce implementation. InfoQ spoke with Nikita Ivanov, CTO of GridGain about the architecture of the product.
Apache Tez is a new distributed execution framework that is targeted to-wards data-processing applications on Hadoop. But what exactly is it? How does it work? In the presentation, “Apache Tez: Accelerating Hadoop Query Processing”, Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share initial results obtained by making Hive use Tez instead of MapReduce.
The MLConf conference was going strong in NYC on April 11th and was a full day packed with talks around Machine Learning and Big Data, featuring speakers from many prominent companies.
Apache Hadoop YARN – a new Hadoop resource manager - has just been promoted to a high level Hadoop subproject. InfoQ had the chance to discuss YARN with Arun Murthy - founder of Hortonworks. 1