This article discusses what stream processing is, how it fits into a big data architecture with Hadoop and a data warehouse (DWH), when stream processing makes sense, and what technologies and products you can choose from.
GridGain recently announced the In-Memory Accelerator for Hadoop, offering the benefits of in-memory computing to Hadoop based applications. It includes two components: an in-memory file system and a MapReduce implementation. InfoQ spoke with Nikita Ivanov, CTO of GridGain about the architecture of the product.
SQL-on-Hadoop technologies include a SQL layer or a SQL database over Hadoop. These solutions are becoming popular recently as they solve the data management issues of Hadoop and provide a scale-out alternative for traditional RDBMSs. InfoQ spoke with Rich Reimer, VP of Marketing and Product Management at Splice Machine about the architecture and data patterns for SQL in Hadoop databases.
Bikas Saha and Arun Murthy discuss Tez’s design, highlight some of its features and share some of the initial results obtained by making Hive use Tez instead of MapReduce.
The MLConf conference was going strong in NYC on April 11th and was a full day packed with talks around Machine Learning and Big Data, featuring speakers from many prominent companies.
Lambda Architecture proposes a simpler, elegant paradigm designed to process large amounts of data. In this article, author discusses Lambda Architecture with the help of a sample Java application. 6
In this article, authors discuss the role of big data and Hadoop in security analytics space and how to use MapReduce to process data for security analysis.
How to use various tools such as Apache Avro, Apache Crunch, Cloudera ML and the Cloudera Development Kit to build applications that use Hadoop.
Jon Natkins explains in this article how to create a personalized recommendation system fed with large amounts of real-time data using Kiji, which leverages HBase, Avro, Map-Reduce and Scalding.
Elasticsearch is an open source, distributed real-time search and analytics engine for the cloud. InfoQ spoke with Costin Leau about Elasticsearch and how it integrates with Hadoop and Big Data.
Although Hadoop is a set of an open source Apache (and now GitHub) projects, there are currently a large number of alternatives for installing a version of Hadoop and realizing big data processes. 3
Paul Dix leads a practical exploration into Big Data in this video training series. The training focuses on the high level architecture while teaching practical usage skills and Ruby algorithms.