In this article, authors discuss the role of big data and Hadoop in security analytics space and how to use MapReduce to efficiently process data for security analysis for use cases like Security Information and Event Management (SIEM) and Fraud Detection.
Apache Samza is a stream processor LinkedIn recently open-sourced. In his presentation, Samza: Real-time Stream Processing at LinkedIn, Chris Riccomini discusses Samza's feature set, how Samza integrates with YARN and Kafka, how it's used at LinkedIn, and what's next on the roadmap.
When building applications using Hadoop, it is common to have input data from various sources coming in various formats. In his presentation, “New Tools for Building Applications on Apache Hadoop”, Eli Collins overviews how to build better products with Hadoop and various tools that can help, such as Apache Avro, Apache Crunch, Cloudera ML and the Cloudera Development Kit.
Jon Natkins explains in this article how to create a personalized recommendation system fed with large amounts of real-time data using Kiji, which leverages HBase, Avro, Map-Reduce and Scalding.
Elasticsearch is an open source, distributed real-time search and analytics engine for the cloud. InfoQ spoke with Costin Leau about Elasticsearch and how it integrates with Hadoop and Big Data.
Although Hadoop is a set of an open source Apache (and now GitHub) projects, there are currently a large number of alternatives for installing a version of Hadoop and realizing big data processes. 1
Paul Dix leads a practical exploration into Big Data in this video training series. The training focuses on the high level architecture while teaching practical usage skills and Ruby algorithms.
In this virtual panel, InfoQ talks to several Hadoop vendors and users about their views at current and future state of Hadoop.
In his new article Benjamin Fagin discuss how to write XJC plugins and use this technique to generate AVRO schemes and marshaling classes directly from existing XSD files 8
Usage of custom Hadoop OutputFormat allows to produce output data in a form most appropriate for other applications. 2
In this article authors show how to extend Oozie by introducing custom actions, specific for a given company/line of business. 4
This article describes how interoperable clouds can be created, today, through the integration of open standards such as the Open Cloud Compute Interface, the Open Virtualisation Format and CDMI. 3