This article provides an overview of tools and libraries available for embedded data analytics and statistics, both stand-alone software packages and programming languages with statistical capabilities. The authors also discuss how to combine and integrate these embedded analytics technologies to handle big data.
In this article, authors discuss the role of big data and Hadoop in security analytics space and how to use MapReduce to efficiently process data for security analysis for use cases like Security Information and Event Management (SIEM) and Fraud Detection.
When building applications using Hadoop, it is common to have input data from various sources coming in various formats. In his presentation, “New Tools for Building Applications on Apache Hadoop”, Eli Collins overviews how to build better products with Hadoop and various tools that can help, such as Apache Avro, Apache Crunch, Cloudera ML and the Cloudera Development Kit.
Raffi Krikorian, Vice President of Platform Engineering at Twitter, gives an insight on how Twitter prepares for unexpected traffic peaks and how system architecture is designed to support failure. 1
Jon Natkins explains in this article how to create a personalized recommendation system fed with large amounts of real-time data using Kiji, which leverages HBase, Avro, Map-Reduce and Scalding.
How do you bringing agility into big data? Learn what makes analytics uniquely different than application development, and how to adapt agile principles and practices to the nuances of analytics.
Elasticsearch is an open source, distributed real-time search and analytics engine for the cloud. InfoQ spoke with Costin Leau about Elasticsearch and how it integrates with Hadoop and Big Data.
Justin Weiler introduces FatDB, a NoSQL DB and a distributed platform built on Mission Oriented Architecture meant to abstract and generalize the essential characteristics of enterprise applications.
Although Hadoop is a set of an open source Apache (and now GitHub) projects, there are currently a large number of alternatives for installing a version of Hadoop and realizing big data processes. 4
"Real-Time Big Data Analytics: Emerging Architecture" white paper by Mike Barlow discusses the difference between traditional & real-time analytics. InfoQ spoke with Mike about this topic. 3
Paul Dix leads a practical exploration into Big Data in this video training series. The training focuses on the high level architecture while teaching practical usage skills and Ruby algorithms.
In his new article Josh Wills introduces Crunch - a new Apache incubating project providing a Java library for creating MapReduce pipelines. 1