BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Big Data Content on InfoQ

  • Building Applications With Hadoop

    When building applications using Hadoop, it is common to have input data from various sources coming in various formats. In his presentation, “New Tools for Building Applications on Apache Hadoop”, Eli Collins overviews how to build better products with Hadoop and various tools that can help, such as Apache Avro, Apache Crunch, Cloudera ML and the Cloudera Development Kit.

  • Interview with Raffi Krikorian on Twitter's Infrastructure

    Raffi Krikorian, Vice President of Platform Engineering at Twitter, gives an insight on how Twitter prepares for unexpected traffic peaks and how system architecture is designed to support failure.

  • Building a Real-time, Personalized Recommendation System with Kiji

    Jon Natkins explains in this article how to create a personalized recommendation system fed with large amounts of real-time data using Kiji, which leverages HBase, Avro, Map-Reduce and Scalding.

  • Agility, Big Data, and Analytics

    How do you bringing agility into big data analytics? Learn what makes analytics uniquely different than application development, and how to adapt agile principles and practices to the nuances of analytics. Examine how the disciplines of data science and software development complement one another, and how these intersect in an agile project environment.

  • Costin Leau on Elasticsearch, BigData and Hadoop

    Elasticsearch is an open source, distributed real-time search and analytics engine for the cloud. The first milestone of elasticsearch-hadoop 1.3.M1 was released last month. InfoQ spoke with Costin Leau about Elasticsearch and how it integrates with Hadoop and other Big Data technologies.

  • Building Scalable Applications in .NET: Introducing the FatDB Distributed Computing Platform

    Justin Weiler introduces FatDB, a NoSQL DB and a distributed platform built on Mission Oriented Architecture meant to abstract and generalize the essential characteristics of enterprise applications.

  • Spoilt for Choice – How to choose the right Big Data / Hadoop Platform?

    In his new article Kai Wähner compares several alternatives for installing a version of Hadoop and realizing big data processes. He compares distributions and tooling from Apache and many other vendors including Cloudera, HortonWorks, MapR, Amazon, IBM, Oracle, Microsoft. He additionally describes pros and cons of every distribution and provides a decision tree for choosing a most appropriate one.

  • Mike Barlow on Real-Time Big Data Analytics

    "Real-Time Big Data Analytics: Emerging Architecture" white paper authored by Mike Barlow covers big data analytics topic and how real-time big data analytics (RTBDA) are different from traditional analytics. InfoQ spoke with Mike about the current state of real-time big data analytics and the emerging trends in the Big Data space like Decision Science.

  • Interview and Video Review: Working with Big Data: Infrastructure, Algorithms, and Visualizations

    Paul Dix leads a practical exploration into Big Data in this video training series. The first five lessons of the training span multiple server systems with a focus on the end to end processing of large quantities of XML data from real Stack Exchange posts. He completes the training with a lesson on developing visualizations for gaining insights from the macro level analysis of Big Data.

  • Apache Crunch: A Java Library for Easier MapReduce Programming

    In his new article Josh Wills introduces Crunch - a new Apache incubating project providing a Java library for creating MapReduce pipelines. Crunch is based on a set of high level abstractions simplifying MapReduce applications design and provides library of patterns to implement common tasks like data joins, aggregations, and sorting.

  • Unit Testing Hadoop MapReduce Jobs With MRUnit, Mockito, & PowerMock

    Hadoop MapReduce jobs have a unique code architecture that raises interesting issues for test-driven development. In this article Michael Spicuzza provides a real-world example using MRUnit, Mockito, and PowerMock to solve these problems.

  • Interview and Book Review: NoSQL Distilled

    InfoQ spoke with both authors of the book, Pramod and Martin Fowler about NoSQL database space, the emerging trends in NoSQL.

BT