BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage MapReduce Content on InfoQ

  • Scalable System Design Patterns

    Ricky Ho revisited his three year old post on that question and realized that a lot had changed since then.

  • Percolator: a System for Incrementally Processing Updates to a Large Data Set

    Google's Daniel Peng and Frank Dabek published a paper on "Large-scale Incremental Processing Using Distributed Transactions and Notifications” explaining that databases do not meet the storage or throughput requirements for Google's indexing system which stores tens of petabytes of data and processes billions of updates per day on thousands of machines.

  • Cloudant releases Java based view server for CouchDB

    Cloudant the company behind CouchDB just released Java View Server for CouchDB. That means that not only Erlang and interpreted languages like Javascript or Python can be used to write Map-Reduce jobs but also JVM based languages.

  • LinkedIn's Data Infrastructure

    Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.

  • Yahoo! Updates from Hadoop Summit 2010

    The Hadoop Summit of 2010 started off with a vuvuzela blast from Blake Irving, Chief Product Officer for Yahoo. Yahoo delivered keynote addresses that outlined the scale of their use, technical directions for their contributions, and architectural patterns in how they apply the technology.

  • Adobe Released Puppet Recipes for Hadoop

    Recently Adobe released Puppet recipes that they are using to automate Hadoop/HBase deployments to the community. InfoQ spoke with Luke Kanies, founder of PuppetLabs, to learn more about what this means.

  • Apache Mahout: Highly Scalable Machine Learning Algorithms

    The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.

  • Amazon Rolls Out Hadoop Based MapReduce to EC2

    It has been possible to run Hadoop on EC2 for a while. Today Amazon simplified the process by announcing Amazon Elastic MapReduce which automatically deploys EC2 instances for computational use and includes a API for interacting with them.

  • Cascading - Data Processing API for Hadoop MapReduce

    Cascading is a new processing API for data processing on Hadoop clusters, and supports building complex processing workflows using an expressive, declarative API.

  • Aster In-Database MapReduce

    Aster Data Systems has announced an in-database MapReduce implementation for their nCluster database platform.

  • Skynet, A New Ruby MapReduce

    The MapReduce design pattern to distribute data processing was introduced by Google in 2004, and came first with a C++ implementation. A new Ruby implementation is now available under the name of Skynet released by Adam Pisoni. InfoQ had the chance to catch up with Adam about its features and how it compares to an existing Ruby implementation called Starfish.

  • MapReduce A Step Backwards: Is Comparison to Relational Databases Fair?

    A recent article on the Database Column by David J. DeWitt and Michael Stonebraker attempts to compare the increasingly popular MapReduce programming paradigm to a relational database. The blogsphere has quickly called foul on the comparison and its reasoning.

  • Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2

    Amazon's EC2 Elastic Computing cloud allows developers to acquisition computing power a the rate of $0.10 per hour consumed. Work as been done to allow Hadoop an open source MapReduce implementation written in Java to run on EC2. This combination will allow developers to write scalable algorithms and then bring up large numbers of servers to use as computing power for them as needed.

BT