InfoQ

Topic/Tag specific view

Hadoop Content on InfoQ


Latest featured content about Hadoop

Yahoo's Doug Cutting on MapReduce and the Future of Hadoop

Community
Java
Topics
Data Access,
Clustering & Caching

InfoQ's lead Java editor, Scott Delap, recently caught up with Hadoop project lead Doug Cutting. Hadoop is an open source distributed computing platform that includes implementations of MapReduce and a distributed file system. In this special InfoQ interview Cutting discusses how Hadoop is used at Yahoo, the challenges of its development, and the future direction of the project.

News about Hadoop

Goat Rodeo: A Unified Data Model for Web Applications

Community
Architecture,
Java
Topics
Announcements,
Data Access,
Web Frameworks

David Pollak, found of the Lift web framework and "Beginning Scala" author, has announced a new initiative "Goat Rodeo" that aims to bring data modeling into the 21st century.

Apache Mahout: Highly Scalable Machine Learning Algorithms

Community
Java
Topics
Cloud Computing

The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.

Amazon Rolls Out Hadoop Based MapReduce to EC2

Community
Java
Topics
Grid Computing,
Cloud Computing

It has been possible to run Hadoop on EC2 for a while. Today Amazon simplified the process by announcing Amazon Elastic MapReduce which automatically deploys EC2 instances for computational use and includes a API for interacting with them.

Cloudera Seeks to Make Hadoop More Accessible With Packaged Distribution

Community
Java
Topics
Grid Computing

Numerous projects have sprouted up around the popular Hadoop open source implementation of map reduce in the last year. Now Cloudera is releasing Cloudera Distribution for Hadoop, an open source product seeking to make it easier for company's to begin using Hadoop.

Cascading - Data Processing API for Hadoop MapReduce

Community
Java
Topics
Cloud Computing

Cascading is a new processing API for data processing on Hadoop clusters, and supports building complex processing workflows using an expressive, declarative API.

HBase Leads Discuss Hadoop, BigTable and Distributed Databases

Community
Java
Topics
Data Access,
Cloud Computing,
Database Design

Google's recent introduction of their Google Application Engine has created renewed interest in alternative database technologies. InfoQ recently sat down with the leads of HBase, an open-source, distributed, data store modeled after the Google's BigTable.

Hypertable Lead Discusses Hadoop and Distributed Databases

Community
Architecture,
Java
Topics
Data Access,
Cloud Computing,
Clustering & Caching

Two open source projects related to Hadoop, HBase and Hypertable, provide Big Table inspired scalable database implementations. InfoQ sat down with Doug Judd, Principal Search Architect at Zvents, Inc. and Hypertable project founder, to discuss its implementation.

Lucene 2.3: Large indexing performance improvements, new machine-learning project

Community
Java
Topics
Search,
Open Source

The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.