InfoQ Homepage Hadoop Content on InfoQ
-
Clojure Roundup: Distribution with Crane, Mathematics with Incanter, Builds with Leiningen 1.0
FlightCaster recently open sourced Crane, a tool for distributing and remotely controlling Clojure instances, currently specialized for EC2. Incanter is a Clojure library and tool that makes R-like statistical computations easy with Clojure. Also: the build and dependency management tool Leiningen 1.0 is now available.
-
Goat Rodeo: A Unified Data Model for Web Applications
David Pollak, found of the Lift web framework and "Beginning Scala" author, has announced a new initiative "Goat Rodeo" that aims to bring data modeling into the 21st century.
-
Apache Mahout: Highly Scalable Machine Learning Algorithms
The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.
-
Amazon Rolls Out Hadoop Based MapReduce to EC2
It has been possible to run Hadoop on EC2 for a while. Today Amazon simplified the process by announcing Amazon Elastic MapReduce which automatically deploys EC2 instances for computational use and includes a API for interacting with them.
-
Cloudera Seeks to Make Hadoop More Accessible With Packaged Distribution
Numerous projects have sprouted up around the popular Hadoop open source implementation of map reduce in the last year. Now Cloudera is releasing Cloudera Distribution for Hadoop, an open source product seeking to make it easier for company's to begin using Hadoop.
-
Cascading - Data Processing API for Hadoop MapReduce
Cascading is a new processing API for data processing on Hadoop clusters, and supports building complex processing workflows using an expressive, declarative API.
-
HBase Leads Discuss Hadoop, BigTable and Distributed Databases
Google's recent introduction of their Google Application Engine has created renewed interest in alternative database technologies. InfoQ recently sat down with the leads of HBase, an open-source, distributed, data store modeled after the Google's BigTable.
-
Hypertable Lead Discusses Hadoop and Distributed Databases
Two open source projects related to Hadoop, HBase and Hypertable, provide Big Table inspired scalable database implementations. InfoQ sat down with Doug Judd, Principal Search Architect at Zvents, Inc. and Hypertable project founder, to discuss its implementation.
-
Lucene 2.3: Large indexing performance improvements, new machine-learning project
The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.
-
MapReduce A Step Backwards: Is Comparison to Relational Databases Fair?
A recent article on the Database Column by David J. DeWitt and Michael Stonebraker attempts to compare the increasingly popular MapReduce programming paradigm to a relational database. The blogsphere has quickly called foul on the comparison and its reasoning.
-
Interview: Yahoo's Doug Cutting on MapReduce and the Future of Hadoop
In this special InfoQ interview, Hadoop project lead Doug Cutting discusses MapReduce, the benefits of open source, and the future direction of the project.
-
Open Source Google-Like Infrastructure Project Hadoop Gains Momentum
While it has been in existence for over a year, open source Google-like infrastructure project Hadoop is just now receiving wider noticed by the development community. Recently Yahoo's Jeremy Zawodny provided a status update showing benchmark performance improving by 20x in the last year.
-
MapReduce Gaining Traction: Tools Plugin Released for Eclipse and Amazon EC2 Support
IBM's Alphaworks website has released an Eclipse plugin to simplify the development of applications using Hadoop, the open source Java MapReduce framework. Work has also been done to easily allow Hadoop applications to run on Amazon's EC2 and S3 platforms for processing and storage.
-
Run Your Own Google Style Computing Cluster with Hadoop and Amazon EC2
Amazon's EC2 Elastic Computing cloud allows developers to acquisition computing power a the rate of $0.10 per hour consumed. Work as been done to allow Hadoop an open source MapReduce implementation written in Java to run on EC2. This combination will allow developers to write scalable algorithms and then bring up large numbers of servers to use as computing power for them as needed.