InfoQ Homepage Hadoop Content on InfoQ

News

RSS Feed

Newer Older

Cloudera Enterprise Released: Interview with Charles Zedlewski

Cloudera recently announced Cloudera Enterprise, a commercial bundling of Hadoop and a dozen other supporting open source projects. InfoQ interviewed Product Manager Charles Zedlewski for more detail about what this means for conventional enterprises and the future face of Hadoop.

Tim Cull
on Aug 24, 2010
LinkedIn's Data Infrastructure

Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.

Ron Bodkin
on Aug 04, 2010
Facebook on Hadoop, Hive, HBase, and A/B Testing

The Hadoop Summit of 2010 included presentations from a number of large scale users of Hadoop and related technologies. Notably, Facebook presented a keynote and details information about their use of Hive for analytics. Mike Schroepfer, Facebook's VP of Engineering delivered a keynote describing the scale of their data processing with Hadoop.

Ron Bodkin
on Jul 14, 2010
Amazon Elastic MapReduce Updates from Hadoop Summit 2010

The Hadoop Summit of 2010 included a keynote from Peter Sirota, General Manager of Amazon Elastic MapReduce (EMR), which is a hosted Hadoop offering from Amazon that includes web-based management tools.

Ron Bodkin
on Jul 13, 2010
Yahoo! Updates from Hadoop Summit 2010

The Hadoop Summit of 2010 started off with a vuvuzela blast from Blake Irving, Chief Product Officer for Yahoo. Yahoo delivered keynote addresses that outlined the scale of their use, technical directions for their contributions, and architectural patterns in how they apply the technology.

Ron Bodkin
on Jul 12, 2010
Mahout 0.3: Open Source Machine Learning

The need for machine-learning techniques like clustering, collaborative filtering, and categorization has steadily increased the last decade along with the number of solutions needing quick and efficient algorithms to transform vast amounts of raw data into relevant information. Apache Mount 0.3 has been announced on March, adding more functionality, stability and performance.

Gilad Manor
on Apr 19, 2010
Clojure Roundup: Distribution with Crane, Mathematics with Incanter, Builds with Leiningen 1.0

FlightCaster recently open sourced Crane, a tool for distributing and remotely controlling Clojure instances, currently specialized for EC2. Incanter is a Clojure library and tool that makes R-like statistical computations easy with Clojure. Also: the build and dependency management tool Leiningen 1.0 is now available.

Werner Schuster
on Dec 13, 2009
Goat Rodeo: A Unified Data Model for Web Applications

David Pollak, found of the Lift web framework and "Beginning Scala" author, has announced a new initiative "Goat Rodeo" that aims to bring data modeling into the 21st century.

Gavin Terrill
on Jun 22, 2009
Apache Mahout: Highly Scalable Machine Learning Algorithms

The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.

Ryan Slobojan
on Apr 23, 2009
Amazon Rolls Out Hadoop Based MapReduce to EC2

It has been possible to run Hadoop on EC2 for a while. Today Amazon simplified the process by announcing Amazon Elastic MapReduce which automatically deploys EC2 instances for computational use and includes a API for interacting with them.

Scott Delap
on Apr 02, 2009
Cloudera Seeks to Make Hadoop More Accessible With Packaged Distribution

Numerous projects have sprouted up around the popular Hadoop open source implementation of map reduce in the last year. Now Cloudera is releasing Cloudera Distribution for Hadoop, an open source product seeking to make it easier for company's to begin using Hadoop.

Scott Delap
on Mar 16, 2009
Cascading - Data Processing API for Hadoop MapReduce

Cascading is a new processing API for data processing on Hadoop clusters, and supports building complex processing workflows using an expressive, declarative API.

R.J. Lorimer
on Oct 10, 2008
HBase Leads Discuss Hadoop, BigTable and Distributed Databases

Google's recent introduction of their Google Application Engine has created renewed interest in alternative database technologies. InfoQ recently sat down with the leads of HBase, an open-source, distributed, data store modeled after the Google's BigTable.

Scott Delap
on Apr 28, 2008
Hypertable Lead Discusses Hadoop and Distributed Databases

Two open source projects related to Hadoop, HBase and Hypertable, provide Big Table inspired scalable database implementations. InfoQ sat down with Doug Judd, Principal Search Architect at Zvents, Inc. and Hypertable project founder, to discuss its implementation.

Scott Delap
on Apr 03, 2008
Lucene 2.3: Large indexing performance improvements, new machine-learning project

The Apache Lucene project, a high-performance full-featured text search engine library written entirely in Java, released version 2.3 today. InfoQ spoke with committer and Project Management Committee (PMC) member Grant Ingersoll to learn more about this release and the future plans for Lucene.

Ryan Slobojan
on Jan 24, 2008

Newer News

Older News

InfoQ Software Architects' Newsletter

News