BT

Big Data: Evolution or Revolution?

by Mark Little on  Nov 13, 2011 3

Recently Steve Jones, from Cap Gemini, questioned whether NoSQL/Big Data is the panacea that some vendors would have us believe. He suggests that in some cases in-memory RDBMS may well be the optimal solution and that approaches such as Map Reduce could be too difficult to understand for typical IT departments. He concludes with a suggestion some sometimes Big Data may be a Big Con.

Hortonworks Announces Hadoop Data Platform

by Abel Avram on  Nov 01, 2011

Hortonworks, a company created in June 2011 by Yahoo! and Benchmark Capital, has announced the Technical Preview Program of Data Platform based on Hadoop. The company employs many of the core Hadoop contributors and intends to provide support and training.

AWS Targets Scientific Community with New Resources for High Performance Computing

by Richard Seroter on  Sep 29, 2011

The Amazon Web Services (AWS) team announced a set of resources targeting the high performance computing needs of the scientific community. AWS specifically highlights their “spot pricing” market as a way to do cost-effective, massive scale computing in Amazon cloud environment.

MapR Releases Commercial Distributions based on Hadoop

by Ron Bodkin on  Jul 07, 2011 1

MapR Technologies released a big data toolkit, based on Apache Hadoop with their own distributed storage alternative to HDFS. The software is commercial, with both a free edition, M3, as well as a paid edition, M5. M5 includes snapshots and mirroring for data, Job Tracker recovery, and commercial support. MapR's M5 edition will form the basis of EMC Greenplum's upcoming HD Enterprise Edition.

Yahoo Hadoop Spinout Hortonworks Announces Plans

by Ron Bodkin on  Jun 29, 2011

Yahoo spun-out its core Hadoop team, forming a new company Hortonworks. CEO Eric Baldeschwieler presented their vision of easing adoption of Hadoop and making core engineering improvements for availability, performance, and manageability. Hortonworks will sell support, training, and certification, primarily indirects through partners.

Scale-up or Scale-out? Or both?

by Boris Lublinsky on  Mar 09, 2011 1

A prevalent trend in IT in the last twenty years was scaling-out, rather than scaling-up. But due to the recent technological advances there is a new option, scaling-out scaled-up servers based on GPUs.

Hadoop Redesign for Upgrades and Other Programming Paradigms

by Ron Bodkin on  Feb 18, 2011

Yahoo recently announced and presented a redesign of the core map-reduce architecture for Hadoop to allow for easier upgrades, larger clusters, fast recovery, and to support programming paradigms in addition to Map-Reduce. The new design is quite similar to the open source Mesos cluster management project - both Yahoo and Mesos commented on the differences and opportunities.

Scalable System Design Patterns

by Jean-Jacques Dubray on  Dec 01, 2010 1

Ricky Ho revisited his three year old post on that question and realized that a lot had changed since then.

Percolator: a System for Incrementally Processing Updates to a Large Data Set

by Jean-Jacques Dubray on  Oct 05, 2010

Google's Daniel Peng and Frank Dabek published a paper on "Large-scale Incremental Processing Using Distributed Transactions and Notifications” explaining that databases do not meet the storage or throughput requirements for Google's indexing system which stores tens of petabytes of data and processes billions of updates per day on thousands of machines.

Cloudant releases Java based view server for CouchDB

by Michael Hunger on  Sep 08, 2010

Cloudant the company behind CouchDB just released Java View Server for CouchDB. That means that not only Erlang and interpreted languages like Javascript or Python can be used to write Map-Reduce jobs but also JVM based languages.

LinkedIn's Data Infrastructure

by Ron Bodkin on  Aug 04, 2010 1

Jay Kreps of LinkedIn presented some informative details of how they process data at the recent Hadoop Summit. Kreps described how LinkedIn crunches 120 billion relationships per day and blends large scale data computation with high volume, low latency site serving.

Yahoo! Updates from Hadoop Summit 2010

by Ron Bodkin on  Jul 12, 2010

The Hadoop Summit of 2010 started off with a vuvuzela blast from Blake Irving, Chief Product Officer for Yahoo. Yahoo delivered keynote addresses that outlined the scale of their use, technical directions for their contributions, and architectural patterns in how they apply the technology.

Adobe Released Puppet Recipes for Hadoop

by Michael Prokop on  Jul 01, 2010

Recently Adobe released Puppet recipes that they are using to automate Hadoop/HBase deployments to the community. InfoQ spoke with Luke Kanies, founder of PuppetLabs, to learn more about what this means.

Apache Mahout: Highly Scalable Machine Learning Algorithms

by Ryan Slobojan on  Apr 23, 2009

The Apache Mahout project, a set of highly scalable machine-learning libraries, recently announced it's first public release. InfoQ spoke with Grant Ingersoll, co-founder of Mahout and a member of the technical staff at Lucid Imagination, to learn more about this project and machine learning in general.

Amazon Rolls Out Hadoop Based MapReduce to EC2

by Scott Delap on  Apr 02, 2009

It has been possible to run Hadoop on EC2 for a while. Today Amazon simplified the process by announcing Amazon Elastic MapReduce which automatically deploys EC2 instances for computational use and includes a API for interacting with them.

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT