...As a data processing paradigm, MapReduce represents a giant step backwards. The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968....Given the experimental evaluations to date, we have serious doubts about how well MapReduce applications can scale. Moreover, the MapReduce implementers would do well to study the last 25 years of parallel DBMS research literature.
The article goes on to list criteria such as:
- MapReduce is a poor implementation (in comparison to B-trees)
- MapReduce is not novel
- MapReduce is missing features (such as loading and indexing)
- MapReduce is incompatible with the DBMS tools
The blogsphere has quickly called foul on the comparison and its reasoning. Greg Jorgensen provides a detailed rebuttal. Among the items he notes are that MapReduce is not a database but an algorithmic technique for distributed processing and should not be compared to one. Jorgensen proposes that a better comparison would have been to SimpleDB:
...What the authors really want to gripe about is distributed “cloud” data management systems like Amazon’s SimpleDB; in fact if you change “MapReduce” to “SimpleDB” the original article almost makes sense...
Rich Skrenta comments on the angle of disruption:
...The thing that disrupts you is always uglier and worse in some way. Less features, less developed. But if there's a 10X price win in there somewhere, the cheap rickety thing wins in the end. Think Linux vs. AT&T Unix, or mysql vs. Oracle...
Lengthy debate and comment on the topic can also be found on reddit and ycombinator.
Community comments
Sad...
by Nikita Ivanov,
more research needed.
by Zubin Wadia,
Stooopid
by Kevin Teague,
Holy crap...
by Kurt Christensen,
Re: Holy crap...
by Kurt Christensen,
Sad...
by Nikita Ivanov,
Your message is awaiting moderation. Thank you for participating in the discussion.
I think authors either do not understand Map/Reduce (doubtfully) or clearly misplacing where it’s used and for what scenarios. What I think authors are also missing is the combination of data portioning and affinity map/reduce that is becoming prevailing design pattern for grid applications. I blogged about it here in more details.
All in all, it is sad to read such a misguided piece…
Best,
Nikita Ivanov.
GridGain – Grid Computing Made Simple
more research needed.
by Zubin Wadia,
Your message is awaiting moderation. Thank you for participating in the discussion.
MapReduce is not a DB. I don't see the parallel here to Teradata or any others of similar ilk.
Swapping it with SimpleDB or BigTable is a more logical perspective.
Also - if they were referring to BigTable, then in fact, it does support indexes and doesn't do brute force searches.
Stooopid
by Kevin Teague,
Your message is awaiting moderation. Thank you for participating in the discussion.
The authors would have done well to read the introduction to the MapReduce paper they cited:
MapReduce is for doing computation on raw data. In Google's case this data is usually crawled from the web. Google likely stores some of the data they glean from raw data they process using MapReduce in a ... DBMS. *sigh*
Holy crap...
by Kurt Christensen,
Your message is awaiting moderation. Thank you for participating in the discussion.
Yeah, I definitely think that the relational database goo-roos who annually inflict billions of dollars in monetary damages on unsuspecting IT departments are in a grand position to tell Google how to do search. Perhaps David and Michael would also like to offer me parenting advice...?
Re: Holy crap...
by Kurt Christensen,
Your message is awaiting moderation. Thank you for participating in the discussion.
Not that I parent as well as Google does search. Oh, you know what I meant...