InfoQ Homepage MapReduce Content on InfoQ
-
Hazelcast Introduces MapReduce API
Hazelcast, an open source in-memory data grid solution introduces a MapReduce API for its offering.
-
Twitter Open-Sources its MapReduce Streaming Framework Summingbird
Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.
-
New Education Opportunities for Data Scientists
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
-
Hadoop Jobs on GPU with ParallelX
The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.
-
Elastic Mesos service automates Mesos cluster deployment in EC2
EC2 users can now automate the deployment of Apache Mesos, an open-source tool to share cluster resources between multiple data processing frameworks, at scale through a new web service called Elastic Mesos provided by Big Data startup Mesosphere.
-
Apache Tez - a Generalization of the MapReduce Data Processing
A new Apache incubator project, Tez, generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks.
-
QuantCell Research Announces First Public Beta of their Java-Aware Big-Data Spreadsheet
Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.
-
Trends in the latest Technology Radar
ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.
-
Windows Azure Storage New Pricing Structure Revealed
Microsoft recently revealed new pricing structure for Windows Azure Storage along with several improvements.
-
LinkedIn Engineering Releases SenseiDB 1.0.0
LinkedIn engineering releases SenseiDB 1.0.0, a NoSQL database focused on high update rates and complex semi-structured search queries, already used in production by LinkedIn in its search related pages (e.g. People/Company search)
-
MapReduce Patterns, Algorithms, and Use Cases
In his new article “MapReduce Patterns, Algorithms, and Use Cases”, Ilya Katsov gives a systematic view of the different MapReduce patterns, algorithms and techniques that can be found on the web or in scientific articles along with several practical use case studies.
-
Apache Hadoop 1.0.0 Supports Kerberos Authentication, Apache HBase and RESTful API to HDFS
After six years of gestation, Big data framework Apache Hadoop 1.0.0 was recently released. Core features in the release include Kerberos Authentication, support for Apache HBase and RESTful API to HDFS. InfoQ spoke with Arun Murthy, VP of Apache Hadoop, about the new release.
-
Blog Sentiment Analysis Using NoSQL Techniques
Corporations are increasingly using social media to learn more about what their customers are saying about their products. This presents unique challenges as unstructured content needs analytic techniques to interpret the sentiment embodied in the blog posts. InfoQ caught up with Subramanian Kartik to learn more about the blog sentiment analysis project his team worked on.
-
HPCC Systems Launches Big Data Delivery Engine on EC2
HPCC Systems, which is part of LexisNexis, is launching this week its Thor Data Refinery Cluster on the Amazon EC2. HPCC Systems is an enterprise-grade, open source Big Data analytics technology platform capable of ingesting vast amounts of data, transforming, linking and indexing that data, with parallel processing power spread across the nodes.
-
Big Data: Evolution or Revolution?
Recently Steve Jones, from Cap Gemini, questioned whether NoSQL/Big Data is the panacea that some vendors would have us believe. He suggests that in some cases in-memory RDBMS may well be the optimal solution and that approaches such as Map Reduce could be too difficult to understand for typical IT departments. He concludes with a suggestion some sometimes Big Data may be a Big Con.