InfoQ Homepage MapReduce Content on InfoQ
-
Uber Open-Sourced Its Highly Scalable and Reliable Shuffle as a Service for Apache Spark
Uber engineering has recently open-sourced its highly scalable and reliable shuffle as a service for Apache Spark. Spark is one of the most important tools and platforms in data engineering and analytics. It is shuffling data on local machines by default and causes challenges while the scale is getting very large. Shuffle as a service is a solution developed at Uber for this problem.
-
Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads
In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults
-
Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings
Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.
-
Glenn Tamkin on Applying Apache Hadoop to NASA's Big Climate Data
NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.
-
Google Open Sources MapReduce Framework for C to Run Native Code in Hadoop
Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.
-
Apache Drill Included in MapR Latest Distribution Release
MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it
-
Hazelcast Introduces MapReduce API
Hazelcast, an open source in-memory data grid solution introduces a MapReduce API for its offering.
-
Twitter Open-Sources its MapReduce Streaming Framework Summingbird
Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.
-
New Education Opportunities for Data Scientists
2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.
-
Hadoop Jobs on GPU with ParallelX
The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.
-
Apache Tez - a Generalization of the MapReduce Data Processing
A new Apache incubator project, Tez, generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks.
-
QuantCell Research Announces First Public Beta of their Java-Aware Big-Data Spreadsheet
Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.
-
Trends in the latest Technology Radar
ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.
-
Windows Azure Storage New Pricing Structure Revealed
Microsoft recently revealed new pricing structure for Windows Azure Storage along with several improvements.
-
LinkedIn Engineering Releases SenseiDB 1.0.0
LinkedIn engineering releases SenseiDB 1.0.0, a NoSQL database focused on high update rates and complex semi-structured search queries, already used in production by LinkedIn in its search related pages (e.g. People/Company search)