InfoQ Homepage MapReduce Content on InfoQ

News

RSS Feed

Newer Older

AI, ML & Data Engineering

Uber Open-Sourced Its Highly Scalable and Reliable Shuffle as a Service for Apache Spark

Uber engineering has recently open-sourced its highly scalable and reliable shuffle as a service for Apache Spark. Spark is one of the most important tools and platforms in data engineering and analytics. It is shuffling data on local machines by default and causes challenges while the scale is getting very large. Shuffle as a service is a solution developed at Uber for this problem.

Reza Rahimi
on Aug 14, 2022
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
Cloud

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.

Alex Giamas
on Oct 31, 2018
Glenn Tamkin on Applying Apache Hadoop to NASA's Big Climate Data

NASA Center for Climate Simulation (NCCS) is using Apache Hadoop for high-performance data analytics. Glenn Tamkin from NASA team, recently spoke at ApacheCon Conference and shared the details of the platform they built for climate data analysis with Hadoop.

Srini Penchikala
on May 06, 2015
Google Open Sources MapReduce Framework for C to Run Native Code in Hadoop

Google announced last week the release of open source MapReduce framework for C, called MR4C, that allows developers to run native code in Hadoop framework. MR4C framework brings together the performance and flexibility of natively developed algorithms with the scalability and throughput provided by Hadoop execution framework.

Srini Penchikala
on Feb 25, 2015
Apache Drill Included in MapR Latest Distribution Release

MapR recently announced including Apache Drill in its latest release of MapR distribution. Apache Drill is the open source version of Google’s Dremel. Dremel is the infrastructure on which BigQuery is based upon. Drill is offering a low latency SQL-on-Hadoop interface. While this puts it in the same space as several other technologies around Hadoop, Drill has some unique characteristics setting it

Alex Giamas
on Sep 30, 2014
Hazelcast Introduces MapReduce API

Hazelcast, an open source in-memory data grid solution introduces a MapReduce API for its offering.

Michael Hausenblas
on Feb 18, 2014
AI, ML & Data Engineering

Twitter Open-Sources its MapReduce Streaming Framework Summingbird

Twitter has open sourced their MapReduce streaming framework, called Summingbird. Available under the Apache 2 license, Summingbird is a large-scale data processing system enabling developers to uniformly execute code in either batch-mode (Hadoop/MapReduce-based) or stream-mode (Storm-based) or a combination thereof, called hybrid mode.

Michael Hausenblas
on Jan 16, 2014
New Education Opportunities for Data Scientists

2013 has been rich in announcements for new programs, degrees and grants for aspiring data scientists and Big Data practitioners.

Charles Menguy
on Jan 14, 2014
Hadoop Jobs on GPU with ParallelX

The MapReduce paradigm is not always ideal when dealing with large computationally intensive algorithms. A small team of entrepreneurs is building a product called ParallelX to solve that bottleneck by harnessing the power of GPUs to give Hadoop jobs a significant boost.

Charles Menguy
on Dec 26, 2013
Apache Tez - a Generalization of the MapReduce Data Processing

A new Apache incubator project, Tez, generalizes the MapReduce paradigm to execute a complex DAG (directed acyclic graph) of tasks.

Boris Lublinsky
on Sep 20, 2013
QuantCell Research Announces First Public Beta of their Java-Aware Big-Data Spreadsheet

Big Data analytics startup QuantCell Research has announced the release of the first public beta of what they are positioning as their "Big Data" spreadsheet.

Victor Grazi
on Aug 21, 2013
Trends in the latest Technology Radar

ThoughtWorks's latest "Technology Radar" focuses on mobile, accessible analytics, simple architectures, reproducible environments, and data persistence done right.

Aslan Brooke
on Jan 18, 2013
Windows Azure Storage New Pricing Structure Revealed

Microsoft recently revealed new pricing structure for Windows Azure Storage along with several improvements.

Anand Narayanaswamy
on Dec 11, 2012
LinkedIn Engineering Releases SenseiDB 1.0.0

LinkedIn engineering releases SenseiDB 1.0.0, a NoSQL database focused on high update rates and complex semi-structured search queries, already used in production by LinkedIn in its search related pages (e.g. People/Company search)

Kostis Kapelonis
on Mar 19, 2012

Newer News

Older News

InfoQ Software Architects' Newsletter

News