InfoQ Homepage Hadoop Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

LinkedIn Migrates away from Lambda Architecture to Reduce Complexity

Software engineers from LinkedIn recently published how they migrated away from a Lambda architecture. The Lambda architecture implementation caused their solution to have high operational overhead and added complexity, leading to slow product iteration times. As a result, the engineers chose to migrate to a Lambda-less architecture, resulting in significant development velocity improvements.

Eran Stiller
on Dec 08, 2020
Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Cloud

Google Introduces Cloud Storage Connector for Hadoop Big Data Workloads

In a recent blog post, Google announced a new Cloud Storage connector for Hadoop. This new capability allows organizations to substitute their traditional HDFS with Google Cloud Storage. Columnar file formats such as Parquet and ORC may realize increased throughput, and customers will benefit from Cloud Storage directory isolation, lower latency, increased parallelization and intelligent defaults

Kent Weare
on Sep 09, 2019
Architecture & Design

Data Engineering in Badoo: Handling 20 Billion Events Per Day

Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.

Andrew Morgan
on Aug 09, 2019
Cloud

Microsoft Announces New Azure Analytics Services ADLS, ADX and More

Microsoft has announced the general availability of two new Azure analytics services - Azure Data Lake Storage Gen2 (ADLS) and Azure Data Explorer (ADX). Furthermore, Microsoft also announced the preview of Azure Data Factory Mapping Data Flow.

Steef-Jan Wiggers
on Feb 17, 2019
AI, ML & Data Engineering

The Evolution of Uber’s 100+ Petabyte Big Data Platform

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Hrishikesh Barua
on Nov 10, 2018
Cloud

Cloudera and Hortonworks Merge with Goal to Increase Competition with Cloud Offerings

Earlier this month, Cloudera and Hortonworks announced an all-stock merger at a combined value of around $5.2 billion. Analysts have argued that this merger is aimed at increased competition that both companies are facing from cloud vendors like Amazon, Google and Microsoft. In this article we log reactions from analysts and the industry, and the implications for current customers.

Alex Giamas
on Oct 31, 2018
AI, ML & Data Engineering

Q&A with Microsoft's Arindam Chatterjee Discussing Azure HDInsight 4.0

InfoQ caught up with Arindam Chatterjee, principal group manager at Microsoft, regarding the announcements about HDInsight at Microsoft Ignite.

Rags Srinivas
on Oct 23, 2018
AI, ML & Data Engineering

Q&A with Saumitra Buragohain on Hortonworks Data Platform 3.0

InfoQ caught up with Saumitra Buragohain, senior director of Product Management at Hortonworks, regarding Hadoop in general and HDP 3.0 in particular.

Rags Srinivas
on Jul 19, 2018
AI, ML & Data Engineering

Dataiku's Latest Release Integrates Deep-Learning for Computer Vision

Collaborative data science platform Dataiku's latest release of its Data Science Studio includes pre-trained deep learning models for image processing. The DSS platform implements each step of a data-science project from data-sourcing and visualization to production deployment. Its machine-learning module supports standard libraries and it integrates with Hadoop and multiple Spark engines.

Alexis Perrier
on Apr 11, 2018
DevOps

DevOps Workbench Launched by ZeroStack

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

Helen Beal
on Jan 12, 2018
AI, ML & Data Engineering

Apache HBase 1.3 Ships with Multiple Performance Improvements

Apache HBase 1.3.0 was released mid-January 2017 and ships with support for date-based tiered compaction and improvements in multiple areas, like write-ahead log (WAL), and a new RPC scheduler, among others. The release includes almost 1,700 resolved issues in total.

Alexandre Rodrigues
on Jan 30, 2017
AI, ML & Data Engineering

Julien Le Dem on the Future of Column-Oriented Data Processing with Apache Arrow

Julien Le Dem, the PMC chair of the Apache Arrow project, presented on Data Eng Conf NY on the future of column-oriented data processing. Apache Arrow is an open-source standard for columnar in-memory execution. InfoQ interviewed Le Dem to find out the differences between Arrow and Parquet.

Alexandre Rodrigues
on Dec 08, 2016
AI, ML & Data Engineering

Combine SQL Server with Hadoop Using PolyBase

With the recently released SQL Server 2016, you can now use SQL queries against Hadoop and Azure blob storage. Not only do you no longer need to write map/reduce operations, you can also join relational and non-relational data with a single query.

Jonathan Allen
on Jun 02, 2016

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News