InfoQ Homepage Apache Spark Content on InfoQ

News

RSS Feed

Newer Older

Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Architecture & Design

Data Engineering in Badoo: Handling 20 Billion Events Per Day

Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.

Andrew Morgan
on Aug 09, 2019
AI, ML & Data Engineering

Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs

The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.

Hrishikesh Barua
on May 24, 2019
AI, ML & Data Engineering

Databricks Open Sources Delta Lake to Make Data Lakes More Reliable

Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...

Alex Giamas
on May 20, 2019
AI, ML & Data Engineering

Microsoft Releases High-Performance C# and F# Support for Apache Spark

Microsoft announced the release of .NET for Apache Spark, adding new high-performance C# and F# binding to the big-data computation engine.

Anthony Alford
on Apr 30, 2019
AI, ML & Data Engineering

The Evolution of Uber’s 100+ Petabyte Big Data Platform

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Hrishikesh Barua
on Nov 10, 2018
DevOps

DevOps Workbench Launched by ZeroStack

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

Helen Beal
on Jan 12, 2018
AI, ML & Data Engineering

Modern Big Data Pipelines over Kubernetes

Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.

Srini Penchikala
on Jan 08, 2018
Cloud

Microsoft Updates AI Services and Tools for Data Scientists and Developers

At the recent Ignite conference, Microsoft released several updates related to its Artificial Intelligence (AI) services and tools. These updates include the release of the Azure ML Experimentation service, Azure ML Model Management service, Azure ML Workbench and the general availability of Microsoft Cognitive Services.

Kent Weare
on Sep 30, 2017
Java

Emerging Technologies for the Enterprise Conference 2017: Day Two Recap

Day Two of the 12th annual Emerging Technologies for the Enterprise Conference was held in Philadelphia. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Kyle Daigle (engineering manager at GitHub), Holden Karau (principal software engineer at IBM), and Karen Kinnear (JVM technical lead at Oracle).

Michael Redlich
on Apr 30, 2017
Java

Emerging Technologies for the Enterprise Conference 2017: Day One Recap

Day One of the 12th annual Emerging Technologies for the Enterprise Conference was held on Tuesday, April 18 in Philadelphia, PA. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Monica Beckwith (JVM consultant at Oracle), Yehuda Katz (co-creator of Ember.js), and Jessica Kerr (lead engineer at Atomist).

Michael Redlich
on Apr 24, 2017
Java

Lightbend Speaks to InfoQ on Their Acquisition of OpsClarity

Nine months after acquiring BoldRadius, Lightbend announced their acquisition of OpsClarity, a company specializing in monitoring reactive applications. InfoQ interviewed Mark Brewer, president and CEO at Lightbend and Alan Ngai, co-founder of OpsClarity and now VP of cloud services at Lightbend to learn more about this new partnership.

Michael Redlich
on Feb 24, 2017
AI, ML & Data Engineering

Apache Eagle, Originally from eBay, Graduates to top-level project

Apache Eagle, an open-source solution for identifying security and performance issues on big data platforms, graduates to Apache top level project on January 10, 2017. Firstly open-sourced by eBay on October 2015, Eagle was created to instantly detect access to sensitive data or malicious activities and, to take actions in a timely fashion.

Alexandre Rodrigues
on Jan 24, 2017
AI, ML & Data Engineering

Facebook's Comparison of Apache Giraph and Spark GraphX for Graph Data Processing

A Facebook team has recently published a comparison of the performance of their existing Giraph-based graph processing system with the newer GraphX which is part of the popular Spark framework. Their conclusion is that GraphX is neither sufficiently scalable or performant to support their graph processing workloads.

Srini Penchikala
on Dec 09, 2016

Newer News

Older News

InfoQ Software Architects' Newsletter

News