InfoQ Homepage Apache Spark Content on InfoQ

News

RSS Feed

Newer Older

Architecture & Design

Designing for Failure in the BBC's Analytics Platform

Last week at InfoQ Live, Blanca Garcia-Gil, principal systems engineer at BBC, gave a session on Evolving Analytics in the Data Platform. During this session, Garcia-Gil focused on how her team prepared and designed for two types of failure - "known unknowns" and "unknown unknowns."

Eran Stiller
on Feb 24, 2021
Cloud

Google Brings Databricks to Its Cloud Platform

Recently Google announced a partnership with Databricks to bring their fully-managed Apache Spark offering and data lake capabilities to Google Cloud. The offering will become available as Databricks on Google Cloud.

Steef-Jan Wiggers
on Feb 23, 2021
.NET

Microsoft Releases .NET for Apache Spark 1.0

Last month, Microsoft released the first major version of .NET for Apache Spark, an open-source package that brings .NET development to the Apache Spark platform. The new release allows .NET developers to write Apache Spark applications using .NET user-defined functions, Spark SQL, and additional libraries such as Microsoft Hyperspace and ML.NET.

Arthur Casals
on Nov 28, 2020
AI, ML & Data Engineering

Accelerating Machine Learning Lifecycle with a Feature Store

Feature Store is a core part of next generation ML platforms that empowers data scientists to accelerate the delivery of ML applications. Mike Del Balso and Geoff Sims recently spoke at Spark AI Summit 2020 Conference about the feature store driven ML development.

Srini Penchikala
on Jul 20, 2020
AI, ML & Data Engineering

Spark AI Summit 2020 Highlights: Innovations to Improve Spark 3.0 Performance

At the recent Spark AI Summit 2020, held online for the first time, the highlights of the event were innovations to improve Apache Spark 3.0 performance, including optimizations for Spark SQL, and GPU acceleration.

Carol McDonald
on Jul 03, 2020
AI, ML & Data Engineering

Boosting Apache Spark with GPUs and the RAPIDS Library

At the 2019 Spark AI Summit Europe conference, NVIDIA software engineers Thomas Graves and Miguel Martinez hosted a session on Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS Library. InfoQ recently talked with Jim Scott, head of developer relations at NVIDIA, to learn more about accelerating Apache Spark with GPUs and the RAPIDS library.

Carol McDonald
on Feb 25, 2020
AI, ML & Data Engineering

Databricks' Unified Analytics Platform Supports AutoML Toolkit

Databricks recently announced the Unified Data Analytics Platform, including an automated machine learning tool called AutoML Toolkit. The toolkit can be used to automate various steps of the data science workflow.

Srini Penchikala
on Oct 08, 2019
Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Architecture & Design

Data Engineering in Badoo: Handling 20 Billion Events Per Day

Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.

Andrew Morgan
on Aug 09, 2019
AI, ML & Data Engineering

Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs

The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.

Hrishikesh Barua
on May 24, 2019
AI, ML & Data Engineering

Databricks Open Sources Delta Lake to Make Data Lakes More Reliable

Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...

Alex Giamas
on May 20, 2019
AI, ML & Data Engineering

Microsoft Releases High-Performance C# and F# Support for Apache Spark

Microsoft announced the release of .NET for Apache Spark, adding new high-performance C# and F# binding to the big-data computation engine.

Anthony Alford
on Apr 30, 2019
AI, ML & Data Engineering

The Evolution of Uber’s 100+ Petabyte Big Data Platform

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Hrishikesh Barua
on Nov 10, 2018
DevOps

DevOps Workbench Launched by ZeroStack

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

Helen Beal
on Jan 12, 2018

Newer News

Older News

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

News