InfoQ Homepage Apache Spark Content on InfoQ

News

RSS Feed

Newer Older

.NET

Microsoft Releases .NET for Apache Spark 1.0

Last month, Microsoft released the first major version of .NET for Apache Spark, an open-source package that brings .NET development to the Apache Spark platform. The new release allows .NET developers to write Apache Spark applications using .NET user-defined functions, Spark SQL, and additional libraries such as Microsoft Hyperspace and ML.NET.

Arthur Casals
on Nov 28, 2020
AI, ML & Data Engineering

Spark AI Summit 2020 Highlights: Innovations to Improve Spark 3.0 Performance

At the recent Spark AI Summit 2020, held online for the first time, the highlights of the event were innovations to improve Apache Spark 3.0 performance, including optimizations for Spark SQL, and GPU acceleration.

Carol McDonald
on Jul 03, 2020
AI, ML & Data Engineering

Boosting Apache Spark with GPUs and the RAPIDS Library

At the 2019 Spark AI Summit Europe conference, NVIDIA software engineers Thomas Graves and Miguel Martinez hosted a session on Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS Library. InfoQ recently talked with Jim Scott, head of developer relations at NVIDIA, to learn more about accelerating Apache Spark with GPUs and the RAPIDS library.

Carol McDonald
on Feb 25, 2020
AI, ML & Data Engineering

Databricks' Unified Analytics Platform Supports AutoML Toolkit

Databricks recently announced the Unified Data Analytics Platform, including an automated machine learning tool called AutoML Toolkit. The toolkit can be used to automate various steps of the data science workflow.

Srini Penchikala
on Oct 08, 2019
Cloud

Google Releases Cloud Dataproc for Kubernetes in Alpha

Google Cloud Dataproc is an open-source data and analytic processing service based on Hadoop and Spark. Google has recently announced the alpha availability of Cloud Dataproc for Kubernetes, which provides customers with a more efficient method to process data across platforms.

Steef-Jan Wiggers
on Sep 23, 2019
AI, ML & Data Engineering

ApacheCon 2019 Keynote: Google Cloud Enhances Big-Data Processing with Kubernetes

At ApacheCon North America, Christopher Crosbie gave a keynote talk title "Yet Another Resource Negotiator for Big Data? How Google Cloud is Enhancing Data Lake Processing with Kubernetes." He highlighted Google's efforts to make Apache big-data software "cloud native" by developing open-source Kubernetes Operators to provide control planes for running Apache software in a Kubernetes cluster.

Anthony Alford
on Sep 13, 2019
Architecture & Design

Data Engineering in Badoo: Handling 20 Billion Events Per Day

Badoo is a dating social network that currently handles billions of events per day, explains Vladimir Kazanov, data platform engineering lead. At Skills Matter, he talked through some of the challenges of operating at this scale, and what tooling Badoo uses in order to process and report on this data.

Andrew Morgan
on Aug 09, 2019
AI, ML & Data Engineering

Expo: Real Time A/B Testing and Monitoring with Spark Streaming and Kafka at Walmart Labs

The WalmartLabs engineering team developed a real time A/B testing tool called Expo that collects and analyzes user engagement metrics. It uses Spark Structured Streaming to process the incoming data and stores the metrics in KairosDB.

Hrishikesh Barua
on May 24, 2019
AI, ML & Data Engineering

Databricks Open Sources Delta Lake to Make Data Lakes More Reliable

Databricks recently announced open sourcing Delta Lake, their proprietary storage layer, to bring ACID transactions to Apache Spark and big data workloads. Databricks is the company behind the creators of Apache Spark, while Delta Lake is already being used in several companies like McAffee, Upwork etc . Delta Lake is addressing the heterogeneous data problem that data lakes often have...

Alex Giamas
on May 20, 2019
AI, ML & Data Engineering

Microsoft Releases High-Performance C# and F# Support for Apache Spark

Microsoft announced the release of .NET for Apache Spark, adding new high-performance C# and F# binding to the big-data computation engine.

Anthony Alford
on Apr 30, 2019
AI, ML & Data Engineering

The Evolution of Uber’s 100+ Petabyte Big Data Platform

Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.

Hrishikesh Barua
on Nov 10, 2018
DevOps

DevOps Workbench Launched by ZeroStack

Private cloud provider, ZeroStack, has announced a self-service capability from which developers can create their own workbenches. Forty developer tools from a mix of open source and commercial providers are available to users of the DevOps Workbench through Zerostack’s Intelligent Cloud Platform.

Helen Beal
on Jan 12, 2018
AI, ML & Data Engineering

Modern Big Data Pipelines over Kubernetes

Container management technologies like Kubernetes make it possible to implement modern big data pipelines. Eliran Bivas, senior big data architect at Iguazio, spoke at the recent KubeCon + CloudNativeCon North America 2017 Conference about big data pipelines and how Kubernetes can help develop them.

Srini Penchikala
on Jan 08, 2018
Cloud

Microsoft Updates AI Services and Tools for Data Scientists and Developers

At the recent Ignite conference, Microsoft released several updates related to its Artificial Intelligence (AI) services and tools. These updates include the release of the Azure ML Experimentation service, Azure ML Model Management service, Azure ML Workbench and the general availability of Microsoft Cognitive Services.

Kent Weare
on Sep 30, 2017
Java

Emerging Technologies for the Enterprise Conference 2017: Day Two Recap

Day Two of the 12th annual Emerging Technologies for the Enterprise Conference was held in Philadelphia. This two-day event included keynotes by Blair MacIntyre (augmented reality pioneer) and Scott Hanselman (podcaster), and featured speakers Kyle Daigle (engineering manager at GitHub), Holden Karau (principal software engineer at IBM), and Karen Kinnear (JVM technical lead at Oracle).

Michael Redlich
on Apr 30, 2017

Newer News

Older News

InfoQ Software Architects' Newsletter

News