BT

InfoQ Homepage News Microsoft Releases .NET for Apache Spark 1.0

Microsoft Releases .NET for Apache Spark 1.0

This item in japanese

Bookmarks

Last month, Microsoft released the first major version of .NET for Apache Spark, an open-source package that brings .NET development to the Apache Spark platform. The new release allows .NET developers to write Apache Spark applications using .NET user-defined functions, Spark SQL, and additional libraries such as Microsoft Hyperspace and ML.NET.

Apache Spark is an open-source, general-purpose analytics engine for large-scale data processing, with built-in modules for streaming, SQL, machine learning, and graph processing. Initially developed by the AMPLab team at UC Berkeley, it can be used in conjunction with different data repositories, including the Hadoop Distributed File System, NoSQL databases, and relational data stores. Since all data is processed in-memory (RAM), Spark can be 100x faster than Hadoop for large-scale data processing.

According to Jeremy Likness, senior program manager for .NET Data at Microsoft, the release addresses a long-standing community demand:

.NET for Apache Spark launched two years ago to address the increasing demand from the .NET community for an easier way to build big data applications. A recent survey confirmed the biggest motivation to use the package is to take advantage of existing .NET development skills and resources, including the enormous .NET ecosystem of existing libraries and frameworks.


.NET for Apache Spark brings key Spark functionalities to the .NET development ecosystem, including DataFrame APIs (versions 2.3, 2.4, and 3.0, allowing the use of Spark SQL queries) and support for Spark's machine learning library (MLlib). .NET developers can also use user-defined functions (UDFs) to write Spark applications.

The package also provides an API extension framework for additional libraries, including Delta Lake (a storage layer for ACID transactions in Spark), Microsoft Hyperspace (an indexing subsystem for Spark), and ML.NET (Microsoft's machine learning framework) - which is particularly interesting for .NET developers since it can also be extended with other machine learning libraries such as TensorFlow.

Performance is another critical feature of this release. According to Microsoft's benchmarks, .NET for Apache spark programs that do not use UDFs show the same speed as Scala and PySpark-based non-UDF Spark applications. If the applications include UDFs, the .NET for Apache Spark programs are at least as fast as PySpark programs, often faster.  

Source: Microsoft

The official release article also included plans for future features, including LINQ support and additional deployment options such as integration with CI/CD DevOps pipelines and publishing or submitting jobs directly from Visual Studio.


.NET for Apache Spark supports all .NET applications targeting .NET Standard 2.0 (.NET Core 3.1 or later is recommended). The package is available as an OSS project on the .NET Foundation's GitHub and can be downloaded from NuGet. It can also be used in other Apache Spark cloud offerings, including Azure Databricks and AWS EMR Spark. For on-premise deployments, it offers multi-platform support for Windows, macOS, and Linux.

 

We need your feedback

How might we improve InfoQ for you

Thank you for being an InfoQ reader.

Each year, we seek feedback from our readers to help us improve InfoQ. Would you mind spending 2 minutes to share your feedback in our short survey? Your feedback will directly help us continually evolve how we support you.

Take the Survey

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT

Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
Country/Zone:
State/Province/Region:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.