Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows

Simplifying ETL in the Cloud, Microsoft Releases Azure Data Factory Mapping Data Flows

In a recent blog post, Microsoft announced the general availability (GA) of their serverless, code-free Extract-Transform-Load (ETL) capability inside of Azure Data Factory called Mapping Data Flows. This tool allows organizations to embrace a data-driven approach without the need to manage large infrastructure footprints but having the ability to dynamically scale data processing workloads.

Mapping Data Flows address the scale challenges associated with big data integration, but through a visual browser-based designer. Mike Flasko, partner director of product management at Microsoft, explains:

Mapping Data Flows simplifies data processing, with built-in capabilities to handle unpredictable data schemas and to maintain resilience to changing input data. Developers build resilient data pipelines in an accessible visual environment with our browser-based designer and let ADF handle the complexities of Spark execution.

Image Source:

ETL solutions typically require large infrastructure investments and a lot of code to be written to build data-centric solutions. Mapping Data Flows, inside of Azure Data Factory, seeks to reduce those complexities, Flasko explains:

Accelerate time to insights by focusing on building your business logic without worrying about managing and maintaining server clusters or writing code to build pipelines. Easily perform ETL tasks like loading fact tables, maintaining slowly changing dimensions, aggregating semi-structured big data, matching data using fuzzy matching, and preparing data for modeling.

Mapping Data Flows include built-in data transformations to address common ETL activities like join, aggregate, pivot, unpivot, split, lookup and sort data. In the event that the out-of-box capabilities don’t address an organization’s requirements, an expression builder can be used that allows developers to customize their ETL solution.

Beyond a simplified developer experience, Azure Data Factory also provides live insights into the data moving through Azure Data Factory pipelines. These insights include metrics like null counts, value distributions, standard deviations, minimum length values, maximum length values, row counts and more.

In addition to ETL telemetry and insights, developers also have access to an interactive visual debug experience that allows for real-time debugging and tracing.

Image Source:

Microsoft is certainly not new to the ETL space. For several releases of SQL Server, Microsoft has included SQL Server Integration Services (SSIS). However, with the shift to cloud computing and Platform as a Service (PaaS) SQL Server offerings, this has left SSIS limited to Infrastructure-as-a-Service (IaaS) or on-premises workloads. Kamil Nowinski, a Microsoft MVP, shared his perspective, on the transition from SSIS to Azure Data Factory Mapping Data Flows, in a recent blog post:

This new feature has huge capabilities. I’m very excited to use it more. Automatically scalable processing, like this, is very efficient with Big Data processing. Hence, it’s worth to start designing new processes with Azure Data Factory or even migrate existing processes when your enterprise suffers on performance degradation due to amount of processing data.

Nowinski has also provided a component-by-component comparison of Azure Data Factory Mapping Data Flows and SSIS on his blog.

Rate this Article