Azure Data Explorer Supports Native Ingestion from Amazon S3

Microsoft recently announced the ability to natively ingest data from Amazon S3 into Azure Data Explorer (ADX). The new feature simplifies multi-cloud data analytics deployments, bringing data from Amazon S3 to Azure, without relying on custom ETL pipelines.

Designed for interactive analytics capabilities over high velocity and diverse raw data, ADX is a managed analytics platform to analyze high volumes of data in near real time. Anshul Sharma, product manager at Azure Data, explains the need for the new feature:

Prior to the S3 ingestion support in ADX, depending on the volume & frequency of the incoming data, you might use an ETL process to move data from S3 to Azure blob before ingesting to ADX, or read the file content in AWS lambda or Azure functions, and ingest directly into ADX. The former approach requires you to duplicate the data, adding more cost and complexities, and the latter proves challenging especially if you are moving large files.

Azure suggests to ingest the data via the Data Management service, which batches the data at a high throughput, setting the batching policy on databases or tables.

Source: https://techcommunity.microsoft.com/t5/azure-data-explorer-blog/azure-data-explorer-supports-native-ingestion-from-amazon-s3/ba-p/3606746

Amazon S3 is not the only source supported by the analytics platform on Azure: ADX can ingest data from different Azure sources including Azure Blob, Azure Event Hub and Azure IoT Hub, as well as open-source technologies such as Kafka or Logstash. Companies such as Tray.io provide a connector for ADX and S3 integration and automation.

To trigger the pull of the data from the bucket to ADX, S3 invokes AWS Lambda when a new object is received. The Lambda function, using ADX SDK, posts a message to the Azure storage queue which includes file metadata, object URL and authentication token to fetch the file. As ADX gets notified of incoming files, depending on the batching policy, ADX retrieves the data from the source. Sharma adds:

AWS Lambda in this scenario is extremely lightweight as it does not process the data and just sends a message on to ADX using the SDK. This keeps the lambda cost minimal, and relies on ADX to do the heavy lifting.

A sample AWS .NET Lambda function is available on GitHub. While it is possible to run an ADX cluster to host incoming data as a free cluster on Azure (up to 100 GB of storage and 10 databases), the overall costs of a big data analytics platform ingesting data from Amazon S3 includes Lambda invocations and egress data transfer fees from AWS.

About the Author

Renato Losio

Show moreShow less

InfoQ Software Architects' Newsletter

Follow us on

About the Author

Renato Losio

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter