Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Amazon Kinesis Data Firehose Gains Delivery to HTTP Endpoints

Amazon Kinesis Data Firehose Gains Delivery to HTTP Endpoints

This item in japanese

Amazon Kinesis Data Firehose recently gained support to deliver streaming data to generic HTTP endpoints. This also enables additional AWS services as destinations via Amazon API Gateway's service integration. The new capability is complemented with dedicated integrations of additional third-party service providers like Datadog, MongoDB, and New Relic.

Amazon Kinesis is AWS's streaming data platform to "collect, process, and analyze video and data streams in real time" and features several services. Kinesis Data Streams provides the most performant and flexible capabilities, at the expense of requiring users to implement a custom application. Even using the built-in Lambda integration can still be operationally challenging, as recently illustrated by Prateek Mehrotra. In contrast, Kinesis Data Firehose is a fully managed service to "prepare and load real-time data streams into data stores and analytics services" without any custom code besides an optional serverless data transformation via an AWS Lambda function. Notably, Kinesis Data Firehose can also directly consume Kinesis Data Streams to combine the benefits of both offerings.

So far, Kinesis Data Firehose has supported built-in destinations like Amazon S3, Amazon Redshift, and Amazon Elasticsearch, and as of 2017 also Splunk via its HTTP Event Collector. AWS has now added generic HTTP endpoint support to enable custom destinations, including additional AWS services such as Amazon DynamoDB or Amazon SNS by means of Amazon API Gateway's AWS service integration. This also opens the platform to other third-party service providers like Datadog, MongoDB, and New Relic.

As emphasized by Imtiaz Sayed and Masudur Rahaman Sayem in their introductory blog post, existing Kinesis Data Firehose features "are fully supported, including AWS Lambda service integration, retry option, data protection on delivery failure, and cross-account and cross-region data delivery". Other than the name suggests, all traffic between the delivery stream and the HTTP endpoint is always encrypted in transit via HTTPS. It can also be signed with an access key if required by the destination service.

Amazon Kinesis Data Firehose HTTP endpoint data flow

Image: Kinesis Data Firehose HTTP endpoint data flow (via AWS Management Console)

Creating a Kinesis Data Firehose delivery stream involves the following steps and concepts:

  1. Choosing a delivery stream name and a data source, which can be a custom application, the Kinesis agent, and other AWS services that write to the stream directly, or a Kinesis data stream that is consumed via a dedicated integration
  2. Optionally processing records to apply data transformation via an AWS Lambda function and record format conversion via AWS Glue
  3. Choosing a destination like the new HTTP endpoint, and configuring an Amazon S3 backup plan to protect against delivery failures or to always retain all source records
  4. Configuring various settings regarding performance, security, and error logging

Once configured, a delivery stream provides extensive monitoring options via integrations with CloudWatch metrics and optionally CloudWatch logs, which is recommended to ease troubleshooting HTTP endpoints. It is also possible to test the delivery stream configuration using sample data.

AWS Serverless Hero Eric Hammond acknowledges the HTTP endpoint destination's potential in a twitter thread, but suspects it might face adoption challenges:

This feature may be able to solve some interesting problems. Most people with those problems won't go looking for answers in Amazon Kinesis Data Firehose. [...] I'm never going to complete another project because I can't decide which method I should use to get data from A to B inside of AWS.

Microsoft Azure's Event Hubs provides similar features to capture events in Azure Blob Storage or subsequently process them via Event Grid and a webhook or Azure Functions based handler. Goggle Cloud Platform's PubSub supports push subscriptions for publicly accessible HTTPS addresses. The open-source distributed event streaming platform Apache Kafka provides Kafka Connect as a gateway to other systems, for example, by reusing Apache Camel components via its Camel Kafka Connector. AWS also offers Kafka as a service under the Amazon Managed Streaming for Kafka label.

The Amazon Kinesis Data Firehose documentation features a developer guide, including the HTTP endpoint delivery request and response specifications and a section on troubleshooting HTTP endpoints, the AWS CLI reference, and the API reference. Support is provided via the Amazon Kinesis forum. Amazon Kinesis Data Firehose is primarily priced based on the ingested data volume. Egress to HTTP endpoints does not incur additional costs. Charges associated with target services like Amazon S3 and Amazon Redshift are billed separately.

Rate this Article