BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon Releases Kinesis Firehose

Amazon Releases Kinesis Firehose

Lire ce contenu en français

Recently, on October 7th, 2015, Amazon announced a new service called Amazon Kinesis Firehose. Kinesis Firehose is a follow-up service to the Kinesis service which was released two years ago.  In order to avoid ambiguity, the original Kinesis service has been renamed to Amazon Kinesis Streams.

Amazon Kinesis Firehose is a managed service that requires little administration and allows you to stream application, telemetry or log data into Amazon S3 (Simple Storage Service) or Amazon Redshift table without the use of custom code. 

Image Source: Screen Capture https://www.youtube.com/watch?v=YQR_5W4XC94

Roger Barga, general manager for Amazon Kinesis, breaks down the Amazon Kinesis Firehose into the following three concepts:

  1. Delivery Streams are configured to identify the destination for the data stream that is to be processed.
  2. Records refer to the data that a publisher will make available to the delivery stream in the form of a data blob.  Data blobs can be as large as 1000 KB.
  3. Data Producers, or publishers, will make records available to the delivery stream such as a web server sending log data.

The service is directed towards batch-oriented scenarios where data is persisted, or concatenated, for between 60 seconds and 15 minutes before it is ingested. Administrators control the buffer size and buffer interval that determines how frequently this data moves. The following image describes how these input parameters can be managed.

Image Source: https://aws.amazon.com/blogs/aws/amazon-kinesis-firehose-simple-highly-scalable-data-ingestion/

Compression and encryption are also supported features of the service taking advantage of gzip compression and encryption via Amazon’s KMS (Key Management Service).  By leveraging a centralized security service, this means that other services can also decrypt this data using Amazon keys.

Much like other Amazon services, Kinesis Firehose provides auto-scale capabilities that require little administration. It also provides some advanced features including file rotation, check pointing via Kinesis Agent, and retries which allow for persisting data for up to 24 hours if an S3 bucket is unavailable.

The intent of Kinesis Firehose is to allow for a zero-code, configuration experience for administrators.  But, for more advanced scenarios, developers are able to take advantage of the Kinesis Firehose API which can be integrated into their apps.  The API provides operations such as:

  • CreateDeliveryStream - Create a delivery stream by providing the S3 bucket information that your data will be delivered to.
  • DeleteDeliveryStream - Delete a delivery stream.
  • DescribeDeliveryStream - Return configuration information about a delivery stream.
  • ListDeliveryStreams - Enumerate all delivery streams available in AWS account.
  • UpdateDestination - Update the S3 bucket configuration for a delivery stream.
  • PutRecord – Put a single data record of up to 1000 KB into a delivery stream.
  • PutRecord Batch – Place a batch of records (500 records or 5MBs) into a delivery stream

Amazon has provided customers with a unified console that allows organizations to manage both Kinesis Firehose and Streams in the same tool. For customers who may be familiar with Amazon Kinesis Streams, there are a few important distinctions between the two services.  Amazon classifies the two systems in the following way:

  • Amazon Kinesis Streams is a service for workloads that require custom processing, per incoming record, with sub-1 second processing latency and a choice of stream processing frameworks.
  • Amazon Kinesis Firehose – is a service for workloads that require zero administration, ability to use existing analytics tools based on S3 or Redshift and a data latency of 60 seconds or higher.

 

Rate this Article

Adoption
Style

BT