BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Amazon Announces Managed Streaming for Kafka in Public Preview

Amazon Announces Managed Streaming for Kafka in Public Preview

This item in japanese

Bookmarks

At the recent AWS re:Invent 2018 event, Amazon announced a new fully managed service that makes it easy for customers to build and run applications that use Apache Kafka to process streaming data. This new service is called Amazon Managed Streaming for Kafka, Amazon MSK for short, and is now in public preview.

Apache Kafka is a massively scalable distributed open-source streaming platform that supports multiple producers and consumers, and connects data streams across enterprises. Now Amazon offers Kafka version 1.1.1 as a managed service in AWS for customers without the need for any Apache Kafka infrastructure management expertise. AWS has fully automated the lifecycle of the brokers in Zookeeper nodes, and in case one of the nodes fails, the service will take of care of it as Damien Wylie, principal product manager, Amazon Data Streaming, explains in his presentation at re:Invent:

We are going to detect that failure automatically and then reintroduce a new node. Hence the IPs remain intact, and finally, any patches that are required throughout the time you are running the cluster we automatically apply those for you.

Amazon offers MSK in the US East (Virginia) region, and the clusters require a Virtual Private Cloud (Amazon VPC) for private connectivity. Furthermore, in preview MSK supports:

  • AWS Key Management Service (AWS KMS) for encryption at rest
  • AWS Identity and Access Management (IAM) for control-plane API control (provisioning of the brokers and tearing them down)
  • Amazon CloudWatch for Apache Kafka broker, topic, and ZooKeeper metrics
  • Amazon Elastic Compute Cloud (EC2) M5 as instances as brokers
  • Amazon EBS GP2 broker storage 

Deployment of Amazon MSK is straightforward using the AWS management console, CLI, or SDK. A user provides the subnets they need for an Amazon MSK cluster to privately connect to, specify the broker quantity and storage they need per broker, and create the Apache Kafka cluster. Next, users can configure the cluster, and have their application stream data from producers to a topic, where this data is read in real-time by consumers.

 
Source: https://aws.amazon.com/msk/

With MSK and Kineses, Amazon has two streaming service offerings available on AWS. Both have similar concepts, and focus on ingesting streaming data – thus customers have the option to either move to a managed Kafka service or AWS in general. 


Source (screenshot): https://www.youtube.com/watch?v=zhsVfsykBHc

Currently, Amazon is not the only one with a Kafka option on their platform. Microsoft, as of recently, offers Kafka support by providing a Kafka endpoint before their Event Hubs streaming service. Therefore, instead of bringing a managed Kafka service to Azure, Microsoft mimics its Event Hub as a managed Kafka. Also, Event Hubs like Kinesis are similar in concept to Kafka itself.

With Amazon MSK customers will face no upfront costs and "pay as you go" for broker instances and storage. At the preview, a broker runs as a M5 instance for $0.21 per hour, and broker storage is $0.10 per GB-month. In a discussion on Hacker News about MSK, one respondent said about pricing:

Just to note, the $.21/hr broker is on an m5.large (2 CPU, 8 GB Mem), which goes for $.096/hr. We run three nodes right now on m5.xlarge instances. At $.42/hr for the managed Kafka, compared to $.192/hr self-hosted Kafka, I think we'll keep it self-hosted for now.

This argument was countered with another comment that was made on Twitter by Jared Short, who made clear that the engineering total cost of ownership (TCO) of self-hosting can be large (and somewhat hidden):

"We run three nodes. At $.42/hr for the managed Kafka, compared to $.192/hr self-hosted... we'll keep it self-hosted for now."  I love HN math. Real world math: Over one year that is ~$2k difference, ~20 hours of engineering time. Maintenance isn't free; it obscures true cost.

Lastly, Wylie also indicated at re:Invent that upon the GA release, Amazon will provide an SLA for MSK, allow for version upgrades, offer scale out and up options for clusters, have users define their custom cluster configuration, offer auto scale for storage, allow tagging, and add support for AWS CloudTrail and AWS CloudFormations. Availability of MSK will also be made worldwide.

Rate this Article

Adoption
Style

BT