Amazon MSK Replicator: Active-Passive and Active-Active Clusters for Apache Kafka Service

AWS has recently announced MSK Replicator, a new option for cross-region and same-region streaming data replication. The new feature of the Amazon Managed Streaming for Apache Kafka service provides automatic asynchronous replication across clusters, enhancing availability and ensuring business continuity.

MSK Replicator allows customers to set up active-passive or active-active cluster topologies: in an active-active setup, both MSK clusters serve reads and writes, while in an active-passive setup, only one MSK cluster serves streaming data while the other cluster is on standby.

Source: AWS blog

Designed to be highly scalable, Amazon MSK is an AWS streaming data service that manages Apache Kafka infrastructure and operations, handling up to millions of messages per second.

The new feature supports cross-region and same-region replication between clusters, scaling automatically to handle workloads, and is available for provisioned and serverless MSK clusters. Danilo Poccia, chief evangelist (EMEA) at AWS, explains the benefits of the new replicator:

Cross-cluster replication is often used to implement business continuity and disaster recovery plans and increase application resilience across AWS Regions. Another use case, when building multi-Region applications, is to have copies of streaming data in multiple geographies stored closer to end consumers for lower latency access. You might also need to aggregate data from multiple clusters into one centralized cluster for analytics.

The documentation highlights other possible use cases, including data distribution, allowing different teams and partners to have their own copies of data. The service replicates data, Kafka metadata (including topic configurations), Access Control Lists (ACLs), and consumer group offsets. Poccia clarifies:

Consumer group replication allows me to specify if consumer group offsets should be replicated so that, after a switchover, consuming applications can resume processing near where they left off in the primary cluster. I can specify a comma-separated list of regular expressions that indicate the names of the consumer groups to replicate or to exclude from replication.

Since MSK Replicator acts as a consumer for the source cluster, replication can cause other consumers to be throttled on the source cluster. AWS recommends provisioning identical capacity for source and target clusters, and accounting for the replication throughput when calculating the capacity.

While Amazon MSK offers compatibility with Apache Kafka, it is not the only data streaming option on AWS, with Kinesis Data Streams another popular choice.

MSK Replicator is currently available in a subset of AWS regions, including Ohio, Northern Virginia, and Ireland. Customers deploying a replicator pay standard charges for the source and target MSK clusters, cross-region data transfer fees, plus a $0.08 USD fee per GB of data replicated and an hourly rate for each replicator configured, starting at $0.30 USD per hour.

About the Author

Renato Losio

Show moreShow less

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Write for InfoQ

About the Author

Renato Losio

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Related Sponsored Content

Popular across InfoQ

The InfoQ Newsletter