BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Tales of Kafka at Cloudflare: Andrea Medda and Matt Boyle at QCon London

Tales of Kafka at Cloudflare: Andrea Medda and Matt Boyle at QCon London

This item in japanese

Bookmarks

At QCon London, Andrea Medda, senior systems engineer at Cloudflare, and Matt Boyle, engineering manager at Cloudflare, shared the lessons their platform services team learned from enabling the use of Apache Kafka at the scale of 1 trillion messages.

Boyle began by outlining the problems that Cloudflare needs its technology to solve, namely providing its own private and public cloud, and the operational challenge of coupling between teams that arose as their business needs grew and evolved. He went on to identify how Apache Kafka was selected as their implementation of the message bus pattern.

While the messagebus pattern enabled the decoupling of load between microservices, Boyle explained how services still ended up being tightly coupled because of an unstructured approach to schema management. To solve this problem, they opted to migrate from JSON messages to Protobuf and to build a client-side library to validate messages prior to publishing them.

As the adoption of Apache Kafka grew across their teams, they developed a Connector Framework to make it easier for teams to stream data between Apache Kafka and other systems while transforming the messages in the process.

Over the pandemic, as load on Cloudflare’s systems grew, the team began to observe bottlenecks on a key consumer which had begun to breach its Service Level Agreements. Medda explained how the team's initial struggle to identify the root cause of the issue prompted them to enrich their software development kits (SDKs) with tooling from the Open Telemetry ecosystem to gain better visibility of interactions across their stack.

Medda went on to highlight how the success of their SDKs brought more internal users which spurred a need for better support in the form of documentation and ChatOps.

Medda summarized the key lessons as:

  • Striking the balance between highly configurable and simple standardized approaches when providing developer tooling for Apache Kafka
  • Opting for a simple and strict 1:1 contract interface to ensure maximum visibility into the workings of topics and their usage
  • Investing in metrics on development tooling to allow problems to be easily surfaced
  • Prioritizing clear documentation on patterns for application developers to enable consistency in adoption and use of Apache Kafka

Finally, Boyle shared a new internal product, called Gaia, that the team was building to enable push-button creation of services according to Cloudflare’s best practices.

About the Author

Rate this Article

Adoption
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

  • actually...

    by Mac Noodle,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    "Matt explained how services still ended up being tightly coupled because of an unstructured approach to schema management. To solve this problem, they opted to migrate from JSON messages to Protobuf and to build a client-side library to validate messages prior to publishing them."

    The opposite is true. JSON allows loose coupling.
    Using something like Protobuf/AVRO and client libs creates tight coupling.

  • Re: actually...

    by Nsikan Essien,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    Hi Mark N, thanks for commenting. I'd be interested to hear how you approach this problem in your organisation.

  • Re: actually...

    by Daniel Bryant,

    Your message is awaiting moderation. Thank you for participating in the discussion.

    This is a great discussion point! Theoretically, a loosely typed data exchange format like JSON should promote loose coupling. However, for me, JSON is just an implementation detail, and unless designers specifically design with loose coupling in mind (and use something like JSON schema or contract-based testing), it's all too easy to create a tightly coupled system.

    I've worked on a number of microservice systems that used JSON for data exchange, and unless all services within the system strictly followed Postel's Law ("be conservative in what you send, be liberal in what you accept"), the reality is that any change to the data payload (and implicit schema/contract) often had wide-ranging effects that are very difficult to track and predict. In my mind, this is the very definition of tight coupling :)

    With a strongly typed schema (like that offered by Protobuf et al.), all services within a system explicitly agree upon and enforce the contract. In my experience, this makes it much easier to understand and identify the impact of any changes. And providing we don't explicitly change the interface/contract, any changes within the service should not leak to other services.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT