Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Pinterest Open-Sources a Production-Ready PubSub Java Client for Kafka, Flink, and MemQ

Pinterest Open-Sources a Production-Ready PubSub Java Client for Kafka, Flink, and MemQ

This item in japanese

Pinterest open-sourced its generic PubSub client library, PSC, which has been heavily used in production for a year and a half. The library helped the engineering teams by increasing developer velocity, and the scalability and stability of services using it. Over 90% of Java applications have migrated to PSC with minimal changes.

Pinterest uses messaging infrastructure throughout its platform, including Apache Kafka, Apache Flink, and MemQ. Jeff Xiang, a software engineer at Pinterest, provides a summary of some of the challenges resulting from using different messaging backends:

Over the years, operational experience has taught us that our customers and business would greatly benefit from a unified PubSub interface that the platform team owns and maintains, so that application developers can focus on application logic instead of spending precious hours debugging client-server connectivity issues.

The company created a generic PubSub client library that provides a unified abstraction and enhanced features compared to native client libraries. PSC supports automated service discovery, optimized configurations, automated error handling, interceptors, metrics, and optimized configurations. The library provides two primary interfaces: PSC Producer and PSC Consumer, each able to manage one or more backend producers or consumers.

The Architecture of the PubSub Client (Source: PSC GitHub Repository)

The library introduced Resource Names (RNs) to support automated service discovery for messaging topics. Topic references use a fully qualified RN string that contains all information required to establish broker connections. For instance, `secure:/rn:kafka:prod:aws_us-west-1:shopping:transaction` specifies the topic, cluster, region, as well as the backend (Kafka) that the client needs to connect to. This approach prevents accidental misconfigurations using native clients with invalid host/port combinations, SSL configuration options and credentials, incorrect regions, etc.

Engineers developed a Flink-PSC connector to enable seamless migration for Flink-based workloads. The main migration challenge was ensuring that newly migrated jobs could recover their job state from Flink checkpoint files.

PubSub Client has 100% feature and API parity with native clients, which allowed Pinterest to migrate over 90% of its Java applications to PSC with minimal changes to their codebases. The migration task usually involved replacing imports and references and updating client configurations to leverage PSC ones, including new Resource Name (RN) strings.

The Impact of PSC Rollout on Flink Job Restarts (Source: Pinterest Engineering Blog)

Pinterest plans to introduce further enhancements into the PSC, including automatic error handling for more remediable errors, such as detecting and refreshing expiring SSL certificates. The company is also working on a C++ version with a Python one on the roadmap. Lastly, the platform team is looking to utilize client tracking functionality to support client chargeback so that infrastructure costs can be attributed to projects and teams.

In response to the LinkedIn post, Aaron Lee commented:

Well done! It’s impressive how much downstream impact a well designed unified client for such a core piece of infrastructure has. I can see other large engineering teams finding a lot of value in utilizing this.

About the Author

Rate this Article