Pinterest's Moka: How Kubernetes Is Rewriting the Rules of Big Data Processing

Digital pinboard provider Pinterest has published an article explaining its blueprint for the future of large-scale data processing with its new platform Moka. The company is moving core workloads from ageing Hadoop infrastructure to a Kubernetes-based system on Amazon EKS, with Apache Spark as the main engine and support for other frameworks on the way.

In a two-part blog series, Soam Acharya, Rainie Li, William Tom and Ang Zhang describe how the Pinterest Big Data Platform team considered alternatives for their next-generation massive-scale data processing platform as the limits of the existing Hadoop-based system, known internally as Monarch, became clear. They present Moka as the outcome of that search, and as their EKS based cloud native data processing platform, which now runs production workloads at Pinterest scale. Part one of the series focused on the overall design and the application layer. In contrast, part two turns to what the authors call "the infrastructure-focused aspects of Moka with learnings and future direction."

The post frames the move to Kubernetes in practical terms. It demonstrates an industry-wide shift in which big technology companies now treat Kubernetes as a control plane for data, rather than only as a stateless service platform. Encouraged by growing popularity and increasing adoption in the Big Data community, the team explored Kubernetes-based systems as the most likely replacement for Hadoop 2.x. Any candidate platform had to meet precise criteria around scalability, security, cost and the ability to host multiple processing engines. Moka is an example of how to modernise a Hadoop-era data platform without abandoning existing Spark investments.

A central theme of the second article is how to operate Spark at a very large scale on Kubernetes. The authors explain how they added logging, metrics and job history services around Moka so that engineers could debug and tune jobs without needing to understand the underlying cluster topology. They worked on standardising log collection with Fluent Bit and publishing uniform metrics using OpenTelemetry and Prometheus-compatible endpoints. This gave both infrastructure and application teams a consistent view of system health.

Pinterest has also invested in making the platform reproducible through infrastructure-as-code. In the article, the team outlines how they use Terraform and Helm to create EKS clusters, configure networking and security, and deploy supporting components such as the Spark History Server.

Pinterest's engineers also discuss dealing with different hardware architectures. They describe how they built multi-architecture images so that their data workloads run well on both Intel and ARM-based instances, including AWS Graviton, and they link this to cost and efficiency goals at fleet scale. A LinkedIn summary of the project from InfoQ editor Eran Stiller notes that Moka "delivers container-level isolation, ARM support, YuniKorn scheduling, and significant cost savings by consolidating workloads and auto scaling across instance types". These details place the work within the broader trend of cloud users seeking to cut infrastructure costs without sacrificing performance.

The broader industry conversation on processing engines adds nuance to Pinterest's story. In a separate LinkedIn post, Acharya writes that "while Spark is our primary workhorse, the success of Moka has meant other use cases at Pinterest are following suit: Flink Batch is in production with Apache Ray close behind and Flink Streaming slated for later this year". Technical deep dives on Spark and Flink underline why this matters, stressing that Spark remains well suited to large batch and interactive analytics workloads, while Flink is "purpose built for real time, stateful stream processing" with strict event by event handling. The team is presenting Moka as a flexible base to add different engines to, depending on the needs of the specific workload, rather than as a Spark-only platform.

External observers have drawn lessons from the Pinterest case. The ML Engineer newsletter describes the Moka article as an example of "deploying EKS clusters, Fluent Bit logging, OTEL metrics pipelines, image management, and a custom Moka UI for Spark on Kubernetes", placing it alongside other case studies in modern data infrastructure. These reactions suggest that Moka is seen somewhat as a reference architecture for a class of cloud native data systems.

The team do however present their migration work as an ongoing journey, rather than a finished project. In the blog and a further LinkedIn post, the Pinterest authors discuss "learnings and future direction" and describe how early proof-of-concepts led to a phased move away from Hadoop as confidence in the new stack grew. Acharya notes that "the best problems show up at scale" and that building the platform involved "working out the kinks" as the team shifted real workloads. For other organisations, this experience may be the most important lesson. Copying the technical choices around Kubernetes, EKS and Spark is relatively simple, but the process of uncoupling from legacy systems and investing in observability, automation and multi-engine support is likely to be the real work that lies ahead.

About the Author

Matt Saunders

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Matt Saunders

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter