InfoQ Homepage Performance & Scalability Content on InfoQ
-
Dynein – an Asynchronous Background Job Service from Airbnb
At Airbnb, they move time consuming, resource intensive tasks over to asynchronous background jobs to improve scalability. The job scheduling system has become a very important component and they have therefore built Dynein, a distributed delayed job queueing service and scheduler. In a blog post, Andy Fang from Airbnb describes the background and challenges in designing and building the service.
-
HAProxy EBtree: Design for a Scheduler, and Use (Almost) Everywhere
At QCON New York 2019, Andjelko Iharos presented how CTO Willy Tarreau and the HAProxy team implemented a scheduler using an EBtree data structure to optimize performance and memory usage of the HAProxy load balancer.
-
Microsoft Introduces Azure Front Door, a Scalable Service for Protecting Web Applications
In a recent blog post, Microsoft introduced the general availability (GA) of Azure Front Door (AFD), a scalable and secure entry point for web applications. The underlying technology in Azure Front Door, has been in place inside of Microsoft for the past five years where it has enabled scaling and protection for many popular Microsoft services including Office 365, Xbox, and Microsoft Teams.
-
Scaling Graphite at Booking.com
Booking.com's engineering team scaled their Graphite deployment from a small cluster to one that handles millions of metrics per second. Along the way, they modified and optimized Graphite's core components - the carbon-relay and carbon-cache, and the rendering API.
-
Scaling Apache Kafka at Pinterest
Apache Kafka is used at Pinterest for transporting data for real time streaming applications, logging and visibility metrics for monitoring. Hosted on AWS, Pinterest’s Kafka installation uses the MirrorMaker and DoctorKafka tools for replication and high availability.
-
The Evolution of Uber’s 100+ Petabyte Big Data Platform
Uber’s engineering team wrote about how their big data platform evolved from traditional ETL jobs with relational databases to one based on Hadoop and Spark. A scalable ingestion model, standard transfer format and a custom library for incremental updates are the key components of the platform.
-
Scaling Global Traffic at Dropbox with Edge Locations and GSLB
The Dropbox engineering team shared their experience of architecting and scaling their global network of edge locations. Located around the globe, these run a custom stack of nginx and IPVS and connect to the Dropbox backend servers over their backbone network. A combination of GeoDNS and BGP Anycast ensures availability and low latency for end users.
-
Supercharging Marketo's Campaign Engine at Reactive Summit
Marketo is a marketing automation software, executing over 20 billions customer defined actions per month. Apurva Pawar, Daniel Pugliese, Dennis Bronnikov and Pei-Chiang Ma from Marketo’s engineering team explained at Reactive Summit how they rewrote the core of their system with Akka and a reactive approach.
-
Amazon S3 Increases Request Rate Performance and Drops Randomized Prefix Requirement
Amazon Web Services (AWS) recently announced significantly increased S3 request rate performance and the ability to parallelize requests to scale to the desired throughput. Notably this performance increase also "removes any previous guidance to randomize object prefixes" and enables the use of "logical or sequential naming patterns in S3 object naming without any performance implications".
-
Facebook Open Sources LogDevice - a Distributed Data Store for Log Storage
Facebook open sourced their internal distributed log storage project called LogDevice. It offers high write availability using replication, durable log storage and recovery from failure.
-
How Coinbase Handled Scaling Challenges on Their Cryptocurrency Trading Platform
Coinbase, a digital currency exchange, faced scaling challenges on their platform during the 2017 cryptocurrency boom. The engineering team focused on upgrading and optimizing MongoDB, traffic segregation for hotspots to resolve them, and building capture and replay tools to prepare for future surges.
-
Hyperledger Adds "Caliper" to Measure Blockchain Performance across Implementations
On March 19th, Hyperledger announced Caliper has been accepted by the Technical Steering Committee as a Hyperledger project. Hyperledger Caliper is a blockchain benchmark tool that allows projects to consistently track performance characteristics across different blockchain implementations.
-
How Booking.com Uses Kubernetes for Machine Learning
Sahil Dua explained how Booking.com was able to scale machine learning (ML) models for recommending destinations and accommodation to their customers using Kubernetes, at the QCon London conference. In particular, he stressed how Kubernetes elasticity and resource starvation avoidance on containers helps them run computationally (and data) intensive, hard to parallelize, machine learning models.
-
Handling Traffic Spikes from Global Events at Facebook Live
Facebook Live’s engineers talked about how they scale their systems to handle traffic from both predicted and unpredicted events. While the latter is handled by their global distributed architecture, the former involves careful advance planning and load testing.
-
Smart Replies for Member Messages at LinkedIn
LinkedIn has launched a new natural language processing (NLP) recommendation engine which is used to provide members with smart-reply recommendations to messages. The models and infrastructure development process has been documented in detail in a recent blog post by the engineering team.