InfoQ Homepage Availability Content on InfoQ
-
Erlang-Runtime Statically-Typed Functional Language Gleam Reaches 1.0
Gleam, an actor-based highly-concurrent functional language running on the Erlang virtual machine (BEAM), has reached version 1.0, which means it is now ready to be used in production systems with a guarantee of backward compatibility based on semantic versioning.
-
Uber Improves Resiliency of Microservices with Adaptive Load Shedding
Uber created a new load-shedding library for its microservice platform, serving over 130 million customers and handling aggregated peaks of millions of requests per second (RPSs). The company replaced the solution based on QALM with Cinnamon library, which, in addition to graceful degradation, can dynamically and continuously adjust the capacity of the service and the amount of load shedding.
-
How RevenueCat Manages Caching for Handling over 1.2 Billion Daily API Requests
RevenueCat extensively uses caching to improve the availability and performance of its product API while ensuring consistency. The company shared its techniques to deliver the platform, which can handle over 1.2 billion daily API requests. The team at RevenueCat created an open-source memcache client that provides several advanced features.
-
Slack Migrates to Cell-Based Architecture on AWS to Mitigate Gray Failures
Slack migrated most of the critical user-facing services from a monolithic to a cell-based architecture over the last 1.5 years. The move was triggered by the impact of networking outages affecting a single availability zone, causing user-impacting service degradation. The new architecture allows incrementally draining all the traffic away from the affected availability zone within 5 minutes.
-
Automated Horizontal Scaling with Amazon Aurora Limitless Database
AWS recently announced the preview of Amazon Aurora Limitless Database, a new capability supporting automated horizontal scaling to process millions of write transactions per second and manage petabytes of data in a single Aurora database.
-
Monzo Employs Targeted Traffic Shedding against Stampeding Herd Effect from the Mobile App
Monzo developed a solution for shedding traffic in case its platform comes under intense and unexpected load that could lead to an outage. Traffic spikes can be generated by the mobile app and triggered by push notifications or other bursts in user activity. The solution can reduce the read traffic by almost 50% with 90% overall accuracy without noticeable customer impact.
-
How Amazon Prime Video Delivers 99.999% Availability While Reducing Costs
Amazon Prime Video created a highly available live video streaming architecture by combining redundant components to achieve the five-nines of availability that they require for their platform. The company optimized the deployment topology and video encoding to reduce costs while ensuring optimal video quality for users.
-
AWS Introduces Amazon Route 53 Resolver on AWS Outposts Rack
AWS recently announced that Amazon Route 53 Resolver is now available on AWS Outposts rack providing on-premises services and applications with local Domain Name Service (DNS) resolution directly from Outposts. In addition, local Route 53 Resolver endpoints also enable DNS resolution between Outposts and on-premises DNS servers.
-
Microsoft Azure Cross-Region (Global) Load Balancer Now Generally Available
Microsoft recently announced the general availability (GA) of Azure cross-region (Global) Load Balancer in all Azure public and national cloud regions.
-
Azure Cosmos DB Integration with Vercel Now in Public Preview
Microsoft recently announced the public preview of the Vercel and Azure Cosmos DB integration allowing developers to easily create Vercel applications with an already configured Azure Cosmos DB database.
-
AWS Adds Multi-AZ with Standby Support to OpenSearch Service
OpenSearch Service recently introduced support for Multi-AZ with Standby, a new deployment option for the search and analytics engine that provides 99.99% availability and better performance for business-critical workloads.
-
Meta Switches to MySQL Raft to Improve Reliability and Operational Simplicity
Meta is rolling out MySQL Raft in its data centers to replace its current MySQL semisynchronous databases. The new consensus engine helps operation and allows MySQL servers to take responsibility for promotions and membership.
-
Testing Advanced Driver Assistance Systems
Advanced driver assistance systems can have a huge number of test cases. Cutting the elephant into smaller pieces can ensure every bit and piece is tested. A good test environment is essential to be efficient, fast and flexible to cover all required tests to ensure quality. Testers should be involved in the project right from the beginning to avoid task-forces, quality- or delivery problems.
-
Atlassian Exceeds 99.9999% of Availability Using Sidecars and Highly Fault-Tolerant Design
Atlassian recently published how it exceeded 99.9999% of availability with its Tenant Context Service. Atlassian achieved this high availability by implementing highly-autonomous client sidecars, able to proactively shield themselves from complete AWS region failures. Sidecars query multiple services concurrently to accomplish this goal and ensure that requests are entirely isolated internally.
-
Slack Implements Circuit Breakers to Improve CI/CD Pipeline Availability
Slack recently published how it implemented the Circuit Breaker pattern to improve its CI/CD pipeline availability. Before this project, engineers at Slack saw challenges as peak request volumes in internal tooling caused cascade failures in dependent systems. Since completion, engineers saw increased service availability and fewer bad developer experiences like flakiness from failing services.