InfoQ Homepage Infrastructure Content on InfoQ
-
Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems
In this article series, we take readers on a journey of discovery and provide a comprehensive overview and in-depth analysis of many key aspects of cell-based architectures, as well as practical advice for applying this approach to existing and new architectures.
-
Building a Global Caching System at Netflix: a Deep Dive to Global Replication
Netflix's EVCache system handles 400M ops/second across 22,000 servers, managing 14.3 PB of data. This infrastructure ensures global availability and resilience through intelligent data routing and flexible replication strategies. By implementing batch compression and switching to DNS-based discovery, Netflix optimizes efficiency, reduces bandwidth usage and significantly lowers operational costs.
-
How to Minimize Latency and Cost in Distributed Systems
Explore the benefits and challenges of microservices architecture in cloud environments, focusing on achieving resilience and high availability while managing costs and performance issues.
-
Are You Done Yet? Mastering Long-Running Processes in Modern Architectures
In this article, Bernd Ruecker explores the importance of long-running processes in various applications, particularly in distributed systems. He emphasizes the value of asynchronous communication and explores strategies like Centers of Excellence, along with visual tools like BPMN for enhancing communication and understanding. The contents of this article were presented during QCon London 2024.
-
Using GreenOps to Improve Your Operational Efficiency and Save the Planet
Our infrastructures have environmental and economic costs; the IT sector is responsible for 1.4% of carbon emissions worldwide. GreenOps can be used to help mitigate this impact.
-
Architecting for High Availability in the Cloud with Cellular Architecture
Cellular architecture is a design pattern that helps achieve high availability in multi-tenant applications. The goal is to design your application so that you can deploy all of its components into an isolated "cell" that is fully self-sufficient. It can benefit your customers regarding availability and ensure you hit your SLAs.
-
Zero-Knowledge Proofs for the Layman
This article will introduce you to zero-knowledge proofs, a kind of cryptography you can use to provide the proof you know a secret, such as a private key or the solution to a problem, without ever sharing it to an interested party. While many articles exist on the topic, this will not require any high math knowledge.
-
Orchestrating Resilience Building Modern Asynchronous Systems
In this article, we will discuss what problems we had to solve at Twilio to efficiently build a resilient and scalable asynchronous system to handle a complex workflow and the advantages we got from adopting a Workflow Orchestration solution, including abstracting away state management and out-of-the-box support for retries, observability, and audibility.
-
Optimizing Resource Utilization: the Benefits and Challenges of Bin Packing in Kubernetes
Optimizing Kubernetes usage is an important part of a responsible cloud strategy. Bin packing is an effective strategy for maximizing the usage of each node.
-
The Role of Digital Twins in Unlocking the Cloud's Potential
This article explores the use of the DT concept as a new way to make cloud services more developer-friendly. This new model aligns the development, deployment, and now the runtime aspects of a microservice into a single, cohesive unit, bridging the gap between developers and the cloud and paving the way for a new era of cloud services.
-
Service Assurance in Private LTE/5G Networks
This article talks about service assurance in the context of cellular networks, how private networks pose additional needs, and how an end-to-end service assurance framework can be designed and developed for such networks.
-
Debugging outside Your Comfort Zone: Diving beneath a Trusted Abstraction
This article takes a deep dive through a complex outage in the main database cluster of a payments company. We’ll focus on the aftermath of the incident - the process of understanding what went wrong, recreating the outage in a test cluster, and coming up with a way to stop it from happening again, and dive deep into the internals of Postgres, and learn about how it stores data on disk.