InfoQ Homepage Articles

Articles

RSS Feed

Newer Older

Culture & Methods

Adaptive Responses to Resiliently Handle Hard Problems in Software Operations

As engineers move into more senior positions such as Staff Engineer, Architect, or Sr Tech Lead roles, their knowledge and experience is often applied across the system. This expertise is increasingly needed for handling novel problems or designing innovative solutions to complex problems. This article discusses strategies for approaching your role as a senior member of your organization.

Laura Maguire
on Oct 23, 2024
Architecture & Design

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.

Yury Niño Roa
on Oct 21, 2024
Cloud

Optimizing Wellhub Autocomplete Service Latency: a Multi-Region Architecture

Every company wants fast, reliable, and low-latency services. Achieving these goals requires significant investment and effort. In this article, I will share how Wellhub invested in a multi-region architecture to achieve a low-latency autocomplete service.

Matheus Felisberto
on Oct 17, 2024
Architecture & Design

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

In this article series, we take readers on a journey of discovery and provide a comprehensive overview and in-depth analysis of many key aspects of cell-based architectures, as well as practical advice for applying this approach to existing and new architectures.

Rafal Gancarz
on Oct 14, 2024
Architecture & Design

How Cell-Based Architecture Enhances Modern Distributed Systems

Cell-based architecture has emerged as a response to many challenges associated with distributed systems. It employs the bulkhead pattern to isolate failures to a fraction of the affected infrastructure footprint and prevent widespread impact. Cells can also help organize large architectures into domain-bound deployment and delivery units, which provides essential sociotechnical benefits.

Erica Pisani Rafal Gancarz
on Oct 14, 2024
Architecture & Design

Building a Global Caching System at Netflix: a Deep Dive to Global Replication

Netflix's EVCache system handles 400M ops/second across 22,000 servers, managing 14.3 PB of data. This infrastructure ensures global availability and resilience through intelligent data routing and flexible replication strategies. By implementing batch compression and switching to DNS-based discovery, Netflix optimizes efficiency, reduces bandwidth usage and significantly lowers operational costs.

Sriram Rangarajan Prudhviraj Karumanchi
on Oct 11, 2024
DevOps

Proactive Approaches to Securing Linux Systems and Engineering Applications

Maintaining a strong security posture is challenging, especially with Linux. An effective approach is proactive and includes patch management, optimized resource allocation, and effective alerting.

Prashanth Ravula
on Oct 07, 2024
Java

How Functional Programming Can Help You Write Efficient, Elegant Web Applications

Many things can make software more challenging to understand and, consequently, to maintain. One of the most complex and problematic causes is managing internal mutable states. When the internal state is poorly managed, the software behaves unexpectedly, leading to bugs and fixing, which introduces unnecessary complexity. FP solves this problem by providing immutability mechanisms and more.

Uberto Barbini
on Oct 04, 2024
AI, ML & Data Engineering

Virtual Panel: What to Consider When Adopting Large Language Models

Four experts discuss some issues people should think about when adopting LLMs and how they can make the best choice for their specific use case. Topics include how to choose between an API-based vs. self-hosted LLM, when to fine-tune an LLM, how to mitigate LLM risks, and what non-technical changes organizations need to make when adopting LLMs.

Anthony Alford Meryem Arik Numa Dhamani Maggie Engler Tingyi Li
on Oct 01, 2024
Cloud

How to Minimize Latency and Cost in Distributed Systems

Explore the benefits and challenges of microservices architecture in cloud environments, focusing on achieving resilience and high availability while managing costs and performance issues.

Amir Souchami
on Sep 25, 2024
AI, ML & Data Engineering

Navigating LLM Deployment: Tips, Tricks, and Techniques

This article focuses on self-hosted LLMs and how to get the best performance from them. The author provides best practices on how to overcome challenges due to model size, GPU scarcity, and a rapidly evolving field.

Meryem Arik
on Sep 24, 2024
DevOps

Building Better Platforms with Empathy: Case Studies and Counter-Examples

Scaling platform development often means absorbing cognitive burdens, but empathy is key. Understanding users beyond their immediate issues leads to better solutions. Platforms help manage growth's complexity, but a product mindset with user-centricity is vital. In his talk at QCon San Francisco 2023, David Stenglein expanded on cultivating empathy through open communication.

David Stenglein
on Sep 23, 2024

Newer Articles

Older Articles

Topics

Pitfalls of Unified Memory Models in GPUs

Beyond Platform Thinking at RB Global – Build Things No One Expects, in a Place No One Expects It

Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

Adaptive Responses to Resiliently Handle Hard Problems in Software Operations

Proactive Approaches to Securing Linux Systems and Engineering Applications

Helpful links

Choose your language

Articles

Adaptive Responses to Resiliently Handle Hard Problems in Software Operations

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Optimizing Wellhub Autocomplete Service Latency: a Multi-Region Architecture

Article Series: Cell-Based Architectures: How to Build Scalable and Resilient Systems

How Cell-Based Architecture Enhances Modern Distributed Systems

Building a Global Caching System at Netflix: a Deep Dive to Global Replication

Proactive Approaches to Securing Linux Systems and Engineering Applications

How Functional Programming Can Help You Write Efficient, Elegant Web Applications

Virtual Panel: What to Consider When Adopting Large Language Models

How to Minimize Latency and Cost in Distributed Systems

Navigating LLM Deployment: Tips, Tricks, and Techniques

Building Better Platforms with Empathy: Case Studies and Counter-Examples

Challenges and Lessons Porting Code from C to Rust

Copilot Now Available in OneDrive: AI-Powered Features for Streamlined Document Management

Ephemeral IDs: Cloudflare's Latest Tool for Fraud Detection

Beyond Platform Thinking at RB Global – Build Things No One Expects, in a Place No One Expects It

Evolving Trainline Architecture for Scale, Reliability and Productivity

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Managing High-Performing Software Teams

Adaptive Responses to Resiliently Handle Hard Problems in Software Operations

Mastering Observability: Unlocking Customer Insights with Gojko Adzic

Distill Your LLMs and Surpass Their Performance: spaCy's Creator at InfoQ DevSummit Munich

Generally AI - Season 2 - Episode 4: Coordinate Systems in AI and the Physical World

University Researchers Publish Analysis of Chain-of-Thought Reasoning in LLMs

Google Cloud Adds Scalable Vector Search to Memorystore for Valkey & Redis Cluster

Podman Desktop 1.13 Launches with Hyper-V Support and Additional Enhancements

Uber Completes Major MySQL Fleet Upgrade, Boosting Performance and Security

QCon San Francisco

QCon London

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Articles