InfoQ Homepage application performance management Content on InfoQ

Articles

RSS Feed

Newer Older

DevOps

Analyzing Apache Kafka Stretch Clusters: WAN Disruptions, Failure Scenarios, and DR Strategies

Proficient in analyzing the dynamics of Apache Kafka Stretch Clusters, I assess WAN disruptions and devise effective Disaster Recovery (DR) strategies. With deep expertise, I ensure high availability and data integrity across multi-region deployments. My insights optimize operational resilience, safeguarding vital services against service level agreement violations.

Srikanth Daggumalli Nishchai Jayanna Manjula
on Jun 20, 2025
Cloud

We Took Developers out of the Portal: How APIOps and IaC Reshaped Our API Strategy

Dynamic API strategist with expertise in transforming legacy management into efficient APIOps frameworks using Infrastructure as Code (IaC). Proven track record in automating API lifecycles, enhancing security, and fostering developer productivity through CI/CD integration. Adept at driving operational excellence and consistency across environments, enabling rapid deployment and innovation.

Balakrishna Sudabathula
on Jun 12, 2025
Architecture & Design

Using Traffic Mirroring to Debug and Test Microservices in Production-Like Environments

Traffic mirroring has evolved from a network security tool to a robust method for debugging and testing microservices using real-world data. By safely duplicating production traffic to a shadow environment, teams can replicate elusive bugs, profile performance under actual load, validate new features, and detect regressions, ensuring that production remains isolated and user experiences intact.

Apoorv Mittal
on Jun 09, 2025
Culture & Methods

InfoQ Culture and Methods Trends Report - 2025

This report summarizes how the InfoQ Culture and Methods editorial team sees the ongoing and emergent trends in the culture and methods space.

Shane Hastie Charity Majors Ben Linders Rafiq Gemmail Craig Smith
on May 09, 2025
Architecture & Design

Applying Flow Metrics to Design Resilient Microservices

Software design with resilience is an acknowledgement to the reality that everything fails. We put metrics in place to help us detect and resolve such problems and failures. Flow metrics, commonly used to measure how well teams deliver software, can be used to measure and improve system resilience.

Mourjo Sen
on Mar 26, 2025
AI, ML & Data Engineering

Beyond Notebook: Building Observable Machine Learning Systems

In this article, the author discusses a machine learning pipeline with observability built-in for credit card fraud detection use case, with tools like MLflow, FastAPI, Streamlit, Apache Kafka, Prometheus, Grafana, and Evidently AI.

Lakshmithejaswi Narasannagari
on Mar 14, 2025
AI, ML & Data Engineering

Secure AI-Powered Early Detection System for Medical Data Analysis & Diagnosis

In this article, the author discusses the techniques for securing AI applications in healthcare with an use case of early detection system for medical data analysis & diagnosis. The proposed layered architecture includes application components to support secure computation, ai modeling, governance and compliance, and monitoring and auditing.

Mahesh Vaijainthymala Krishnamoorthy
on Mar 03, 2025
AI, ML & Data Engineering

A Framework for Building Micro Metrics for LLM System Evaluation

LLM accuracy is a challenging topic to address and is much more multi-dimensional than a simple accuracy score. Denys Linkov introduces a framework for creating micro metrics to evaluate LLM systems, focusing on goal-aligned metrics that improve performance and reliability. By adopting an iterative "crawl, walk, run" methodology, teams can incrementally develop observability.

Denys Linkov
on Jan 21, 2025
Architecture & Design

Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture

Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.

Leander Vanderbijl
on Nov 18, 2024
Architecture & Design

Cell-Based Architecture Adoption Guidelines

The challenges in building modern, reliable, and understandable distributed systems continue to grow, and cell-based architecture is a valuable way to accept, isolate, and stay reliable in the face of failures. Organizations must ensure that the cell-based architecture is the right fit for them and that the migration will not cause more problems than it solves.

Guy Coleman
on Nov 04, 2024
Architecture & Design

Securing Cell-Based Architecture in Modern Applications

Securing cell-based architecture is essential to fully capitalize on its benefits while minimizing risks. To achieve this, comprehensive security measures must be put in place. Organizations can start by isolating and containing cells using sandbox environments and strict access control mechanisms like role-based and attribute-based access control.

Stefania Chaplin
on Oct 28, 2024
Architecture & Design

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.

Yury Niño Roa
on Oct 21, 2024

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles