InfoQ Homepage Observability Content on InfoQ

Articles

RSS Feed

Newer Older

Culture & Methods

InfoQ Culture and Methods Trends Report - 2025

This report summarizes how the InfoQ Culture and Methods editorial team sees the ongoing and emergent trends in the culture and methods space.

Shane Hastie Charity Majors Ben Linders Rafiq Gemmail Craig Smith
on May 09, 2025
Architecture & Design

Applying Flow Metrics to Design Resilient Microservices

Software design with resilience is an acknowledgement to the reality that everything fails. We put metrics in place to help us detect and resolve such problems and failures. Flow metrics, commonly used to measure how well teams deliver software, can be used to measure and improve system resilience.

Mourjo Sen
on Mar 26, 2025
AI, ML & Data Engineering

Beyond Notebook: Building Observable Machine Learning Systems

In this article, the author discusses a machine learning pipeline with observability built-in for credit card fraud detection use case, with tools like MLflow, FastAPI, Streamlit, Apache Kafka, Prometheus, Grafana, and Evidently AI.

Lakshmithejaswi Narasannagari
on Mar 14, 2025
AI, ML & Data Engineering

Secure AI-Powered Early Detection System for Medical Data Analysis & Diagnosis

In this article, the author discusses the techniques for securing AI applications in healthcare with an use case of early detection system for medical data analysis & diagnosis. The proposed layered architecture includes application components to support secure computation, ai modeling, governance and compliance, and monitoring and auditing.

Mahesh Vaijainthymala Krishnamoorthy
on Mar 03, 2025
AI, ML & Data Engineering

A Framework for Building Micro Metrics for LLM System Evaluation

LLM accuracy is a challenging topic to address and is much more multi-dimensional than a simple accuracy score. Denys Linkov introduces a framework for creating micro metrics to evaluate LLM systems, focusing on goal-aligned metrics that improve performance and reliability. By adopting an iterative "crawl, walk, run" methodology, teams can incrementally develop observability.

Denys Linkov
on Jan 21, 2025
Architecture & Design

Transforming Legacy Healthcare Systems: a Journey to Cloud-Native Architecture

Discover how Livi navigated the complexities of transitioning MJog, a legacy healthcare system, to a cloud-native architecture, sharing valuable insights for successful tech modernization. Our experience illustrates that transitioning from legacy systems to cloud-based microservices is not a one-time project, but an ongoing journey.

Leander Vanderbijl
on Nov 18, 2024
Architecture & Design

Cell-Based Architecture Adoption Guidelines

The challenges in building modern, reliable, and understandable distributed systems continue to grow, and cell-based architecture is a valuable way to accept, isolate, and stay reliable in the face of failures. Organizations must ensure that the cell-based architecture is the right fit for them and that the migration will not cause more problems than it solves.

Guy Coleman
on Nov 04, 2024
Architecture & Design

Taking Advantage of Cell-Based Architectures to Build Resilient and Fault-Tolerant Systems

Cell-based architectures offer a robust approach to building resilient systems. They achieve this through the core principles of isolation, autonomy, and replication. Each cell manages its resources and makes decisions autonomously. Observability for cell-based architecture requires a tailored approach to address the unique challenges and opportunities presented by this distributed system design.

Yury Niño Roa
on Oct 21, 2024
DevOps

Elevating Kubernetes Logging for Enhanced Observability

In this article, we will explore the challenges, strategies, and best practices that will help you achieve seamless log management in your Kubernetes environment.

Prithvish Kovelamudi
on Jun 13, 2024
Cloud

Multi-Cloud Observability Using Fluent Bit

Explore the benefits and challenges of observability in multi-cloud deployments. See how Fluent Bit, a lightweight log collection and distribution tool, can enhance multi-cloud observability by improving cloud neutrality, cutting egress costs, and tackling compliance challenges.

Phil Wilkins
on Apr 23, 2024
DevOps

Orchestrating Resilience Building Modern Asynchronous Systems

In this article, we will discuss what problems we had to solve at Twilio to efficiently build a resilient and scalable asynchronous system to handle a complex workflow and the advantages we got from adopting a Workflow Orchestration solution, including abstracting away state management and out-of-the-box support for retries, observability, and audibility.

Sai Pragna Etikyala Vikranth Etikyala
on Jan 12, 2024
AI, ML & Data Engineering

InfoQ AI, ML, and Data Engineering Trends Report - September 2023

In this annual report, the InfoQ editors discuss the current state of AI, ML, and data engineering and what emerging trends you as a software engineer, architect, or data scientist should watch. We curate our discussions into a technology adoption curve with supporting commentary to help you understand how things are evolving.

Roland Meertens Srini Penchikala Sherin Thomas Daniel Dominguez Anthony Alford
on Sep 06, 2023

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles