InfoQ Homepage Observability Content on InfoQ

Articles

RSS Feed

Newer Older

DevOps

How to Best Use MTT* Metrics to Optimize Your Incident Response

Selecting the correct MTT* metric to improve your incident response is important. If the wrong metric is chosen, the improvements may get lost in the noise of a multivariable equation. This article reviews the various MTT* metrics available and discusses the best scenarios for selecting each one.

Alex Ewerlöf
on Mar 17, 2022
DevOps

Why Change Intelligence is Necessary to Effectively Troubleshoot Modern Applications

Change Intelligence is often a missing component in incident management. Successfully correlating monitoring and observability data to arrive allows engineers to arrive at the root cause more rapidly. Telemetry provides the building blocks that enable change intelligence to identify and map the root cause, based on changes in the system and their broader impact.

Mickael Alliel
on Jan 24, 2022
DevOps

Why the Future of Monitoring Is Agentless

Traditionally, monitoring software has relied heavily on agent-based approaches for extracting telemetry data from systems. Observability requires better telemetry than agents currently provide. OpenTelemetry is driving advances in this area by creating a standard format and APIs to create, transmit, and store telemetry data. This unlocks new opportunities in observability.

Austin Parker
on Oct 15, 2021
DevOps

DevOps and Cloud InfoQ Trends Report - July 2021

This article summarizes how we see the "cloud computing and DevOps" space in 2021, which focuses on fundamental infrastructure and operational patterns, the realization of patterns in technology frameworks, and the design processes and skills that a software architect or engineer must cultivate.

Matt Campbell Steef-Jan Wiggers Shaaron A Alvares Helen Beal Daniel Bryant Lena Hall Rupert Field Aditya Kulkarni Jared Ruckle Renato Losio Holly Cummins
on Jul 19, 2021
Cloud

Solving Mysteries Faster with Observability

At QCon plus, a virtual conference for senior software engineers and architects covering the trends, best practices, and solutions leveraged by the world's most innovative software organizations, Elizabeth Carretto discussed observability at Netflix and how their internal tool, Edgar, comes into play.

Elizabeth Carretto
on Jun 30, 2021
Cloud

Cloud Native and Kubernetes Observability: Expert Panel

InfoQ recently caught up with Observability experts to discuss several topics including fundamental questions about what Observability really entails, the misconceptions and challenges that the users are facing, the open standards that are influencing the industry in general and why there is more interest in this area off late.

Rags Srinivas Liz Fong-Jones Bartłomiej Plotka Josh Suereth Frederic Branczyk
on May 06, 2021
Culture & Methods

Site Reliability Engineering Experiences at Instana

With the popularity of distributed architectures, distributed databases, containers and container orchestrators, an approach that emphasizes automation and a culture of collaboration is a natural fit for modern day operations. Site Reliability Engineering takes engineering practices that have been established and proven in software engineering and applies them to the field of operations.

Bastian Spanneberg
on Apr 29, 2021
Architecture & Design

Software Architecture and Design InfoQ Trends Report—April 2021

An overview of how the InfoQ editorial team sees the Software Architecture and Design topic evolving in 2021, with a focus on what architects are designing for today.

Thomas Betts Holly Cummins Daniel Bryant Eran Stiller
on Apr 19, 2021
DevOps

Piercing the Fog: Observability Tools from the Future

Visibility into those distributed systems and how they are performing is challenging. Despite all the observability tools available for site reliability, debugging remains incredibly difficult, and many SREs would agree that their debugging processes have only marginally improved. This article explores how observability for troubleshooting could be done from the user’s point of view.

Srinath Perera
on Feb 10, 2021
DevOps

Instrumenting the Network for Successful AIOps

AIOps platforms empower IT teams to quickly find the root issues that originate in the network and disrupt running applications. AI/ML algorithms need access to high quality network data to determine what went wrong and where. Network visibility starts from TAPs around network equipment, and teams can add application instrumentation and logs as data sources for complete insights.

Ron Nevo
on Nov 23, 2020
Architecture & Design

Load Testing APIs and Websites with Gatling: It’s Never Too Late to Get Started

Conducting load tests against APIs and websites can both validate performance after a long stretch of development and get useful feedback from an app in order to increase its scaling capabilities and performance. Engineers should avoid creating “the cathedral” of load testing and end up with little time to improve performance overall. Write the simplest possible test and iterate from there.

Guillaume Corre
on Sep 19, 2020
Architecture & Design

Realtime APIs: Mike Amundsen on Designing for Speed and Observability

In a recent apidays webinar, Mike Amundsen, trainer and author of the recent O’Reilly book “API Traffic Management 101”, presented “High Performing APIs: Architecting for Speed at Scale”. Drawing on recent research by IDC, he argued that organisations will have to drive systemic changes in order to meet the upcoming increased demand of consumption of business services via APIs.

Daniel Bryant
on Aug 07, 2020

Newer Articles

Older Articles

InfoQ Software Architects' Newsletter

Articles