InfoQ Homepage Observability Content on InfoQ

  • Chaos Engineering Observability: Q&A with Russ Miles

    In a new O’Reilly report, “Chaos Engineering Observability: Bringing Chaos Experiments into System Observability”, the author, Russ Miles, explores why he believes the topics of observability and chaos engineering “go hand in hand”. He argues that as engineers begin to run chaos experiments, they will need to be able to ask many questions about the underlying system being experimented on.

  • Three Pillars with Zero Answers: Rethinking Observability with Ben Sigelman

    At KubeCon NA, held in Seattle, USA, in December 2018, Ben Sigelman presented “Three Pillars, Zero Answers: We Need to Rethink Observability” and argued that many organisations may need to rethink their approach to metrics, logging and distributed tracing.

  • Testing Complex Distributed Systems at Sarah Wells Shares Lessons Learned

    The complexity in complex distributed systems isn’t in the code, it’s between the services or functions. Testing implies balancing finding problems versus delivering value, said Sarah Wells at the European Testing Conference. Testers often have the best understanding of what the system does; they have a good hypothesis about what went wrong, and are able to validate it pretty quickly.

  • Adopting Envoy as a Service-to-Service Proxy at Reddit

    Reddit introduced Envoy into their backend framework as service-to-service proxy to support their ongoing architectural improvements. By adopting Envoy as a service-to-service Layer 4/Layer 7 proxy, they discovered significant improvements in observability, ease of adoption, and performance.

  • The Evolution of Full Cycle Developers at Netflix: Greg Burrell at QCon SF

    At QCon San Francisco, Greg Burrell talked about the journey towards “full cycle developers” within the Netflix edge engineering team. Following the principle of “operate what you build”, developers within this team chose to take on more operational responsibility for their services, and were facilitated by comprehensive tooling, training and management support.

  • Shipping More Safely by Encouraging Ownership of Deployments

    Many incidents happen during or right after the release argues Charity Majors, CEO at Honeycomb. She believes that stronger ownership of the deployment process by developers will ensure it is executed regularly and reduce risk. She argues for investment in the tooling, high observability during and after release, and small, frequent releases to minimize the impact caused by shipping new code.

  • Scaling Observability at Uber: Building In-House Solutions, uMonitor and Neris

    Uber’s infrastructure consists of thousands of microservices supporting mobile applications, infrastructure, and internal services. To provide high observability of these services, Uber’s Observability team built two in-house monitoring solutions: uMonitor for time-series metrics-based alerting, and Neris for host-level checks and metrics.

  • O11ycon Discusses Benefits and Challenges of Observability

    The first o11ycon provides a comprehensive look at the emerging concept of observability in software and systems which allow people to understand if things are working as expected, and to diagnose problems and identify solutions.

  • Observability and Microservices: The Need for Effective Tracing and Metrics

    Zach Jory has written an article discussing how microservices and service mesh implementations need observability to ensure that developers can build cloud-native applications which scale and can be more easily managed. This ties into a number of articles and interviews we have spoken about over recent months too.

  • Building Observable Distributed Systems

    Today's systems are more and more complex; microservices distributed over the network and scaling dynamically, resulting in many more ways of failure, ways we can't always predict. Investing in observability gives us the ability to ask questions to systems, things we never thought about before. Some of the tools that can be used for this are metrics, tracing, structured and correlated logging.

  • How Observability Impacts Testing: Q&A with Amy Phillips at QCon London

    Observability gives you a picture of the system’s current health and can replace certain types of testing. For low-risk application areas you can rely on observability instead of testing, provided you have continuous delivery that provides fast feedback and allows you to release changes quickly.


Is your profile up-to-date? Please take a moment to review and update.

Note: If updating/changing your email, a validation request will be sent

Company name:
Company role:
Company size:
You will be sent an email to validate the new email address. This pop-up will close itself in a few moments.