Effective and Efficient Observability with OpenTelemetry

Daniel Gomez Blanco, Principal Engineer at Skyscanner and author of Practical OpenTelemetry, shared his experiences at QCon London on a large-scale observability initiative at his company, based on adopting OpenTelemetry across hundreds of services and the motivation and value gained from adopting open standards across the entire organization.

Blanco started with the following phrase: When our systems change, how do we know what has changed? A primitive way of observability is using print statements. However, that needs to scale better in real distributed systems. For example, the systems at Skyscanner are complex and distributed, with hundreds of services deployed and thousands of interconnections connections between them.

When a change is deployed, how does one determine how it affected a given service, and its dependencies, within such a complex system at Skyscanner? That’s where observability comes in.

Blanco explains why observability matters, how open standards help observability, rolling it out in the organization, and how to adopt it in practice. Observability helps in two different ways. The first one is to answer the question is my system behaving as expected after a deployment? And two, why is it not behaving as expected when a regression occurs?

Effective Observability means:

High granularity: detailed telemetry data corresponding to individual operations within system transactions
Rich context: considering multiple telemetry signals and dependencies under one single holistic view of the system
Signal correlation: linking metrics, traces, and logs under one single stream of events
Service correlation: relating telemetry from different services as part of the same common operation

Next, Blanco went into details of effective and efficient observability leveraging OpenTelemetry, buy-vs-build decisions in buying from a vendor or using open-source, rolling out OpenTelemetry at Skyscanner, and adopting it.

And finally ended with some key takeaways:

Complex systems require effective observability
Open standards empower simplification
OpenTelemetry enables signals to be used efficiently

InfoQ interviewed Daniel Gomez Blanco about effective and efficient observability.

InfoQ: How did you drive the adoption of OpenTelemetry at Skyscanner?

Daniel Gomez Blanco: At Skyscanner, we have been investing in developer enablement, in general, for years, and it continues to pay back. To roll out OpenTelemetry with minimal friction for service owners, our platform engineers worked on two main areas.

The first is a set of core libraries and base images containing a default, opinionated configuration that allows us to drive our observability strategy without requiring any code changes on the applications using them. In a way, these are like our own OpenTelemetry Distro. Any application running at Skyscanner uses these defaults, configuring aspects like instrumentation packages, plugins, exporters, etc. Thanks to this and Open Telemetry’s compatibility with OpenTracing (now deprecated), our initial integration with OpenTelemetry involved only minor version bumps of these internal libraries.

The second one is OpenTelemetry Collectors, funneling all our telemetry data. This allows us to handle the last hop of telemetry and authentication to our observability platform and helps us re-aggregate and transform data as required. Combining these will enable us to make the golden path the path of least resistance and get observability out-of-the-box for our service owners.

InfoQ: What is the most significant benefit of adopting OpenTelemetry?

Daniel Gomez Blanco: There are many benefits from adopting OpenTelemetry, and it's hard to choose one but, in my opinion, the most significant one is how Open Telemetry’s API design allows putting telemetry instrumentation in the best hands possible, which can be the author of a library, a telemetry expert, or the maintainer of an application, depending on the situation.

All this telemetry can then be tied together under a single stream of correlated data rather than the old notion of three pillars of observability. When we instrument a given concept with metrics, traces, or logs, we can do it without having to decide on the backend where we will send this data or take library dependencies on a given SDK to do so. These details can be configured at application startup, enabling service owners to integrate with their telemetry backend of choice or to switch between them with zero code changes.

Most importantly, this allows applications, frameworks, and libraries to describe themselves in ways that make them observable, ubiquitously, in context with all other dependencies needed to run a distributed system.

InfoQ: What is a good starting point if someone wants to follow in your footsteps?

Gomez Blanco: The best starting point is to understand the value that OpenTelemetry can bring to your organization in particular and to be able to communicate it efficiently.

This can differ depending on multiple factors specific to each team or organization. For example, for those with many legacy systems to instrument and transport telemetry, the most significant ROI can be simplifying telemetry libraries and export pipelines. While for new, greenfield projects, the most value will come from the amount of high-quality telemetry that OpenTelemetry can provide out-of-the-box. Communicating this value can help drive adoption and align priorities between engineering leads.

Nonetheless, some common patterns certainly help. A good starting point is evaluating instrumentation libraries provided by OpenTelemetry, for your supported languages. Apart from getting high-quality telemetry for free, a large community of developers also supports any changes in the instrumented libraries, reducing the toil for individual teams instrumenting their services. Additionally, these produce telemetry following standard naming conventions, enabling every observability platform and engineer to speak the same language.

Another crucial part of an OpenTelemetry environment is Collectors. They're the Swiss army knife of the observability engineer, helping to integrate with existing non-OpenTelemetry solutions and start to produce data in a standard format, also allowing to transform this data to meet multiple needs. A great place to start is to see OpenTelemetry in action, and the best place to do so is the official OpenTelemetry Demo project.

About the Author

Steef-Jan Wiggers

Show moreShow less

InfoQ Software Architects' Newsletter

Write for InfoQ

About the Author

Steef-Jan Wiggers

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter