InfoQ Homepage Podcasts Ted Young on Observability and the Release of OpenTelemetry 1.0

Ted Young on Observability and the Release of OpenTelemetry 1.0

Mar 22, 2021

Podcast with

Ted Young

Daniel Bryant

In this podcast Ted Young, director of developer education at Lightstep, sat down with InfoQ podcast host Daniel Bryant and discussed: observability (and the three pillars), the OpenTelemetry CNCF sandbox project and the 1.0 release, and how to build an effective telemetry collection platform.

Key Takeaways

Although we’ve long been building distributed systems, the recent engineering trend towards building systems with microservices and cloud technologies has made monitoring and observing the underlying systems more challenging.
OpenTelemetry is a collection of tools, APIs, and SDKs that can be used to instrument, generate, collect, and export telemetry data for analysis in order to understand a system's performance and behavior.
OpenTelemetry provides everything an engineer would need to build a telemetry system, except for the final step of the data storage or data analysis tool.
The two main components of OpenTelemetry are the clients that would be installed in an application or service, and then a separate service called the collector, which is more like the telemetry system.
There is a clean separation between the API and SDK of the client allowing separate evolution. The current OpenTelemetry 1.0 release provides a stable tracing API, with a minimum of three years of support, and SDK, with a minimum of one year support.

Subscribe on:

Transcript

Introductions [00:19]

Daniel Bryant: Hello and welcome to the InfoQ Podcast. I'm Daniel Bryant, News Manager here at InfoQ and Director of Dev Rel at Ambassador Labs. In this edition of the podcast I have the pleasure of sitting down with Ted Young, Director of Developer Education at LightStep. There's been a lot of excitement around OpenTelemetry recently. The CNCF Sandbox Project enables instrumenting, generating, collecting, and exporting telemetry data from your applications.

Daniel Bryant: As the OpenTelemetry spec has now been released as 1.0, I thought this was a perfect time to learn more. I followed Ted's work by the LightStep blog for quite some time and he's always providing great overview both at the high level and also at the implementation details level too. I was also getting to understand about the OpenTelemetry API and SDKs and the separation there. I wanted to know how to use this in my applications and also explore how all the related tech plugs together to allow me to observe and understand my systems.

Daniel Bryant: Before we start today's podcast I wanted to share with you details of our upcoming QCon Plus virtual event taking place this May 17th to 28th. QCon Plus focuses on emerging software trends and practices from the world's most innovative software professionals. All 16 track security by domain experts with the goal of helping you focus on the topics that matter right now in software development. Tracks include architecting a modern financial institution, observability and understandability in production and accelerating APIs and edge computing, and of course more.

Daniel Bryant: You'll learn new ideas and insights from over 80 software practitioners at innovator and early adopter companies. The event runs over two weeks for a few hours per day and you can experience technical talks, real time interactive sessions, async learning and optional workshops to help you learn about emerging trends and validate your upcoming software roadmap. If you're a senior software engineer, architect, or team lead, and you want to take your technical learning and personal development to a whole new level this year, join us at Con Plus this May 17th through 28th. Visit qcon.plus for more information. Welcome to the InfoQ Podcast, Ted.

Ted Young: Thanks for having me.

Daniel Bryant: Could you briefly introduce yourself for the listeners, please?

Ted Young: My name's Ted Young, I go by Ted Suo on the internet. Currently I work at a company called LightStep as their Director of Developer Education, and my main project right now actually is the OpenTelemetry project, so I'm one of the co-founders of that project. Coming from the open-tracing side of the fence that project merged with OpenCensus.

Could you share your thoughts around observability and discuss the core problem space in which the OpenTelemetry project sits, please? [02:32]

Daniel Bryant: Awesome, and that's pretty much what we're going to be talking about today, OpenTelemetry and the 1.0 release. But before we get into this could you set the scene and talk about the problem space we're going to be discussing today, please?

Ted Young: The problem space is what we're calling the observability space. This is what we used to call monitoring. To my mind they're actually really kind of the same terms we just redefined monitoring to be something a little bit narrower in scope. But we're still doing the same things we've always done, which is trying to find logical errors in our system and deal with resource contention, that's I would say 90% of what you're dealing with when you're observing your system.

Ted Young: And there's definitely been a bit of a sea change lately around adding new tools to that observability stack. Mostly to my mind with the goal of making things more efficient. I don't really feel like at the end of the day, we're doing anything new because they're the same bugs, the bugs haven't changed, but what we're trying to do is make it more efficient, allow you to spend less time dealing with the overhead of searching around in your system and allowing you to focus more on actually dealing with a problem having found it. I feel that's where the big shift in the observability space has been.

Ted Young: And the OpenTelemetry project is focused on one particular portion of that space that I've never seen anyone actually focus on specifically, which is the telemetry system. So this is the part of your stack where you're actually instrumenting your applications, generating data, and then transmitting that data to some remote backend where you're going to store it and analyze it. Usually that telemetry system is built out as part of a particular backend or service so you have some metrics system and then that metric system comes with metrics clients and then you put those metrics with clients in your application.

Ted Young: There's some issues associated with having that stack be unified vertically and glued together. I would say there's two problems there. One is around the ability to choose what you're going to do with that data, another problem is the heterogeneity of the modern world, you're often gluing together a bunch of different systems that may want to do different things. And I guess a third thing, I said two but three, is open source.

Ted Young: So one aspect of having a heterogeneous system is you're really building your system out of shared components, that's the most common way to build software these days you're not building your own HTTP client or web framework, you're sharing those with other people. And as someone who's written a lot of open source software, there's this really pernicious problem of how do I communicate information to the end user? I would like to log things, I would like to issue metrics. I would like to tell them operationally what's going on with the software I wrote for them so that they can tune it or handle errors or issues, whatnot. But you hit this wall where you're like, "What do I use? I can pick a reasonable solution, I can pick a logging library I like or a metrics library I like."

Ted Young: But that's not going to necessarily be the same choice the application owner has made and it's not necessarily going to be the same choice all the other open source software may make. So what am I supposed to do? I can't just force the choice of monitoring solution on someone who's installing my software and so then I end up just sighing and logging stuff to stand it out or adding some little interface where they can do the work of gluing it together somehow to wherever they want to put the data. And I always found that those solutions were pretty sad and a thing we're trying to look at by just focusing on this telemetry problem by itself. There's actually a number of these issues related to authors of open source software and operators, these big heterogeneous systems. Where having that telemetry system actually be more agnostic and neutral and separate from where you're storing the data, opens up a lot of doors to solving some of these problems.

Ted Young: It's sort of like scoping the problem down to just the telemetry part and saying we're going to make a telemetry system that works with everybody. We're making things not just faster, but I think this is one of the areas where we're actually opening some doors because we're allowing software that previously didn't have a good story for how it would deliver any kind of information about what it's doing. That software now actually has an option it can pick that's been scoped and written in such a way to specifically solve the kind of issues they would face when they try to glue all the software together into an application.

Has the software development trend towards microservices, containers, and cloud made the challenge of observability more difficult? [07:13]

Daniel Bryant: Gotcha, gotcha. And do you think sort of taking a step back for a second, the move to say modular architectures, microservices, distributed systems, Kubernetes, doc, all that good stuff. Has that made the challenge harder?

Ted Young: Yes, it has. It hasn't fundamentally changed the challenge because we've always been running distributed systems. Realistically when I look at the old lamp stack stuff I used to write it was still a client, a reverse proxy, and authentication service, an app service, a data service, there's at minimum like five or six pieces floating around. And unless you were running a very tiny application you probably had at least some chunk of that horizontally scaled.

Ted Young: And so you've always had this problem of, I have a transaction that's moving through my system and I would like to see all of the information about that transaction. If we just talk about logs for a second I would just like to see all the logs, but just the logs for this transaction. But how do I do that? I can scope the logs by machine, I know what machines the logs came out of but that was all of the transactions that were simultaneously hitting that machine. And so there's a lot of annoying scutwork that was always involved in not even analyzing your data but just before you could analyze it you had to call that data down into something that was in effect similarly of the data that you wanted to see.

Ted Young: And the data you most wanted to see was the transaction. I just want all of the logs in this transaction, but that's actually the hardest thing to get. And when you move from having five services to having 50 services, then that problem just becomes worse and you have to think too much basically and spend a lot of effort. And that's, I believe one of the main reasons why distributed tracing is starting to become popular, and distributed tracing is actually the backbone of the OpenTelemetry project. So we offer a variety of signals but they're all kind of built on top of distributed tracing as the core context propagation mechanism because it provides that transaction level contexts that you can't really get any other way.

What is the elevator pitch for the OpenTelemetry project? [09:22]

Daniel Bryant: So definitely keen to dive into that a bit more in a minute because I mean, I've cut my teeth as in Java days and I've been writing correlation ID injectors for my front end service and the passing down through so we'll definitely cover some of that in a second. But I'm kind of keen to help myself, help the listeners understand what OpenTelemetry is if we dive in a bit deeper. It's a specification, there's the tools, there's an API, there's STKs. If I've asked you, give me the the elevator pitch and then perhaps go a bit deeper, what is OpenTelemetry?

Ted Young: It's everything you would need to build a telemetry system, except for the final step of the data storage or data analysis tool. We specifically stay out of that game, we don't want to pick winners there, so we're scoped to everything from generating the data, to storing it in-flight, processing it in-flight, and then delivering it to the doorstep of wherever you want it to ultimately reside. And so that has a couple of pieces. The two main pieces are the clients that you would install in your application or service, and then a separate service called the collector, which is more like the telemetry system. The idea here is you want to egress that data as quickly as possible out of your applications and into another service and push more of your configuration and data processing and CPU-intensive tasks and all of that into a service that isn't your application.

Ted Young: So you have your clients and the clients are actually broken down into a couple of pieces and I can talk about that and explain why, but you have your clients. I mean, you can configure them and do everything in the clients, I should mention you don't need to run a collector. OpenTelemetry tries to be very, very modular in part because we want it to be widely adoptable, we don't want to have this pulling on a thread situation where you want one part of it and now you're stuck running the whole thing.

Ted Young: But the way we imagine people running OpenTelemetry is you install the clients. The clients are in as default mode as possible. They speak OTLP, which is a protocol we've invented for OpenTelemetry. That protocol includes all of the different data types, so metrics, logs, tracing, system resources, we'll probably add more in the future, but all of that into one single fire hose of data. And you send that data off to a collector. Then in your collector you're doing all of your configuration there. So that's processing the data, scrubbing it for PPI, and then choosing to send it off to various endpoints. So you might be teeing the data off to say, send your tracing data Zipkin and your metrics data to Prometheus, your logging data to your logging solution. More tools are coming out, like LightStep's one of them but there's other stuff that consume all of this data in whole.

Ted Young: I think if we're going to talk about the three pillars later that's actually a thing I'd like to talk about then because I think that's actually a shift in the industry. But the point is you're doing all of that work in this collector, which means if you're trying to change things, you're trying to change your typology, maybe you want to buffer this data more so you are holding onto more of it because you're dealing with some back pressure issues. You can roll out more collectors, you're changing where you want to put the data. So I want to actually now send some of this data off to a new service I want to try, that's a configuration change to the collector.

Ted Young: Maybe you're changing how you're processing the data. You want to add some additional tags to it or you realize you want to start scrubbing something out of it, or you have some old tags or you have some old data and now you have new data in a new format and you want to just transmute the old format into the new format, that kind of stuff. You do that all in the collector and that means as an operator you're in control of all of that stuff. And when you want to change these things you're not having to touch your applications, you're just redeploying your collector topology. That's the ideal setup that we see. The clients are as thin as possible, they're just moving the data out of the application as fast as they can and then you're actually managing everything in an operator-controlled service that doesn't affect the applications themselves when you want to change what it's doing.

Could you explain what a client is within OpenTelemetry? Is it an API, an SDK etc, a sidecar process etc? [13:29]

Daniel Bryant: Very nice. And just to pull a few things back, my brain often defaults to the Java days and I've run on 10 and I've run on Amazon and Kubernetes as well. When you talk about clients, Ted, is there some kind of SDK involved? I've got my Java app, my go app, I plug in some libraries in SDK, and then that would talk to a client which is running out of process, is that how you understand it?

Ted Young: Let me break down what I mean by client. We refer to the OpenTelemetry clients is all the collective stuff that you would run in your service itself. So you have a service that you're monitoring and that service has to have something installed inside of it that's going to produce data. Now I could say, as long as you're producing OTLP, then it doesn't really matter what you're running in your service, it could be Black Box. Also the collector accepts a bunch of existing data formats so you don't have to even be running OTLP so if you're just producing data and Zipkin traces or Prometheus metrics or things like that, you can continue doing that and adopt a collector.

Ted Young: But if you are doing the normal thing, you have an application, you're going to install instrumentation into that application that produces data and then you're going to install an SDK that's going to do something with that data and send it off. And we actually separate that out very cleanly in telemetry, it's actually worth digging into so let me spend a second talking about this. Rather than just having a single library that you haul in we've broken it down into pieces. One piece is what we call the API, the other piece is what we call the SDK.

Ted Young: The API is what you use when you're writing instrumentation. This could be instrumentation packages that you're going to install, so I'm running Django and I want to install the Django instrumentation so OpenTelemetry or somebody provides you instrumentation for that third-party library, so that's one target. It might be I'm the application developer and I want to start adding my own log lines and my own application data to these traces. Like I want to start attaching project IDs and things like that that are application-specific. And then there's the open source library authors themselves who may want to start natively instrumenting. So rather than a third party installing the Django instrumentation, maybe the Django developers take that on themselves and they just provide instrumentation out of the box.

Daniel Bryant: Ah, that's easier with one OpenTelemetry format than doing it for all the different vendors that are out there.

Ted Young: Yeah. This is actually a key separation. It doesn't matter what format you're sending the data in because when you instrument you're only pulling in this API package, and by API I mean programmer interface. And that interface is very, very thin. It's just interfaces and constants, it pulls in no implementation whatsoever, it's completely separated from the implementation. That crucially means it's not pulling in any dependencies.

Ted Young: So one thing we look out for is if all this software is going to compose, it cannot have a transitive dependency conflict because I'm pulling in OpenTelemetry instrumentation and that was pulling in some other thing and now I have a conflict between the versions of GRPC I'm running or something like that. Likewise, we're really, really laser-focused on backwards compatibility for that API layer, because while it's possible for people to upgrade their SDK that's really easy we presume people are going to write instrumentation and that's just going to be hanging out there for years and we don't expect people to go back through and clean that up so your new stuff needs to work with the old stuff.

Ted Young: By having this cleanly separated API package we're able to really focus on ensuring that the callers of that package are never broken either due to additional functionality being added to it or due to some dependency that it pulls in. So the SDK then is actually an implementation of that API. But you could also theoretically write your own, in fact, we have three implementations right now. If you don't install an SDK, if you don't install anything at all you just add instrumentation, then there's a no op implementation that gets installed by default, the API comes with a no op. By default if you install the... Let's say you have a library that's got OpenTelemetry instrumentation, you install an app, that app owner isn't doing anything with it, then it's just no ops so the overhead is minimal.

Ted Young: We also in most languages for testing all of this stuff we have a test implementation like a mock or a fake SDK that we can plug in and then the production SDK is what we call the SDK. But in theory if you didn't want to use our SDK, let's say you have real time performance concerns or you're trying to do something crazy and it just wouldn't work as part of the SDK framework, because the SDK is a framework like most frameworks that provides exporters' interfaces so you can export data to different format so you could write your own export or span processor, all that usual stuff. But you could in theory just bring your own implementation and plug that into the API.

Ted Young: It's not anything new, it's just a clean separation of concerns which I think is very critical when you have these cross-cutting concerns like instrumentation, Log4j I think is a good example of this. But I wish everyone wrote software this way, frankly, especially software I'm going to depend on but just because we're looking at the long-term with OpenTelemetry these are the kinds of things that we take seriously. And I think they're the kind of things it's easier to focus on once you scope the project down to just the telemetry system. It's a little bit easier to say, "We want to make this telemetry system really, really good." And then it feels okay to spend the extra time really solving these issues and making sure all of the edge cases are going to work.

Does an OpenTelemetry Collector act as a funnel for telemetry and as an adapter/facade? [19:19]

Daniel Bryant: The way I had you talk about the collector, and I've seen a bunch of folks online talking about the various collector implementations, it's almost like both a funnel and an adapter, when I say adapt it's more in the classic facade/transformer-type pattern. Is that the correct understanding?

Ted Young: That's exactly it. It's a funnel in the sense of you can buffer your data there, so there's room to handle back pressure and it's like a Swiss army knife where it has a basic model that has receivers, processors, and exporters. So you have receivers that can accept various formats. Those receivers take that incoming data, they translate it into OTLP internally. All of your processes are written against OTLP and then you have exporters that take OTL P and translate it into various formats for export. So you only have to rate your processors against OTLP, and then you can glue them together with various receivers and exporters to make a typology that works for you.

Daniel Bryant: With that sort of cohesion and coupling there, you would have mentioned a single responsibility principle, separation of concerns. This all sounds like you've done the heavy lifting there, all the community and yourself, heavy lifting the hard thinking around these key obstructions.

Ted Young: We've spent a lot of time on design, that's definitely true, we want the design of this to be clean. The next step is to make it efficient. So over the summer we're going to be working on finalizing the various APIs, the tracing API is finalized, that's what the 1.0 announcement was about. And then we'll be finalizing the initial stable version of the metrics API and our logging system. That will all happen we hope by end of year.

How do the three pillars of observability relate to OpenTelemetry? [21:00]

Daniel Bryant: Brilliant. I'm keen to dive into the three pillars of observability and then have a look at the one that I released a basket as well. So you mentioned around the three pillars and I've had Ben Siegelman, of course Cindy Sridharan and lots of folks talking about the three pillars of observability being new metrics, logs and traces. How does this relate to OpenTelemetry?

Ted Young: Oh man. I'm viably cranky person when it comes to the three pillars. So I would say two things. One is it takes those three pillars and turns them into one pillar. And this relates to when people talked about the three pillars to me what they're really saying is you have three different tools that you're going to put this data into. You have a metrics tool over here and you have a logging tool over there and then often you've got your tracing tool, so there's three pillars.

Ted Young: But if what really actually helps make things more efficient from an operator's perspective of both being able to funnel the data out and then also being able to analyze the data and find correlations and get to a root cause as fast as possible, you don't want this information spread out across a bunch of separate tools that don't talk to each other. The ideal solution is you have one tool that accepts all this telemetry data and cross-correlates it as much as feasible. That's one area where I see to a certain degree that the concept of three pillars breaking down.

Ted Young: The other area is I don't believe tracing and logs are two separate things, I'll just come out and say that. People treat these as two totally separate things. I think because they always had a logging system and then early tracing systems tended to be sold as something for latency analysis specifically, and they tended to have a very simple sampling mechanism, they tended to come with some kind of head-based sampling where at the beginning of your transaction you roll a 10, 24-sided die, and if it comes up a one then you trace that request, but otherwise it's off. And so if you're trying to log exceptions and errors that seems like not useful.

Ted Young: But these were really just implementation details, they didn't really have anything to do with tracing as a concept. Tracing as a concept is just, as you mentioned before, it's adding the correct correlation IDs where you want to be able to correlate all of the events that happened in an operation, and then you want to correlate all of the events that happened in the entire transaction. And as soon as you try to do that you're like, "Well, I have this transaction ID and I need to flow this through my program so I can staple it onto all of my logs." As soon as you do that you're tracing. Even if all you're doing is "logging and sending it to a logging system," if you've done the work to flow that transaction ID through, that's tracing. To my mind, tracing is just structured logging with better context, that's really what it is.

Can you explain the role of “baggage” (metadata) within OpenTelemetry, please? [23:49]

Daniel Bryant: And I hear a lot of folks in that OpenTelemetry community talking around baggage, and that's like in addition to say, correlation IDs I can put metadata that relates to what I'm interested in?

Ted Young: Yeah. That's just a way of building your own cutting concerns. The hardest part of all of this is the context propagation actually following the flow of execution through your program and then when the network hops occur, serializing that context on one end and de-serializing it on the other and then continuing the flow of execution. What inevitably happens is people install a tracing system that does all of that and then they're like, "Now that I've got that context flowing through my system, I can think of other things I'd like to do with that." And so that's what baggage is. It's just a dictionary that you can flow through your entire system.

Ted Young: It's not actually directly for observability, it's more just a simple API to allow people to build other tools, other cross-cutting concerns that would benefit from having that kind of transaction level context propagation. Just to give a quick example, A/B testing is I think a great example. If you have a big distributed system and you're say rolling out feature flags or rolling out, I have an A version and a B version I'm trying to test, that's actually operationally tricky. I don't know if you've tried to do that before but it's harder than it sounds. Especially once your system starts having say like three or four tiers in it, you're like three tiers back and you're like, "Am I on the A path or the B path?"

Ted Young: And you end up having to do multiple deployments and managing this with network switches and stuff. And I don't know, it's annoying, but if you could just flow through a piece of data that said A path, B path, and then every service could just switch this operation based on that, this is a much simpler approach to A/B testing and feature flagging. I think once this gets out there and people get used to it you're going to see other interesting tools get built on top of those context propagation tools.

What is the history of OpenTelemetry? And what are the highlights of the 1.0 release? [25:50]

Daniel Bryant: I'm keen to dive into the 1.0 release of OpenTelemetry. Now, I know this means there's history there. There was OpenCensus, OpenTracing, they merged. Now OpenTelemetry is a CNCF Sandbox project. So can I get the elevator pitch for the history, I guess, and what the 1.0 release signifies?

Ted Young: The history is they were two projects and they were like peanut butter and jelly. On the OpenTracing side we were really focused on the separation of concerns that I mentioned before where if you need the instrumentation API cleanly separated from your implementation or it's not really feasible to provide something that shared libraries could use. So we were really focused on that but we didn't provide an implementation and that turned out to be really annoying to users. Like, "Cool, I want to use OpenTracing." They're like, "Great, here's the API." And they're like, "Well, how do I install it?" And like, "We'll go pick some other thing and install that." And they'd be like, "What?"

Ted Young: We've preserved the ability to do that, so that's the OpenTracing part of OpenTelemetry, is you've got the separate API, you can plug in whatever you want. But OpenCensus was like the kitchen sink framework solution which everybody wants. Everyone is like, "I just want a thing that works that I can install that is a flexible enough framework I can write plugins against it. And I don't want to think, I don't want to glue this stuff together, I just want a one-stop shop."

Ted Young: And so we had two projects and they each had half of the coin and the community basically started yelling at us to merge. And eventually it became clear that this would be a really beneficial thing and that if we did merge then that would create a lot of momentum. And it definitely did, as soon as the projects merged the interest in the projects increased dramatically, basically. A lot of people were sitting on the fence waiting to watch us sort this out so when we merged into one project that caused a whole bunch of people to join in and be like, "Great. Now there really is one solution that has a chance of becoming a standard so let's join in." So that's kind of the history.

Ted Young: And then the 1.0 is simple. It's just the marking of stability. We've added stability guarantees to a portion of our system. These are the guarantees I was talking about a bit earlier that the API if you use it now it's never going to break you. If you write plugins against the SDK we will be deprecating existing plugin interfaces as we add new ones but we won't be sharply breaking them.

Daniel Bryant: You get that life cycle, effectively.

Ted Young: Life lifecycle. So we want to make it sure end users can always just upgrade to the latest version of the SDK and receive performance and security patches and not be held back by some plugins they're using they're using that haven't upgraded to some new plugin API, and we've added all of these guarantees for the tracing portion of the system. The metrics portion is still experimental. We're actively working with the Prometheus community right now to ensure that what we're building there is going to work well with Prometheus. Our other major target is StatsD but StatsD is a lot simpler so that's easier target. Our goal is by November to have stable releases of that out the door, so we'll be in beta over the summer, basically.

How does the OpenMetrics project relate to OpenTelemetry? [29:01]

Daniel Bryant: Very nice. I was chatting to some folks a while back around OpenMetrics. How does that relate into this? Is that closely following previous style metrics?

Ted Young: Yeah. OpenMetrics is a protocol. So it's specifically a metrics protocol that I believe Prometheus is in the process of adopting and it's just one more protocol that we plan on supporting. So internally we have our own format and this is due to the fact that one, we need a format that handles all of these different signals so we had to make one because there wasn't one. But also there's some aspects to OpenTelemetry wanting to push information out to the edges.

Ted Young: We a couple of things that aren't in OpenMetrics right now around handling deltas. For example, OpenMetrics, doesn't have a concept of deltas. We want to be able to handle a variety of histogram formats and details like that. So we're actually working directly with Richard Hartman and the OpenMetrics people to ensure these things are going to play nicely with each other and there's not some issue if you want to consume OpenMetrics data but use OpenTelemetry as your telemetry system.

Daniel Bryant: I think for some of us sort of more on the outside of some of these looking into the CNCF, and CNCF is fantastic. But we see a lot of open X, open Y, and it is how do I plug all these things together because a lot of times it's an architect. As a developer I'm like, "I just have these problems. How do I understand my systems?" And then being able to have an opinionated way of mixing these things together is super useful.

Ted Young: And that's actually the specific problem we want to solve with OpenTelemetry, is saying just this is a telemetry system that you can install, and this will allow you to continue to have choice. So as these various observability solutions continue to show up, because there's a set of them that are already in, the CNCF, Prometheus and Jaeger, I'm sure more will continue to get invented. And there's a variety of protocols, the stuff people are already using, new stuff people want to use. And so the idea with OpenTelemetry is you don't want to be rewriting all of the instrumentation or yanking things out and shoving new things in, you just want to run one telemetry system and then at the edges be able to have choice about what you're doing with that data.

If listeners want to get involved with the OpenTelemetry project what's the best thing they can do? [31:11]

Daniel Bryant: Still perhaps, this has been a great tour of I think the entire landscape around this space. If listeners are getting excited and wants to get involved what's the best thing they can do?

Ted Young: The best thing people can do right now is give us feedback. So we've shipped a stable version but now we're in the process of adding as much convenience as we can, simplifying the installation experience, adding any kind of high-level API functionality, things like annotations and stuff like that. And so we would love feedback on how using OpenTelemetry feels right now, where are the pointy bits that we could send off. If people want to actually get involved in working on the project we have a repo called the community repo in the OpenTelemetry GitHub organization, and you can find all of the various working groups there.

Ted Young: Each language has a working group and then the various specification efforts have a working group. They tend to meet once a week on Zoom calls, so you can show up to those calls and say, "Hi," or have a look at the organization level. We have a set of projects like Kanban Boards that show what the current initiatives are. You can get involved in all those levels. You can also help us write the specification. OpenTelemetry is driven by a specification. First we prototype it then we add it to the spec and then the spec is released and the implementations update themselves. And that's an open process, we have an RFC process so you're welcome to write an RFC and partake in the development of it in that manner as well.

How can listeners reach out to you? [32:40]

Daniel Bryant: If listeners want to reach out to you, where's the best way to find you? Twitter, LinkedIn, somewhere else?

Ted Young: Twitter is good. So I'm Ted Suo on Twitter, T-E-D-S-U-O, and my DMs are open so feel free to hit me up there. Also OpenTelemetry is on the CNCF Slack instance, so that's another good place to get a hold of us for anything out there.

Daniel Bryant: Excellent. Well, thanks for your time today, Ted.

Ted Young: Thank you so much, Daniel. It's been great.

Mentioned

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and the Google Podcast. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

Navigating AI, Platform Engineering, and Staff-Plus: InfoQ Dev Summit Boston Preview

Courtney Nash Discusses Incident Management, Automation, and the VOID Report

Architecture Does Not Emerge - a Conversation with Tracy Bannon

InfoQ Architecture and Design Trends in 2024

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?