InfoQ Homepage Podcasts Alois Reitbauer on Cloud Native Application Delivery, Keptn, and Observability

Alois Reitbauer on Cloud Native Application Delivery, Keptn, and Observability

Oct 28, 2020

In this podcast, Alois Reitbauer, VP, Chief Technical Strategist and Head of Innovation Lab, sat down with InfoQ podcast co-host Daniel Bryant. Topics discussed included: the goals of the CNCF app delivery SIG; how cloud native continuous delivery tooling like Keptn can help engineers scale development and release processes; and the role of culture change, tooling, and adopting open standards, such as OpenTelemetry, within observability.

Key Takeaways

The CNCF’s Application Delivery SIG focuses on delivering cloud native applications which involves multiple phases including building, deploying, managing, and operating.
The Keptn CNCF sandbox project is an event-based control plane for continuous delivery and automated operations for cloud-native applications.
The observability and application performance monitoring communities are often attempting to solve similar problems, but from different perspectives.
Often changing culture is a big part of successfully adopting modern monitoring and observability approaches.
OpenTelemetry is a collection of tools, APIs, and SDKs. Engineers use it to instrument, generate, collect, and export telemetry data (metrics, logs, and traces) for analysis in order to understand software performance and behavior.

Subscribe on:

Transcript

01:01 Introductions

01:01 Daniel Bryant: Hello and welcome to the InfoQ podcast. I'm Daniel Bryant, news manager here at InfoQ and product architect at Ambassador Labs. And I recently had the pleasure of sitting down with Alois Reitbauer, VP Chief Technical Strategist and Head of Innovation Lab at Dynatrace. I've followed Alois's work for quite some time now via his presentations in the application performance monitoring space, and also the work he's undertaken in the Cloud Native Computing Foundations Application Delivery Special Interest Group, more commonly known as the “App Delivery SIG”.

01:17 Daniel Bryant: Alois is keen to explore his thoughts around infrastructure as code and application delivery in the Kubernetes space. In particular, I wanted to dive a little deeper into their Keptn CNCF Sandbox Project, which I know he's working on. And this project is aiming at addressing core challenges within this domain.

01:41 Daniel Bryant: Changing gears slightly, Alois has worked in the monitoring and app performance space for 15 plus years now and so I was keen to explore his thoughts around the rise of observability and how new projects like OpenTelemetry will help both end users and monitoring vendors.

01:53 Daniel Bryant: Hello Alois, welcome to the podcast. Could you introduce yourself to the listeners please?

01:57 Alois Reitbauer: Hello, I'm Alois Reitbauer, I'm Chief Technology Strategist here at Dynatrace where I'm responsible for our open-source involvements, involvement with CNCF and some of our technology research topics. Thanks for having me today.

02:09 Could you explain the goals of the CNCF App Delivery SIG, please?

02:09 Daniel Bryant: So when we were talking off mic, we were talking around the CNCF, the Cloud Native Computing Foundation. I know you were a co-chair in the app delivery working group there, say a listener is not that familiar with the CNCF and the working groups, could you give us a bit of a high level pitch as to the advantages and what's going on within these working groups?

02:26 Alois Reitbauer: When you look at the CNCF, you obviously see the technical oversight committee that's doing a lot of work. But as we all know, the cloud native community has massively grown over the last couple of years and there are lots and lots of different areas and you can't expect people to be experts in all areas, drive all the initiatives. You also have to take into account that many of these initiatives are done by people as a side job, not as their main job.

02:50 Alois Reitbauer: And the idea behind the SIG or the Special Interest Groups is to cover special topics and support the TOC. And there are numerous of those. There's one on security, many others, and the app delivery one is one that we with the great help from Alexis started a while ago, which is focusing on application delivery on Kubernetes. Which sounds like pretty straight forward but there's lots of questions, just to give you some ideas, how do I even define what an application is? Everybody talks about blue-green deployments if you remember the suite from Kelsey Hightower, that he can't see a Kubernetes demo anymore, where somebody does do a blue-green deployment. Still, I think if you talk to seven people doing blue-green deployments, you get eight different opinions how it should be done.

03:32 Alois Reitbauer: And then there's also a lot of tooling also that comes into the CNCF. So the SIGs also do due diligence in projects, project reviews in a specific space. We also try to bring new projects in and have people talk about it and bring up new topics. So we have now two topics that we had working groups on, we are now moving back into the main work. One is about air gapped environments. So Kubernetes environments are not connected to the internet, you can try this, turn off your internet connection and try to use quick control apply, you will realize that this is harder than you think, or working in a fully offline scenario, which is obviously becoming more and more of an issue for companies to work there. The other one was more focused on definition of operators.

04:12 Alois Reitbauer: Right now we're looking also into, okay, how can we help people better to understand some of their questions? There's a lot of demo applications out there, but how can we build a more real world use cases? Look at Hipster Shop for example, Hipster Shop is great to see how a microservice application works, but there's no stateful workloads. There's no more traditional type database in there. There's really not a lot of secrets management involved. A lot of the things that you would have in a real world application that you have to manage and deal with. And that's what the SIG are for, bringing up those topics, coordinating them, producing white papers and doing a lot of those things.

04:45 Alois Reitbauer: Historically, there was way more work reviewing sandbox projects, so that's the early stage project, that has now been much more streamlined with the TOC in a simplified process. But that's basically what you can think of the SIG specialized groups that deal with certain topics.

04:59 What are the advantages and disadvantages of “infrastructure as code”?

04:59 Daniel Bryant: Excellent. So when you talk around there in app delivery and sort of everything as code, infrastructure as code, what do you think the advantages are and disadvantages are for things like infrastructure as code?

05:09 Alois Reitbauer: So infrastructure as code I think is great. And I think we've just learned what the first generation of infrastructure as code helped us, but also what the problems are. I think there's one thing about code, it's great, it can be managed in a source code repository. You can have version control on it. Obviously everybody who can code can write it and it's very native for developers to manage it.

05:30 Alois Reitbauer: That was for example, one thing for me, if whenever I have to look at those workflow tools where I graphically model something that's been stored in a third party tool, I was like, this sounds great if you sell it to an executive, but for me as a developer, I have to go into another tool, I can manage it. How do I keep the things even in sync?

05:46 Alois Reitbauer: Like for my Jenkins pipelines, I want them in my source repository. And if I change a build, I want to have them right with my source code and not managed somewhere else or other forms of deployment scripts, and so forth. I think that's great. I think what we're all struggling with as an industry or where I've heard people struggling with and where we need better answers is reusability. So there's one powerful tool in the whole of computer science, which is copy and paste.

06:12 Daniel Bryant: I've used it.

06:13 Alois Reitbauer: Yeah, you've used it all. I think sometimes we're using it too much. And I think a lot of these S code approaches are very much relying on a copy and paste approach or they have more of a realization in there, but that's something that we're struggling with. How can we build more reusable blocks of deployment automation, configuration management? There are means there, but a lot of it is copy and paste code still. And I think that's what some of us are struggling with.

06:36 Daniel Bryant: Well said. All jokes aside I've definitely fallen victim to that, I've gone to stack overflow, copy pasted something and then it's not worked as expected, right?

06:44 Alois Reitbauer: Especially in an enterprise setting, not everybody should have access to everything. So there S code might also be helpful because it can be validated. So I was talking to some of the Dynatrace customers who I work with, I said, "Well, even when people write a deployment script, somebody has to validate, but it's doing what it's doing because you might be in a highly regulated environment." And that validation sometimes actually it takes longer than writing the script. And obviously the script is easier to validate than something that's not as code and modeled in some proprietary way. But overall I think the S code as we want developers to do something or to take responsibility for something or development teams, I don't even like to say engineering teams.

07:20 Alois Reitbauer: So I'm going away from developers and development teams. If you really think about this microservice approach, you build it, you run it, you have a lot of different skills in this team. And it's not just the people who write the code, it's the people who are responsible for security, who understand how deployments work. There's also the people who have the right documentation and might be responsible for developer experience if other services are consuming it. That's why I'd like to talk about engineering teams more than about developer teams.

07:46 Alois Reitbauer: But the point I wanted to make there is we have to give them something that is easy to use for them. We can't expect them to learn something that's entirely new. And we can argue as much as we want over JSON versus YAML versus JavaScript. At the end of the day developers and engineers know how to write code or work in code like environments and not in proprietary tools. And I think that's why the whole asset code approach is super useful where it is. And I think it has helped people, just the scalability issues, I think is something we are running into now, as we are using it more and more and bigger and bigger environments.

08:19 Your recent blog post argued that “your delivery pipeline will become your next big legacy code challenge”. Could you explain the motivation behind this?

08:19 Daniel Bryant: I really like your blog post that talked about how your delivery pipeline will become your next big legacy code challenge. And I think that riffs quite nicely on what you've said there. What did you mean when you say that your delivery pipeline could be the risky part of your infrastructure?

08:32 Alois Reitbauer: This was something that came out of quite some work we were doing over the last two and a half years where the app delivery space, we were working with more and more customers how we can help them to automate more of the delivery pieces. And if you're a software engineer by background, and if you look at how delivery pipelines are built, even if you argue that they are built as code or as something that's similar to code, we have come a long way the way we build systems. We don't build them in a monolithic way. Still, most pipelines are highly monolithic.

09:04 Alois Reitbauer: We've come to an event based approach where we subscribe to events and distribute them to subscribers. That's not how we build pipelines. We have all agreed that we are on that road of test driven, automated testing. What we look at from a pipeline perspective, from a delivery integration perspective, we cobbled together a lot of tools, why APIs make stuff work. And when it works, we deploy it and then we hope it's never going to break. And if we need something else we just put it on top. And if we again, we need something else, we just connect yet another API to yet another API.

09:39 Alois Reitbauer: And it's more of this, I wouldn't even call it craftsmanship, but we just get stuff to work in many cases and when it's good enough to work, we leave it that way. But we don't apply the same principles we apply to the rest of our software. And that's where we very often run into situations where I've said, "Nobody wants to touch the pipeline anymore." We have the money, but nobody wants to touch it. And if somebody did it really well, what people do they copy and paste it. So back to that copy and paste story and run a second version, but they have to modify to bits and pieces. And then the next person or the next team modifies again something.

10:13 Alois Reitbauer: So you're running up with all of those snowflakes of pipelines. And I like to say the snowflakes are pretty when it's a winter landscape, they're not so great when you have individual pipelines that you're running. So that's how I came up to the legacy code. So the definition of legacy code for me there is code that was built a long time ago, it no longer fits the most modern best practices that are in there like test-driven development, modularity, event driven architectures, and that nobody wants to touch. And I think that's where we've left with those pipelines.

10:25 Could you introduce the Keptn continuous delivery tool?

10:25 Daniel Bryant: Very interesting. Very interesting. And the blog post goes on to further explore the Keptn tool that I know you've worked on and that the team worked on quite a bit. Could you introduce what that tool is for the listeners please?

10:50 Alois Reitbauer: Yeah. So Keptn is a CNCF sandbox project now. And the core concept is actually to solve two problems, the first one is that they don't build monolithic pipelines where you have everything mixed together, and that enables more reusability. So what we did with Keptn, we separated the actual flow of things like how things should happen in an order like you want to build, test, deploy, propagate, from the tools that are interested in these events and they want to participate in them. So it's an event based architecture.

11:15 Alois Reitbauer: So maybe bringing it down we thought the bottom line is we thought if we build event based microservice architectures, maybe we should deploy them with an event based microservices system as well. So that's what we call like the shipyard file, which is merely the event flow specifications. And then you have services that subscribe to these events and do something. Like the shipyard file might only specify that you wanted to test step. And then the engineer who's responsible for your environment might have said, "Yes, we so far did testing only in a say a non-functional way. Now we add performance. Now we add security." But historically what you had to do, you had to modify all those pipelines.

11:51 Daniel Bryant: Oh, I see. Yep.

11:52 Alois Reitbauer: Add this to those pipelines. Now the only thing you have to do is for this environment, you add one event subscription. Say whenever I see a testing event, I'm also doing this way now.

12:03 Daniel Bryant: It gets very modular, very composable.

12:05 Alois Reitbauer: Exactly. And another example, Cyber Monday type of example. So you're in the e-commerce space, you want to do work with rapid prototyping and rapidly roll out features, AP tests, ABCD tests, multicolor deployments, everything great. There is just this weekend between Black Friday and Cyber Monday where you won't be happy. So try to change all the of your pipelines to the point where you're deploying into production to move from automatic experiments to manual approval only for all of your microservices. So how many bits and pieces do you have to change and then you might want to change it back three days later.

12:38 Alois Reitbauer: And so the basic idea behind Keptn is providing this flexibility by separating the how from the what. It also adds additional reuseability. So when we talk to customers in regulated industries, they might define hard-coded a process it has to be followed like this has to be tested this way. There have to be security tests and these things in there, but the tools might vary from team to team. So I can define the process independent of the tools that should be used and still know, okay if everything follows these processes, instead of having to read all of their deployment scripts to understand what it actually does. Because as I mentioned before, for some customers we work with, this is a big challenge. The script is written rather quickly, but somebody has to look at those scripts, whether they fulfill all these requirements.

13:17 Alois Reitbauer: And the second part is then that we said, "Well, it's fine that we talk about delivery, which is our day one operations, but the same thing is actually true for day two operations. I cannot ship operational instructions easily with my service." So we also what we did in Keptn, we extended it. That it can ship what we call remediation as code. It's more or less operations instructions, one box as code, along with the application, it can react to alerts of whether they're coming from Prometheus or from other monitoring tools, automatically check and get for that version of the service that you're running, which remediation action you want to trigger. And it's obviously closed loop replenish. If it doesn't work it's because another one that I think is it automatically rolls back and forward with the changes. And even if you have multiple versions in production, it will pick the right version. And again, the developer never leaves their usual area of where they work in which is the code repositories. So that's the short rundown of what Keptn is doing.

14:08 What is the developer experience with Keptn? What configuration does a developer or operator have to write?

14:08 Daniel Bryant: Yeah, that's perfect Alois. I mean, I appreciate podcasts are not the perfect medium for my next question, but from a pure sort of mechanical point of view, how does this work? Because for example, I'm a long-term Jenkins user. I have my Jenkinsfile, maybe a Dockerfile, some other stuff like that, templates, for example, what does the Keptn deployment look like?

14:25 Alois Reitbauer: So Keptn, it self deploys on Kubernetes, every disc tool that you want to use. So whether it's all the way down to like a case really S small distribution, Keptn is based on projects and services, it's either API or CLI based. So if it would start from scratch, you would create a project and the services, then you'll link your services to the deployment script repository. So we're deliberately we're doing CD. We're not doing CI. We're not building images. We take the artifacts when they are done, but we're looking at the deployment scripts.

14:52 Alois Reitbauer: So you use your helm charts and you would point at, "Okay, for this service you're using these helm charts." And as you create your project, you also define in a shipyard file what your process looks like. "I want to have three stages. These stages would work this way. This stage would work this way. Here I want this manual approval. Here I want to have automatic approval."

15:07 Alois Reitbauer: You have quality gates built in. So quality gating is out of the box. So it's a way more opinionated approach towards delivery. So it's not like you have tasks that are there. It's like all tasks are typed and all messages are typed. So a delivery is something, a deployment is something that exists. A test is something that exists. A test needs a validation.

15:28 Alois Reitbauer: So this goes beyond like this thing doer mentality. It's a bit more limiting maybe in what it's doing, but it also provides more structure to this. And that's also why part of Keptn is actually the specification of these events, which are cloud events. And where we are also people on the team, working with the CDF folks on standardizing on these events and have findings and coordination there. So you create the shipyard file then you just install your services. If you want to just one of Kubernetes you have batteries included. Services that could do helm based deployment it also automatically has the rewrite. So it's multi-stage aware of deployments. If you say, "I want to have four stages," it will build based on that service helm chart, it will build out the individual stages for you so you don't have to manage this.

16:08 Alois Reitbauer: So as a developer it's pretty transparent what the delivery workflow looks like. You don't even know whether it's three or four stages and it shouldn't necessarily have to care about it, but it builds it out. And you obviously can modify and mitigate the ops type fashion, as that is it's coming with batteries included so on Kubernetes if you upload your helm files for your service, it will deploy it into multiple stages, run your tests, check deployments for you. But as the services are exchangeable by definition, you could put in something like Argo, you could put in something like Flux, if this is what you want to use.

16:37 Alois Reitbauer: So the ideas will be right at the control plane than to be the solution. And that was the design driver behind it. So when we looked at that whole problem of, we want to ship something from dev to production, with a set of different tools that do different things and need some kind of orchestration on top of it, at some point, we have built this before in this industry and it scales pretty well. This is what we call the core architecture of an SDN. So we move an artifact from east to west, dev to production. We use obviously the data plane, which in our case are all the delivery tools that we're using. So it might be a delivery tool, it might be a testing tool. Then we have the control plane. And we have on top the application plane, which is in our case, the definition of what I actually want to happen.

17:21 Alois Reitbauer: So from the conceptual level, this is pretty much the same what we're trying to do here. And these concepts have served us pretty well. So why don't we just build something that similarity works that way? And it's also about a lot of these services, existing Keptn, while they can be as complex as you want them to be they might just be an API call for example, to an Argo. Or they might just write a different GIF repo to do certain things. And also why we have a separate specification on the events, because eventually we hope that we can agree more and more on what these events look like, and then eventually have tools actually understand these events out of the box.

17:54 Daniel Bryant: Nice interoperability.

17:55 Alois Reitbauer: Yeah, the more successful we are with this approach down the road, the less you might need something like that translation layer in Keptn, which we're totally fine with. Because the big challenge we had was, "Oh, this is snowflake prop." And the more we work with customers like building these automated deployment and remediation environments, we thought at some point we have built 500 different environments. They all look similar, but they are not. And then people come back to us and it's like totally unmanageable and you have to start all over.

18:22 Alois Reitbauer: And that's also what you see people actually doing today. And what's in the testing industry, it's always repeating itself here with test scripts. People never modify test scripts, they just throw them away and re record them. And the same I think is happening here. People are barely modifying those scripts, they're just throwing stuff away and rebuilding it. At some point this is also not a scalable solution so much.

18:41 Alois Reitbauer: What's also in there and what's also specifications that's remediation file, which you can think of like an IFTT for runbooks, which is shipped alongside your description. It's like if you run into a slowdown switch to static delivery, like enable this feature flag. Or in this case, scaling it up is something that most platforms provide out of the box, but when it becomes more application-specific, that's also what you ship along in there.

19:05 Alois Reitbauer: And again, everything exists as a specification as well. So even if people don't want to use Keptn and define the specification and use of those things, or as ideas, it can be used independently.

19:14 The concept of “remediation as code” is interesting! Could Keptn be used to implement solutions in the emerging “AIOps” space?

19:14 Daniel Bryant: I know this is a bit of a buzzword, but I've heard you use it and I've definitely used it myself. It's almost heading towards the AIOps space, right? Artificial intelligence operations, even though it’s a simple “if, then” type approach, it is adding some intelligence into the pipeline. Would that be fair?

19:28 Alois Reitbauer: Yeah. So AIOps are actually a couple of different things. The first generation of AIOps was a lot of events deduplication, and now we are moving into this generation two where tools are doing more intelligent automation steps. And it's always funny when you talk to people about self healing and the runbook automation type of things they say, "Well, it's self healing, it sounds like science fiction." And then you show them how it's implemented and say, "Well, this is pretty simplistic and pretty straight forward." They say, "Yeah, but it's still really great. It's saving you hours of work and it just makes you faster than what you're doing today."

20:00 Alois Reitbauer: And especially if you have a microservices environment it's really paying off because they again, back to using monolithic runbooks to solve microservice distributed problems, my example is always okay, so we have this more micro service. We have a too high load we switch to static delivery, which is great. So we're now creating more load on the Redis cluster. So we now need to scale up the Redis cluster. As we have scaled up the Redis cluster, we are running into some CPU limits somewhere else. And so you would have to have like these very complex runbooks.

20:30 Alois Reitbauer: So instead of deciding for a service on its own, but it's the best way to solve this problem we try to solve these end to end type of processes. Like the pipelines start to become highly complex and hard to manage and maintain and then I'm challenging you what if you're running three different versions of a service in production and each service has a different remediation action and then you have to roll it back, that doesn't work from an operations perspective anymore.

20:52 Switching gears to the topic of observability, could you explain what the goals of the OpenTelemetry project are, please?

20:52 Daniel Bryant: That's been a great tour, as I say, of delivering and remediating. I think bridging the gap between those things is observability. We need to be able to understand what's going on, particularly for end users and things. I know you are doing a lot of work in the open telemetry space. I was chatting to Rob Skillington from Chronosphere a couple of weeks ago again now. And he was talking about OpenMetrics and how that related to Prometheus and so forth. So could you explain to us what OpenTelemetry is and who would be the target users and how it fits into the OpenMetrics and all these other standards as well, please?

21:20 Alois Reitbauer: It's always very interesting when you ask a vendor that has been in this business for almost 13 years and a lot of these technologies are proprietary, we actually see that this is a great movement per se. So whenever you talk about OpenTelemetry and OpenMetrics, and I remember our very first conversation, especially on the open telemetries space was back then so open census for, a Google project, and also open tracing. I was talking with Morgan from Google and I think in the second one we had an agreement, like if you look at what we are doing in the industry, and now I'm talking mostly about APM providers that has been in this space for a longer period of time, we reverse engineer third party frameworks, figure out how to best instrument them, put the instrumentation into a separate code stream that we usually add dynamically at run time and maintain them along with the core application codes. This is what we have been doing for a while. And it's not just one person in the industry is doing it, but everybody's doing it.

22:13 Alois Reitbauer: Like each APM vendor has a way to monitor Apache HTTPCore and things. At Dynatrace there is a large number of engineers, close to 100, or depending on how I look at it, just looking at this. And we say, "Well, couldn't we make it easier for framework vendors. And couldn't be just all agree to just do this once and focus on where the real challenges is?" Because as we provide more metrics and as we collect more and more data, we want to focus most of our efforts on the analytics side and not on the data collection side.

22:44 Alois Reitbauer: So for me, OpenTelemetry and also OpenMetrics is more or less a standard and how we collect this data a way for us to collaborate where we think about where we have to provide this to end-users, but we want to do it in a way that we don't have to build it individually. And obviously you remove some of the vendor lock-in. Because if you understand an OpenMetrics format and OpenTracing format, you can pick and choose your backend. And we distinguish ourselves on the back of technology and not so much on the, this is how we know to capture all the parameters in Apache HTTP library module, or things like that.

23:17 Alois Reitbauer: So for me, the primary collaboration is happening on the vendor or tool provider side right now. There are use cases for end users so no doubt, especially if you want to collect specific business metrics, whether it's data or things, you should be doing it. But I think as an end user or somebody who writes actual application codes, you should not write implementation for basic framework code. And a lot of these you should get out of the box. And it's actually not that easy. So capturing values is pretty easy, but getting full instrumentation scalable and right from our background is not that straightforward. But it's great that we can collaborate on this as an industry. That's also why we invested there as it makes our life easier at the end of the day.

23:55 Alois Reitbauer: And I think there's a great movement in our industry where we have agreement on something. And I think eventually the application developers will profit from this because they get a more unified user.

24:05 How does application performance monitoring (APM) relate to observability?

24:05 Daniel Bryant: Very nice, very nice. I heard you mentioned APM a couple times there, Alois, application performance monitoring, I believe. How does that relate to observability? Because I've seen them almost as two camps up until this point in time. There are the APM vendors and the observability vendors. Do you see some harmony coming together, particularly around OpenTelemetry?

24:20 Alois Reitbauer: I think we're trying to all solve the same problems. And I mean, observability is obviously very much driven by a number of well-known people and they talk about a problem I think that is well understood although it talks about a bit the democratizing of data. So historically if you had a dashboard and that everything you wanted to see was on that dashboard, then you just couldn't see it. It was also very hard if you were a developer and you wanted to have access to certain metrics to get this data from production. So there were a lot of things where we talk about this classic monitoring, you get what you get developer if you get it at all, if you even have production access to. Okay, let the developer who knows the service best decide what they want to see and then just show them the data that they need to work so that they really understand how that code works.

25:03 Alois Reitbauer: And so sometimes people say, "Well, the APM providers don't provide this slicing and dicing then on top of it." Because observability of it is not just the data collection and that we can talk with these traces, metrics and logs and its also behavioral data. It's also typology data on top it's our belief. But at the end of the day, it's what I can do with that data. What information can I get out of it? And how many questions can I ask a system and what do we get out of the box from it?

25:29 Alois Reitbauer: So I think that that's where the industry is moving. I don't think observability was really addressing this pain a lot of people had from bad monitoring systems in place. And also microservices made it harder. Because in the old days, I remember having those conversations a long time back, "Why do we even need a tracing solution? We know if it's not the front end, it's the middle tier and otherwise it's the database." Usually everybody told you it's the database until they realized it's the way the application used the database that's causing the problem.

25:58 Alois Reitbauer: So I think that the whole observability trend is yes, we have to give the people who need to work with the data the ability to collect the data that they want, which it also means giving them access. That's all very important. I think it's not talked about often enough necessarily. It's great if I, as a developer can put in data and metrics collection points. If I have no access to the data from production, what should I do?

26:20 Alois Reitbauer: I remember having these conversations very often with more traditional ops mindset people, the developers, they don't care about all that we see here in production. They say, "Yeah, but if I'm a developer here, how can I get access to this production data here? Like logs and all the data?" It's crazy you can't get access to this. But how do you expect me to care about something that I even don't know about? I don't even know that you're running into this problem so how do you want me to care about it? I can't care. This is even a more organizational issue. It's like developers don't care how hard this stuff is to keep performing in production. And you never told them. You might just blame them over lunch, but you never give them any means to solve this problem. I think that's another massive pain.

27:02 Alois Reitbauer: That's why I see observability, honestly, as less of a technology advancement, but a different approach to do things. And some people might disagree, they might see it differently. But for me, it's a whole, almost the way we talk about DevOps. You know that joke? DevOps is not a team, DevOps is a set of practices and methodology to do certain things. And I think we should maybe talk about observability the same way, because it doesn't help you to give people OpenTelemetry and give them Prometheus, if later on they can't access the data in production or can't have access to it.

27:30 Alois Reitbauer: Yes, this is hard because there is something like GDPR. They might not have access to all the log data. Yes, people have to put effort into this. It is a different culture. Oh yes and by the way, if people spend time on looking at production data, they are not writing new features. That's also something you have to account for.

27:46 Alois Reitbauer: That's why I think there's a lot of cultural change in there as well. And especially if you look at some of the conversations that are now happening in the observability space, a lot is culture related as well. And I think there might be a really an analogy to the DevOps space, which in the beginning, nothing but the DevOps, or what inventors of DevOps folks say, but what a lot of people were doing in the beginning of DevOps was a lot as code, which is not what the intention was. There was always organizational ideas there but as the space matured, people were moving more, "Hey actually we have to solve a much bigger problem here."

28:17 Alois Reitbauer: And I've seen this in the monitoring space over and over again, so the hardest bits and pieces was establishing this trust between the teams, giving people access to data, freeing them up. Because I keep telling people, "You can't just give people more responsibility and just think that they can do it in the same amount of time." That just doesn't work. That's just the pure laws of how work-life balance works.

28:39 Alois Reitbauer: So I mean, that might not be the exact answer that you were looking for, but that's the perception I have on this. So I think it's great that we talk about it more, that we're being a bit more frank that there's like buying into observability, but I think the harder conversations, once people get all the data, is we need time for it. We need to have a culture around doing things. That would be a more interesting discussion.

28:58 Wrapping up, how can end-users get involved with the future of app delivery on Kubernetes?

28:58 Daniel Bryant: Very nice. Yeah. So how can end users get involved with the future of app delivery on Kubernetes?

29:03 Alois Reitbauer: So what I really would encourage people, maybe if people are really interested in application delivery, especially on Kubernetes, like talking now with my SIG hat on right now, feel free to engage with these communities. So what I can tell you also from CNCF type of engagements, there is an end-user community, but there's also all of the SIG where we have a lot of people in there that want to hear real world problems. Yes, you will hear a lot of people from projects, from vendors in there, but we always look for input from end users.

29:33 Alois Reitbauer: Like right now on the air gap environments, for example, tell us what the challenges are. And that's usually how the best work comes along. And that's what I'm a strong believer in also coming from a commercial software company and with a product mindset on, for me the best thing is always listen to real world problems and then work with how people think about solutions. I sometimes have the feeling that people don't want to engage because they think, "Well, I'm not that far in my journey. I'm not so experienced. I'm just starting with the cloud native world and maybe I'm asking a stupid question."

30:06 Alois Reitbauer: So my point is, if we, as a community can't give you the answer to get started and to feel like its part of that community then we have to do better and we have to provide maybe better materials and help you to accelerate faster. So don't be afraid to engage in these communities, especially if you're moving into this space. Really, be there, ask questions, and you will notice the people won't say, "Oh, this person doesn't really know how this and this works." No, they will say, "Oh that's interesting I never thought about it that way. Oh yeah that might be actually hard. Hmm we have never done this."

30:35 Alois Reitbauer: Okay that's really my encouraging statement here from an app delivery SIG chair perspective, try to engage there, ask questions. There are no stupid questions as people like to say. They're really hard questions. I understand that people now, as they're moving from especially the pet project phase to real world application deployment questions, keep asking this, use these forums, these types of engagements, as much as you can and really don't be afraid.

31:01 How can listeners contact you?

31:01 Daniel Bryant: Awesome stuff Alois. Awesome stuff. If folks want to get in contact with you what's the best way, Twitter, LinkedIn, email?

31:07 Alois Reitbauer: Twitter: @aloisreitbauer. So just my Twitter handle.

31:10 Daniel Bryant So well, thanks for your time today.

31:11 Alois Reitbauer: Thank you.

More about our podcasts

You can keep up-to-date with the podcasts via our RSS Feed, and they are available via SoundCloud, Apple Podcasts, Spotify, Overcast and YouTube. From this page you also have access to our recorded show notes. They all have clickable links that will take you directly to that part of the audio.

Previous podcasts

[Video Podcast] Improving Valkey with Madelyn Olson

Developers Can Improve the ESG Aspects of Software by Tackling Early Ethical Debt

Startup Software Architecture - You Never Really Throw it Away: a Conversation with David Gudeman

[Video Podcast] AI-Driven Development with Olivia McVicker

InfoQ Software Architects' Newsletter