InfoQ Homepage Presentations Ephemeral Execution is the Future of Computing, but What about the Data?

Ephemeral Execution is the Future of Computing, but What about the Data?

View Presentation

Speed:

40:17

Summary

Jerop Kipruto and Christie Warwick use Tekton to explore challenges of data gravity in ephemeral execution, discussing clean container injection mechanisms and a secure server interface.

Bio

Jerop Kipruto is Senior Software Engineer @Google. Christie Warwick is Software Engineer @Google.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Kipruto: My name is Jerop Kipruto. I'm a software engineer at Google. I'm also a maintainer of the Tekton project.

Warwick: I'm Christie Warwick. I'm also a software engineer at Google. I also work on Tekton. I am the author of the book, Grokking Continuous Delivery. Why is this topic important to cloud native? When we talk about cloud native, we're often talking about containers, but the cloud is much broader than that. There's a joke that the cloud is somebody else's computer. There's a lot of truth to that. One of the great things about using somebody else's computer is that you only have to pay for the compute that you're actually using. A lot of what we're talking about when we're talking about the cloud, is running these ephemeral processes that only exist while they're doing what they need to do, and then they're destroyed afterward. When you only pay for the compute you're actually using, it can be way cheaper and easier than buying and maintaining your own hardware.

Kipruto: There's a catch, data is heavy. Ephemeral processes are prone to data gravity. They're meant to be easily portable but they can get bogged down when handling data intensive workflows. For some context, data gravity is the tendency of data to attract more data, making it difficult to move the data to the processes that need it. How much data do you have, and how heavy is it? If you're just starting out, consider data early in the design phases of your ephemeral execution models. If you're further along, inspect your system scalability with larger datasets, and determine whether you need to make changes sooner than later. If you consider how to access and use data early, your ephemeral execution models will scale effectively as your businesses grow. In this talk, we'll introduce ephemeral execution. Then we'll look at a case study based on Tekton. Lastly, we'll discuss data gravity in workflows.

A Brief History of the Cloud

We'll start with ephemeral execution. First, how did we get here? Let's take a brief look at the history of cloud computing. In the 1950s, we had the on-premises mainframes. Then the 1960s was the foundation era, where Professor McCarthy proposed utility computing. Then the 1970s to the 1980s was the melting pot, hundreds of thousands of computers were connected to the internet. Fast forward to the 1990s, there were early mentions of cloud, where it was mentioned in an internal document at Compaq. Then the 2000s was the inception era, where AWS launched the public cloud in 2003. The 2010s was the acceleration period, where container standards were established, Kubernetes was released, cloud native became increasingly popular, and CNCF grew with a thriving ecosystem. I think you already had a sneak peek to the 2020s, which is the adoption period where there was widespread enterprise adoption, or there is widespread enterprise adoption of the cloud. As we see in this timeline, cloud computing has come a long way in a relatively short period of time. We expect more disruption in the years to come, and a key piece of this future is ephemeral execution.

Ephemeral Execution

Warwick: What are we talking about when we use the term ephemeral? The word actually comes from the Greek goddess of the day, Hemera. The term comes from a medical context initially, where it meant a fever or an illness that only lasts for a day, like food poisoning. The scope eventually grew to encompass plants and insects that have short lifespans, and now we use it for anything that's fleeting, or temporary. When we talk about ephemeral execution, we're also talking about the lifespan of the processes. Ephemeral processes have a short lifespan. How short? It depends. The key is that they're created to do whatever they need to do and then destroyed when they're done. You can contrast that with more traditional processes you might be used to, that are just running and waiting for requests. If they didn't get any requests, they would just be idling. Let's look at some examples of ephemeral processes. One type is serverless functions. Serverless functions are invoked in response to some event. Then when they're done, they're done, making them ephemeral. Edge functions are a type of serverless function. Even though the cloud is broader than just containers, they are a key piece of it. Since containers provide a standard way to package software together with its dependencies, they're a really natural fit for ephemeral execution, especially since when they run, they use so much of the operating system that's on the underlying machine, that they're actually usually much faster to start and tear down than if you were using a VM. They're a really nice fit. Individually, serverless functions and containers can only do so much. They really shine when they're combined together into workflows. Workflows bring together these individual units and orchestrate them to do something bigger than just one would do on its own. They fit naturally into a couple of domains. One of them is continuous delivery, which is very near and dear for Jerop and I, which is automating the processes for safe and quick releasing of software to users. Then another one is machine learning, where you can use workflows to test and tune and optimize ML models. In this talk, we're going to be focusing on workflows.

Tekton

Kipruto: Let's look at Tekton as a case study of handling data in workflows. You may ask, what is Tekton? Tekton is a workflow orchestration engine that executes user supplied workloads as containers on Kubernetes. Tekton provides a serverless abstraction on Kubernetes in that it creates resources solely for given requests. It's often used for CI/CD, but also for machine learning operations.

Warwick: In the demo that we're showing you, we're going to be using an app that displays the Purr Programmers of Tekton. These are the mascots for our releases and are valued contributors. Unfortunately, though, this is the last time that you'll be seeing the cats because we're not actually running the application, what we want to do is create a workflow that will clone the source code for the app and then build and push an image that runs it.

Kipruto: The first thing we look at is the task that we're going to use to fetch the source code into a workspace. Let's go through the steps. We have the API version, because we use KRM and we create a custom resource definition in Kubernetes, serve the API version, v1beta1. We specify the kind, it's a task. In the metadata, we're going to specify the name, git. Then in the specification, we're going to have a workspace named source. Then a parameter repository, just pointing to the repository where we're hosting this application. In the steps, we're only going to have one step, clone, taking an image and a script. It's too long to add here, but in the demo, you'll actually see. Next, we're going to look at a kaniko task, which is used to build and push the image. Similarly, we're going to have an API version and kind, task. In the metadata we're going to specify the name kaniko. On the spec, we're going to have a workspace, source, and parameter, image, which is just a name or reference to the image that we're going to build. In the steps, we're going to have a build step and a push step. For some context, kaniko is a tool that's used to build container images using a Dockerfile without needing a Docker daemon.

Warwick: This is the ideal that we would like to achieve. We want to have these two separate tasks which are really well factored and each do one thing well, where one can clone from git and one can build and push with kaniko. We want them to be able to use data in the same local context. If they could do this, this makes it really easy to share the data between the tasks. It gives us performance benefits, because the data is right there, and we don't have to move it around. As long as you're not running anything with privileged execution, it's isolated from other workloads, because the data only exists in the context of the pod.

Kipruto: Unfortunately, the only way to combine tasks in Tekton today is through pipelines, which are directed acyclic graphs where each node is a task. Let's look at an example, the one that will combine the git and the kaniko tasks. We'll have to specify the API version again, but this time, for kind we'll specify pipeline, and then for the metadata, we'll have the name git-kaniko-pipeline, in the spec, we'll have the workspace, source, and parameters, two of them this time around, the repository and the image. For the tasks, we'll start with the git task, and then the workspace, source, parameter, repository, which will be parsed in from the pipeline level, and a reference to the task that we looked at named git. Next, we'll have the kaniko task, which will run after the git task. This is where we build the graph or the sequence of the tasks that are going to execute. Then we have the workspace, source, and the parameter, image, which is parsed from the pipeline level. Lastly, we need to have a reference to the task that we looked at previously, or the task definition.

At runtime, a pipeline is executed using a PipelineRun, which is used to provide the resources that the pipeline uses. We have this knit-tight solution between authoring time and runtime. Let's look at the PipelineRun definition. It's still a CRD, so API version, and kind, PipelineRun. In the metadata, we'll have generateName. The reason why we generate names is so that we can have a unique name for each instance or execution of this PipelineRun. Then in the spec, we have a service account name for auth. In the workspace, we have a volumeClaimTemplate. The volumeClaimTemplate, for some context, is used to provision a persistent volume that will be used to share data between the git and the kaniko tasks. In the parameter repository, we're going to specify the application that we're going to use, the cat service for the Purr programmers. For the image, I'm just going to parse in the name of the image that I want to build. Finally, we're going to have a reference to the pipeline definition that we just looked at on the previous slide, named git-kaniko-pipeline.

Warwick: This is the reality of what actually executes. Each of the tasks is executed as a pod in Kubernetes. The sequential steps in the task becomes sequential containers in the pod. Within the context of that pod, the steps have access to the same disk, and they have access to environment variables, but the tasks themselves are in separate pods. Any data that is shared between them has to be moved somehow. We use a persistent volume to do that. Let's see this in a demo.

Kipruto: We're now looking at the Purr programmers' application in that repo. This is the git task with all the details fleshed out. It's also available in the Tekton catalog if you'd like to use it. We also have the kaniko task exactly the same definition just with more details. It's the pipeline that brings together the git tasks and the kaniko task. Then last, we see the PipelineRun definition. You can see the volume details are more fleshed out. Then we have the same repository and image name. We're going to start by installing Tekton to the cluster. What's happening here is that we're actually installing Tekton pipelines, which is the core component of Tekton. Then we have other components as well that complement these, like triggers for starting up PipelineRuns based on events. We have chains for software supply chain security, and many others. I'll stop there. Now we're getting the pods because we've already installed, we can see that there is a controller and webhook running at this point in our cluster.

Let's go back to our application. We'll start by applying the git task to our cluster. Then next, we'll apply the kaniko task. Now we're going to list the tasks and see that both of them are installed in our cluster for a few seconds. The next step is to apply the pipeline to the cluster. That's happening now. Then we can now execute using the PipelineRun to create an instance of the PipelineRun, and there's a unique name and new PipelineRun created. We can observe the logs as it executes. What we're expecting to see happening now is the source code being fetched from git repository into the persistent volume, and then the pod would be moved over so it can be used by the kaniko build task, build and push. Git clone has happened, we can see the result SHA, the commit that was fetched, then kaniko build is executing now. It's running some commands, taking snapshots. It's continuing to execute now, taking a search of the full system. Next, I'm expecting the kaniko build push and then it's over. Let's check the status of the PipelineRun to observe the execution and see exactly what happened. We can see that it was successful. We can see the workspace that was used and parameters, and the task run executed. Lastly, that there were two pods that executed successfully.

Warwick: Let's dig in to the status of that PipelineRun a bit. The whole thing took a minute to execute. You can see that the parameters for repository and image were the ones we provided. Then for the workspace, you can see it was bound to a volumeClaimTemplate, which Tekton would use to instantiate a persistent volume claim. There are two task runs. The first one was the one that ran the git task. It took 18 seconds, and it grabbed the source from the repository and then put it on to the persistent volume. Then the kaniko task run, grabbed that source, built an image and pushed it, and it took 42 seconds. In practice, we have a git task that writes to this persistent volume outside of the pod. We have the kaniko task which reads from that persistent volume. You could think about these as two processes. We have the git process and the kaniko process. Then we use the persistent volume to move data to the processes. We're effectively moving the data to the process. This gives us some challenges. Performance-wise, we're running two separate pods. Each of those has to start up and be torn down. Worse, each of the pods could be scheduled to a different node. In that case, the persistent volume has to actually be detached and reattached to the different nodes. We have problems with isolation because this persistent volume lives in the Kubernetes cluster, so something else could mount that volume and start modifying the data in between. Then, lastly, it's complicated to wire this together, especially for such a simple use case. To double down on the usability problem, we've gotten feedback about this. One of our longtime contributors tweeted after our beta release and said, "I'm really struggling to wrap my head around the Tekton API, and do some pretty simple stuff with it. Am I going to need to start composing tasks via persistent volumes to feed Git into kaniko? Say it isn't so." Unfortunately, Matt, it is.

Kipruto: We've always done things one way. We move the data to the processes. What if we flip the script? What if we move the processes to the data instead. We can do this by combining the tasks that download and upload the data with the tasks that consume and produce the data. For example, we could write a task that brings together the git and the kaniko tasks to execute in one pod. Here is how it would work. We would specify API version and kind, task. Here we'd have the name git-kaniko-task. The spec would specify the workspace source, and parameter, repository, and image. We need both of them to be parsed to the two tasks. The steps would have references to the task definitions that we looked at previously. We would have git, which uses a ref, the task, git, and kaniko, which uses the task, kaniko. This task is executed in a task run. We would specify the task run and the metadata would generate name to create a new name for each instance. The spec would provide the service account name and the workspace source. Notice that this time, we get to use an emptyDir. An emptyDir creates a local disk or a local volume that would be accessible to the pod. The parameter section will specify the repository which remains exactly the same, and the image name. There's a slight modification in terms of the name but non-consequential. In the taskRef, we would have a reference to the task that we just looked at in the previous slide.

Let's see this in action. This is a task that combines the git and the kaniko tasks that we just looked at, which provides the repository and image name and the workspace source. This is the task run where we get to use the emptyDir and parse in the parameters that are needed. We'll start by installing the tasks that we just looked at, to the cluster. Then we can list all the tasks and confirm that they're available there. You can see the previous two from 25 minutes ago and now the new task. Now we can create the task run. Now there's a new instance of that running. We can observe the logs as it executes. The first thing, it's fetching the source code. You can see the SHA that was used, and it's now building the image, taking snapshots with the application, and pushing the image at the very end. Next, we're going to clear the page and then look at the status, and observe what happened. We can see that was successful. We can see the parameters in the workspace that was used, that there are three steps, git clone, kaniko build, and kaniko push that were completed, and that only one pod was used.

Warwick: Let's look at the status of the task run. This time, the whole thing took 46 seconds. It's only one data point, but you might remember that the previous demo took 60 seconds, so that suggests that we saved 14 seconds, which is about a 25% improvement in performance. You can see again that we have the same parameters. Now the workspace instead of being bound to a persistent volumeClaimTemplate that got instantiated to a persistent volume, we're just using an emptyDir, so it's just local to the pod. You'll notice that now, instead of having those two task runs, we have three steps. We have the git clone step, which writes the source code to the local disk. Then we have the kaniko step that build and push the image to the container registry. This is what the user specified. Again, the user still gets to use these separate well-factored tasks where we have one task that knows really well how to clone from Git and one task that knows how to build with kaniko. Then, effectively, this is what actually executed. We basically expanded the steps out of the tasks. The thing on the left is what users have to do today if they want this functionality, they have to actually break down these well-factored tasks, and rewrite mega tasks that do all the things that they want. By changing our approach and bringing the processes to the data, we can maintain that reusability and have these well-factored tasks. This gives us the ideal that we wanted from the beginning. We have two tasks, but all the data that's shared between them can be handled on the local disk.

Data Gravity in Workflows

Kipruto: Zooming out, let's look at data gravity in workflows. In this case study, we looked at Tekton, which is a workflow orchestration engine that is often used for continuous delivery, but also for machine learning operations. We saw how data gravity affected processes in Tekton, but looking at fetching source code, and building an image as an example. We walked through a use case where the processes are sequential. We do not address data in parallel execution in this demo. We have use cases for reading data in parallel, such as linting and testing in parallel. However, we don't have use cases for writing in parallel because of the conflicts that could arise. We're planning to look into data in parallel execution in future work, and we'll talk about that. The focus of this talk is sequential workloads, especially downloading and uploading data before and after execution.

In the case study, the data was previously moved to the processes, specifically that the data was available or stored in a persistent volume. We address data gravity through data locality by moving the processes to the data instead. In this case, the data was stored in local disks. This allows us to realize the desired performance, isolation, and usability. In terms of performance, we reduced the runtime from 60 seconds to 46 seconds, and we also reduced the resource utilization from two pods to one pod. For isolation, we have the data only in a local disk, so it's isolated from other workloads that could have mutated it. Lastly, for usability, the tasks that download data execute in a shared context with tasks that consume the data. On the other hand, the tasks that produce data execute in a shared context with tasks that upload the data.

Next Steps

The next steps for this work is to introduce artifacts as a core feature of Tekton. This will provide a clean interface for declaring inputs and outputs, which we realize using tasks in tasks as shown in the demonstration. A key requirement for artifacts, that they generate immutable references, which would usually be a URI and digest of the data consumed or produced. Next, we'll use those immutable references of artifacts to generate provenance, which are critical for software supply chain security. Today, we use type painting for provenance generation. Users have to set certain suffixes in results, which are the metadata of Tekton resources. This makes provenance generation prone to errors. To make Tekton secure by design, or by default, we plan to generate provenance for artifacts without users needing to configure anything additional. Third, we're also exploring caching as a core feature of Tekton. This will be used to optimize the reuse of data among the tasks. Lastly, we're considering data locality in parallel execution. Today, Tekton positions itself as a CI/CD tool only. However, Tekton is so flexible that we find that it's being used for machine learning operations, and also in other domains like generic workflows. Users who've already invested in Tekton for DevOps may be inclined to continue using Tekton for machine learning operations as well, and whatever other domains where it applies for workflow orchestration. We're considering expanding the scope of Tekton to address those use cases. This may involve bringing in data in parallel execution into scope for this work.

Warwick: When we talk about the cloud, we're often talking about ephemeral processes like serverless functions and containers. The cloud is becoming more ephemeral. This has a lot of benefits, but it also has catches.

Kipruto: The catch is that data is heavy, as you've seen with Tekton. Data has gravity and can accumulate over time and become heavy. What kind of ephemeral execution model are you designing? If it's a workflow engine like Tekton, instead of moving the data to the processes, consider moving the processes to the data. This will ensure that your ephemeral execution models will scale to handle data intensive workloads and applications.

Next in Cloud Native Development

Warwick: This talk was focused on workflows and continuous delivery. What if you're working on workflows in a different domain or you're not working on workflows at all? If you're using or considering using edge functions, there's a talk about edge functions, and Erica's talk, Living on the Edge. If you're working on a backend server, then consider watching Paul's talk, Developing Above the Cloud, about Darklang and how Darklang is making software development for the cloud easier. Sergey's talk about the smoothie architecture that it turns out that we're all actually building, or we're blending together our business logic with all the stuff you need to do to make the cloud work. If you've got a database involved, which you probably do, then you might enjoy Carl's talk about using local-first techniques to make it so that your applications can work in environments where you have little or maybe even no internet access. A lot of these aren't directly about data, but you're going to find that each of them will touch on data handling in some way, because it's such a core problem in cloud native development.

Questions and Answers

Participant 1: What Tekton does looks very similar to AWS offering with state machines. Have you guys considered adding another abstraction layer on top of that, so you can make your data be multi-cloud?

Warwick: You're describing that Tekton would be executing workloads that would be realized in different clouds, instead of like all in one spot?

Participant 1: Ideally, I want my workload to translate either into Tekton through GCP, or into state machine store on AWS, to have a single source of truth for this is how my CD pipeline works, from your example.

Warwick: You'd have one language definition and then you can translate that into very different modes of execution?

Participant 1: Yes. Another example would be Terraform into CloudFormation or whatever Google uses.

Kipruto: People can use Tekton in whatever platforms they use. It's portable. If you want to use it in OpenShift, or IBM Cloud, or Google Cloud, or AWS, so I think it depends on infrastructure. If it's multi-cloud, maybe you can use it that way.

Warwick: One of the goals that we had was, we would really like for the Tekton API to be a standard across CI/CD systems. One of the dreams was that even though it's executed as pods in Kubernetes right now, that it would theoretically be something else, and it could be backed by other things. One thing that Jerop is actually looking into is maybe changing the syntax a little bit. Most of what we showed isn't that Kubernetes specific, but some of it is. We want to look into making it maybe a bit simpler and a bit more high level, so that it would make sense to translate it into maybe totally different ways of executing. I think it's not something we've done, but it's something that we're thinking about doing.

Kipruto: Specifically, things like the Kubernetes resource model spec, things that are very specific to that area we're considering.

Participant 2: I have some familiarity with the model currently attached to the container, like you're not just parsing into another [inaudible 00:31:30], and stitching things together like that. How are we able to run the process in the same pod? What is the process, is it still a container? I'm curious how that works. Is it like a nested Docker containers or something like that?

Kipruto: Each step in a task executes as a container in the pod.

Warwick: They execute as a container, but we've done some hacks. In Kubernetes a pod has a bunch of containers and they all start simultaneously. We have a binary that we inject into each of them, that makes them wait for their turn to execute, basically. If you have multiple steps, they all start, and then the first one starts executing, and then the next one will be triggered to execute and so on.

Participant 2: They're expected to be in the same pod?

Warwick: Yes, exactly. They're each containers inside the same pod. One of the limitations that we have is that from the very beginning a task translated exactly to a pod. A lot of what you saw us doing was us just working around that limitation and trying to find a way to have the nice way of writing these well-factored units, but then combine them together into a pod. Another possible avenue of exploration is breaking down the barrier and decoupling tasks from pods.

Kipruto: When you're making well-factored tasks, you find that they do only one thing really well. When they do that, they tend to only have one step. That's where the challenge raises, because it's only doing one thing, it's only one step. We actually find that in the Tekton catalog where people share resources, 77% of the tasks only have one step because they're meant to do one thing. Then you need to bring these things that do one thing, each together in one pod. That's why we're revisiting the model and how it maps to Kubernetes, or the architecture.

Participant 3: If you have an application or system which needs their data in multiple areas, so basically, you're fetching the same data to the different areas. That's an associated process. How do you measure that versus the actual, from the cost perspective, as well as the comm perspective. Of course, you have to replicate the data, so there's cost associated with that, versus the compute cost. How do you guys compare that?

Warwick: I think it depends on what your priorities are. I think we've been optimizing for assuming that you would really only want to pay the cost of fetching the data one time. That's what we've done up until this point. Then, I think you've been talking with, is it IBM, because their priority for something that they're working on is security, they have decided instead to pay the cost of doing the git clone every single time that they're fetching the data, because they'd rather have it be slightly less efficient, but then slightly more secure. Then adding caching in. I'm not really sure how that's going to fit in with this, because if there's going to be a cache, then maybe the persistent volume is coming back. Or we probably have some other place where we're going to put data where we know it'll be immutable, and it's not.

Kipruto: I think the key thing there is software supply chain security and the SLSA requirements, and making sure that there's some isolation or the data cannot be modified by other workloads. Say you fetch source code and then before you build the image, something else could have changed the data. Some organizations pay the cost for making sure that it's local to the processes I need it to accept, when there's actually a really big security breach, but leaning more towards improving the security boundaries. We're planning to build on this and try to figure out some caching mechanism, and maybe the PVCs, but maybe with something added on top of that, to help to meet the requirements.

Participant 4: I'm sure all of us really love writing YAML file, so [inaudible 00:35:34] CDK version plan where I can have all these configurations in my programming language, and either that can be used directly, or that can generate the YAML files that's needed for Tekton.

Kipruto: We have Kubeflow pipelines as an SDK. If you're using machine learning operations, you can use Kubeflow on Tekton. Then it will generate the Tekton resources for you. You'll just write Python and Tekton is handled behind the scenes. I don't know about CI/CD, but we have many vendors that use Tekton underneath.

Warwick: We don't have an official SDK. There have been efforts at different points to bring like DSLs on top of this. At one point, we were talking about having a universal translation mechanism so that anyone could write a DSL and then it would get translated into this. None of it has gone anywhere. We don't have anything at the moment.

Participant 5: You mentioned the use cases are expanding into other areas. At what point do you start potentially duplicating work for extending your system too far, and data localities have been solved with MapReduce systems, and things like that. You have to advise users, like, we love that we have users, but if you're using it for too much of the wrong thing, we're going to be extending it in the wrong direction, potentially.

Kipruto: We're not trying to extend it to like do everything, be a solution that rules them all. I think the idea is that there's nothing inherently CI/CD in Tekton. For the most part, the only thing could be Chains if you argue about that. If you look at Tekton pipelines and tasks and all the constructs, we find that it's generally applicable to workflow orchestration. We're not planning to solve everything in the world. Maybe, for users that we're seeing like IBM, who end up using Tekton for DevOps, and they're heavily invested in the infrastructure, the security, they have everything set up in the organization to use it for continuous delivery and DevOps, then it's really hard for them to consider now going to a completely different stack, that uses another tool for workflow orchestration. It's very compelling for such users to then continue using it for MLOps or other kinds of orchestration on Kubernetes, which Tekton already does pretty well. Even though it was intended for one thing, I think we're finding ourselves needing to respond to our users and what they're asking of us. When they come to the community with certain feature requests, we've, for the most part since 2019, been pushing back and saying, no, we are only CI/CD. I think at this point, given the growth and the direction of the project, we're starting to revisit those conversations, and maybe just finding ways to help them and solve the problems. I don't think at the end of the day that we will try to create full-fledged solutions, maybe we find a way to support, for example, Kubeflow pipelines on Tekton better, as opposed to creating a competitor to Kubeflow. That's something we're still discussing and we'll figure out over time.

Warwick: I think the other thing is that we're finding that like the machine learning applications is not hugely different from the CI/CD applications, where it's like there's some data, there's some event from the data, and then there's some operations that you're doing to the data. I think that dramatically expanding the scope into absolutely anything workflow related, maybe we'll do that. I think more likely, it would be just like a gradual expansion into something that's already pretty adjacent. We already have a number of contributors who are using it for MLOps. At the moment when they make feature requests, and they can't explain them in terms of CI/CD use cases, we're like, we can't do that. It would just be a gradual expansion into that.

Kipruto: Also researching this has been really interesting, because I thought that the point of deviation would be S3C stuff, but I found out like, provenance. I thought we have provenance in Tekton Pipelines recently added as a field and I was worried, but then turns out that they have something called lineage graph. There's a bit of an overlap, or at least related components, even when we talk about some of the S3C stuff that we've been focusing on. They also need to prove things and make claims reliably and have attestations.

See more presentations with transcripts

Recorded at:

Feb 06, 2024

InfoQ Software Architects' Newsletter

Ephemeral Execution is the Future of Computing, but What about the Data?

Summary

Bio

About the conference

Transcript

A Brief History of the Cloud

Ephemeral Execution

Tekton

Data Gravity in Workflows

Next Steps

Next in Cloud Native Development

Questions and Answers

Related Sponsors

This content is in the AI, ML & Data Engineering topic

Related Topics:

Related Editorial

Popular across InfoQ