Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Presentations Success Patterns in for Building Cyber-Physical Systems with Agile

Success Patterns in for Building Cyber-Physical Systems with Agile



Robin Yeman demonstrates that there is a mission imperative to migrate from phase gate approach to Scaled Agile to increase safety for stakeholders.


Robin Yeman has expertise spanning over twenty-eight years in software engineering with focus on Digital Engineering, DevSecOps, and Agile building large complex solutions across multiple domains from submarines to satellites. She advocates for continuous learning with multiple certifications including SAFe Fellow, SPCT, CEC, PMP, PMI- ACP, and CSEP.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.


Yeman: My name is Robin Yeman. I am what I call a digital disruptor. I worked as an engineer for over 28 years, building cyber-physical safety critical systems. The majority of my career, I worked at Lockheed Martin. I had the opportunity to build everything, from submarines to satellites, and pretty much everything in between. I currently am working on my PhD in systems engineering. The goal is to deliver a dissertation that says how to build safety critical systems using agile and DevOps.

Let me give you a little background. 21st century development approaches like agile and DevOps have benefited small initiatives. We've seen it in spates. A single team building software, their ability to respond, change, adapt, while maintaining quality has been amazing, and shown time and time again. Now we want to know, what are the benefits if we take those beyond that small team? What if I have many teams? What if I'm building much larger systems? What happens when I'm building cyber-physical safety critical systems? What's the scope that I have to deal with there? Just to get us on the same page and a common language, I'll start by talking a little bit about agile and DevOps. Agile is defined as an iterative and incremental approach to deliver products and capabilities. The goal is to be able to adapt to changing needs. There are other benefits but that's the key goal. DevOps is a set of practices that bring together both development and operations. It shortens system delivery time, while maintaining quality. The thing here is both of these different approaches leveraged primarily in software, and I believe that they could be used at a far wider location. I think that we could use agile and DevOps to build the entire safety critical system.

Begin with Why

Why do we want to do that? Let's step back to where agile began in software, in about 2001. There were a group of engineers that were tired of the amount of rework, they thought there was a better way, were tired about how long it took to get capabilities out the door. What they found is that software changes very rapidly. Requirements are never done. They're constantly changing. The environment or the context they lived in, is both complicated and complex. Here, you can see that agile works really well in complicated and complex domains. Where waterfall tends to thrive is in that simple domain. I've already built it five times. I know what I'm doing. There's not going to be any change. Over the years, as technology has started to move quicker, cyber-physical systems are experiencing the same level of change that software was back in the early 2000s. The reason to bring agile and DevOps to the cyber-physical system is exactly the reason why we brought it to software in the first place. What's the key motivation here? There's something called VUCA. VUCA stands for volatility, uncertainty, complexity, and ambiguity. That pretty much sums up today's environment. Technology is moving at the speed of light. Last year, nobody could have saw ChatGPT coming, and now everybody knows about ChatGPT. My parents know about ChatGPT. It's crazy. Materials are changing. Our ability to model is changing. Our ability to move things from physical space into cyberspace is changing. What do we do? I think the answer is to move these proven, tried and true approaches that worked great in software, beyond software, to the cyber-physical system.

Considerations for Agile and Cyber-Physical

What are some considerations, things that we have to think about? Not all of these are barriers, but they are things that we have to consider and think about. What lifecycle we're using. Our organizations. How to decompose the system. Our team structure. Constraints of physicality. It's a little different than just 1's and 0's. How we can decrease the cost of learning. Most cyber-physical systems have some degree of compliance and regulatory requirements applied. We'll start at the top.

1. Agile is a Lifecycle

There are two different types of lifecycles. One is empirical, and the other one is predictive. Waterfall is a predictive lifecycle. Basically, one thing goes after the next thing, after the next thing. It's a detailed plan already completed, because we already know all of the steps. We've built it many times. Agile uses an empirical lifecycle. It's based on objective data. I do something, and then I evaluate, reflect on it, and build that into an updated plan. A lot of times people look at agile as just a tool, a tool to build software or a set of practices. Really, all of the things to deliver a system are still required, whether you choose agile or waterfall. I still have requirements. I still have design. I still have to implement. I still have to test and validate my solution. Now I have to put it into operations. For these reasons, I don't really consider agile a tool. I think it's a lifecycle. Once we start looking at it from that perspective, we'll see why we need to take it beyond just software.

2. Organization and Common Language

Organization and common language is a huge issue. Most large organizations have been built up over the years as functional stovepipes, and for good reason. They were optimized for economies of scale, which was excellent during the manufacturing revolution. That's when we needed it. Today, speed of delivery is way more important than economies of scale. It's a different trade, requiring a different organizational structure. Let me tell you about a couple of things needed for speed of delivery. We need to be able to communicate across these stovepiped organizations. One thing you should know is communication follows the org structure. It's as simple as that. Program managers know what's going on in program management. Systems, they know what's going on in the systems. Everybody has to hand off to another organization in order to get something from need to operations. These handoffs are causing us delays. Let's come back to the fact that communication follows the org structure. That means the language follows the org structure. When I talk about program management, they're referring to things like industry 4.0, or lean thinking. When I start talking to systems engineers, they're focused on systems thinking, systems engineering. We talk to designers, they're focused in design thinking. If you talk to a hardware person, they're probably telling you about rapid prototyping. Software is very much into discussing things in the agile space because that's where they started. When I talk to testers, they're all about shifting left. Lastly, when we take a look at operations, they're talking about things like ITIL, or IT Infrastructure Library. Here's the thing, each and every one of these methods or approaches is trying to optimize the flow of delivery, from concept to cash. Unfortunately, the language is lost on each other. We're not communicating in the same language. We're looking at value in a vertical, but in order to deliver capabilities at the speed of relevance, we have to look at them in a horizontal. We have to look at the value. Each one of these elements are valuable in themselves, but their value in delivering the outcome of the product. That's why changing how we manage and organize our teams is going to be so critical to being able to be successful, to deliver capabilities at the speed of relevance.

3. Decompose by Product, not Function

Another thing that we have to do in order to be able to have cross-functional teams, is we have to make sure that we decompose the product right. Many times, you will see an integrated master schedule, or work breakdown structure. You'll see things like systems engineering, software, hardware, test, it's very functional. We don't want functional if we're trying to deliver by outcome, we want to have outcome-based or product-based decomposition.

Here you can see an example CubeSat. What are some outcomes? What are some products? We've got attitude determination control, thermal determination control, payload, structures. Yes, there are some handoffs, but there's much less than if we're talking about doing all the requirements first, then doing all the design, then doing all the implementation. We have to change how we break down work. What does that look like? If you're familiar with agile, you've probably heard things like epics, and features, and stories, maybe tasks. Product-based decomposition. Here, for example, guidance, navigation, and control for a satellite. The outcome I'm looking at here is to track current location, leverage navigation target data to get me to the next location. That's a big piece of work. It's going to take a lot of elements, and it's going to have to impact not just the software, it's going to impact the firmware and the hardware in order to make it work. Next, we have to decompose that. Typically, in agile, we would call these features. By definition, a feature is something that can be completed in 8 to 12 weeks, less than a quarter. From breaking down this guidance, navigation, and control for the satellite, I may have a path planner, and I might have flight control. These might be big rocks that my teams can complete over the course of less than a quarter. Then they have to decompose that even smaller, because when we talk about agile, frequently, I'm moving things, delivery of capabilities in what we call sprints. Really short, 1 to 2-week time boxes. I've got to break this work down even smaller. I can take this path planner and say, as a satellite, I want to adjust my position to maintain some synchronous orbit. Key thing about a story that's a little bit different than requirements is there's always, who wants it? What's the business benefit? Why do I want to do it? That's how you know you have a story. Typically, especially for newer teams, I tend to also have them even further decompose into tasks for the sprint. This works especially well if you have a lot of new people working together, and not everybody has solid longevity and domain in whatever we're building.

4. Less Homogenous Teams

One thing we have to consider when we move beyond software is the fact that the teams are a little less homogenous. Now you're going to say, and I agree with you, that they weren't that homogenous in software. We have a variety of different languages. We have a variety of different operating systems. We've got frontend. We've got backend. We've got real-time embedded. We've got application. It's not quite as homogenous as one might think. Once we get into hardware, we're even less homogenous, we have even wider skill sets. Mechanical engineering, control, power, RF systems, electronics. These are very different skill sets. Over time, the goal is we want to build cross-functional teams. What does that mean? Does that mean everybody on the team has all of these skills? No. That means based upon the product that I'm building, when I put a team together, I have all of those skills. The other thing that I want to focus on is when I put a team together, making sure that they're T shaped. People call them all different kinds of things, I shaped, all different things. The goal is, I have depth in one or more areas, but breadth in others. That allows me to help other people do other types of work. It also allows me to empathize with how they're absorbing my work. If I understand how me developing a system model is going to impact an RF engineer, that's going to potentially impact how I build that model. We want to look beyond.

5. Constraints of Physicality

There's definitely constraints in physicality. Once you bend metal, it's very expensive to change. The cost to make changes increases much more with hardware over time than software. We got some new tools, we got some new tricks, so we can try. Let's take that cyber-physical system and put it into cyberspace. Let's start reviewing it within the digital environment, using things like simulators, emulators, digital shadows, those are low-fidelity digital twins. Could be something I'm starting with, as I increase fidelity I get to my digital twin. If I talk about a temporal piece, then I get to a digital thread, meaning I typically have one or more digital twins that I can evaluate over time. All of these things I can do in cyberspace, and the benefit is they allow me to get that rapid feedback loop that you can see in software. They also reduce the cost for me to make change, much lower than before I bend metal. Once I do make a change, I can do things like additive manufacturing, or 3D printing. In many cases, these are just going to be prototypes. In some cases, the materials have come far enough that they're actually operational ready, meaning I design it in virtual space, and I quickly print it. Like a photocopier, except for a physical system, which is amazing.

6. Decrease Cost of Learning

One of the things this also does is allows me to make a lot of changes, try a variety of different scenarios, because it decreases the cost of learning. Here you can see an example digital twin. You know you have a digital twin, if I have both a physical and a cyber system, and that they're connected via the sensors to data and the analytics to the outcome. Basically, I'm using the data that I get from those sensors, and I'm putting it into the twin to try a variety of scenarios that maybe I would not be able to try multiple times in the physical space. This is giving me the ability to experiment and learn faster. These tools just keep decreasing in cost and increasing in fidelity, which is what makes them so attractive right now. In order to do all these things, we have to make sure we integrate these tools, this is a huge problem. We saw it in the software space, too. You've seen probably a variety of different software development or DevOps pipelines. There are literally hundreds of tools or applications that can go into those pipelines. They're not very good if they can't integrate with each other. Once we move beyond software, to looking at other tools, we've got to extend even more. We've got to make sure that we have those common interfaces, that we can leverage APIs, or a variety of tools. A good one that I've used in the past is Tasktop, which has now been integrated into Planview. There are other tools similar, and you could integrate these yourself. I will tell you that tools are cheaper than people any day of the week and twice on Sunday. If you do have the opportunity and the ability to get a tool that can do the integration for you, you're going to be in better shape because your teams can focus on what you really need them to do, versus making sure these tools talk to each other.

7. Compliance Requirements

Most cyber-physical systems have a compliance or regulatory requirement. NASA has one of the most stringent. Here you can see, for example, if I'm building a rocket or a satellite, I've got to follow the NASA standards. You can see, there is a whole slew of them. None of these standards are a one-liner, they are pages and pages. What do we need to do? The first thing is, start with the constraint. The biggest mistake I see over again, is we build the system, and then we try to make it compliant. Begin with the compliance and build the system to the compliance, and do it iteratively and incrementally, in as close a context as you can possibly do. This is going to minimize the amount of rework you have to do, while making sure that we can deliver cyber-physical safety critical systems to the compliance and regulation that we need to without rework.

Use All the Tools in Value Stream

Next step, make sure you use all of the tools in the value stream. Just like we talked about the organization, and all of the different org structures and the different stovepipes in there, they all have different tools. We need to actually look at the whole thing. Anytime I go from a need to operations, or what I call a concept to cash, I need to do planning. I've got requirements, modeling, potentially virtual reality, digital twins, maybe augmented reality. I've got a software component. We talked about now that I've built the system in cyberspace, I may want to do 3D printing, might want to learn faster, so I'm going to leverage algorithms from AI and ML. In order to do that, I absolutely need good data. I need great data aggregation. Then I need to make sure I'm building in cyber, and take advantage of things like edge computing. Right now, all of these technologies live in different stovepipes. We need to look at all of the tools in our value stream, so that we can, like I said, integrate the tools and integrate the delivery of the system, so that we have complete transparency from, I have a new idea or a new concept, all the way into operations.

Examples for Agile and Hardware

Next, I'm going to talk to you about a couple examples. The first one is a lot of fun. That's going to be, I had the opportunity a couple of times to build the car with Joe Justice. Joe Justice is an agile for hardware guy. He's legendary, and he has been working in agile for hardware space for over a decade, and has had a lot of experience in companies like Tesla. One of the things that he's done in the past is he brings all of the materials to build the car in your space, and you leverage three days, a number of sprints with all your teams to build the car. The cool thing about this is, it really gives you the feel for what it's like to not just build software, but to build a cyber-physical system, within these time constraints with the communications that are needed to make sure everything's integrated. It's a lot of fun. Like I said, you really get the gist, that all of a sudden work is work, and how we implement work isn't specific to waterfall, or agile, or whatever. We can implement all kinds of work within any of those. They're a way of getting things done.

Here, you can see what I call, old dogs, new tricks. When I was working at Lockheed Martin, I had the opportunity to work with a fleet ballistic missile team, and they were fantastic. One of the things they decided to do is they wanted to go and build their hardware using agile cycles. I did not have a lot of teams at the time that were willing to do this. I wasn't exactly sure how it would go the first couple times we tried. A little more than three or four iterations, these teams had already set up prototype environments, breadboards, and were able to iterate on the system, buying down risk with each and every sprint. They actually were leveraging the two-week sprints, and giving a demonstration to people at the close of them. You can see here, we're not going to build a missile every two weeks. What we're doing is we are buying down risk by building knowledge. We're learning faster. I have enough of a capability that I can validate unknowns. It's really in a heartbeat, what we're looking for.

Here's a cool one, this is Alten. They have maybe not cross-functional teams, but the teams working together. We've got a software team, mechanical team, an electrical engineering team, and they've got key integration points. You can see that they've put themselves on three-week increments. In the first one, they're able to build out a plywood prototype which uses software and hardware. By the next one, they've got a hydraulic prototype, again, building down those unknowns, buying down risk as they're building. By the third one, they've got a near final prototype. They've got a printed circuit board, sheet metal, cables, and even an internet connection. Then they have a beta. What they've done is iterate and increment all the way through to a minimum viable product, which is enough information to learn and get feedback on.

Bosch is a company that's always on the bleeding edge. They have really gone all in on virtual prototyping and simulations, things like digital twins. This allows them to iterate on their vehicles and build them to their vehicle safety standards. Again, they're very much taking that physical vehicle, putting it into cyberspace, leveraging all the tools in their toolbox, you can see things like cloud computing and edge, to be able to deliver vehicles faster.

Planet Labs is associated with what we would call new space. New space is the new entrants into the market. They are basically seen as maybe the folks that have entered for the last 10 years, and they are commonly seen as doing things new and different, different than they've been done before. Here you got Planet Labs, and Planet Labs does imagery of the Earth's surface using a large number of satellites. The cool thing is, they can design, build, and deliver, and launch a satellite every three to four months. It's unheard of. Their approach, just like we talked about all the tools in their toolbox, they're leveraging things like automation, integrated software, hardware in the loop, all of those things to learn quicker so that they can build value in the shortest sustainable lead time.

Industrial DevOps Principles

What I've been describing here is what myself and a colleague of mine has referred to as industrial DevOps principles. We've seen these as patterns over time, that seem to enable us to deliver large cyber-physical safety critical systems using agile and DevOps practices. Some of the principles we would tell you about is organized around the flow of value. We talked about that. We have to relook at our organizational structure. Multiple horizons of planning. What we need to do is make sure that when we're building a system, we have an end-to-end view, an end-to-end plan. Does that mean a detailed plan? No. As the plan goes further out, the lower the fidelity is. I still have to have a plan now. Typically, let's say I'm looking at a five-year system, I might have a five-year plan, broken down into an annual plan, broken down into a quarterly plan, broken down to a sprint plan, broken down into a daily plan. What's the difference between that and a predictive plan? At each one of those horizons, I can evaluate data, and use that data to further inform the next horizon of planning. The key difference here is I'm using empirical data to inform future planning. I need to make sure that I implement data driven decisions. This is critical. We need to make sure as we're building systems that we instrument them to provide a lot of data so that we could use that data to make them better.

Everybody knows architecture is critical. Architecting for speed requires modularity, and standardized interfaces. That's whether it's physical, or cyber-physical. Making sure we look at the theory of constraints. We're going to manage queues and create flow. When we put multiple teams together, which is what we're talking about, we need them to have a common heartbeat. You may say, why? Why can't they all be independent? Because we need the system to iterate, not pieces of it. The goal is to make sure that we're moving in a lockstep. If you were to cross the country with eight of your best friends, and you were all in cars, different cars, it's unlikely that you would follow right after each other on the road. It's more likely that you would meet up in hotels across the United States at synchronized times. That's really what we're looking at. That allows you to get across the United States at the same pace as all of your friends. We need to integrate early and often. We know this, but it's one of those things that a lot of people are like, if I just wait a little bit longer. No. Every time we wait to integrate, we create risk, we create rework. Just like I said to begin with the constraint, begin with the test, so secure test-driven development. Compliance-driven development. We want to begin with that, not do it afterwards. Then, lastly, we've really found that organizations should have a growth mindset. What does growth mindset mean? It means that I can keep learning. What I knew yesterday isn't exactly the same as what I know today. I'm not afraid to take that new learning and apply it. If we know everything, it's unlikely we're able to move. Understand that this is just a journey.


We've written a lot of papers on this. Over the last five years, we've had a lot of fun with IT Revolution, building a variety of papers on what we call industrial DevOps, with a lot of subject matter experts, more than you can count. We have learned a huge amount about how to build cyber-physical systems using agile and DevOps, beyond software, so much so that we've taken it and transitioned it into a book, "Industrial DevOps: Build Better Systems Faster."


See more presentations with transcripts


Recorded at:

Apr 23, 2024