BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Cognitive Digital Twins: a New Era of Intelligent Automation

Cognitive Digital Twins: a New Era of Intelligent Automation

Bookmarks
43:54

Summary

Yannis Georgas presents the building blocks of a Cognitive Digital Twin and discusses the challenges and benefits of implementing one in an organization.

Bio

Yannis Georgas is the Intelligent Industry Lead at Capgemini, where he develops Industry 4.0 solutions for clients in Manufacturing, Energy, and Utilities. Before joining Capgemini, he led innovation projects in the fields of Smart Cities, Connected Autonomous Vehicles, and 5G Radio Access Networks as an innovation leader at Cisco UK&I.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Georgas: What if you had the power to twin reality? Picture having the ability to create a digital replica of our world to try scenarios, and see potential future outcomes. For example, you can use the replica world to try various business ideas and see which one is worth building, or create a digital replica of your body to try different wellness plans, and see how to improve the quality of your life. These might sound like concepts taken out from a science fiction book, but this is precisely the concept of digital twins.

Overview

My name is Yannis Georgas. In this talk, we will first start by looking at the history of the digital twins to understand why they're becoming more popular. Then we will go through a manufacturing case study and build one together. I don't want the word manufacturing to put you off. I have simplified manufacturing concepts in this presentation in order to focus more on the technical ones. Also, if you're not in the manufacturing sector, the concept and the technology of the digital twins can apply to many other sectors.

The Evolution of Digital Twins

Let's start first by understanding the concept of twining. Back in the '70s, NASA built two simulators as the exact replica of the Apollo 13 spacecraft. As you can see on the screen, the command module is in the chestnut color, and the lunar landing module is in forest green. The purpose of these simulators was to train the astronauts, but also to have full awareness of what is happening in the spacecraft during the mission in case something goes wrong. Something unfortunately did go wrong. The third day after the launch, while the spacecraft was on its way to the moon, one of the oxygen tanks exploded, leaving the astronauts with limited resources. It was due to these simulators and the human collaboration that NASA managed to come up with a solution and bring them home safely. In the '70s, the digital twin was not digital at all. It was physical, as you can see on the screen. The concept of the twining became digital in the early 2000s due to the revolution of internet and computers. It didn't stop there. Multiple technological advancements and trends have given birth today to a new type of digital twin, the cognitive digital twins.

Let's see, what are these technologies and trends. First of all, is 3D modeling. Advances in 3D modeling have removed the need for physical costly and expensive simulators, as we've seen in the case of NASA, back in the '70s. We have the technology today to build high fidelity replicas of the real world. The second technology is Internet of Things. How can we create a replica of our real world if the real world is not connected? In order to connect the real world to the internet, and make it digital, the best way of doing that is by installing electronic devices that use powerful networks, and they can extract data that we've never had before. This is the power of the Internet of Things, connecting the real world to the cloud, which is the next technology, cloud computing. Digital twins require large amounts of data and processing, as well as storage that can be very costly and difficult to manage. Cloud computing provides a scalable way to store and process this data, making it easier for many organizations to adopt digital twin technology. Everyone including us can go online and start building a digital twin. It is not only fast and cheap to start experimenting, but it's also giving us a platform for innovation in order to very quickly test your business hypothesis and understand if the digital twins is the right tool for you. Of course, the fourth technology is artificial intelligence. We are no longer alone on this planet. AI and machine learning algorithms can be used to analyze vast amounts of data that we collect on the cloud and identify patterns that we might not be able to see for ourselves. This means that now the AI is advising the human operator and it's raising and expanding our awareness. Last but not least, is community, tools, and standards. There is plenty of code on the internet these days and open source tools that we don't have to create a digital twin from scratch. We can already start using code, standards, and create interoperable and scalable digital twins without reinventing the wheel, or buying expensive software as it was in the past.

Digital Twin Definition

What do we mean today when we say digital twins? A digital twin is a virtual replica of a real-world asset or system synchronized at a specific frequency and fidelity to drive business outcomes. Let's pick up those three key words in blue font. The first one is systems. For many years, digital twins were only connected to assets. Today, they could also replicate systems. Systems could be a combination of assets, people, and procedures. It pretty much is everything that we need in order to replicate a real-world system. It doesn't stop on assets only. The definition is much wider. The second word is synchronized. Digital twins are dynamic. This means that changes to the real-world asset or system influence the digital twin. Then, based on the intervention of humans, or AI, or both, the decisions are influencing the real-world asset or system. There's this cycle of information and data flowing from the real world to the digital twin, and vice versa. The last word is business. Designing a digital twin always starts with the business in mind, and what the business is trying to achieve. In the NASA story, for example, we saw that the real value of the twin was to train the astronauts and help them during the mission. Always when we're designing a digital twin, we start first with the business not with the technology.

Business Value of Digital Twins

The digital twin is a tool that helps operators drive new business value. Let's see some examples together. When car automakers have an idea for a new car, they need to know very quickly if it's an idea worth building. To do so they have to build prototypes. They have to test the designs, materials, configurations. These prototypes are very expensive and very time consuming to build. Instead, what they do now is they do the designs and tests on digital twins. This concept means that they have to build less prototypes, and enables them to do faster innovation, from idea to the market. Energy providers use digital twins to predict faults and proactively fix them in order to reduce machine downtime. That ensures that more sustainable energy is produced. Pharmaceutical companies use digital twins to simulate clinical trials, which helps predict the impact of medicine to the human body, improve safety, and expedite drug discovery.

Case Study - Car Automaker Cresla

By now we know what the digital twins are, we know their benefits, and we also know why they're becoming more popular. The theory stops at this point, and we are going to actually build a digital twin through a case study. Think, the car automaker Cresla. I will take you through a couple of examples from my career that I stitched together to this one story of a completely imaginary car automaker named Cresla. Cresla is a startup. We have designed our new automobile and we're about to begin mass production. The demand for our vehicle is high. It's now bought a facility, and in that facility, we install these types of robots. We also have a workforce and materials. Also, we bought ourselves a manufacturing execution system, otherwise called as MES. MES is a software, imagine it as the brain of our manufacturing facility. In simple terms, MES pretty much tells the workers what they need to know, what to do next, and puts all these activities together in a plan that helps us make the car correctly.

This is where we come in. We are the production manager. We are in charge of everything that we presented above. We are in charge of the production line, of the people, of the machines, and everything. The business mandate that we have is to run the production 24/7 in order to meet the high demand that we're facing. One of the biggest challenges that we have as a software manager is the machine downtime that can jeopardize our production targets. The solution here is the digital twin. The digital twin can help us proactively understand which robot in the production line is about to fail, and when. That would help us then evaluate and understand what kind of mitigating actions we can take in order to ensure that the workers, materials, they are allocated accordingly to reduce downtime to the minimum as possible.

Virtualizing Our Car Production

How do we build a twin then? This is the plan that we pull together in order to virtualize our car production. The first step is to digitize our robots. We need to create, first of all, a digital twin for the robots that we have in the production line. We need to install sensors. These sensors are going to collect data in real-time, that will help us predict the machine failures. After we have done this successfully, this is only part of a wider play. The wider play is building the production twin, but in order to do that, we need the robot twin, and we also need everything else. We need the workforce, what the workforce is doing, what activities they're working in, what processes are taking place, the materials. That would help us then understand a machine failure, how it impacts the wider shop floor, the wider production line. Of course, we will need as well the factory layouts of the production line. After we understand the impact, then the final step is to identify what is the best maintenance strategy in order to reduce my downtime. This is where we need also to start looking at a 3D visualization of our shop floor in order to run these what if scenarios and understand the impact, but also understand the solution of finding what would be the optimal maintenance strategy.

Robot Twin

According to the plan, the first thing that we need to do is we need to build the robot twin. The questions that we are going to ask this digital twin is, show me the monitor equipment utilization in real-time? That would be the first step. We need first of all to collect the data and start looking at that functionality. After we have that in place, we can then start predicting machine failures based on past data. Let's look exactly how we need to do that. Before we do anything, we need to start with the fundamentals. This is what we call the shop floor connectivity, or in other words, we need to establish the right architecture. The first thing that we need to do is we need to connect to the robot. We said that we want to install sensors, in order to digitize the robot. There's a sensor, as you can see in the screen, that we're going to attach to the robot. This sensor is going to collect temperature and vibration from the motor. This is one of the first things that could go wrong. Because of wear and tear, moving parts are the first things that would ask for maintenance. What we're going to do is install a sensor at the robot's motor.

The second thing is the status and utilization. We will need the status and utilization in order to create a dashboard, to look at the KPIs and understand the utilization of our machine and status. What we're going to do here is we're going to connect to something that we call PLC. PLC stands for programmable logic controller. It's exactly what the word says. It's a controller, so we give some instructions to that controller for the robot to execute certain kinds of activities. The next thing that we want to do is we want to connect to the MES, and we will need it later on. If you remember, we need data from the robot, but also, we need data from everything else and everyone else. We need to pretty much model the whole production line. We cannot do this if we do not connect to the MES. The MES is the execution system. The execution system has data, such as, when did a specific activity start? When did it end? Are we behind schedule? Who is the allocated worker, or workers? What are the materials that we're consuming at that part of the production line, and what kind of work orders we are executing? All of this data is important, and we need to connect to the MES in order to collect it.

IT/OT Convergence - The MQTT Protocol

Let's first look at the architecture that we would usually find in manufacturing facilities nowadays. The ISA-95 model is a layered architecture. You can see at the bottom, at level 0, we place the sensors and signals, then the PLC that we already discussed, SCADA, MES, and finally, ERP. These are typical systems that we find in a manufacturing production. The combination of ISA and OPC-UA, which is a protocol for communication between machines, is an architecture widely used for decades, but it's not the optimal. It usually ends up creating a complex architecture environment dominated by proprietary tools, custom point-to-point connections, data silos, and technical debt. Today, the world is shifting from software driven architectures to data driven architectures. What type of architecture do we want for our shop floor? The architecture that we want for our factory has the MQTT protocol at its core. The MQTT broker receives a message which sees topic and payload, and puts it in a queue, and then passes on the message among the systems that are subscribed to that certain topic. For example, as we see in the slide, the temperature sensor will publish its value on a temperature topic, and a client that is subscribed to that topic will receive the value. If the client is not subscribed to that topic, it won't receive it. MQTT has a wide variety of applications, some of which we are already using, such as the Instagram and Facebook Messenger. The architecture that we see on the right side is now our factory floor, and has the MQTT broker at its core. All the systems now, they're communicating through this broker.

The benefits of that architecture is, first of all, simplified data management. The MQTT uses a predefined data model that standardizes the way that data is presented and managed across all devices on a network. This makes it easier to manage data and ensures consistency across all devices. Reduced network traffic is another one. Again, the MQTT devices only need to send data updates when there is a change in data. This reduces the amount of network traffic and conserves bandwidth. Improves security. There is an inbuilt security feature such as the message authentication and encryption. This helps protect data from unauthorized access and ensures that the integrity of data transmitted is secure. The fourth item is faster data processing. The MQTT allows devices to publish and subscribe to data in real-time. This means that the data can be processed and acted upon more quickly, which is an important industrial IoT application, where fast response times are critical. The last but not least, is interoperability. The MQTT is designed to be interoperable with a wide range of devices and applications, making it easier to integrate into existing industrial IoT systems.

Now that we have established an event driven architecture for our production, we connect the sensor to the cloud and we start receiving the data. On the left side of the slide, what you see is the software development kit. This is not the whole script, but it's the most important part. The top part actually connects to a certain topic with the cloud, and it establishes a handshake between those two by using the certification that the device has, and the cloud also accepts. By establishing the secure connection, then what it does is it starts transmitting the data that it's listening from the temperature sensor. When the temperature sensor collects a new value, then this value is sent to the cloud. This specific example, you can see the JSON payload that we received from the sensor. We can see there, there are two values. One of them is the temperature, and one of them is the timestamp.

Creating the Knowledge Graph (Data Model)

Now we have data flowing to the cloud. The next thing that we need to do is we need to enable our digital twin to understand the data that we receive. This is why the second and very important fundamental element that we need to build is what we call a knowledge graph. In short, this is the data model. The data model is what the digital twin uses to understand data and structure data. The simplest data model that we see on the screen is the one between a subject and an object, that they are connected through this arrow that establishes the relationship between them. The first decision for building our production line data model is, if we're going to develop an RDF model or a Labeled Property Graph. The RDF, which stands for Resource Description Framework is a framework used for representing information in the web. It's not new, it has been with us for years, decades. Because it's a standard, the RDFs are focused on offering standardization and interoperability so our company can use them internally, as well as externally to share data with our ecosystem. For example, with our suppliers. The property graphs, on the other hand, are focused on data entities to enhance storage and speed up querying, and require less nodes for the same amount of entities. I have prepared an example on the screen just to show you for the same amount of information, how the Labeled Property Graph has only two nodes, when the RDF for the same amount of information has six. As you can see for even a small number of nodes, the RDF can quickly explode. It has the benefit that since it's a foundational framework of how web information is structured, then other companies could have adopted this as well so it will make interoperability easier with our ecosystem. This is a tradeoff.

In the case that we are presenting here, we decided to go with the Labeled Property Graph. Using the Labeled Property Graph, we create the knowledge graph for our robot digital twin. As you can see, on the left side, we have the robot in the middle that could take values from robot 01, which is a specific robot on the production line, all the way to how many robot serial numbers we have. As you can see, all the other nodes are connected to the central one. We can see what kind of operation the robot is currently working on. What are the resources that it's consuming? We can also see the status, the utilization, and also, we can see the vibration and the temperature of its motor. All of this data is the data that we are going to use for building predictive maintenance in the monitoring use cases. Now on the right side, you see an example of the CYPHER queries that we use. In this case, for creating the knowledge graph, I use arrows. Everyone can use this online. Also, we use Neo4j, as you can see the queries, just to get an idea of what the CYPHER language looks like. If you want to go with RDF, which is the other rival, you can start very quickly with Protégé editor. You can use Protégé in order to start building your knowledge graph there. It's open source. If you find the CYPHER language maybe a little bit tricky to learn, then I would recommend the RDF, because the RDF uses a more SQL-like language, which is called SPARQL. If you are already on board SQL, then I would definitely recommend RDF.

The knowledge graph is very important for our digital twin, for the following reasons. First of all, it's going to structure and organize information about the digital twin. It is going to enable the digital twin to continuously learn and adapt based on new data and new domain knowledge and insights. It's also going to be our single source of truth. It's going to improve the accuracy of machine learning and AI models. It has certain advantages over the typical relational databases. For example, it doesn't require you do any complex joins for certain cases. In this example, we build very quickly, but if we're developing a production grade version of a knowledge graph, then I must say at this point that we need a more coordinated approach. Because if we want to build a knowledge graph, it is a multidisciplinary effort. It's a group of IT people deploying a database and creating a data model. It needs coordination between different departments to come together to agree on a common language. All of these nodes that you see on the knowledge graph would be representing data domains, as specified by strong data governance. That would, at the end, create interoperability of our data models. We will see also later on that a digital twin is a combination of various other digital twins, that they all come together to get a better understanding of our focus on what exactly the use case is. In this case, we're looking at the robot digital twin. Later on, we will see how the process twin are going to nicely connect together and create our understanding of the shop floor.

Robot Twin: Monitoring Dashboard

Let's now pull together the event driven architecture that we built at the first part, and the knowledge graph that we just presented in the previous slide. We're going to bring them into this architecture. This is a very simplified version of a digital twin. It has five logical layers. Starting on the right side, this is the production manager. The production manager is the user persona, is the person who is going to use our digital twin. In this case, it uses the robot digital twin. Our production manager is asking the question, what is my robot utilization and status? In order to fetch the information to him to this dashboard, we need, first of all, two things. At the bottom, at item 2, we have the data model. This is the data model. Like us, they created the data model. Now we're using a CYPHER command to recreate the data model within the knowledge graph database. Now we have a knowledge graph stored in the Graph DB. The Graph DB is going to take the values that are coming from the production line, from the robot. The robot is connected through an OPC-UA to our MQTT broker, the one that we presented before, and it's transmitting its data, the utilization and the status. This data is stored at the time series DB. This acts as a historian of the shop floor. The next thing, with a serverless function, we take the latest data, the latest utilization and status data, and we update the knowledge graph. Now the knowledge graph, what it is, it's a graph database, but it's a graph database that has the latest information about our robot. Then we present this in a 2D dashboard, because at this point, we do not need a 3D digital twin. The visualization of the digital twin is defined by the use case. In this case, where we are visualizing only KPIs, only numbers, and maybe charts, we do not need to have some 3D visualization of our shop floor. This is, for information, the serverless function that pulls the latest information out of the time series DB, and sends it to the graph database. As you can see, there are two functions. The first function that is defined is actually a CYPHER query that replaces the value with the latest value from the time series database. Then the second function is connecting to the API of the knowledge graph. After it connects, it uses the first function to replace it.

Robot Twin: Predictive Maintenance

Building on top of the previous use case, which was the monitoring use case of monitoring the robot data, and have some basic KPIs, the second use case that we're going to discuss is the predictive maintenance. We want to know when the robot is going to break down. Now we receive a new pair of data, which is the vibration and the temperature. How we're going to do this is at the data product layer, we are going to introduce a notebook or a container that runs the trained model. This trained model is going to predict the next failure. We have previously trained the model using historical data collected from the time series database, and we found the records from before, when was the robot offline? The new now serverless function, what it does, it populates the knowledge graph and shows a probability of failure. Always when we are predicting the future of our machine, rarely will we have 100% confidence. We are looking at building a model that is going to be trained and has, as high as possible, accuracy. How we are going to do that is we are going to train, we're going to deploy, and then we're going to watch our model for any data drift. Because if that happens, then we need to revisit the model. I have a question for you, can we build a predictive maintenance use case without the knowledge graph? Let's consider that we store all the data at the time series DB. Then, at the time series dB, we run the model, we look at the data, and we are able to predict when the robot is going to break down. Can we do this? The answer is, yes, we can. We can actually develop the predictive maintenance without the knowledge graph, without the Graph DB at all. The problem is that if we do so, we will not have a digital twin. We will have what we call a zero-knowledge digital twin, which is an ad hoc project. There is no interoperability. There is no single source of truth. These types of solutions, they're ad hoc, and companies really struggle into scaling them up. There's also a lot of overhead costs required for a company to maintain these ad hoc solutions. I highly recommend to avoid creating a digital twin that has no knowledge graph. You will still get temporary. You will still get the value. If you're looking at creating a digital twin that is going to last for the whole lifecycle of your asset or your system, then I highly recommend to create a knowledge graph.

Let's dig a little bit deeper on how do we build the cognitive function of our twin, the one that we described before. In order to train our predictive maintenance model, I would like to show you how exactly we do that. I will not dig too much into the details and into the code because these types of predictive models would need a whole session just by themselves. They vary a lot from use case to use case, because the specifics of the asset or the system, or the specifics of the data, the specifics also of the available records varies. In this case, what we have on the screen is we have a chart of temperature over time. As we remember, we're collecting the telemetry data from the model. As you can see, how we did it here is we have tagged the data with three types of colors. In green is the normal operation. In red, where the x marks, is the failure that we noticed. In blue line is the recovering mode. This is how we tag the data to be used for training. When the model now is trained with enough data, you can see that it starts giving us predictions of what is going to happen in the future. That light blue color is where the algorithm predicts the next failure. In this case, we used a random forest regression. We didn't use it only on temperature, we use it also on vibration. For simplicity, I'm only showing the temperature. As I said, there are other models as well. There's also LSTM, depending how much data you have, and if you have tagged data, in order to be able to develop such a model. What I do recommend, though, is that, many times, companies want to do AI, but they actually start by attacking the problem head-on, when they should be looking at how to create a strong data governance foundation, in order to collect the data and curate the data and make it ready for training. This is most of the time that we actually need to spend to ensure high accuracy for our data models.

Production Twin

Now our robot twin monitors all the machines in the factory floor for utilization and for failures. Let's say that our robot twin predicts a failure within the next five days, we need to build a production twin to answer questions such as, how does that failure impact my production plan? Or, how to work around this failure to ensure minimal disruption to my production. We will start by building the knowledge graph of the production twin. On the left, we connect to the MES via an API, and receive the tags as shown on the screen. On the right side, we take the tags from the MES and we place them as nodes of our knowledge graph. This knowledge graph describes only the data that is coming out from the MES. The data that is coming from the MES is, what kind of operation is taking place, according to what workplan? Who are the workers that are involved in this operation? What kind of work order they're executing. As well as what their competency is. This is the knowledge graph. What are we going to do now? We're going to merge it with the robot knowledge graph. Now, we have two knowledge graphs merged together. If you notice, the operation node in the middle, and the resources node, they are the same for both the robot graph and the resources graph. This is not a coincidence. In order to build a knowledge graph for our digital twin, we sat together with the manufacturing, the testing, the design, the procurement teams, and we all agreed on the common definitions to what we call data dictionary. This created a common language between the department that enables us now to build interoperable and scalable knowledge graphs.

This is the use case for optimal maintenance strategy. The digital twin in this case collects data from the robot, and predicts that one of them is going to fail in the next 5 days. It then looks at the data that we collect from the MES, and it shows us the impact to our production. The production manager can then test what if scenarios in a 3D environment, and identify the optimal strategy to ensure that the production runs with as less interruption as possible. In the previous architectures, we delivered the digital twin in manual mode, meaning that we modeled and developed the knowledge graph from the ground up. For this use case, we decided to go native and use the digital twin from one of the hyperscalers. Why, you might ask. First of all, our digital twin needs a 3D visualization, so we needed a scene composer to link the 3D models to our data. There is other software that can do this job as well, software from the likes of Dassault, Siemens, and others mentioned. The second reason, the hyperscalers offer the digital twin as a managed service, so they place the knowledge graph in the backend, and they fetch it off with an API. This means that we do not go into the specifics of the Graph DB, but we do expect a scalable, reliable, and available service that is very important if we want to deploy a digital twin in a production environment. Before you choose a hyperscaler, ensure that you can import and export to the format that you're creating in the knowledge graph, because you want to avoid lock-in.

Let's look at the slide. In numbers 2, 3, and 4, we have identified three data paths. In hot, we have the robots that they will be sending their data in seconds. The warm is within minutes, and it's in amber color. The blue is collected daily. All of these data is collected, as you see in the middle, to the data lake. This is where also we are running our predictive models and our analytics. You might have also noticed that there is a service that I refer to as Auto-ETL. Auto-ETL is using crawlers to automatically discover datasets and schemas in the files that we store. Also, in cases where we have drawings, we can even use computer vision to pick up engineering data from drawings. We are using it in order to recreate the 3D environment. As you see at the bottom, in item number 5, this is now where we upload the 3D models to our knowledge graph, and we link them with the data.

Visualizing the Optimal Maintenance Strategy

We managed to create a virtual manufacturing environment in 2D and 3D, where we can safely run what if scenarios and see the impact of our decisions, without disrupting the real production line. We can see the past by looking at historical data, as if we are looking at a video replay. We can also see the future of our production line, based on the data that we have today. We have a full cognitive digital twin that can successfully help the shop floor manager identify optimal maintenance strategy. Our digital twin helps Cresla to monitor their equipment availability in real-time, predict machine failures, understand the impact of these failures to the production line. Then identify the optimal maintenance strategy to reduce downtime. The factory can now run with as less disruption as possible, and meet the demand. The business stakeholders are ecstatic.

 

See more presentations with transcripts

 

Recorded at:

Jan 26, 2024

BT