BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations Data Mesh Architecture Applied to Complex Organizations

Data Mesh Architecture Applied to Complex Organizations

42:07

Summary

Nandakumar Heble looks at the basic construct of a data mesh and how one might go about applying it.

Bio

Nandakumar Heble is a Data Architect with many years of experience architecting and designing data applications in multiple domains. More recently, he headed the data architecture and governance practice in UBS – Investment Banking. He is now involved in the ambitious initiative of building a Data Mesh in UBS.

About the conference

Software is changing the world. QCon London empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Heble: I've been with UBS for almost four years now. I've been fortunate to be working on this new concept, which many of you might have heard of, but just want to share our experiences. By no means is the journey complete. I'm not standing here to tell you how we have conquered the mesh challenge and how we have built this great ecosystem. We have not. Is there anybody here who has not heard of data mesh? You might think mesh is the flavor of the season or the year. We had data warehouses. We had lakes. Data warehouses have been around for a long time now. Then we had lakes on commodity hardware. They've all had their problems. I'll show some of the problems we have been facing in UBS. As we move to the cloud, they are creaking, and we know what these problems are.

Zhamak Dehghani, when she was part of Thoughtworks, came up with this concept. It's probably about three, three and a half years now. It was called her name, termed a new paradigm. There were four key principles. I'm not going to describe mesh. You can read it up. It's a wonderful book to read. The four key principles that form part of the mesh, one is the concept of domain ownership. The mesh tries to bring this together. The first concept is domain ownership, that is, data products, or data in general is owned by the people who produce it. Data as a product which follows on from them, effectively, you package data and sell it like you would sell products on Amazon. Why shouldn't it be presented as a product? It's not so much techie.

Traditionally, data is treated as something that technical people have to do, and the business people just give you written requirements. It moves from there where consumers, or business can directly go and consume it themselves. Self-serve data platform, I'll talk about that a little more, but this is all around the tooling that must be provided to build these products. Finally, governance. Governance is super critical. This is also part of the core principles of the mesh.

Backstory

I'll just take a step back to a story. This is a story. If you go back to the 60's, even before I was born, we had the Boeing 707, DC-8. This was the age of the turbojets, single aisle aircraft. People were very excited, you could fly across continents in a few hours, and not days and weeks, as used to be the case before. These aircraft didn't crash and burn every so often, so it was safe. As the passengers kept increasing, it was expensive and airports couldn't handle the additional load. There was a challenge, what do we need to do to make travel cheaper, easier, and more accessible? Very similar problems to what we face with data today. Then came Boeing 747, 67' or 68' I think is when the first aircraft was produced.

I was fortunate, because my brother-in-law used to work in Boeing, and he was part of the first 747 program. He fondly remembers this. He had a tear in his eye when British Airways had the last flight of 747 a few years ago. It just didn't happen by accident. There were a number of things that came together for this aircraft to be formed. That little picture that you see in the middle, that team was called The Incredibles, apparently. They were the designers and builders who were involved in the very first program. Some of the things that came together for this aircraft to be built, first of all, four engines were there before, but this was twin aisle, so it was huge, and you had that distinctive nose, and the floor on the top. One was wide body design. I'll just cover one or two things here which are important to call out. One was the thing around engines. Engines became a lot more powerful, which means it could carry huge loads.

Even today, the A380, amazing aircraft, I can't believe that it weighs 540-plus tons at takeoff. It's an amazing feat of engineering, and all because of engines and wing flaps. As small as they might seem, wing flaps have played a very key role. Then, finally, use of fault tree analysis. We may all think this is fancy engineering and metal fabrication and so on, but even things like fault tree analysis have helped make flying safer.

The Complex Enterprise Problem

It's very similar now. If you come to UBS, what we have been doing, we have highly valuable data. Every organization has highly valuable data. Within one area, the division that I work in, investment banking, we have close to 160 petabytes of data largely on-prem. We are moving to the cloud, but we do get a lot of value out of it. We use it internally, not just for risks, and reporting, and stuff like that, but also to derive value. We sell data to our clients. We package it up, but it's becoming harder. We are also actively moving to the AI/ML age, and it's harder. We have lots of data, but it is siloed.

Financial institutions have this problem, especially those of us who come from an on-prem estate. We have tried to solve the problem by joining it up. We extensively publish data, but it's not good enough. Producers have this feeling of fire and forget. Why are you coming and asking me for my data? I've just published it, go consume it. It's over there. Then we spend three to six months trying to get access to it. I don't know what data it has. I have to ask people. That's the third problem, hard to access.

What if we were trying to pivot? I think there was this realization that if we move to the cloud, it's going to be hard. We can't lift and shift applications. Even though we have microservices and all these fancy cloud technologies, autoscaling, multimodal databases, and so on, we couldn't possibly imagine having 160 petabytes of data on the cloud, because the cost would be astronomical. Even within the company, we have stopped saying cloud is a cost play. It's not a cost play. It might be an efficiency play, but there is hard work to be done. We felt mesh is something good. Mesh will help us if we do it the right way.

Democratization, it's not so much about just building data once and so on, but we really need to democratize, make data freely available. We have some internal challenges. There are problems that have been posed for us to solve. Cloud adoption, a lot of the things that we will see really can be done on the cloud. Then we have some modern architecture standards that our CTO has published. We call them digital principles that we have to adhere to. We felt that the mesh in combination with using the tools that cloud providers have will help us achieve this.

Producer, Consumer, and Governance Conundrums

I'll now start with some conundrum. This reflects the problems we have had. We still have this. Like I said at the start, we haven't solved this problem. One is the producer conundrum. I do this. It was my day job till very recently, when I was heading data architecture. I talk to the various streams. We are agile. We are organized along agile principles. It's good. We do produce software in agile, and it's very quick. When it comes to data, producers don't know what the right data is. As I was saying, producers publish data. They don't know who consumes it. They have a fair idea who does it, because those people come with problems, but otherwise, there is no register of who is consuming what. It's very hard to know. Producers, by default, use their internal models.

Most of them are relational, because we come from the Oracle, SQL Server world, so we have got extensive physical models, unfortunately, not documented well, but we'll just publish it. If somebody comes along and asks for data, we just publish it in our internal physical model. Then we say, go figure. Data products in a consistent fashion. This came up in the Trainline session. In many ways, I think we use probably every technology known to mankind, especially in databases. We use every possible flavor of database.

That's hard, because the consumers have to know how to interact with that database. Producers would like a consistent way, they like to be told how to do it. It just makes life easy. Then, how do I then publish my products in a way that people can understand it? When we go and ask people to produce data products, it's easy to say, produce products, but how do you do it? These are some of the questions that we get asked.

If you flip it on the consumption side, very similar. Consumers look for data. They don't know where to find it. Most often, they start with data that they have been consuming for many years. Believe it or not, the conversation is around the files that are transferred, they're largely batch. We have these very long-winded conversations. When you ask, what data do you need, they say, but I can only tell you what data we consume. Show us what data you consume. It's not electronic, no digital, nothing. It's an email with 100, 200 pages worth of attributes with file names. You can imagine, most of the people who produce the data are no longer in the company, because this goes back 15, 20 years, so you have to sit and trawl through all of this.

Again, surprise, those attributes are physical attributes, so you then have to go and look at what it means in the database. These are the kinds of conversations we have. Data model is very important. Consumers want to understand data models more easily. I'll talk a little bit more about it. Data quality, we have some instances where data quality is managed entirely by the consumption teams, simply because they may face off to the regulators, or it's used for balance sheet purposes, or something like that. This is handled much further down, which means we may not get it right, or in the worst case, we keep doing the same thing multiple times, because the producer could have done it once, consumers do it multiple times.

Finally, the governance conundrum. Most of this is done manually today. We have systems, but there are people updating things in the system. One good example is master and authoritative sources. All financial institutions have to adhere to a number of regulatory standards, which say data has to be consumed from master and authoritative sources. We have design authorities which review this and then maintain this manually in a system, things like that.

Goals

Our goals, when we started the data mesh initiative, we were calling it by other terms. We used the term ADA, for example, for a while, and that's changed. This hasn't changed. Our goals have always been better governance. We want governance teams to look at data, metadata, the way it is presented, without having to ask people, so data governance, management. Technical simplicity, I spoke about the producer conundrum. Then, improved business efficiency. This is from a consumption angle. This has always been our goal. We have been doing it in various ways on-prem, but they have been with suboptimal results, but we want to do it properly on the cloud now.

Data Product Types

This is one slide I've thrown in just to show the kind of data products that we want to build. Again, this is standard. Zhamak talks about it in her book as well. The source aligned products that you see at the bottom, we expect hundreds of them, if not thousands, but definitely hundreds. If it goes into the thousands, we probably have duplicates. The idea is, everywhere that the data is actually produced, mastered, and we have a very complex environment. Investment banking is huge, complex, spans multiple asset classes. UBS being global, we span across multiple countries, so we would see a number of instances of these source products. Wouldn't it be nice if this is all registered? You have an Amazon of this.

A good example of this is Hugging Face, which I look at. You could see all of this in one place with a common model. That's one. People who produce this may not necessarily know, or will have no idea how exactly it's getting used. If they could produce this data in a consistent way, we then have the aggregators and the consumer aligned teams coming in. This could be the AI/ML teams, Reg reporting teams, financing teams, whoever. These are the people who consume this source aligned data. You can see some examples of this. Maybe there'll be someone interested in the consumer products. You could call this bronze, silver, gold in warehouse terminology, very similar, but this is the concept.

Architecture

If you look at the architecture, and I'll talk about the problems we have faced when we try to do this as well, this is what we believe are the key components of the mesh. You have the data product right there in the middle. That's our objective. That's what we want to build. This could be anything. It could be data and sat in a relational table. It could be NoSQL. It could be a JSON message on a Kafka topic, anything. There is the concept of a distribution, so you can distribute data products in any fashion. It could be real-time. It could be batch, anything. To build that data product, you have to have a number of things that are required. Remember the Boeing 747 analogy? You need a number of other tools and technologies involved. We've got the taxonomy. I spoke about the model. The question is, how do I describe my data model? I need a taxonomy.

In our industry, for example, in financing, investment banking, we have got something called common domain model, the ISDA model. Every sector, healthcare, automobile, everybody has some form of a common domain model. The question is, can we adopt it? Can we enhance it where there are gaps? It's hard work, but a taxonomy definitely helps. They will always be internal models. We always customize products. Yes, we need that. We need some mechanism of registering this data product. It has to be in a registry. Should we build a registry internally? Is there something available out there in open-source, vendor product? We'll look at it. Registration is a key part.

This principle is, you cannot consume any data that is not registered. There's no concept of just connecting to your friend's data sat there in some database under his desk, not allowed. Registration becomes important. The challenge is, how do we make this as easy as possible, as simple as possible? Then you have got lineage related to governance. Then, as you go down the chain, you've got the tools that are required to monitor and consume this. You got the marketplace. I've thrown in some industry standards, which I'll refer to later, but these are the standards. Some of them are mature. Some are still maturing. This, we believe, is what we will need to build the mesh ecosystem.

This is just one view of the stack. I won't go through every box, but some of these will be familiar. We have used this even in the warehouse days. We've got ingestion pipelines. We have got data management. We got these tools. Those tools tended to be very use case specific, division specific. We use that in a certain way in an Oracle infrastructure based, or real-time streaming architecture. They tend to be different. If you look at it again, bottom to top, and I showed you the three types of data products. You've got the source, that's the foundation layer. We expect all producers to produce data in the same fashion, in a consistent way. Then we have got platform service, and this is where we have been facing most challenges. I'll talk about that. We haven't solved it.

Then, on the top are the nice, sexy bits: analytics, AI, ML, GenAI, all these things that we keep talking about. How do we do it today? We probably take a copy of it. This is what we want to try. The holy grail of data mesh is access in place. Wouldn't it be great? It doesn't matter whether I want to produce a Power BI report, an ML model that's doing some recommendation, driving a recommendation engine for clients, but they both access data in place. That's what we are aspiring here. The one difference to what Zhamak had originally said, I think it's evolving now, was data mesh was built more for the analytical space. We've heard a lot about microservices.

The problems of the operational world have already been solved, because you've got microservices. Analytics is still a problem waiting to be solved. We just spoke about warehouses and lakes. The way we have been looking at it is, I spoke about distributions, if we could solve this both from a batch and real-time perspective, if we could provide both data at rest and in motion, then what else would an operational consumer need? That problem is solved. That's what we are trying to do now.

The Good and the Not So Good - Idyllic Skyline

This is what I'm now going to talk about, is the journey we have had over the last 18 months. This is the idyllic skyline. We want to build a grand city. Imagine Milton Keynes, we are just about to build it all over again. What would that city look like? Before anyone comes to live there, we need roads, we need water lines, electricity grid, probably a Trainline, tram, whatever. We need all of these to be in place. We then have to build schools, hospitals. There are some buildings that we will start having. This is all ahead of the people coming and living there. What we are trying to do in the mesh is we are building these common things, data platforms, for example. It could also be standards, tooling. We then want the producers to come in and start using them to build things. You can imagine you are building all these factories, and the factories produce goods.

Then, they don't have to be shipped off somewhere to another country or a continent, we have consumers who come and consume it from there. This becomes a self-sustaining city. This is how it looks like, wonderfully. Where are we today? It's not a set menu. It's a la carte. We get into a restaurant and we say, could you show me what your vegan menu looks like or something? They give me a menu with 15 pages, and then I have to go through it and choose the four items that are of interest to me. Looks great, because anybody who goes into that restaurant probably has got something they can eat. They will never walk out without eating anything. You can imagine how difficult it is if everybody has to do this.

In our world, if you want to go and build something on the cloud, we have got wonderful products, tooling. We have got vendors. We have got open-source tools available, but every team has to choose the tech stack for every single use case. We can produce standards. We have already done a lot of it, but we still struggle. What we would like is set menus, options 1, 2, 3, which covers 80%. You just choose. You ask a set of questions, and then you get the answer. Then you always have the 20%, of course, which is the exception.

Registry

What are the things that have gone well for us? Fortunately, we have built the registry. That's the key part. As we go along, I'll try and talk about some of the open standards. We have built some custom tools, which I can't talk much about. Some of the open standards, the W3C standard, for example, is very good. There are two or three standards they have produced, one called DCAT. This is an excellent standard that underpins a lot of the registry. Many of the open-source registry tools that are out there use DCAT. We have extended this. This is good. This has been successful because we have spent the best part of two years building it. We couldn't find one tool in the market that could integrate into SDLC, so we had steps where producers could not publish their data without the metadata being captured. You might think this is a very small step, but traditionally, IT teams build, publish data, it's gone.

Then when you go around trying to find out the metadata, it's manually maintained. It's out of sync. No guarantee that it reflects what's in production. We've integrated that into SDLC, so you can't publish anything without it being registered. It forms part of the standard governance process, which means it cannot be deployed into a higher environment without somebody from the data management office reviewing it, checking the mappings, and approving it. That's the level of integration we have done. We have had to do this. We have used some open tools, standards, and then built an application around it. This is all hard graph. It takes time to do.

The Good

Data quality, we are not there yet. There are, again, a lot of data quality tools, but these work within the boundaries of the vendor solution, or even where it is open source it works in a certain portion. It might work well for batch, but not quite well for real-time. Things like that. Again, two good standards. One is called DQV, which is the Data Quality Vocabulary. Again, I was talking about taxonomy. The way we describe data quality rules differs. If I speak to three of you here, we probably have three or four different ways in which we describe data quality. This helps us to standardize that. DMN is Decision Model Notation. It's an OMG standard. It forms part of the BPMN suite, business process modeling standards. This is particularly good, because you standardize the taxonomy, and then you can write these rules in a machine-readable form.

Under the covers, it's just an XML payload. You have a UI which business people can use, so you can write rules in the form of tables, expressions. These are the two tools that we are using. Again, we are having to do some build internally, but there are open-source tools. DMN is one. You have the entire Kogito Quarkus framework, which can be used to build these rules. We are going to be using these open-source tools ourselves. Again, needs to be integrated into governance, which is a solution we don't have. We want rules to be defined, attached to data products, approved. Again, you can imagine this is the circle, virtual circle. Everything we do has to be part of this circle.

Developer Tooling

Developer tooling is probably where we are most behind. Cloud is great. You got lots of solutions. Any problem you have, there is probably a solution that fits your bill but doesn't work well with someone else's solution. Backstage we have looked at. We had a presentation of Backstage here, plugins. That's an open-source tool that we are looking at to simplify the way producers build it. The consistency that I was talking of before, we want to achieve by the use of Backstage. This is early days. There are some tools that have been made available via FINOS, the Open-Source Foundation. You've got Legend, for example. We are trying to build something ourselves here. Again, the idea for us is model-driven development. I am particularly passionate about this, because if you start with the model, then it just makes products that much more easier to understand.

Data Access, Data Security, and Lineage

Access is probably the most difficult area. By no stretch of imagination have we solved it. We still have silos, even on the cloud. Because we have been talking very closely with vendors. It's extremely hard because our data spans geographies. You can't see Swiss data outside Switzerland. You can't see Chinese data outside China. Somebody in China moves out of the country, can't access the data. We have got every possible combination of access control rules, info barriers. Wealth data has to be masked, and there is no easy way of doing it on the cloud. Fine-grained access control, so blob stores, for example, don't support row and column level access control. You can use something on top, Databricks, for example, supports it.

Then, how does it coexist with other persistent stores? How do we get NoSQL databases to interoperate in this? Then, finally, how do we get analytics and operational consumers both going through the same access control framework? ODRL is not a machine-readable language, but that's something that the industry is trying to push. We are saying, you publish data product. It only solves one part of the problem. Where the industry is moving to now is to say, we publish products, and here is a license file that goes along with it. The license file is machine readable. The London Stock Exchange is a good example. They publish thousands of datasets every day, licensed. They are looking to see if the consumers can implement ODRL, whereby you don't need people, and documents to be exchanged. The ODRL effectively implements the contract. We need all of the other things I mentioned to be available inside the organization for this to work.

Security, here we are looking at Open Policy Agent. We don't have an implementation of this yet, but this is something we have been looking at. All of the policies I was mentioning before, we want to be able to define and attach it to the data product. Again, we don't want this to be done by every consumer. You can imagine, you've got hundreds and thousands of datasets, you want to access it, and then every time you find something interesting, you can't see what's inside of it. The only way you can see it is by asking someone permission, so that goes through a process, a workflow.

You get approval, you access it, and then realize it's actually of no use to you. Then your cycle repeats. What we want to do is attach it to the data product. You can go see what's in it. You can see sample data. If you're allowed to use it, that's it. You don't have any request. You just go use it. If it's some restricted data, then you ask for access. Lineage, a difficult area. We have solved parts of it. Again, lineage, there are tools out there. We use Azure extensively. The challenges are, it, again, works real-time. It might show you date lineage for a certain kind of technology, say Java, but then won't show you for Spark. It doesn't show you business logic, so front to back it's hard to see what the business logic is. We have had to implement metadata capture utilities that will then push this into the lineage tool, using Atlas API. There are some standards. Atlas is a good standard.

The challenge remains as to, with all of these different tools and technologies available on the cloud, how do you use it? Traditionally, on-prem, we were using a tool like Informatica. Where we're using ETL based tools, it's fairly straightforward. Use Informatica, great rich tool. It gives you a lot of functionality, lineage being one of them. It gives you data quality. When you move to the cloud, it's harder because you use different tools, and you're trying to optimize cost, performance, but we need to have to capture lineage.

Learnings

What have our learnings been over the last two, two and a half years, good and bad. One, cloud is the way to go. We are not regretting what we have been doing. We have to move to the cloud. That's the best way. It also is a cultural shift, because, traditionally, we build these nice silos. We hold data around our own data. We got 42 warehouses in investment banking. Because I won't trust someone else's data, I'll just do it myself, because it's not documented, so the best way to do it is do it ourselves. We are saying cloud helps us to smash these because we will not allow lift and shift, so this is the way to go. When we looked at the industry, especially, it was 18 months ago, things are changing now. The most common offering was the lake. You would see people talk about mesh.

When you actually open the cars, the terms used for the tooling would always be a lake. You had data lake, open lake, whatever terms you use, they had lake in it. You wonder, is it really just lake under a different name, with a different name? That's still there. The primitives are available. Everything I've described so far, I'm sure you can point to a particular tool, technology, either open source or vendor, that meets that requirement. I just named some of them for data quality, access control. You have them, but we have found no easy way of linking them together, making them work together seamlessly. Producers have to do work. Consumers have to do work. That's the challenge we are trying to solve. Industry standards are maturing. I think OMG and W3C are very good examples. They are doing a lot of work. We have a good team within UBS working extensively on some of these standards.

We, of course, have been traditionally using the financial standards, FIX, FpML. We have been very active in that, we use that. I think on the data parsing messaging side, we are pretty comfortable. Where we have challenges is in standards that define how the mesh should be built and sustained. Access control, it needs a lot more work. We need that integrated with the registry. We are working with some of the vendors to see with other tools that we could adopt and extend, but there is work that we'll be doing internally as well. In a way, that's helping, because we are breaking the organizational barriers. Once these applications are siloed, and then we are bound by these organizational boundaries. I don't know whether it's agile or it's just the desperation to solve this problem, we have managed to break some of these walls down. We do collaborate extensively across divisions.

Some of these solutions we have like the registry is a shared service across the organization, and individual divisions have volunteered to do this for the rest of the organization. It's a good example. I think in many ways, mesh has helped us to break some of these barriers. Then, finally, the single plane of glass, we don't have that. There's no one place. You can go to a marketplace we have, you can see what data products are, but you can't see any of the underlying processes that are building. IT monitoring is still hard. IT monitors have to be different. Business views are different. Our challenge remains, how do we bring it all together?

It's not all negative. We have produced a number of data products within the organization. It's been a great learning experience for many of us. I've laud it on my part. If anything, instead of working on a problem that's already been solved, like warehouses or lakes, for that matter, it's better to have this, because this, I think, is a very interesting area. Some of the technologies we have heard about are all being used here by us. I'm sure there are other organizations doing something similar. Some of this work will be open sourced. We are actively thinking of open sourcing some of the work we are doing.

Questions and Answers

Participant 1: One of the issues that you were mentioning earlier was to determine what would consist a good data product. I'd just like to hear a bit better how you overcame this issue. Let's say you have operational teams that are running their applications and producing operational data. Were they the ones deciding what was going to make a good data product, or they had something a bit more central like a data owner. How did it work for you?

Heble: We haven't solved the problem of data owners owning the data and publishing in a common domain. That's our objective, but it's not been easy to do. What we have done is we have identified what we call an authoritative source, one consumer who pretty much does this on behalf of others, and they have a common model, that is then used to publish this data. Essentially, it is hybrid mesh. I didn't use the word hybrid because it looks like a lake. That's the best way to solve the problem initially, because we otherwise have this huge problem of convincing every single producer to start adhering to this, and it's difficult. It's almost impossible initially, because everybody wants to see it working before they can adopt it. We said, let's try it in one place. That's the way we have done it currently.

Participant 1: How are you keeping data in sync? Let's say you have the data product, probably it's not going to be the same storage as the operational data, but you still need to keep it up to date. You've been using something like eventing to keep the data up to date, or are you allowing some lag, maybe 1 hour, 2 hours a day, using some batch process tool to feed back?

Heble: It's essentially eventual consistency. You have to depend on that. We have something called an inventory control process. It checks the source, producers produce an inventory control file. It's a standard established pattern in the industry as well. They produce an inventory control file, which is then checked by the consumer periodically. When I say periodically, it's typically end of the day. Then there is an exception flow, if something is missing. It's expensive. It would have been much easier if the producer had produced their own data. We'll probably get there in a few years' time. That's the slightly roundabout way of ensuring consistency.

Participant 2: How do we organizationally prevent multiple data meshes from being created?

Heble: We have grappled with it. We thought using different terminology would help. We don't call it different meshes, we call it different nodes on the same mesh. I don't know if it helps. It's impossible to expect there is no duplication of data at all, if that's what you mean when you say multiple meshes. It means the same data maybe existing in different meshes because they didn't talk to each other. It's possible. Zhamak calls it federated computational governance, but in some ways, the governance has to be centralized to check this. Can you prevent it? Unlikely. I think the best way of doing it is to let these products be published in the mesh, see that in the marketplace, and then observe. Observability is another piece. It's important to see who is consuming the data and what it is being consumed for.

Then you will immediately find out the ones that everybody wants, and the datasets that nobody seems to want, for whatever reason. It might be because it's number 50 in the list and they have found something above that, the classic Google Search response. Or it could be because the data quality is poor, or they're not quite sure who's produced it. Then, you need to have a garbage collection. You need to have a process of deprecating and retiring those data products. That's probably one way of weeding out some of these duplicates. I think at the start we are saying, let's produce it and then figure out how to clear out duplicates.

 

See more presentations with transcripts

 

Recorded at:

Jan 28, 2025

BT