InfoQ Homepage Presentations Managing Privacy & Data Governance for Next Generation Architecture

Managing Privacy & Data Governance for Next Generation Architecture

Bookmarks

View Presentation

Speed:

Download

45:30

Summary

Ayana Miller explores a governance framework for road mapping, resourcing, and driving decision-making for next generation of architecture with privacy by design. Miller walks through the key players, requirements mapping, templates, and vendor engagement models for informed decision-making.

Bio

Ayana Miller is a Privacy & Data Protection Advisor at Pinterest. She has built and managed in-house privacy programs at Facebook, Snapchat, and Pinterest. Prior to working in Silicon Valley, she worked in Washington, D.C. at the United States Federal Trade Commission and MITRE, a federally funded research and development center.

About the conference

Software is changing the world. QCon empowers software development by facilitating the spread of knowledge and innovation in the developer community. A practitioner-driven conference, QCon is designed for technical team leads, architects, engineering directors, and project managers who influence innovation in their teams.

Transcript

Miller: I definitely would encourage you to ask questions. I am not an expert, I just happen to be someone that fell into privacy is like what I tell people. I like this area, I've done it for almost a decade now, but I also value and appreciate all the conversations that we have bringing these people together. This group won't ever be in this room again, this group of people, so it's important and I think will be helpful to get different perspectives from everyone in here.

Today, I'm going to be talking about privacy and data governance for NextGen architecture, just sharing some of the experiences I've had across my time in the industry. I also would encourage you all, if you all have experiences that are different, we should talk about that as well. I'd like to start in that spirit by asking just for a call out of different titles we have in the room, so people shout out what their title is. Manager, architect, CTO, security engineer, software engineer, technical lead. As you see, there's a lot of different experiences cross-functional, and I think that's important because that just goes to show you the type of people that are now thinking about privacy and data governance. It does cut across a lot of different industries, so a lot of my talk will be about that.

I'll share a little bit about the privacy landscape, I do assume that most of us have heard of the terms GDPR and CCPA so we won't go into too much detail there. We'll also talk a little bit about forming a data-governance strategy, how I thought about that and, based on my experiences, talk about vendor opportunities. I'm not going to talk about specific vendors, I will just talk about the fact that there are a lot of vendors and thinking about approaches and strategies you can think about to tackle them and think about which ones will be beneficial for your organization. Then, finally, I'm actually going to turn it over to you all for recommendations, just based on some of our discussions, thinking about ways to really think about what the vendors look like. I think a lot of it is based on your own experiences, and hopefully, this talk will help you be able to put that into a framework that is useful to send back to your organizations.

Plans Are of Little Importance, but Planning Is Essential

I have a little quote here from Winston Churchill that says, "Plans are of little importance, but planning is essential." I love this quote because I feel like it is totally my life: my personal life, my professional life. You have to make plans, everyone says, "It's time, probably around this time of year, to start thinking about what you're going to do for the next year." For your company, you probably did some activity or you are in the process of doing an activity now for next year's planning, thinking about things. Some companies call it OKRs, outcomes, all these objectives and key results. Where does privacy fit into that? Has anyone been successful in getting privacy as a top-level company objective, something related to it? Nice. A few hands went up. A lot of other people haven't had that success, a lot of the metrics right now for privacy look like, "We won't have any high-level subs. There will be no incidents," and it's like, "Good luck with the rest of that. Good luck getting funding for that." As you know probably, plans are of little importance but planning is still essential, it's still important that a privacy team, privacy programs think about ways that you plan and just be ready to be flexible with that plan.

I will take us on a little journey where I started my career. I put this guy up here because he was a major part of it. I was in grad school at Carnegie Mellon University, I was studying public policy and management with a technical focus, looking a lot at the quantitative measurements and how to bring quantitative values and analysis to the public-policy space. At that time, I ended up going to work for a company called MITRE Corporation that focuses on military defense and government contracting. I ended up in a space called Privacy Strategy and we were looking at HIPAA, and the U.S. Census Bureau and their privacy strategy. I looked around at the industry and I said, "How do I get to a place like Google or Facebook? This seems really cool." There weren't a lot of people talking about privacy at that time, there was a little organization called the International Association of Privacy Professionals that had maybe a few thousand members at the time.

I wasn't convinced that I really wanted to go down the path of security. I thought it was cool, I thought maybe being a white-hat hacker would be pretty cool, like an ethical hacker, but I also wasn't sure if that matched with my policy background. I had been in that a long time. I looked to this organization, the IAPP, they had certifications for privacy professionals. When I started reading through materials, I realized that there was really a lot going on. The industry was a mess. Privacy policies were everywhere segmented, so I wanted to figure out how I could get a part of that.

I ended up then going to the Federal Trade Commission, figured if I was going to go to any government agency, I would go to one that was focused on looking at the laws and regulations themselves. I ended up spending time at the Federal Trade Commission. Facebook called a little bit sooner, after actually 6 months, I went to go work for Facebook privacy program, and then Snapchat, and now Pinterest.

I brought those experiences in privacy engineering, I've seen a couple different programs, but again, I haven't been at all the companies that you all have been at, and so, you probably have different experiences too. Hopefully, we can just have a dialogue about where we see the industry going. Seen a lot of changes over the last 10 years that I think we're only at the beginning stages of. Hopefully, we will have a framework and a dialogue to be able to figure out what the next generation looks like.

The Privacy Landscape

I'd like to start by talking a little bit about the privacy landscape, as I mentioned before. The digital age has basically called into question this fundamental right to privacy. Some people might say, "Is there even a fundamental right to privacy?" We like democratization of technology, we enjoy all the benefits of personalized AI, but we're not always thinking about the costs and the trade-offs associated with all the data-collection sharing. It's only very recently that people, consumers specifically, have started thinking about their privacy, what they share with consumers.

I mentioned that guy earlier before, Edward Snowden, the reason I got involved in privacy. He was very interested in the government and what the government knew about you. I think that's a very important question, there's a lot of people who are still thinking about that, things like NSA, FISA courts, should the government have access to your information? With technology now and tech companies creating so many troves of data, the government has turned now to them asking them to collect information and realizing there's a lot of value there too. These worlds are starting to collide in interesting ways.

As we see advanced technology, machine learning, AI, all this stuff is compelling us to grapple with new ethical questions beyond the borders of even privacy. People are beginning to see the implications of, "Move fast and break things." We had this mantra for a while and we thought that was the way to move forward, you want a 10x so you want to start a startup, you want to get the billions of dollars, and so, you grow at any cost. That means systems are open, that means people have access to systems, that means you do first and apologize later. That also means that things like safety, online safety for our children, online safety for adults, bias addiction, algorithmic bias, all those things have come into play and we started to question, "Are these systems moving faster than we're able to put the policy in place?"

Also, regulators are starting to get savvier. There was a long time, I've been there, and I've been in the companies where anything went, anything could go. Then, later on, you would apologize, you would ask for forgiveness for things that might have been questionable ethically because the borders technically weren't in place. You write code, you submit it, someone tells you to ship a system, you worry about the consequences later. It was all about the dollar. I think we're finally starting to change that conversation. As the features that we see, they come out quicker and quicker. Our demand for them as consumers is out of this world, we want things now, we want things fast. We also expect the same from the solutions, so we have all these new problems, frustration with elections being tampered with, and we want to know how Facebook is going to fix it. We want to know when Uber is going to fix X-Y-Z, why doesn't our app work. We expect things to happen like that.

Consumers are also asking for more transparency and control, not only about privacy but also just it's becoming a standard for everything we do, all the interactions. I think more and more, over the next decade or so, we're going to see more pulling back of the curtain on technology. I think Apple is a good example of this recently with their new privacy announcement, a new privacy page, really explaining for each app how they collect data and what values they put in place.

I mentioned this before, but the U.S. approach to privacy is segmented, and it's challenging. It can be very difficult to enforce it for regulators and it's difficult internally for companies to understand how to comply, there's just a lot of regulations, a lot of confusion. Who's supposed to interpret it? Who's supposed to build it, especially if it's not your mantra, it's not in your area and you don't have any OKRs built to it? How do you do that?

I lead the privacy by design, I program at Pinterest through technical program management. As I mentioned, I've had many hats before. I've been a privacy engineer, a privacy program manager. After a few years, you stop caring more about the title and more about the work you're going to be doing. Essentially what I do, what the sweet spot is, is asking the question, "What are the features we want to build? What are the experiences we want to provide to users and what data does that require? What new sets of data? What processing is that going to require?" Then, on the other side of that is, "What additional risks are we taking on as a company to do that collection?" There's a lot of people that should be involved in that conversation and a lot of decisions at a higher level that have to happen in order for the developers to get that message once the decision is made on those questions. I'd say that's what boils down to the fundamental role that I play in the companies that I've been at.

In the absence of a U.S. privacy model, global regulators and individual states have set the current standards. What we've seen and heard is about the EU GDPR and CCPA – everybody go ahead and roll your eyes, I think we're all tired of it. What I've seen a lot of companies do now is just say, "Ok, we're GDPR-compliant. Good" Like that's it. No, that's not it. There's a lot of stuff that's coming down the pipeline, lots of talk about federal legislation, the California Consumer Privacy Act is coming out. Although you get a lot of goodness from GDPR, if you have certain things in place already, it's not the same, the legislation is different. For instance, with the EU GDPR, there's a requirement for data portability, you have to be able to allow users to transport their information. They have to be able to delete data on request within 30 days and you should also be able to [inaudible 00:11:26] data-access reports. Those things have different names across different companies, so that's really what it boils down to. There's a lot of other stuff in there too, in that directive.

CCPA has a lot of the same spirit, but the intentions there are to make sure that consumers have the ability to opt out of sharing their data. They also want to be able to give consumers the ability to control their data and have an outlet to contact companies. I'd say these are the main ones we hear about in the U.S. now, everyone's preparing for CCPA on January 1st, but there's not a lot of clarity even yet on what this will look like, what enforcement will look like, what does it mean to follow this law.

I find that many of my colleagues feel the same way I do, that this area is challenging. It can be confusing globally. Privacy engineering is still in its infancy stage, it's still a little baby. It wasn't until a few years ago that we even had terms like privacy engineering and [inaudible 00:12:25] tracks. It's very new and that's one of the reasons why I'm standing here today, not as a privacy engineer, even though I've had the title before, but as a developer, because I'm still making efforts to get those people into this community and show them that there's a lot of work to be done, there's a lot of opportunities. It's still evolving, the legislation has not completely cooled down, we're going to see a lot more, in my estimate, coming up.

Right now, I would say that this is the feeling among people who are in this space. They feel overwhelmed. The legislation is still new, there isn't a lot of precedent right now to be able to understand what an acceptable practice looks like, what enforcement will look like. You've probably been to a couple sessions today, you hear vendors talk about their ability to apply applications, or be able to do data flows, or mapping, or compliance tracking. All these tools. I have to think you're interested in that or have some reason to be here to think about that. It's not very clear how they're able to provide value when so many companies are different.

We know security isn't a new concept, but considering privacy at the same level of risk and operations, that is a new concept for a lot of companies. Tech advances have made us had to consider those implications pretty quickly. Now there's just this flood of all these new vendors, and it's overwhelming. How many people have gotten a LinkedIn message from a vendor about something privacy-related? Yes, I see lots of little hands going up. Yes, they're reaching out, they're asking questions, they're looking at people's titles, trying to figure out who's responsible for the budget around privacy and how they're going to solve it.

It's difficult to make the case. Do any of you have budgets devoted specifically to privacy? Nice, there's a few hands, but it's still very fragmented among a lot of companies and it's difficult to make the case. When you just haven't seen the impacts, it's hard to measure the ROI. Things haven't been optimized, so there's not a standard you can point to, like ISO or PCI for security for instance.

Also, privacy professionals feel a little bit conflicted. A lot of these legislative requirements have created unintended consequences. One of them, a big glaring one, was in the news recently and a lot of people in this community were talking about. It was GDPR, the ability to download your information. A lot of legacy systems, as you're probably well-aware, typically may be siloed. You don't contain all your information in one place, you have backups, you don't want necessarily everything in the same centralized database because that puts you at a greater risk. Even at a detailed level, all user data may not be tied to one particular device ID, object, or user, it's spread over a lot of different systems. You can do links, you can do joins to get that information together but you don't necessarily have that all available like that. That's what GDPR requires, it requires you to be able to pull that information for a user, tell them everything that you have on them as a consumer. What this does is it creates additional risks from a security perspective, it creates additional risks for hackers to abuse it. That's not the intent of GDPR, but in the rush to be compliant with this specific letter of the law, a lot of these things have created conflicts for privacy professionals who are trying to administer it.

A lot of privacy professionals also feel unsupported. A lot of teams, as I mentioned, at least in companies I've seen, have to really fight for a seat at the table. It may be under security, it may be under a data-engineering team, it may be under a technical program management team, but there's also often a question about where resources are going to come from. Not a lot of companies are looking at privacy as a fundamental operation, it's still very divided. If you're responsible for that type of responsibility and for that activity, I would really encourage you to think about it as something that's not just a check-the-box activity where you are going to implement GDPR as a one-time project. It's something that's ongoing, it requires ongoing maturity. The standards just aren't yet in place working on it. As we get there, I think it will require more dedicated resources as opposed to thinking about it as just a one-off activity where you set up a system and check the box and move on.

Tech companies are grappling with the future of privacy. As I mentioned before, there's an influx of new data vendors and startups on the scene. They're vying for customers and credibility. How many of you in here are vendors? Ok, a couple hands. Yes, the vendors are out there, they're looking to secure these dollars. Some of them do have good solutions. It has to be a two-way conversation. What I'm going to talk a little bit about, as we progress, is how I've started to approach this at Pinterest and, where there wasn't a privacy engineering team, how to get resources. I really do truly believe that companies want to do more for compliance, they want to be compliant, but they need to be able to keep costs in consideration. I saw this quote on LinkedIn recently. J. Trevor Hughes, who is the president, CEO of IAPP, International Association of Privacy Professionals, said, "Also not sustainable, 64% of companies are handling DSARs," which are data subject access requests, "manually, 30% are manual and ad-hoc, as received."

This is what I was mentioning before about this tension. You can create or buy a system that can potentially automate these data subject access requests, it's going to be very difficult to do that and to do it quickly and do it well. Because of that, a lot of companies are doing it manually. That requires more support on the operations side, but it means that there's no developers who are being taken away from bigger projects that might mean more dollars. There's definitely trade-offs, and I think it's just a question of where do you want to put resources, what's important. At the end of the day, if a user gets the wrong report because it's done manually, what are the impacts and risks of that?

Tech companies want to comply but they need clear, consistent, and actionable guidance. Implications for not complying can be pretty vague. We've seen a few high-profile cases of companies that many people would say they had it coming: Facebook, Google. They've had significant fines but it's still hard to directly map their experiences down to other companies, or to your company specifically. If you don't have the same type of data, if you don't think that the systems are the same, and then, you wonder, "Is it worth this activity anyway?" again, how do you map it to your key results and objectives?

Forming a Data Governance Strategy

I'm going to talk a little bit more about forming this data-governance strategy. I believe that data governance requires technical responsibility. Ethical standards require executive buy-in. What does this mean? Data governance is technical, it's about metadata, it's about tagging. That requires technical responsibility, you're able to build down tech debt and build up data governance with technical tools, with technical resources. Ethical standards have to be set at an executive level. There's no way that your company will be able to buy any developer resources, be able to tell you exactly what should or shouldn't be done with data or where it goes.

As I mentioned, data governance is not ethical standards. It's about metadata tagging, it's about data discoverability, it's about access and explorations, logging and monitoring, things like authentication, authorization, auditing, logging. Ethical standards are situational, it's going to vary day-to-day. You really have to have a clear level from your Executive Board, your c-suite. It takes the staff every moment flagging things for you, asking questions, pinging you. There's daily trade-offs to be made when it comes to ethical standards.

Data governance is actually measurable. You're able to measure whether someone, who wasn't authorized to a system, got access to a certain data type. You're able to see whether someone you were supposed to sell information to, it got sold to them. It's about maintaining security. A term that gets thrown around a lot is reasonable security, and CCP actually calls for reasonable security. It's a little nuanced nebulous term, but some would say, "There's standards for security," so there is something you can measure there. Something like requiring encryption can be measurable. Is it in place? Is it not? This might be at rest, it also might be in transit. Then, not changing user settings is just very baseline things that you're able to measure.

Ethical standards are cultural. It's about asking questions like, what are the forum's that exist to raise questions about which should be or shouldn't be pursued? Are these questions happening for your company, for your product you're launching? Has anyone really step back and say, "Is this the direction we want to go?" Are there principles in place that lead where you want to go as a company, so that you don't end up in a place where you're pushing out things or products or experiences that don't align with your company's core values? If you have principles, are they ratified? How often are they referenced? This goes across the company. It's not enough if the CEO ratifies principles that sit in a document based somewhere on Google Drive or wherever else, they actually have to be something that everyone across the company is espousing and believes in and applying it to their development life cycle. Maybe it's something that gets posted on their desk, maybe it's a refrigerator magnet.

What are the consequences for not abiding by principles? There have been many times, across many companies, experiences I've seen where something happened and there weren't any consequences. Maybe a developer gets moved to another org, they consistently break the rules, but there's no enforcement. If there isn't a consequence for people who are not abiding by the ethical standards once they're set, then I think that's important to figure out what the mechanism will be to address that behavior.

Then, what is the path to escalate? If there are perceived unethical decisions that are happening, do people have a venue? You can think, similar to an employee like hotline where you have employee questions, there should be similar outlet for data-governance questions, things that are happening, things that people see, being able to have a voice and ability to call that out.

As I mentioned, I do want you all to get a chance to talk to each other and share a couple ideas. I used this example of a ride-sharing app, I think you all probably will be familiar with it. They had a hack and they ended up paying the hackers to delete stolen data on 57 million of their users, they stole the information and it included credit-card information. They ended up ousting the chief-security officer and one of his deputies, a year later, for their roles in keeping the hacker under wraps. It only came out in a year after it happened. Yes, there's a lot of ethical questions here, but it's also a question of, "A massive breach of this scale, how would you handle it?"

I'm going to give you a few minutes to think about the ethical principles. You were put in place as an executive for a ride-sharing app. Are there data-governance tooling things that you would suggest too in this case? Or, is that separate, something you handle for another time? Then, also, how would you enforce those policies and principles? I want to give you a few minutes, you can talk to the people around you and just walk through this example.

All right. Does anybody want to share a little bit about their discussion? What are your thoughts on the first question? What ethical principles would you put in place as an executive?

Participant 1: From an ethical standpoint, something along the lines of being sure that we place the customers' interests ahead of the company's interests. In a case like that, customers got a right to know so they can do something about it rather than protect the company itself from imagined financial damage.

Participant 2: We're talking about implementing some best practices like scanning the code for security vulnerabilities during development, or even QA. Also, make sure that the production data is not moved outside production and only those we need to know actually have access, and not move it to development or QA to have a better set of data to test with.

Miller: How do you make decisions about who should have access to production information? Who does that decision lie with? Does anybody have ideas about how you would enforce the policies and principles that you talked about?

Participant 3: The chief ethics officer, which is CEO.

Miller: The chief ethics officer, CEO. Are you in companies where you have a chief ethics officer? One company? Yes. I think that there's a lot of opportunity here and there's a lot of questions about, "Who does have ownership for that?" even though it's something that we see all the time and we have all this data collection. It's nuanced, there isn't someone that is responsible for that role. As a result, that means it falls to each and every one of us.

Data-handling decision making lies on a spectrum, whether we like to believe it or not. I think some people would say, "No, like it's clear, you either collect it or you don't." The truth is it actually is very nuanced. Determining what data to collect is a question about the value versus the risk, and that varies from feature to future and how well you're able to make a case that the data collection is going to be worth it for the users and for the company, as you mentioned, for the business. What is the benefit to users? Are you sharing that with them? Are you keeping it behind the scenes because there isn't actually a benefit to users? Is it different from what actually you intend to do? I think all of those are ethic questions that tech companies are going to have to balance more and more, as we continue to see more transparency in this space. Also, is a benefit readily accessible and clearly-articulated? We talked a little bit about that.

I like that quote, "Statistics are like bikinis, what they reveal is suggestive but what they conceal is vital." I think that's the question about data collection too. The information you collect, even if you say it's going to be aggregate, if it's a new data collection of information, depending on the sensitivity, then it can be suggestive, should your company actually be in the business of collecting that data.

That brings me to the discussion of personal-data collection. It's much too broad and technical and personal for each company to try to explain it and answer it for everyone. It depends on the company, I think that's what you'll experience. I've been in many multiple social-media companies and I have seen them classify data differently. It's also a trade-off that goes into talking to your lawyers and figuring out how defensible it is and who's going to interpret when a regulator comes asking about whether the data is user-identifiable, if it's personal data, if it's PII. All those terms mean the same thing, I think we're starting to come to agreement that there's more understanding and responsibility for all the user data you collect and all of it has the potential to be identifiable.

There's still definitely some areas where people would say, "Yes, this is highly-sensitive and personally-identifiable," things that can be traced back immediately to an individual, location data, it can include company-created information as well. I've heard it in the past, companies say, "No, that's our data. We generated this information about a user so it belongs to us." I think more and more, once you connect that to a user ID and the user only exists because they have created an account, that there's an argument to be made that information is personally-identifiable. Then, it also includes the things that we're more used to hearing about, things like name, first name, last name, email, phone numbers, things of that nature.

Then, there's things that are potentially personally-identifiable. I think I put these up examples up here just to explain that. It varies, different companies have different views on this information, and I think there are also things about employees' information, "Is that in scope? Is it not in scope?" Things like messages, inferred information about a user. Again, if the company's generating it, is that their data? Is it potentially personally-identifiable? There's lots of open questions around this and a lot of companies are waiting to see other companies get in trouble to decide how to determine how to handle it. I don't know if that's the best approach but it's what we're seeing across the industry.

Then, of course there's things that are clearly not identifiable, not personal to a person, things like aggregated data and company information. That information is still valuable and it still has to be protected. I think more and more trying to categorize this information across the board and think about things like disaster recovery, versioning, backups. If someone deletes the data in production, do you want that data to be backed-up, do you want there to be copies? I think all of those questions people are starting to think about and categorize and finally get to a place where we start having a better sense of data governance.

The implications of mishandling range from brand-reputation to fines. If you go to your CEO and say, "There's a chance our brand might suffer," what exactly does that mean? It can be difficult to quantify, but when it happens and it's bad, you know it. There's also 5-billion-dollar fines we've seen for companies recently. "The Huge Fines Can't Hide America's Lack of a Data-Privacy Law." I like that headline because it really gets to it. It's still at the highest tiers the 5-billion-dollar fine that Facebook faced, it was only a drop in the bucket for them, of their revenue. There is a question about whether you'll be sustainable as you grow. As your company grows, do you want to take the risk of brand reputation to facing a fine? Especially if you're still trying to grow and as the industry grows, we want to develop more of a status quo in understanding that the privacy matters.

Companies are learning based on their experiences on others in the industry, and then, new case law is starting to eliminate the interpretation and set the standards for what we will see as the industry steps up.

Then, there's tiers of risk. From a technical-company perspective, your core business and user-data loss, you don't want to suffer that, you want to make sure this version is able, as I mentioned, if you want to put access restrictions on that data. Then, there's generic data loss. I think any company would be hard-pressed to say, "We want to lose any data," but when you think about that, even quantifying it down to, "Ok, if we lose this data, there's no impact as important, as you think about data flowing through your system." It really takes attention to that detail to get to full-data governance.

Excluding industry standards for security which are relatively developed. There hasn't been a lot of focus spent on privacy-aware systems. Earlier today, Jeaniene talked about some of the opportunities in that space in great detail, provided a lot of information there. Data deletion by default is one area for opportunity for companies, is their ability, like Snapchat, for example, to just decide by default, "That is how your product will exist and function in the market." Data identification, which is not perfect, we talked about that earlier as well. There's also the ability to centralize reports about users from across systems, and you want to categorize data and map it. There's also cross-device and cookie-tracking management, all these things are areas that are not going away from privacy programs anytime soon. Even now, as we see the cookie storage and cookie banners, apps are coming up with ways to store things in local storage, that will be the next battleground. If you think you're getting away from cookies, no. Just because there's a new technical way, eventually, the regulators will catch up, they will catch on and that will be the next battleground. If you find people looking for ways to skirt around technical issues, I think we're starting to see an end of that too. I would say be aware of that.

Each new business' use case is going to be unique and contextual. No two uses of data are the same for users in your systems. Companies are faced with data processing and collection decisions daily. It's a dance between compliance and costs. If you're deleting information from your databases, that has a cost. Trying to delete a row can be very costly, but if a user requests it, if it's part of compliance, how do you make sure and how do you build a system that puts those costs into consideration but also the compliance?

What I would recommend and what I've found to be helpful is making a call for business use cases at least one time a year, from a privacy perspective, figuring out what new uses of data the company is going to ask for, "What new systems do we need to put in place in order to ingest those new data sets? What things do we want to be able to collect that we can't today because our systems aren't able to process it, we know we're not compliant?" That way, when a product manager or a developer comes and says, "Ee want to adjust the system, we want to do X-Y-Z," you have a roadmap for explaining how you're going to get there.

In security, there's the security triad, less well-known as a privacy triad, which talks about predictability, manageability, and disassociability. It's about figuring out where data is and if it's going to be there, being able to manage the data, who has access to it and then, finally, being able to disassociate the data. This is all part of the governance strategy you should have when you're thinking about the tiers, the type of data you're collecting, and what your expectations are along the spectrum. You want to make sure you know all the system owners and users. That way, if something goes down, if you need to collect information, or something's missing from a subject-access request, you know who to contact. I see so many instances, API's that are running, systems that are running, jobs that are running, that no one is owning. It's about the ability to maintain accountability for everything that's happening on the system and, if you identify things that don't have ownership, figuring out the path to deprecate it.

Also, when possible, being able to dissociate that information. As I mentioned before, GDPR and CCPA have this as a policy, and I think we'll continue to see that. There's basically ways that, if you can disassociate user information, then there's less risks associated with things like, for instance, a breach. Being able to say that you have a de-identification mechanism is going to be helpful in the future, I think it will help solve a lot of problems. If you're able to encourage your developers to think about that when they build from the beginning, then that will save you a lot of heartache down the road and will keep you from having to delete rows in a table when you get a request for data deletion.

Data governance and ethical standards have to be proportional to the scale and speed of the company. As I mentioned before, things are moving really quickly, so you want to be able to move with the company, and so, you have to stay nimble. Before you start looking at a new collection of data, you have to have a privacy-impact assessment, this is something that a lot of the regulations call for. It has to be very clear that it's for one reason and not for another we have to have that document as well. Then, you also want to make sure that you have high-risk collections, you have the appropriate systems in place, and investments, and engineering resources to maintain them. As you're bringing in new data, for example, if you're bringing in healthcare information, your legacy systems may not be able to support the same responsibilities and compliance needs now that you're collecting this new data.

It's estimated that by 2020 the backup and archiving of personal data will represent the largest area privacy risk for 70% of organizations, up from 10% in 2018. That's what Gartner is estimating. How many of you all have responsibility for archiving personal data? In some way, you touch it. Yes.

Here's one more case study related to this. I want you to spend a little bit of time thinking about de-identification. California Consumer Privacy Act says has a legal definition but, as we know, the application can vary. In your practices and in your areas, what does it mean to actually de-identify it? Is removing a column of data, if you just take out a user ID, if you just take out a user name, is that enough? Or do you need to go a step further? Do you need to be able to add fuzziness to the data, if you're looking at something like location data? If you're able to give an estimate at a zip-code level as opposed to at a fine-grained GPS level, lat/long, then can you do that?

I've seen recently a report for vendors and privacy space, there's a couple companies now that are saying they are able to de-identify your entire database system. So, I want to challenge you all. I want you to think about the roles that you might play as a vendor, someone who is creating a data de-identification project. You are launching a product you want to sell to a customer. As a vendor, what are the selling points that you make for a de-identification product? How do you explain the technology to someone who is in your role, who needs to purchase it? Then, as a client, what narratives are compelling to you? I'll give you just 2 minutes to talk about that with a partner or a group.

All right. Does anybody want to show me a good pitch for a practice approach for being a de-identification vendor? Anybody think they have a compelling argument for de-identification as a vendor?

Participant 4: Of course it would be AI-driven. I mean what else did you expect? It would integrate with your cloud provider. It would provide some kind of reporting that you could use to prove compliance. It would solve both the obfuscation, the identification, or how you want to call it, and would be a proof of compliance to regulations.

Miller: I'm sold. Let's go.

Participant 4: It will be expensive of course.

Miller: Yes. That's exactly how things in the industry are going. There's AI, there's using machine learning. We know data decisions, governance decisions are driven by stakeholders. Again, ethical standards are set by leadership. When you're thinking about your next generation of architecture, and as we close out here, you should look at things based on the expectations of your business. You want to involve legal, you want to involve business, you want to involve security, you want to involve IT. You want to think about the stakeholder perspectives that are going to help move the needle and will help you be able to advocate at the highest levels of the company.

Leadership plays an important role in setting the ethical direction, and companies are starting to get that through internal questions and communications. I had some of my biggest successes when I asked questions at Q&A, asked straight to our leadership. Taking advantage of those opportunities to have that dialogue with them and put it on their radar has been extremely beneficial. We've also seen more CEOs taking public statements. I think, as we see that more and more, that also signals to the company that it's important and it matters. Then, investing in people and technology to operationalize those commitments is going to really be what changes the narrative.

Vendor Opportunities & Watch-Outs

Vendor opportunities. We've talked about this a little bit, but there's a number of privacy tech vendors, but that doesn't mean you need to spend any money. You may decide that you want to buy, that you want to build over buying, and that's completely fine, I think a lot of people are going that route. I still think it's beneficial to have conversations with vendors to think about what could the possibility look like. Organize the vendor evaluations by aligning the timing of them. When you start getting those floods and emails, on LinkedIn or wherever they come in, asking them, telling them, "Look, this is the time we're doing vendor evaluations for a new data-warehousing solution." This is when we're going to be looking at SQL, this is when we're going to be looking at [inaudible 00:43:35], whatever. Really thinking about all the vendors in that space who are moving the market, who are changing things and represent the future of a company based on those business use cases we talked about earlier.

Then, you have them define the proof of concept and set success criteria. You work with them to figure out what's going to help your business and your bottom line but also help you make sure that you're compliant. Then, even if you decide the decision is no-go, a lot of us don't have the funding, but at least you'll have the discussion about "There's a lot of value. Let's keep this conversation open," you never know when you might be able to use that vendor in the future. You can think about those considerations.

This is a template that I actually used recently for making a call to our different teams, asking them about use cases, asking them about what their requirements are, thinking about the values that can be provided, and asking them for reference. It's really about working with your team, but then, also working with the vendors.

Then, when you look at build considerations, if you're going to build internally, you also want to be real about your situation. If you're at a company where things get deprecated a lot or people build systems, and then, they abandon them, that's probably not the best environment to build your compliance structure. If you're going to build, make sure that you have a process in place to be able to manage it.

Recommendations

I'm going to leave you with these questions, as we close out. Thinking about how we might justify buy decisions for privacy and data governance in alignment with the scale and speed of the company. Thinking about your data tiers and flows as systems architectures and guideposts, rather than thinking about them as ethical principles which are set by the leadership. Then, finally, how do you give vendors relevant insights into your system architecture challenges so that we can make sure that the next generation of architecture is built with privacy in mind but also with our architecture needs as companies in mind?

See more presentations with transcripts

Recorded at:

Feb 11, 2020

Ayana Miller

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Managing Privacy & Data Governance for Next Generation Architecture

Summary

Bio

About the conference

Transcript

Plans Are of Little Importance, but Planning Is Essential

The Privacy Landscape

Forming a Data Governance Strategy

Vendor Opportunities & Watch-Outs

Recommendations

Related Sponsored Content

This content is in the Culture & Methods topic

Related Topics:

Related Editorial

Popular across InfoQ