BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage Presentations This is What a Large-Scale Cloud Adoption Program Looks Like

This is What a Large-Scale Cloud Adoption Program Looks Like

Bookmarks
38:03

Summary

Dio Rettori discusses some of the lessons learned, challenges, and considerations of large-scale adoption for JPMorgan Chase.

Bio

Dio Rettori is the head of Cloud Architecture for JPMorgan Chase. He's the co-founder of the Boston and Sao Paulo Kubernetes meetups and has been a CNCF Ambassador since day zero. He has previously held key marketing and product management roles at companies such as Red Hat, Pivotal, and Solo.io.

About the conference

QCon Plus is a virtual conference for senior software engineers and architects that covers the trends, best practices, and solutions leveraged by the world's most innovative software organizations.

Transcript

Rettori: My name is Dio Rettori. I work for JPMorgan Chase. I work within the asset and wealth management business unit. My responsibility is to help and assist teams take applications to both public and private cloud every day. My challenge that I have every day is a challenge of scale. It's not the challenge of moving one application to the cloud. It's not the challenge of helping a single team. It's the challenge of doing this with hundreds of applications, thousands of developers that are distributed across the globe. Very lucky to work with a very capable team, also, technically savvy team, and organizationally, well defined team that we do this every day. Again, my challenge is scale.

What Scale Means

Let's touch a little bit on scale. JPMorgan has 4 main business units, asset and wealth management is one of them. Scale is a problem because we deal with assets, and essentially money from a lot of people, and companies. We have around $2.8 trillion in assets under management. That is some companies' money or some humans' money, if we're talking about private bank. That we have to be very respectful, continue to make their investments worthy so that they continue to be incentivized to work with us.

Incentives

Because of that, the nature is that we are globally distributed. We run our businesses in various countries across the globe. We haven't just started now this journey to cloud. As a firm, and within AWM, we have already 35,000 containers running applications across dev, test, and prod, both Cloud Foundry and Kubernetes. Teams know what a cloud native environment looks like, and they understand the benefits of it. There's still, of course, a lot of software that does not run on those environments. That's not necessarily a bad thing. The fact that something is not running on a cloud native environment doesn't mean that it's bad software. Doesn't mean that it was poorly written, or poorly maintained. No, there's actually a lot of very good software that runs important parts of the businesses that you can't necessarily batch them as cloud native, but they're well written, well maintained, well thought through, well architected software. It's a healthy thing to not label things that are not cloud native, necessarily, to not label them as bad. You need to be very wise and understand what's going on.

By the nature of what we do as well, and being a large firm, we have software from everywhere. I joke that we have at least one of everything. That means that through time, and through our existence, we have acquired many vendor software. We have built ours. We have used different types of databases, different types of application servers. We also use a lot of SaaS services. We integrate with a lot of companies. It's fair to say that the ecosystem is complex, not purely from the number of applications, but the geographically distributed nature makes it complex. The governance aspects that apply to each region makes it complex. Of course the scale in terms of number of instances that we have, makes it complex. It's a scale from various angles.

If I can put my hat on, as Dio, head of the cloud program for AWM, what does it mean to me? There's a few things that I always have to consider. One is that, in general terms, an IT organization exists, or the main goal is to be very respectful with the data that you have at hand. Literally, yes, security first. The folks that you have should have that mindset, a mindset of security first. Of course, the business end of that is that an IT organization exists to enable business to do better through technology. That's why we as information technology exists. If you take that angle of the basic things, a business, or our internal client does not come to us necessarily asking us to build a resilient application or to build a resilient platform. It does not necessarily ask us to migrate from X to Y. It does not actually ask us to secure data. Business and internal clients, they're asking app teams to develop function or capabilities in their applications that will serve their customers better. Those things here, is it resilient? Are you able to migrate this application to a new environment? Can it scale? Is it secure? It's given. That's an important point is that the priorities that internal clients and businesses are putting on application teams do not necessarily immediately equate to, let's migrate to cloud.

1. Application Teams Should Want To Move To the Public Cloud

With that, I wanted to bring you my first point of this, is that, I have learned that it's better if application teams want to move, than if they feel that they have to. I think, ultimately, some corporate program will kick in, that will treat applications that are not on modern platforms or classified as such, in a way that they are "forced." I can't wait for that moment to happen. I need that to happen now. My stance is I need to create incentives, so that folks feel that they should want to move to cloud environments. In terms of incentives, we need to touch on the fact that teams are already at capacity. I don't think I can find one team or many teams that are just sitting there with 20% of their time available, and you can just randomly bring more work to them. They have to want because they need to then negotiate that time with their business sponsors, and make it understood. The fact that it is a strategic decision for a company facilitates that conversation, but that negotiation of that time still has to be there.

2. Acknowledge the Tech Adoption Lifecycle

For us, we have learned a few things. We've been running the cloud adoption program for some time now. It's fair to say that it started the public cloud program with more strong winds in 2019. We noticed a few things on the first teams that had to go there. The first teams that had to go there, they had to have some new capability implemented for them, or they actually had to help test a new capability. Maybe a service they wanted to use wasn't there, or the instructions were not completely there. The instructions were not necessarily there. They also had to have built the instruction. If you look at like, what does it mean? Those people are what we call, traditionally in the market, innovators and early adopters. Because it's someone that's willing to use technology, because that person likes technology. That's willing to take the risk of newer technology due to the potential gains and scalable gains that new technology could have if you adopt it before others, but that person has to be willing to invest the time to build the product. It's good that we have identified this, because the advantage of identifying your group is that it leads you into like, what are the other groups there?

For us, interestingly enough, our share of the folks that were taking applications to production in public cloud, the first ones, in the first phase of our roadmap, it was like almost very much matching the technology lifecycle adoption curve. Twelve percent to 18% of the app teams of the humans, were the ones taking 80% of the workloads to production. That's just a great point, because it brings the automatic thought, what do I do next? Because it's great to have them, but I need the other folks to go as well. The point is, what is acceptable for early adopters? It's not acceptable for the mainstream market.

3. Crossing the Barrier Is About Dealing with Uncertainty

The mainstream market is very much risk-prone. We need to address to the mainstream market, you need to reduce the perception that the product is not ready. They also need much more peer validation. We've learned that in order to cross the barrier from early adopters and innovators that like technology, that are willing to take the time to the mainstream market. This is a mainstream market in a firm like JPMorgan, where we're respectful with a lot of money from our customers. The risk factor is very much there. Is it worth the change? Is it worth the risk? Those are everyday conversations. What we bring to the program is like, I want to reduce your level of uncertainty, so that you're willing to follow on this journey for us.

This is my first time working in an enterprise company. Before this, I worked as product marketing management for various companies, so my background is product marketing and product management. When I saw this I didn't think I necessarily had to reinvent the wheel, because when I look at the scale of our organization, it feels like we're very much our own industry. I thought I could employ the same techniques that the software industry or technology industry applies to crossing that barrier. I could apply insight, because my scale is large enough to justify that. In my mind it's like, what lessons did I learn by working in product marketing management industry that I can bring to my work today? That is how we designed and built the cloud adoption program.

Scale Yourself and Your Team

Let's get into very actionable things. One is, I understood, and most software companies understand that they can't scale their business if all the adoption relies on purely themselves. In order for us to scale our adoption program, we implemented a few things. We implemented a group of folks that we called cloud champions. I've seen the champion model before. In a previous stint at another company, the vendor would pay a champion to be sitting at another vendor just to advocate for that brand. This is what we have. Starting with the folks of the early adopters, innovators, that are not necessarily as even distributed, but we also found folks that were interested and wanted to join. We're able to form a group of people across our major application groups. Again, it's roughly 600 applications, there's various application groups. We found that group of people, and said, let's build them, so that they can scale us. It's good. What do you equate this to? You could equate them to like evangelists. You could equate folks that are like startup meetups, and they do not work for the software company that created that software for the meetup. That's what we want with them.

There's a few caveats. One, they are vocal and loud, and you want that. They will be loud about the good and the bad. In the end, their natural result is that it is positive, because they will duly represent their needs and asks, and their app group. They'll also be your partners in this. Focusing on building this group of champions is something that will benefit you a lot. I think we're getting there in how to scale our organization, because I cannot scale my org, at the same rate that I need adoption to grow. We did that.

Addressing Uncertainty as a Practice

The second thing is now we need to deal with uncertainty as a practice. Knowing that the mainstream market is much less risk-prone, and again, the scale of the firm justifies treating almost like this is a traditional technology adoption lifecycle. What do you do to reduce the risk? We looked at the things that are not necessarily related to the product itself, or to moving to public cloud, but that are around the ecosystems of moving to the public cloud. Would someone know immediately how to operate that application, if moved to a public cloud environment? It's like, it seems the instructions are not there. Someone that is more responsible, is like, I cannot operate. I'm not going to go. I could migrate the application. I have the technological capability to do it. I even have the time, but what do I do once I migrate? We created this specific initiative to tackle data. What do you do once you're there? How do you operate? How do you perform backup? How do you perform restore? How do you perform an emergency management procedure? How do you have access to production pieces under an incident? Those are the things that are resolved on-prem, but you need to build, invent them almost, for a public cloud.

Data is a key thing for us. I think we were able to identify early on that the movement of data, either a one-off movement to migrate a database, or the constant movement through a pipeline. That's also key. There's a lot of governance aspects around data as well. We focused like, great, so we need to first be comfortable about the movement of the data, and about the pipelines to constantly move data there. We built a practice on this. By building a practice, it means that we actually have an initiative that is seeing this problem, treating it like a product. What are the customers asking, or what is it that they need? What capabilities could we implement in terms of either technology, or documentation, or training, or just one-off hands on assistance? SDLC is obvious. It needs to be there. SDLC or CI/CD will change from what you have on-prem to what you have on cloud environments.

Maybe another important point is how much we're investing in learning and enablement. We're finding that this has been very welcoming, because when you enable folks who work in public cloud, they are building their careers.

They no longer have that feeling that the knowledge they're acquiring only applies to that specific thing at the firm, which could be sometimes the case, especially if your company built your own CI/CD platform, dev tools, and runtimes. That has happened. All the knowledge that is learned, taking applications to public cloud, that it's theirs. They feel like, the company is investing in this. It's worth for me. I could stay in the company. I could move to another group, I could leave the company. That knowledge stays. We're heavily incentivizing and paying for folks to take AWS certifications, and all the other cloud provider certifications, as well. As like, we want you to be better. In terms of scale, I think understanding how you integrate with current systems, it's important. No application is a silo today. It will always have to touch on something that already exists.

Acceleration Events

Probably the one that helped move the needle the most in terms of the kickoff, is what we call acceleration events. Internally, we also have another name called cloud programs. This is about a highly focused engagement, where you are able to bring to the room, every single human in the firm that could make a decision about moving an application to a production environment. I think every single individual is the key thing here. This is not necessarily a cheap exercise, but it has worked for us a lot. Again, going to public cloud has to be strategic for a company, because otherwise it's hard for you to justify things like this. It's hard to justify flying a lot of people or dedicating time from a lot of people that already have a lot of work to do, to a single place to invest a couple weeks of their time there. For us, it has been amazing. We were able to move applications that were sitting on legacy technology to public cloud, in a matter of days, because every single decision making power was available.

The other mindset that happens on these acceleration events is a strong decision making mindset. Because there are so many decisions to be made, that you will need to be fast at making those decisions. In order to have less risk, or reduce your amount of risk with being fast at making those decisions, you need senior people there that have already built that knowledge, that know the complications and implications of making a fast decision. Still, you want to make those fast decisions. Through the course of the acceleration event, they're mostly like a week or two, you'll make hundreds of decisions. Then you will have something working on the cloud, at the end. Have this to yourself. You need to think about scale. You need to address, certainly as a practice, acceleration events.

4. Track the Dollars

Maybe the last one, you need to track the dollars. We're a bank after all, so we should not forget this one. It's not certainly a cost play for all of us. If this is a cost play for you, hopefully it's a speed play, not a cost play, that there will be a good amount of time where you're going to have your IT costs increased. Because you will be moving workloads to a new environment, while you're not necessarily decommissioning on-prem or shutting down things elsewhere at the same rate. For this, we call it the bubble. In our case, it's fair to say that bubble is going to stay with us for two years from the moment it started until the end. That's our projection. Then you should see that not being a problem anymore. That the amount of Decomm will compensate for that. It happens because you're teaching a team, you're taking an application to a production workload, and that might run parallel for a while. You might need some more time to be comfortable. You're incurring the new costs, and you're incurring the old cost. That's something that you need to be very mindful of. Watch for this. Budgeting for this moment of transition, where you would probably have both costs, it's a very wise thing to do. I recommend you do that.

Conclusion

I'm going to end with a positive note, hopefully, that in order to move fast, application teams should want to move so that they can fight for the time of their business. You need to acknowledge the technology adoption lifecycle, as in that there are different people that need to consume different types of content, and they are more or less risk-prone. That should direct the decisions that you need to make when engaging them. Crossing the barrier is about dealing with uncertainty by design. I manage a team that deals with uncertainty by design, and that runs the program for our LOB. It is a refreshing thing to do, but it's a heavy thought. It's context switching all the time. We are tasked with dealing with uncertainty. We are the one that should bring peacefulness to a journey that still has a lot of paths to be paved. Not all the paths are paved. If this is a cost play for you, be mindful that for a while, you might have a little bit more cost. Again, hopefully, this is a speed play. It should be more of a speed play than a cost play for companies.

Dealing with Technical Aspect of Scale

Dealing with the scale of a program like this, the technical aspect, it's important, but it's a small part. When you're actually doing is like you're driving organizational change. The technical aspect is there, because that's the design of what we do. We exist to facilitate application teams to get from one place to another. Our challenges are not necessarily technical challenges. We have a highly capable technical force that can power through most of the technical challenges. It becomes more of like, how do you organize for the scale other than your capability of solving a specific technical challenge?

Questions and Answers

Porcelli: I have the first question here on the technical aspect. It may be the not important, or critical path for your work. How do you deal with interdependencies between applications, because you are looking to the program level, so many applications? Likely, there are some level of dependencies that you see. How do you avoid the track that I depend on A and A depends on B, and then we get stuck?

Rettori: There's no way to avoid the trap. It's going to be there. Most large organizations have evolved. They likely use databases as an integration pattern. They likely have some other integration hub as an integration pattern as well. That's why you have to acknowledge all of those. There's not an easy answer for data. Data is about engaging with the owners of the data. Establishing what do you call authoritative source of data. Establish, what are the consumers of data? Then establish, what is the plan to migrate the centers of data? I think it was Gartner that coined that the conversation is moving to, what's grown in your data center, to like, what are your centers of data? Which are these very highly gravitational forces around data where applications have to reside around. There is no magic. It's just a lot of hard work, and discipline, and roadmapping, to move all those applications, the impact to those. There will always be some applications that can survive with data that is a day behind. There's a strategy that implements for those, which is different from a strategy for the applications that need live data, which is different for the strategy that implements for analytical data.

Porcelli: Are the cloud champions dedicated to this role, and what are the incentives that you're bringing to get them more engaged?

Rettori: First is they're not dedicated. I think that's something that we take advantage, the fact they're not dedicated. They are often application owners, or lead designs for application groups. That means that there has to be an investment from their supporting business group or supporting technology groups so that they can dedicate part of that to this effort. At the leadership level, it is understood that their own migration or scaling can't happen if it depends entirely on my team or on us. They also understand that they are using this, and investing in this group of people. It's not a lot. We have thousands of developers, between 2,000 and 10,000. The group of champions is like 60, 70 people that have a good amount of visibility in that group. They technically understand their own part of the technology, so that they can be built to train their groups. The initiatives that we run within the central capacity at AWM, they can train and build their own, which will scale the overall adoption for that capacity. I think the advantage of not having them fully dedicated, is that they will continue to have a very good understanding of what is it that their technology group area is doing, the apps that are a priority for them. You keep connecting those apps, and then each of those apps for new capabilities, for new techniques, to the larger program, so that we can again, together with the other champions, advocate or build or create the ROI, or create the incentive at a much larger scale.

Porcelli: One phrase that you mentioned that I think is very eye opening, the move to the cloud shouldn't be a cost play, but more a speed play? I think that's a very important statement. You mentioned, track the dollars. How long does it take for this bubble to burst? The cost is to a more expected value, that was the initial idea.

Rettori: I think that it might vary per company. There are some systems where this is truer, or more representative than others. There are some systems that the cost benefit is not there due to the potential risk. Yet, those teams need to be incentivized to move, so there has to be some other advantage. Another advantage could be, you'll be able to increase your ability to ship software, 20%. That's a lot. That means that I can serve my customers better. In terms of the bubble, that was a term that my financial team created or made up. You have to be very wise to acknowledge that for a large organization, you can't turn off at the same rate you turn on. You cannot continue to pay at the same way you are embracing new costs. The point is, it needs to be strategic for the firm. When it is strategic for the firm, that's seen as investment dollars, and not just as like expense. For the firm, it is strategic.

Lori Beer, the firm CIO, went to a major technology vendor conference and talked about our cloud adoption program at scale. At that scale, the firm has I think more than $50 trillion of assets under custody. That's money from people all round. Tracking this bubble, we believe that in our case, it's probably going to be like a two-year journey, where we'll have generally higher investments needed until we can't see more of the scale, shut down, turn off, decommission, of the things that we are moving. In this cycle, some teams are more prone to making that decision, shut down, turn off, earlier than others. You have to be cognizant of those, and support them on that journey.

A question that I saw on cost, is that, from moment zero, we brought a financial practice within my group. There is a person in my group that looks at cost, and technology costs as a function. Most of what that person does is like, where are we wasting money? When going to public cloud it literally means throwing money away. Because for on-prem workload, you're employing people, you might have bought real estate. There's a lot of other things getting to that math. You're sustaining families. You have real estate that you can sell or you can rent. It becomes an asset that you manage, from that perspective.

For public cloud, it's waste. I think waste is the best word. For the idealist there that love waste, this is like an amazing thing that you should track, is how to reduce the waste there. There's, of course, reducing waste from 1000 perspectives, if you're lean. That is one of them. It's being intentional about tracking, correct instance sizing. It's being intentional about providing architectural guidance. You could go to a serverless architecture, not have a single service running, and it will cost $1,000 less a month. Yes, $1,000 less a month, times 12, it's not a lot, but times many applications, it becomes a significant number. You have to be intentional about those. By intentional, it shouldn't be a one-off thing that you are doing.

We have an active FinOps program in our group that engages the applications that are not at their optimal utilization level, and gives them guidance on what to do immediately to reduce costs. It could be like, "I see that you have an application here that's in a dev account. You might want to shut them down at night. You might want to shut them down in the evenings, just to save money." Also, for test environments. "I see that your test environment is maybe too large or not representative of what you should be, could we maybe reduce your test environment?" "We see that you could implement some lifecycle policies to move data to a cheaper storage." I think that's the thing. It's being intentional about this, to reduce the size of the bubble, not to scare your CFO.

Porcelli: What's the role of multi-cloud? You mentioned hybrid cloud, you have more than one cloud provider. What's the challenge of this? Instead you go 100% in just one that may be "easier," but what's the landscape with hybrid cloud?

Rettori: There are ways and reason to understand this. One is that I think we all have to acknowledge that by design, UI is going to end up using or running applications in multiple places. Either you're consuming a SaaS software that you don't know where it's running, where you do know where it's running. Or maybe the nature of how geographically distributed your business is. I can give you an example. Amazon does not have a region in Switzerland. For us in the private bank business, having a region in Switzerland is important. Then for those situations, we would have to resort to providers that have a footprint available in that region. There's many reasons that we could take from a technical perspective.

I'm a fan of trying to provide some portability for the applications. If you can deploy applications that can be somewhat portable, and either you solve that on the runtime by relying on Kubernetes, for example, as one way. Not a silver bullet by any means, but one way to so much standardize what runtime looks like, or you can standardize on CI/CD pipelines that can package and ship for different environments. I probably like that better even. Those are a few ways. Even from a regulatory perspective, it depends on the region you're running. The regulator might ask, you need to be running on multiple cloud providers, by regulatory reason.

Porcelli: How important is the architecture framework that applications adopt in this journey to the cloud? What about maintaining CI/CD for applications that don't have an active development?

Rettori: Most companies have an application maintenance policy. My program focuses on applications that we want to invest or maintain, and on applications that we want to divest or stop using. Then the applications with an active maintenance, I think most of those are divest, for our case. We rather just focus on the ones that are evolving, and the ones that are becoming intrinsically important for the business, which is large majority.

We have created in our group, what we call reference architectures and solution patterns, which is, as the name says, given the problem, what is the tested and proven solution for that? We've broken down into larger architecture references, into smaller consumable patterns. For example, large is, you want to move an application that's going to become microservices, that you're going to run on a serverless architecture, that's going to have a data pipeline, and that's going to have some specific backup policy. That's like some of the reference architecture. There are multiple patterns that fit into that. The patterns for a specific thing, like for example, how do you observe Lambda within the constructs of what the firm has available for you to observe, and what the provider has? Those individual patterns, they are also consumable independently. We have those for data pipelines. We have those for observability. We have those for backup and restore. There is a decent and growing group of technology recommendations that we have tried, tested, and blessed, and maintain, that folks can rely on.

Porcelli: What are the size of these acceleration events? How many human beings have you put together in a room?

Rettori: It's 50 to 100 people. It's a good number. It varies, but it's 50 to 100 people. We have had larger ones with close to 100 people, but the key aspect of the acceleration event is, do you have all the support needed at an arm's length?

 

See more presentations with transcripts

 

Recorded at:

Jan 11, 2022

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

BT