BT

How to Adopt a New Technology: Advice from Buoyant on Utilising a Service Mesh

| Posted by Thomas Rampelberg Follow 0 Followers , reviewed by Daniel Bryant Follow 738 Followers on Jul 09, 2018. Estimated reading time: 11 minutes |

Key Takeaways

  • By being mindful of the impact adopting a new technology like a service mesh into your production stack has on you and your colleagues, you can successfully empower stakeholders.
  • Be clear about what problem you are solving, and define appropriate acceptance criteria. Run experiments that attempt to show how a service mesh can make life better for the various stakeholders.
  • Cultivate allies and champions by taking small-scale demonstrated solutions and educating them on how a service mesh made those things better.
  • Planning for failure and understanding the risks along the entire journey will actually set you up for success. The concerns that you collected early on can help with this planning. Each possible risk can be addressed early on so that it isn’t an issue.
     

Adopting technology and deploying it into production requires more than a simple snap of the fingers. Making the rollout successful and making real improvements is even tougher. When you’re looking at a new technology, such as service meshes, it is important to understand that the organizational challenges you’ll face are just as important as the technology. But there are clear steps you can take in order to navigate the road to production.

To get started, it is important to identify what problems a service mesh will solve for you. Remember, it isn’t just about adopting technology. Once the service mesh is in production, there need to be real benefits. This is the foundation of your road to production. Once you’ve identified the problem that will be solved, it is time to go into sales mode. No matter how little the price tag is, it requires real investment to get a service mesh into production. The investment will be required by more than just you as well. Changes impact coworkers in ways that range from learning new technology to disruption of their mission critical tasks.

One of my favorite quotes is “No plan survives contact with the enemy” (Helmuth von Moltke the Elder). How many times has a deployment to production surprised you? If we assume that something will go wrong, it is possible to plan for that up front. This is all part of “good” engineering practices. Unfortunately, it is easy to see all the ways something can go right and ignore the ways that it can go wrong. In particular, because it is common for the ways that something can go wrong to be part of the unknowns. 

Remember the problem you’re solving? Now is the time to validate that the problem is actually being solved. Even when you’ve gone through a proof of concept for technology, the chasm between a product website and what that new technology does in the real world can be extreme. As you take the small steps towards a complete rollout in production, it is important to take some time and verify that you’re helping instead of hurting.

What problem are you solving?

You can use service meshes to solve many problems. It is important to nail down what the most important problem is for you. That can be used as acceptance criteria - has this all been worth the effort? Good problems are ones that have multiple possible solutions. If the only solution to a problem is to use a specific technology, you might be seeing the world as if you have a hammer and everything is a nail.

The best problems that service meshes solve are the ones that empower microservice owners to do what they do best and not focus on the things that every platform needs to provide: observability, reliability and security.

Working with microservices is hard. In particular, it can be tough to debug the interactions between services to understand why something is broken. It is entirely possible to solve this problem by working with service owners and getting them to build tools to provide the visibility required. A service mesh would, however, provide the required visibility with less effort from everyone involved. Just think about how empowering it is to let service owners worry about other things and get interaction debugging for as little work as possible.

In a microservice world, each service ends up depending on many others. When one service fails, that one failure can cascade into other services comprising the stack. Services can get stuck in lengthy retry loops that consume resources while they process their retry queues. Left unmanaged, what could have been a small isolated failure instead becomes a larger system issue that users complain about insistently. Circuit breaking provides primitives that can mitigate those lengthy loops and stop a cascading failure in its tracks. Following the theme of empowerment, by using a service mesh to solve this problem, you provide functionality that helps service owners build more resilient services easily.

At some point, the spectre of compliance may haunt your doorstep. Auditors will ask for encryption for data in motion. The amount of work required to update and audit every service can be extreme. Then there are the unfortunate details such as certificate revocation and update. By using a service mesh, you can make all these problems an operational concern instead of a developmental one. With one way to handle encryption for data in motion, instead of the potential hundreds that you can be confronted with in a microservices world, the audit will go more smoothly.

For Houghton Mifflin Harcuort, an educational and trade publisher in the United States, the problem to solve was around developer agility. Robert Allen, Director of Engineering says, “With Linkerd, a team could continue forward on a work contract and be ahead of the game, and not disrupt their deployment schedule. We could decouple teams more and become a lot more agile. This was a huge benefit.” 

Having a concrete problem statement and clear acceptance criteria defined is the first step towards successfully adopting a service mesh in production. You get a tool to use in the next step when selling the value of a service mesh to others and a bar to measure progress against as the rollout occurs. There are other problems you could be solving, these are just common problems that come up time and again.

Sell it

Rarely anyone is working alone, and it is unlikely that a service mesh can get into production without the help of others. If you don’t (or can’t) convince your colleagues that it is a good idea, the path to production becomes infinitely more difficult, if not downright prohibitive. Armed with the problem that is being solved, defined acceptance criteria, and a clear explanation of its value gives you an opportunity to gather allies to your cause. Turning your colleagues into allies creates additional voices to champion the virtues of a service mesh. That sort of organizational buy-in can make many of the possible missteps further along the road to production avoidable.

Every stakeholder will have their own concerns. A developer might care about learning new technology and writing integration code that moves out their current deadlines. Your management team may be concerned about downtime as well as new business dependencies. For each of these stakeholders, it is valuable to talk with them and understand what they’re concerned about. Their concerns will help you shape how the rollout occurs as well as provide a platform to describe what benefits they’ll receive.

When you’re solving the right problems for your organization, you’re providing benefits to all of the stakeholders. After learning each concern, it is possible to provide a list of benefits and incentives that explain what is in it for them. This is the fun part! You have the opportunity to explain how solving this problem will empower your colleagues with new tools and capabilities. Just imagine how exciting it is for a security team to understand that there will be consistent encryption between services.

Sample stakeholder concerns

Stakeholder

Incentive

Concern

Platform engineers

  • Unified visibility across all services
  • Failure isolation
  • Is it reliable?
  • Will it introduce complexity?

Developers / service owners

  • Remove complex communication logic from your code
  • Easily run parallel versions of a service
  • What do I have to change?
  • Do I have to learn a new complicated way of doing things?

Security team

  • Consistent application of TLS and authz/authn across services
  • Policy
  • Will it make things less secure?
  • What new attack vectors are introduced?

The Management

  • Faster pace of development
  • Fewer outages
  • What dependencies are we introducing to our business?

Plan for failure

There are opportunities for a production rollout to trip up at every step. By planning for each step to encounter challenges, it is possible to make them all go a little more smoothly. Even before anything gets into production, there are some opportunities for failure. Take each step of the road to production with an eye towards the possible risks. They can come from anywhere and many will be unknown until you start implementation of a certain stage of the process.

Are you trying to boil the ocean? Focus on the problem that you’re solving. It is tempting to see every possibility and get overly excited. As the scope for a project increases, so do the risks and time required. By keeping the problem and its acceptance criteria in focus, you have leverage that can help keep scope creep to a minimum and allow you to move forward with confidence.

Start small. Making incremental progress is important. Sometimes it feels impossible, but there is always a much smaller piece of the larger picture that you can do first. By separating the larger project into small deliverables, you’re able to remove much of the risk involved with introducing change. Can you imagine everything required in changing something significant in production all at once?

Have you taken the time to address risks? Knowing about risks is only half of the battle. You need to budget time to address them. Clearly communicating your plan to deal with risks and involving your coworkers in that plan is a key strategy on the road to getting a service mesh intro production.

Is it taking too long to demonstrate value to your stakeholders? It can be hard to understand why the production rollout of a service mesh could take so long. The benefit of working in small incremental steps comes back to not only addressing risks, but also presents a chance for you to provide clear communication at each step of the way. Each small step along the road should include verification that it was successful and communication to show progress.

Looking back at what worked well and didn’t for each incremental step is also an effective strategy for getting your service mesh into prod. By retrospecting, you’re able to encourage communication and focus on good diagnostics explaining exactly what happened. With the problem being solved at the top of everyone’s mind, clear verification becomes an integrated part of the rollout process.

Planning for failure also means accounting for who is to blame. When something goes wrong, and you’re in the process of changing anything in the system, you’re the first to get blamed. It isn’t always your fault though! Misunderstood tech regularly gets blamed for things it can’t possibly do. Whenever this happens, it is an opportunity to educate your colleagues, explaining exactly what a service mesh does and doesn’t do.

What are the tradeoffs? It can feel good to try and address *every* possible risk. That doesn’t always make sense though. Understand the potential impact of each risk. Some of these can be extremely improbable, while having impressive implications. Other risks are likely to happen and keep their scope of implication small. Each of these risks have a specific cost associated with mitigation. Once you understand the cost of mitigation and the impact of the risk, it is possible to make the call around tradeoffs and clearly communicate that. By communicating that you weighed the tradeoffs around a particular risk, it is possible to calm stakeholders with specific concerns.

Conclusion

By being mindful of the impact adopting a new technology into production has on you and your colleagues, you can smoothly and successfully empower stakeholders. The first step is being clear about what problem you’re solving. Pick a real problem that you’re experiencing, define clear criteria that shows it has been fixed, and use that to show how a service mesh has made life better.

Get allies on your side by taking those demonstrated solutions and educating them on how a service mesh made those things better for them.It’s that type of direct help that will get them on your side and that’s how you grow additional champions. You’ll need to understand what their concerns are. You must also accept that change always comes with risk. These understandings, combined with a clear view of the problem you’re solving and its acceptance criteria, will present opportunities to help ally colleagues to your project.

Finally, planning for failure and understanding the risks along the entire journey will actually set you up for success. The concerns that you collected early on can help with this planning. Each possible risk can be addressed early on so that it isn’t an issue.

Form3 rolled Linkerd out into production and ended up with something they could rely on. Ed Wilde mentions how it made a huge difference between previous systems: 

“One thing that’s clear with Form 3 today is how few errors there are. On our previous system, you’d see a low background level of errors. This was just accepted. The difference with this system is that we just don’t have errors anymore. Linkerd has proven to be a component you can rely on, and it is very solid. We have had no operational issues with it.” 

This isn’t a foolproof plan. But by following these steps you’ll be much better equipped to roll a service mesh out into production in your company. Getting a service mesh into production isn’t just about the technology. It’s about knowing how to empower your colleagues and making them feel like the service mesh gives them superpowers.

About the Author

Thomas Rampelberg is a Software Engineer at Buoyant Inc, authors of the Linkerd service mesh. He has made a career of building infrastructure software that allows developers and operators to focus on what is important to them. While working for Mesosphere, he helped create DC/OS, one of the first container orchestration platforms used by many of the Fortune 500. He has moved to the next big problem in the space: providing insight into what is happening between services, improving reliability between them and using best practices to secure the communication channels between them.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT