Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage Articles Virtual Panel: Microservices Communication and Governance Using Service Mesh

Virtual Panel: Microservices Communication and Governance Using Service Mesh

Leia em Português

Key Takeaways

  • Service mesh frameworks are used for handling service-to-service communication and offer a platform to connect, manage, and secure microservices.
  • Service mesh helps the application developers by taking care of features that require complex coding, like routing decisions, which done at the mesh level, not in the applications.
  • It also provides the security policies that you can program into the mesh. For example, you can set up a policy that restricts inbound internet traffic to some of services in the mesh.
  • Service meshes like Istio work seamlessly on platforms like Kubernetes but there are rough edges on using it in other platforms.
  • Sidecar proxies enable the decoupling of your applications from the operational aspects of managing service communication effectively and reliably.
Want to learn more about Service Mesh?
Read our ultimate guide to managing service-to-service communications in the era of microservices and cloud.
Read the guide
Service Mesh

Service mesh is a dedicated infrastructure layer for handling service-to-service communication and offers a platform to connect, manage, and secure microservices.

Service mesh makes the communication between microservices flexible and reliable. It provides the critical capabilities needed in distributed services environments such as resiliency, service discovery, load balancing, encryption, authentication & authorization, fault tolerance  (via service retry and circuit breaker).

InfoQ spoke with subject matter experts in the service mesh area to learn more about why service mesh frameworks have become critical components of cloud native architectures.

Below sections in this article provide details of the panelists we spoke to, questions that were included in the virtual panel, and the panelists’ responses.


  • Matt Klein, Lyft
  • Dan Berg, IBM
  • Priyanka Sharma, Lightstep
  • Lachlan Evenson, Microsoft
  • Varun Talwar, Google
  • Yuri Shkuro, Uber
  • Oliver Gould, Buoyant

InfoQ: Can you define Service Mesh and what advantages it brings to table in the areas of microservices interaction and governance?

Matt Klein: The two most difficult problems facing microservice practitioners are networking and observability. i.e., how do services talk to each other reliably? When things go wrong, how can the problem be quickly determined and either fixed or worked around? Reliable microservice networking and observability requires a multitude of techniques including service discovery, load balancing, timeouts, retries, circuit breakers, health checking, advanced routing, stats, logging, distributed tracing, and more. Historically, most modern architectures have built feature-rich libraries in each language that the organization uses that perform these concerns. This by definition necessitates reimplementing and maintaining a large amount of sophisticated functionality in multiple languages.

The idea behind the "service mesh" is to use an out of process "sidecar" proxy running alongside every application. This proxy implements all of the sophisticated networking and observability needs for a microservice architecture in one place and at very high performance. Since the proxy implements the required functionality in a dedicated process, it can work alongside any application language. When every application has an associated sidecar proxy and routes all traffic through it, the application itself no longer needs to be aware of the underlying network details and can treat it as an abstraction. This allows application developers to largely focus on business logic, irrespective of the many languages that might be in use within their organization.

Dan Berg: A service mesh is a term used to describe the network of microservices that make up applications and the management of the interactions between them. One example of this is Istio, an open technology that provides a way for developers to seamlessly connect, manage and secure networks of different microservices—regardless of platform, source or vendor. Service mesh helps developers to be more productive by moving complex and error-prone logic from the application code and move it out to the service mesh. For example, a service mesh manages traffic routing and shaping, ensures secure communication between services, captures network telemetry, and security policy enforcement for all services within the mesh. A service mesh ensures greater resiliency of services with built-in circuit breaking support which handles failures in a graceful manner when a service is unable to reach its destination.

Priyanka Sharma: A service mesh is an infrastructure layer for service-to-service communication. It ensures reliable delivery of your messages across the entire system and is separate from the business logic of your services. Service meshes are often referred to as sidecars or proxies.

As software fragments into microservices, service meshes go from being nice-to-have to essential. With a service mesh, not only will you ensure resilient network communications, you can also instrument for observability and control, without changing the application run-time.

Service meshes make it easier for organizations to adopt microservices with consistent tooling across engineering teams. Individual developers can focus on their services and let the mesh take care of the network layer communications as well as the tooling around the microservices.

Lachlan Evenson: A service mesh is a set of applications that enables uniform service-to-service communication in microservice architectures. Service meshes enable microservice developers and operators to interact with dependent services in a prescribed and expected fashion. This aids in governance by providing a single interface and as-such a single point of policy enforcement for all communication rather than bespoke or boilerplate implementations.

Varun Talwar: Service mesh is an architectural pattern whereby all service communications and common functions needed by microservices are handled by a platform layer (outside code) uniformly. When a platform layer like this can uniformly implement common network functions like routing and load balancing, resiliency functions like retries and timeouts, security functions like authentication, authorization and service level monitoring and tracing, it can significantly ease the job of microservice developers and enable a smart consist infrastructure that can enable organizations to manage at a higher services abstraction. (independent of underlying network and infrastructure).

Yuri Shkuro: The term “service mesh” is rather misleading. In the direct interpretation it could be used to describe both the network of microservices that make up distributed applications and the interactions between them. However, recently the term has been mostly applied to a dedicated infrastructure layer for handling service-to-service communication, usually implemented as lightweight network proxies (sidecars) that are deployed alongside application code. The application code can treat any other service in the architecture as a single logical component running on a local port on the same host. It frees the application code from having to know about the complex topology of modern, cloud native applications. It also allows infrastructure teams to focus their energy on implementing advanced features like routing, service discovery, circuit breaking, retries, security, monitoring, etc. in a single sidecar component, rather than supporting them across multiple programming languages and frameworks typical of modern applications.

Oliver Gould: A service mesh is a dedicated infrastructure layer for making runtime communication between microservices safe, fast, and reliable. At Twitter we learned that this communication is a critical determinant of the application’s runtime behavior, but if you aren’t explicitly dealing with it, you end up with a fragile, complex system. A service mesh gives an operator the control they need to debug & manage this communication. If you want to dig deeper, we wrote a in depth post on what a service mesh is and why you need one, which you can find here.

InfoQ: Enterprise Service Bus (ESB) pattern has been popular for last several years especially in Service Oriented Architecture (SOA) models. How do you contrast service mesh pattern from what ESB offers?

Klein: I'm not going to debate the differences between SoA vs. microservices or ESB vs. service mesh. To be perfectly honest, I think there is very little real difference and the name changes are mostly driven by vendors attempting to differentiate new products. Computing, and engineering in general, are driven by iterative change. In recent years, most SoA/microservice communication has moved to REST and newer strongly typed IDLs such as Thrift and gRPC. Developers have favored simplicity via direct networking calls from in-process libraries vs. centralized message buses. Unfortunately, most in-process libraries in use are not sufficiently solving the operational pain points that come when running a microservice architecture (Finagle and Hystrix/Ribbon are exceptions but require use of the JVM). I view "service mesh" as really just a modern take on the ESB architecture, adapted to the technologies and processes that are now in favor among microservice practitioners.

Berg: At a high level an ESB and a Service Mesh appear to be similar in that they manage the communication between a set of services; however, there are fundamental differences. A key difference is that messages are sent to an ESB which in determines which endpoint to send the message. The ESB is a centralized point for making routing decisions, performing message transformations, and managing security between services. A service mesh, on the other hand, is a decentralized approach where client-side proxies are programmed via the service mesh control plane to manage routing, security, and metrics gathering. Thus the service mesh pushes key responsibilities out to the application versus encapsulating the functionality within a centralized system such as an ESB. This makes the service mesh much more resilient and scales better within a highly distributed system such as seen with cloud-native applications.

Due to the client-side approach used with a service mesh, it is possible to have much more sophisticated routing rules, policy enforcement, and resiliency features such as circuit breakers than what can be achieved with an ESB. Another key difference is that the application logic doesn’t know that it is participating within a service mesh. The mesh adjusts to adopts applications. With an ESB, the application logic must be adjusted to participate with the ESB.

Priyanka: ESBs and service meshes have a lot in common, particularly why they were built. ESBs became popular in the SOA era – they managed network communications and also took care of some of the business logic. They were built for the same reason we are building service meshes today – as the number of services increases, consistency and reliability across the system is needed, and a message bus/sidecar proxy is a great way to achieve that.

Service meshes are different from ESBs because they are specifically built for cloud-native, microservices architectures. ESBs do not function well in the world of cloud computing. They take on too much of the business logic from services, and slow down software development by creating another dependency and organizational silo.

To sum up, I would say that service meshes are the next-generation evolution of ESB technology. The core motivators are the same, but the implementation is more sophisticated and tailor-made for the cloud-native era.

Evenson: Just like ESB is synonymous with SOA such is service mesh to microservices. The major difference is the scope and size of the services implemented by both ESB and service mesh. ESB is much larger in terms of feature set and backend system support. ESB typically focuses on large enterprises and industry standards and protocols whereas service meshes are lightweight enough to add value to the first few microservices. 

Talwar: ESBs were about centralized architecture where a central piece carried all the intelligence to make decisions. Over time, the central piece became complicated and lacked fitment in a microservice architecture where each team/service wants to configure, test, deploy and scale their services at a rapid pace. The new architecture of service meshes represents an inversion of SOA pattern from dumb endpoints, smart pipes (large monolith hierarchical apps) to smart endpoints (service specific functions), dumb pipes.

Shkuro: The relationship between ESB and service mesh is similar to relationship between monolithic and microservices based applications. They both serve a similar function, the main distinction is in how they are doing that. ESB is a single system sitting between all other services in the architecture. It provides a single point of control over every message exchange between services. It also introduces a single point of failure and increases the latency of all communications. In contrast, a service mesh implemented via sidecars performs the same functions but in a distributed, decentralized manner. A control plane of service mesh provides the same centralized authority of policies and routing decisions as ESB, while not being on the critical path of every request. The data plane is implemented by the sidecars running alongside application code. For example, In a typical Kubernetes setup, each microservice instance runs in a pod next to its own copy of the service mesh sidecar. All traffic in and out of the microservice passes through that instance of the sidecar, with no hard dependencies on other centralized subsystems.

Gould: The goals are not all that different, but the priorities and implementation details are extremely different. ESBs tend to be implemented as a centralized, single point of failure, whereas a service mesh like Conduit uses “sidecar” proxies to be explicitly decentralized and scalable.

Furthermore, it’s possible to use it in only a small part of an application, meaning that adoption can be incremental and doesn’t require total architectural lock-in. Finally, the service mesh is focused heavily on the operational aspects of communication and tries to avoid any real awareness of the details of the application’s business logic. The goals of the service mesh are operational, not architectural or integrational.

InfoQ: Who in the enterprise should care about a service mesh? Is this something a typical developer should be aware of when deploying applications?

Klein: The idea behind the service mesh is to largely make the network abstract to application developers. Application developers still need to understand general networking concepts such as retries, timeouts, routing, etc. (since they will be involved in configuration), but they shouldn't need to know how they are implemented. Thus, the typical developer should care about the service mesh because it means they can delete a lot of one-off networking and observability code and obtain a uniform, more feature rich, and more reliably solution for free!

Berg: Using service mesh and a strong cloud platform, smaller companies can create apps and features that previously only larger companies could dedicate resources towards, under the traditional model of using customized code and reconfiguring every server. Cloud, service mesh and microservices give developers the flexibility to work in different languages and technologies, resulting in higher productivity and velocity.

A typical developer should be aware that they are participating in a service mesh and understand they are communicating with other services in the mesh. They should embrace the fact that the service mesh helps them avoid features that require complex coding, like routing decisions, because it is done at the mesh level, not in the application itself. This ultimately allows the developer to be more productive. The telemetry information as well as the ability to inject failures is a powerful development tool to detect problems and, ultimately, eliminate them from the application.

Priyanka: Infrastructure and platform teams are often the folks who design and implement service meshes in software organizations. It is critical for those teams and their engineering leadership to work together on the best strategy and implementation for the company.

While service meshes improve application developers’ productivity by decoupling network communication from the services, they should be aware of the specific service discovery and observability features being offered. This will help the developers know what will work automatically and which functionality they need to customize. For instance, if the service mesh is instrumented with OpenTracing, developers are guaranteed top-level observability across the system. They can then choose to instrument their services with OpenTracing to get more detailed traces of bugs or performance degradations.

Evenson: A service mesh should be transparent to the developer and the services that it provides are treated as a feature of the platform. Operators will however have an interest in service meshes as they are another piece of the stack that requires care and feeding.

Talwar: One of the interesting aspects of service mesh (in the limit) is that it brings many diverse stakeholders together like developer, operator, prod security, network ops, CIO, CTO etc. As for a developer, when the service mesh is done right in an org, developer doesn’t have to write code for many common functions (ideally only business logic) and deployment into the fabric (with mesh) takes care of the functions (via policies) at runtime.

Shkuro: The service mesh solution is typically owned by the infrastructure / networking team. A typical application developer does not need to know much about it. They may need to know that to make a request to service X they need to send it to the local port Y reserved for that service, or to send all requests to the same port but indicate the target service via HTTP header or a special API of the RPC framework. Of course, in many organizations the same developer is also the on-call person for their services, which means it’s also useful to be aware of how to monitor the sidecar process in case of a problem. At Uber we have a tool that automatically gives each service a dashboard displaying metrics from many infrastructure components used by the service, including metrics generated by the sidecar process, such as request and error counts, request latency histograms, etc.

Gould: The enterprise should care because it brings a layer of standardization to runtime operations, similar to how Docker and Kubernetes provide standardization of runtime operations. Platform operators-- -the folks bringing Docker & Kubernetes into organizations-- -love the service mesh because it gets them out of the critical path for debugging and operating microservices.

Developers (and, more generally, service owners) benefit because it allows them to decouple their application code from operational logic that belongs in the runtime. The mesh provides operational affordances that allow developers to move more quickly, with less fear of breaking things.

InfoQ: How do service mesh solutions support resiliency in terms of service retries, timeouts, circuit breaker, failover etc?

Klein: The sidecar proxy implements a vast array of advanced features such as service discovery, load balancing, retries, timeouts, circuit breakers, zone aware routing, etc. on behalf of the application. These features are very difficult to get right and microservice codebases are typically littered with buggy or incomplete versions of them. It's substantially more efficient to offload this type of functionality to a single entity that can be implemented once in a high performance way and substantially vetted.

Berg: Application functionalities that are tightly coupled to the network, such as circuit breaking and timeouts, are explicitly separated from the service code/business logic, and service mesh facilitate those functionalities in the cloud and out of the box. Large-scale distributed systems have one defining characteristic: there are many opportunities for small, localized failures to turn into system-wide catastrophic failures. The service mesh is designed to safeguard against these escalations by using the agility and portability of cloud tools, such as containers, to shed load and fail fast when the underlying systems approach their limits.

This all is done in the client-side proxy (sidecar) available in the application. The sidecar is responsible for forwarding a request to a service where another sidecar proxy receives the request prior to forwarding to the application. When the request is being made, the proxy will automatically trip the circuit breaker, and potentially reroute traffic to another version when the upstream service is not reachable. Failures may occur because of poorly set timeouts between the services. A service mesh like Istio helps you avoid bad user experiences and outages from timeouts because Istio allows you to inject failures directly into the mesh, allowing you to test and validate your connection timeouts without having to guess.

Evenson: The service mesh data-plane component sits in-path of all data communications across all microservices. Given that placement, they are aware of the data mesh and hence can make policy driven decisions that support resiliency features.

Talwar: Service meshes have two parts. Data plane and control plane. Pluggable API driven data planes like Envoy (used in Istio) allow configuration for retries and timeouts so these can be configured and changed easily. Envoy also has ability to define configuration for circuit breakers as well as coarse and fine grained health checks for all instances in the pool for load balancing and routing away from failure/high latency instances. See here for more details. 

Shkuro: Many of these features would vary between specific implementations of the service mesh. The techniques themselves are not new, yet many are still an active area of research and innovation. What is special about the service mesh is that they abstract these concerns from the application code and encapsulate into a single infrastructure layer. Doing so keeps the application code lightweight, and allows service mesh developers to iterate quickly and develop best of class solutions for these problems. For example, take the problem of failovers. When a certain service in a particular availability zone experiences problems, usually the safest approach to recover is to shift the traffic to another availability zone, provided that it has enough excess capacity. Service mesh can do that completely transparently to the rest of the services in the architecture, by changing a few settings in its control plane. To support this failover capability in every service would be a lot more difficult.

Gould: The single most important reliability feature provided by a service mesh is Layer 7 load balancing. Unlike L3/L4 load balancers, service meshes like Conduit are aware of per-request metadata and can help to automatically route around slow or failing instances, rack failures, etc.

Once these load balancers are aware of Service Level Objectives (usually in terms of latency and success rate) they can make incredibly smart decisions about when traffic should not be sent to a given instance.

The service mesh can also automatically retry requests for the application if that’s a safe thing to do. Note, however, that retries can actually make outages worse; you can get stuck in long running retry loops that tie up resources and can cause system wide cascading failures. So it’s important to parameterize correctly, e.g. apply a budget-based approach to retries as we’ve done in Linkerd. This dramatically improves worst-case behavior.

InfoQ: How does a service mesh support the security capabilities like authentication and authorization? How can it help with run-time security policy enforcement?

Klein: Although most security teams would say that they want authentication and authorization between services, very few organizations end up deploying a solution at scale. This is because system wide authentication and authorization are very difficult problems! The service mesh helps greatly in this regard. Authentication can be deployed relatively easily using techniques such as mTLS and SPIFFE. Application/security developers need to specify policy but do not need to worry about how the underlying encryption and authentication are implemented. Similarly, the sidecar proxies can use authentication data derived from mTLS sessions do drive authorization at the L7 routing level. E.g., specifying that /service_a can only be accessed by service A and /service_b can only be accessed by service B.

Berg: This stems from a few key factors. A service mesh has a component that manages the certificate authority inside the mesh. This authentication component is responsible for programming the client-side proxies to automatically establish trust between services in the mesh using mutual TLS (transport layer security). If developed properly, these certificates will have a short lifespan so that if a service is compromised, there’s only a small security breach window before the certificate gets recycled rendering the original useless.

A service mesh has security policies that you can program into the mesh. For example, you can set up a policy that restricts inbound internet traffic to some of services in the mesh. If you only want to allow inbound internet traffic to service A, all other inbound internet traffic will be rejected if it deviates to a service other than A, as the client-side proxy intercepts all inbound and outbound traffic to the applications. A service mesh enforces strong identity assertion between services and limits the entities that can access a service all this is done without changing a line of the application code.

Priyanka: Service meshes create more flexibility and control at deployment time because fewer assumptions are baked into the application code. I think it would be best for the service mesh providers in the panel to speak about their specific implementations for resiliency and authentication.

Evenson: The service mesh control-plane can only provide features that are inherently supported on the platform that the service mesh is running. In the case of a service mesh running on Kubernetes, authentication and authorization are expressed in the service mesh and converted to the underlying Kubernetes resources where they are enforced.

Talwar: Once service meshes intercept all the service-service communication, they can encrypt and strongly authenticate all communication with no developer involvement (huge plus) and also enable authorization policies for who can call whom. Since all traffic is flowing through the data plane of the service mesh, ensuring encryption for all supported/tunneled protocols and allowing/disallowing egress/ingress for each service can be enforced by the service mesh.

Shkuro: One great benefit of the sidecar approach is that its identity could be used interchangeably with the identity of the actual microservice, because the networking policy on the containers can be set up such that the microservice is not reachable by any other means except the sidecar process. This allows moving many security concerns into the sidecar and standardizing them across the organization. The authentication can be done exclusively by the sidecar, for example by terminating all TLS at the sidecar and using unencrypted communication between the application and the sidecar. The caller identity can be passed to the application code via trusted request header, in case it needs to perform additional advanced authorization. Some simple forms of authorization, such as “only service X is allowed to access my endpoint Y” can also be moved completely into the sidecar process and controlled via centralized policies. Those policies can even be updated at runtime without affecting the application code.

Gould: Once orchestration is in place via e.g. Kubernetes, the traditional network segmentation approaches to identity start to break down. The service mesh makes it possible for services to regain a consistent, secure way to establish identity within a datacenter, and furthermore, to do so based on strong cryptographic primitives rather than deployment topology. Conduit, for example, can provide and/or integrate with Certificate Authorities to automate the distribution of TLS credentials for services, so that when two mesh-enabled services communicate, they have strong cryptographic proof of their peers. Once these identity primitives are established, they can then be used to construct access control policies.

InfoQ: What's the on-ramp like for someone learning about and deploying service meshes today? Where are the rough edges that you expect to smoothen out?

Klein: To be honest it's still early days. Successfully deploying a service mesh across a large microservice architecture is possible, but still requires quite a bit of networking and systems knowledge. As I've written about extensively, a service mesh deployment is composed of the "data plane" and the "control plane." The data plane touches every packet and performs load balancing, retries, timeouts, etc. The control plane coordinates all of the data planes by providing configurations for service discovery, route tables, etc. Data planes like Envoy, HAProxy, and NGINX are robust and fully production ready. However, developing and deploying a control plane and associated configurations that work for an organization is actually the difficult part.

Envoy is a general tool that is used in a large number of deployment types. This means that Envoy has a dizzying array of options that to the uninitiated can be very intimidating. Unfortunately, adaptability is often at odds with ease of use. On the other hand, control planes that are more tightly tied to an organization's development practices and tooling will likely have less options, more opinions, and will be easier for the majority of developers to understand and use. Thus, over time I think that as microservice architectures standardize on tooling such as Kubernetes, the service mesh will be a more "out of box" experience via control plane projects like Istio that build on top of Envoy.

Berg: Similar to adopting a cloud strategy, a service mesh is rich in capabilities and function, but it can be overwhelming if you try to use all of its capabilities on day one. You have to adopt service mesh in bite sized portions to avoid choking. For example, if you want visibility into the complexity of microservices, adopt a service mesh just for telemetry, not security or routing. Start simple and grow your use of the service mesh as your needs grow.

Priyanka: Based on what we hear from the OpenTracing end-user community, service meshes are a welcome technology that make microservices more robust. Currently, people are spending time understanding all the options and the more educational material (such as this article) that is out there, the better.

Evenson: This really depends on the service mesh. One of the features of a service mesh is that you do not have to change your application to support the service mesh. Some rough edges are how application developers have to modify infrastructure definitions to deploy the data-plane component of the service mesh so that it can sit in-path to all data communications.

Talwar: Today service meshes like Istio work seamlessly on platforms like Kubernetes but there are rough edges on using it in other platforms. Another area of focus should be for users to try incrementally various parts of Istio like just security or monitoring or resiliency without the cognitive load of understanding other parts of Istio. I think more work on both these areas will make Istio more digestible and widely usable. One other rough edge is well tested performance and production hardening which is work that is underway in Istio project right now.

Shkuro: Kubernetes and Istio make deploying service mash fairly straightforward, but for many organizations that do not use Kubernetes there is a bit of a learning curve. To start with, the deployment system used in the organization needs to support the ability to run more than one container as a logical unit of a service instance (the concept of a pod in Kubernetes). Alternatively, the service mesh process can run as a single agent on the host, which still solves some of the problems like routing and service discovery, but makes the other features like security and authentication impossible. From some informal conversations I had with other companies, running the control plane for the service mesh is probably the hardest part. The control plane needs to be deeply integrated with the rest of the architecture, e.g. it needs to be aware of service deployments in order to control service discovery, health checking, load balancing / failovers. The Istio project is making great strides to abstract the control plane functionality.

Gould: We’ve been supporting Linkerd in production for almost two years. We’ve learned a ton about rough edges, both some things we were expecting to learn, but often things that are only obvious in retrospect. One surprising lesson was that, while Linkerd is great at extremely high scale, it turns out many users would greatly benefit from a scaled down approach optimized for simplicity rather than maximum power and maximum flexibility. It should be easy to get started with a service mesh. We want to reduce the complexity of managing distributed services, not add to it. That insight led to our recent work on Conduit, and if isn’t all you need to get up and running, I’d love to know why.

More generally, I think the pitfall of adopting a service mesh is trying to do too much at once. Service mesh adoption needs to be incremental in its criticality and in the scope of problems it solves. They are a new sort of tool, and the most successful adoptions we’ve seen have been incremental. Our advice is to keep it as simple as possible (but no simpler).

InfoQ: Can you discuss the Sidecar design pattern and what platform capabilities can be implemented using Sidecar?

Klein: As I've discussed above, the sidecar proxy is the key to abstracting the network for applications. We consider localhost networking to be reliable. Using that assumption, the application only needs to be aware of its sidecar proxy, and the proxy can handle everything else (other than context propagation).

Berg: The sidecar is a client-side proxy that is deployed with every application (i.e., container). A service which has deployments with the sidecar automatically enables the service with the mesh. It is conceptually attached to the parent application and complements the application by providing platform features. Thus the sidecar provides a critical network control point. With this design pattern, your microservice can use the sidecar either as a set of processes inside the same microservice container or as a sidecar in its own container to leverage capabilities such as routing, load balancing, resiliency such as circuit breaking and retries, in-depth monitoring, and access control.

Priyanka: The Sidecar pattern is basically the Plugin or Driver pattern, but for platforms. By abstracting the implementation details away from the application for networking, metrics, logging, and other subcomponents that have standard interfaces, operators get more control and flexibility in how they craft their deployment.

Evenson: The sidecar design allows you to manipulate the Linux the runtime environment without having to change your application code. Typically, a service mesh data-plane component is deployed as a sidecar and the routing for that network namespace on the Linux kernel is modified to route all ingress/egress via the data-plane component.

Talwar: Sidecar pattern is a pattern where a co-process/container image sits next to the application and can act as a trusted partner which can be updated independently and managed by a separate team but shares the lifecycle with the application. Platform capabilities that a sidecar can take on include logging, reporting, authentication, policy checks for quota etc.

Shkuro: There is no strict agreement in the industry about what constitutes a Sidecar pattern. For example, Brendan Burns from Google would consider the service mesh sidecar we discussed here to be an example of the Ambassador pattern, because it is only concerned with how the application communicates with the rest of the world, while Microsoft Azure documentation uses a more generous definition that includes many peripheral tasks, including platform abstraction, proxying communications, configuration, logging, etc. Personally I prefer the latter definition, where Ambassador pattern is a subclass of the Sidecar pattern.

In essence, the Sidecar pattern recommends extracting common functionality from business applications and packaging it into another process that runs in a sidekick container. It’s a well-known principle of decomposition. By extracting the common pieces into reusable containers we free the applications from having to re-implement those features, potentially in multiple programming languages. It is similar to breaking legacy monolithic applications into separate microservices, except that the sidecar life cycle is the same as that of the parent service and we use them mostly for infrastructure related functions.

Gould: Fundamentally, a sidecar is just another container. It’s nothing magical. With sidecar proxies, we’re able to manage operational logic as close to the application as possible, without actually being inside application code. The entire point of a service mesh is to decouple your applications from the operational aspects of managing service communication effectively and reliably. With the sidecar pattern, we can do things like provide and validate identity for applications, since the sidecar necessarily has the same level of privileges of the service for which it proxies. That’s the biggest difference between sidecars vs e.g. per-host deployments.

About the Panelists

Matt Klein is a software engineer at Lyft and the architect of Envoy. Matt has been working on operating systems, virtualization, distributed systems, networking, and making systems easy to operate for over 15 years across a variety of companies. Some highlights include leading the development of Twitter’s C++ L7 edge proxy and working on high-performance computing and networking in Amazon’s EC2.

Dan Berg is a Distinguished Engineer within the IBM Cloud unit. Daniel is responsible for the technical strategy, and implementation of the containers and microservices platform available in IBM Cloud. Within this role, Daniel has deep knowledge of container technologies including Docker and Kubernetes and has extensive experience building and operating highly available cloud-native services. Daniel is also a core contributor to the Istio service mesh project.

Priyanka Sharma is an entrepreneur with a passion for building developer products and growing them through open source communities. She currently heads Open Source Partnerships at LightStep and is a contributor to the OpenTracing project, a CNCF project that provides vendor-neutral APIs for distributed tracing. She serves as an advisor to startups at HeavyBit industries, an accelerator for developer products. Follow her on Twitter @pritianka

Lachlan Evenson is a cloud native evangelist and mercenary. Lachlan has spent the last two and a half years working with Kubernetes and enabling cloud native journeys. He is a believer in open source and is an active community member. Lachlan spends his days helping make cloud native projects run great on Azure.


Varun Talwar is Product Manager in Google Cloud; founding PM on @grpcio and @IstioMesh



Yuri Shkuro is a staff engineer at Uber Technologies, working on distributed tracing, reliability, and performance. Yuri is the coauthor of the OpenTracing standard (a CNCF project) and a tech lead for Jaeger, Uber’s open source distributed tracing system.


Oliver Gould is CTO and cofounder of Buoyant, where he leads open source development efforts for open service mesh projects Linkerd and Conduit. Prior to joining Buoyant, he was a staff infrastructure engineer at Twitter, where he was the tech lead of Observability, Traffic, and Configuration & Coordination teams. He is the creator of Linkerd and a core contributor to Finagle, the high-volume RPC library used at Twitter, Pinterest, Soundcloud, and many other companies.

Rate this Article