InfoQ Homepage Articles Adoption of Cloud Native Architecture, Part 3: Service Orchestration and Service Mesh

Cloud

Adoption of Cloud Native Architecture, Part 3: Service Orchestration and Service Mesh

This item in japanese

Lire ce contenu en français

Sep 01, 2021 13 min read

InfoQ Article Contest

Share your knowledge Win a ticket to a QCon event
or an InfoQ Dev SummitFind out more

Key Takeaways

Service interaction and governance model are critical in a microservices-based architecture. Advanced architecture patterns like service orchestration and service mesh can help address challenges we face in distributed systems.
Challenges like siloed development and complex service-to-service interaction (Death Star architecture) can lead to performance issues in production, security issues in terms of access control, and lack of overall system monitoring.
Service orchestration helps with these challenges, but it also has some disadvantages like lack of decentralized policy enforcement.
Service mesh-based solutions offer several advantages compared to traditional microservices architectures in the areas of connectivity, reliability, security, and observability.

The first part of this article series discussed architecture evolution and strategic architecture patterns for technology trends like microservices, serverless, and containerization. Architectures based on principles such as loose coupling, extensibility, and interface-based design will be more resilient during major shifts in technology platforms. Well-designed solutions stand the test of time by insulating business logic from technology components that become obsolete over time.

The second part focused on architecture stabilization gaps and anti-patterns such as distributed monoliths and "Death Star architecture." It demonstrates the need to balance architecture and technology stability.

This third part explores the importance of service interaction in a microservices-based architecture, the typical challenges of distributed systems, and how advanced architecture patterns like service orchestration and service mesh can help address those challenges.

A microservices architecture offers several advantages but also comes with some challenges. We’ll look at these challenges as being architectural opportunities, and explore how to overcome them through service orchestration.

The design patterns discussed in this article are applicable to cloud platforms like AWS, VMware Tanzu Application Service (formerly Pivotal Cloud Foundry), or Kubernetes. But you can also use them in a non-cloud infrastructure.

Kubernetes has become the de facto cloud platform in several organizations. A service-mesh framework offers an excellent way to take full advantage of Kubernetes on the cloud, whether you are using Kubernetes today or planning to explore it in the future. This article will demonstrate how a service mesh helps manage the services deployed in production.

Let’s first start with the challenges we, as developers and architects, experience in distributed application architectures.

Challenge 1: Siloed development

The example of a distributed system below illustrates the traditional application architecture challenges before the emergence of services-based architecture and cloud platforms.

Figure 1. Siloed development example

There are two client applications (App 1 and App 2) and three business services (Service 1, Service 2, and Service 3).

In this example, App 1 talks to Service 1 and Service 2. App 2 only talks to Service 2. Service 2 also talks to Service 3.

Let’s examine App 1 and identify what’s happening inside this application.

It is an example of how applications were typically developed and deployed before microservices came along, and the design is used in some applications even today.

Figure 2. Application and service functionality under the hood

The sample application above includes critical business logic mixed with non-business logic tasks, all embedded inside the application code:

Non-functional: authentication, authorization, notification, etc.
Common platform tasks: service routing, service discovery, service retry/circuit-breaker, tracing, etc.

The non-functional requirements are typically hardcoded into the application and entangled with the business logic, despite being common concerns of multiple applications.

This example shows how much real business code a typical application contains in comparison to non-functional tasks that should be managed outside the application and leveraged by multiple business apps.

Only one out of eight functions hold app-specific logic that belongs inside the application. The other seven functions should not be coupled with the application logic.

Digging deep into other applications will likely reveal the same trend.

A look inside Services 1, 2, and 3 reveals a similar pattern. Each service has its own business logic mixed up with the same non-functional and platform tasks we saw earlier, which should not be part of the service. We see common non-functional services like authentication, authorization, customer notifications, etc. There are also some platform services that are common to all business applications hosted on the cloud platform, like routing, service discovery, and service monitoring and tracing.

Our simple example of a distributed system with two applications and three services doesn’t look as simple when we look inside each of these components, as shown below.

Figure 3. Distributed systems example with common functionality duplicated in each app and service

All applications and services include all the non-functional code inside them. There are plenty of disadvantages with this type of design.

There is a lot of duplicate implementation and proliferation of the same functionality in each application and service, resulting in longer application development (time to market) and exponentially higher maintenance costs.

With all these common functions embedded inside each app and service, all are tightly coupled with specific technologies and frameworks used for each of those functions, for example for Spring Cloud Gateway and Zipkin or Jaeger for routing and tracing respectively. Any upgrades to underlying technologies will require every application and service to be modified, rebuilt, and redeployed, causing downtime and outages for users.

Because of these challenges, distributed systems are becoming complex. These applications need to be redesigned and refactored to avoid siloed development and the proliferation of one-off solutions.

As networks become more stable and reliable, the "in-process" calls, as shown in Figure 3, can start to transition to "over-the-network" communication.

These complex systems be redesigned to take advantage of common functions without embedding those functions in each individual application and service with the design pattern called "common services".

Common services

The remedy for embedded code in each application is to encapsulate each of those common functionalities in its own service and host the service on a central server (in case of VMs) or containers in the cloud.

Figure 4. Common functionality encapsulated in independent microservices

The client applications would call these remote services when the apps need to execute common functionality. As we can see in Figure 4, the common logic is no longer embedded in each application or business service.

Common services should be stateless and ideally developed or refactored with Twelve-Factor App best practices so they can add value by reuse in consumer applications.

We can use open-source frameworks to develop these common services, like Spring Boot and Spring Cloud for Java applications and Asp.Net Core middleware for .NET-based applications. Since these services are ideally Twelve-Factor-based applications, it's easier to deploy the apps to any cloud platform. It’s also easier to manage and monitor them in production environment.

There are several benefits to this architecture:

Faster development and delivery timelines for application teams means faster time to market.
Individual deployments are isolated from other applications and services so there’s less dependency among components.
Scalability at each service level.
Automatic compliance with security and technology standards.
Smaller, quicker, and simpler ongoing maintenance eliminates the need to rebuild and redeploy all applications and services when shared functionality is upgraded.
Less technical debt in the long run.

These common services can be hosted on a cloud platform (like AWS, Azure, Kubernetes, or VMWare Tanzu Application Service, formerly known as Pivotal Cloud Foundry) that provides good support for capabilities like auto-scaling, monitoring, and ease of deployment.

However, common services are only the transitional architecture in the cloud-native journey, not the goal. Even though the architectures based on common microservices offer several advantages, they also come with some new challenges, including tightly coupled service interactions and lack of centralized policy enforcement in terms of routing, discovery, and circuit-breaker policies.

We’ll examine these new challenges later in the article.

With the adoption and expansion of microservices in organizations, the communication between client applications and business services as well as the interactions among services become critical. Failure to address this communication complexity can result in poor performance of the services and disruptions in the system availability.

Let’s now look in more detail at the communication challenges among apps and services.

Challenge 2: App/service and service/service communication

As we break down large applications into fine-grained services, the overall number of deployed components increases, making the interaction between these components more and more complex. Figure 5 illustrates this complexity.

Figure 5. Application-to-service communication challenges

Even with the common functionality abstracted out of each client application and deployed as individual services, the interdependencies of the services and how the services are allowed to call each other pose a major threat to the architecture.

Let’s extend the previous example to include a few more business services and examine the effects on service communication.

As we can see in the service-to-service communication in Figure 5, there is still tight coupling between the different services. It's also difficult to pinpoint which service is having issues when the whole system is running slow or is suffering an outage.

We might wonder if other companies who have adopted microservices and this complex interaction model have gone through the same challenges. They have. Companies like Netflix, Amazon, and Twitter experienced the same challenges when any application or service could call another service without an effective model of service communication or governance process. As noted in Part 2 of this article series, this architecture challenge is so prevalent in the industry that it defined it as an anti-pattern called "Death Star architecture."

These companies overcame this architecture challenge with service orchestration.

Service orchestration

Figure 6. Service orchestration improved service-to-service interaction

In the model of service orchestration in Figure 6, we still manage the common services outside of the client applications like in the previous architecture, with their own individual deployments, lifecycles, and scalability needs.

The major improvement is that we’ve moved the routing service to be in front of all the common services.

Client applications should call only the routing service. Depending on the use case and context of the request coming from the client applications, the routing service calls one or more common services and application services, in a predefined order.

This architecture offers many benefits:

First of all, the client applications and common services are loosely coupled. It also allows for flexible traffic management and centralized policy enforcement.
The policies can be security related, like authentication and authorization, or SLA related, like service retry attempts and circuit-breaker rules, or observability and monitoring related.
This architecture provides end-to-end monitoring of the system.

This architecture also offers a lot of flexibility in terms of how different parts of the architecture, whether the client application, common services in the backend, or the router itself, interact:

Client applications can be web applications, mobile apps, IoT devices, or other services.
Backend services can be monolithic apps, microservices, or serverless functions.
The routing service can be used for different capabilities like routing/splitting and canary deployments with zero downtime for production applications.
The communication between the client and the service can be transactional and synchronous using a request/response mechanism or it can be based on asynchronous publish/subscribe messaging.

In service-orchestration architecture, the consumer application teams only need to focus on the user-interface screens and any application-specific services, thus protecting business logic and IP from technology volatility and common foundational capabilities that are not application specific. All the common services -- whether they are business, non-functional, or platform services -- will be hosted on the cloud platform and are called by the routing service, which acts as the service orchestrator.

As the diagram in Figure 7 shows, all the technologies and frameworks used in common services are completely abstracted from the consumer applications.

Technology abstraction

Figure 7. Service orchestrator-based solutions abstract technologies from client application

Let’s explore how service orchestration abstracts the technologies from client applications.

With a centralized service orchestrator as shown in Figure 7, client applications do not need to be aware of any of these technologies. Also, any of these technologies can be upgraded without impacting the client apps.

Similar to the services-based architectures we’ve discussed in this article so far, as good as the service-orchestration architecture looks, it still has some challenges:

The routing service can become a single point of failure.
There is some performance overhead as the routing service needs to call over the network each service involved in the use case.
There is no native invocation of the services.
There is no decentralized policy enforcement.

With the challenges of three different architectures we’ve discussed so far -- traditional distributed systems, microservices-based architectures, and service-orchestration-based applications -- let’s now discuss couple of the emerging cloud-native design patterns, called service mesh and sidecar.

Service mesh and sidecar

The final architecture model we’ll discuss in this article is based on the service mesh and sidecar design patterns, and it's applicable to the Kubernetes platform which supports the sidecar capability out of the box.

In this architecture illustrated in Figure 8, we still have a central component, called the "control plane", to define and manage different policies, just as in the service-orchestration solution.

Sidecar containers, which are part of what’s called the "data plane", are automatically injected into the business services at run time. These sidecar proxies enforce the policies that are defined in the control plane and replicated to the data plane.

Service-mesh-based solutions can help the distributed-systems architecture to improve security, observability, and traffic-management capabilities.

The basic principles on which service-mesh solutions are based are centralized policy management and administration with a decentralized policy execution and enforcement (the best of both worlds).

Figure 8. Service mesh and sidecars for Kubernetes-hosted apps

Service-mesh capabilities

Service-mesh-based solutions offer several advantages compared to traditional microservices architectures with respect to connectivity, reliability, security, and observability, as listed below.

Connectivity:

Traffic control (routing, splitting)
Gateway (ingress, egress)
Service discovery
A/B testing, canary
Service timeouts, retries

Reliability:

Circuit-breaker
Fault injection/chaos testing

Security:

Service-to-service authentication (mTLS)
Certificate management
User authentication (JSON Web Tokens)
User authorization (role-based access control)
Encryption

Observability:

Monitoring
Telemetry, instrumentation, metrics
Distributed tracing
Service graph

Service-mesh technologies have attracted a lot of attention over the last few years and there are several implementations like Istio, Linkerd, Consul Connect, etc. Since the focus of this article is to discuss the architecture patterns behind successful microservices-based architectures, we won’t dive into the details of service-mesh features and implementations.

If you are interested in learning more about service-mesh technologies, check out The InfoQ eMag - Service Mesh Ultimate Guide.

Conclusions

There are different ways to implement the interaction and communication among microservices. Service orchestration can be managed with an API gateway as the core architectural component. If we need additional capabilities beyond those offered by the API gateway, we can use service mesh and sidecar for those additional cloud-native architecture requirements.

It’s important to design the interaction between different layers in cloud-native application architecture, including how to model the data, services, and events as first-class citizens in the modeling effort.

In Part 4 of this article, we’ll discuss the final piece of cloud-native architecture adoption: cloud-native DevOps. We’ll look at how DevOps practices like CI/CD, containerization, and the Kubernetes cloud platform, along with microservices and service orchestration patterns, can help organizations with cloud adoption.

References

About the Authors

Srini Penchikala is a senior IT architect for Global Manufacturing IT at General Motors in Austin, Texas. He has over 25 years of experience in software architecture, design, and development, and has a current focus on cloud-native architectures, microservices and service mesh, cloud data pipelines, and continuous delivery. Penchikala is the co-creator and lead architect in implementing an enterprise cloud-native service-mesh solution in the organization. Penchikala wrote Big-Data Processing with Apache Spark and co-wrote Spring Roo in Action, from Manning. He is a frequent conference speaker, is a big-data trainer, and has published several articles on various technical websites.

Marcio Esteves is the director of applications development for Tokyo Marine HCC in Houston, Texas, where he leads solution architecture, QA, and development teams that collaborate across corporate and business IT to drive adoption of common technologies with a focus on revenue-generating, globally deployed, cloud-based systems. Previously, Esteves was chief architect for General Motors IT Global Manufacturing, leading architects and cloud-native engineers responsible for digital-transformation-leveraging technologies such as machine learning, big data, IoT, and AI/cloud-first microservices architectures. Esteves developed the vision and strategy and led the implementation of an enterprise cloud-native service-mesh solution at GM with auto-scalable microservices used by several critical business applications. He also serves as board technical advisor for VertifyData in downtown Austin.

InfoQ Software Architects' Newsletter

Login with:

Don't have an InfoQ account?

Adoption of Cloud Native Architecture, Part 3: Service Orchestration and Service Mesh

InfoQ Article Contest

Key Takeaways

Related Sponsored Content

Challenge 1: Siloed development

Common services

Challenge 2: App/service and service/service communication

Service orchestration

Technology abstraction

Service mesh and sidecar

Service-mesh capabilities

Conclusions

References

About the Authors

Rate this Article

This content is in the Cloud topic

Related Topics:

Related Editorial

Popular across InfoQ

The InfoQ Newsletter