Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ


Choose your language

InfoQ Homepage News Buoyant Releases Version 1.0 of Their Service Mesh, Linkerd

Buoyant Releases Version 1.0 of Their Service Mesh, Linkerd

This item in japanese

Cloud-native software company Buoyant has released version 1.0 of Linkerd (pronounced “linker-DEE”), their open source “service mesh” project for cloud-native microservice-based applications. Buoyant CTO Oliver Gould discussed the significance of the release:

A 1.0 release is a meaningful milestone for any open source project. In our case, it’s a recognition that we’ve hit a stable set of features that our users depend on to handle their most critical production traffic. It also signals a commitment to limiting breaking-configuration changes moving forward.


It’s humbling that our little project has amassed such an amazing group of operators and developers. I’m continually stunned by the features and integrations coming out of the Linkerd community; and there’s simply nothing more satisfying than hearing how Linkerd is helping teams do their jobs with a little less fear and uncertainty.

Buoyant founder and CEO William Morgan added:

As Linkerd hits 1.0, we're proud to see companies like Paypal, Credit Karma, and Ticketmaster join the long list of adopters. High traffic, high reliability companies like these are using the Linkerd service mesh alongside infrastructure like Docker, Kubernetes, and the rest of the cloud native stack, as part of a broad effort to bring ever-greater levels of reliability and scalability to their applications.

Morgan defined the purpose of a service mesh and why it’s important to have one on a cloud native stack. Examples on how to get started with Linkerd running on Docker, Kubernetes, and a local machine can be found on GitHub.

Morgan gave an exclusive interview to InfoQ on the one-year anniversary of Linkerd last month, and has now spoken to InfoQ again on this milestone release.

InfoQ: Please remind our readers what the vision is of the service mesh, and at what point an organization needs this new abstraction to manage communications between services?

William Morgan: A service mesh is an infrastructure layer for handling service-to-service communication. It’s a point in the stack for adding reliability and security to multi-service applications, and it gives you a uniform layer of visibility and control across your entire app.


If you’re operating monolithic software, or have an architecture where there are a limited number of hops and they’re well served by dedicated client libraries (like a three-tiered app) then you’re probably fine without this.


But if you’re building “cloud native” apps---which these days primarily means that you’re using Docker and Kubernetes and microservices---then this cross-service communication forms a critical part of the runtime behavior of your application. You can’t ignore it. You’ve got to monitor it and you’ve got to be able to manage it. You need a service mesh like Linkerd.

InfoQ: What are some of the most common types of breaks between services, that are specific to this more modern microservices and containers architecture that's still relatively nascent?

Morgan: Well, one of the “fun” parts of any distributed system is that there are many ways for a small, localized failure to escalate into a system-wide outage. The infamous “cascading failure.” For example, whenever there’s some kind of transient failure, you want to handle it by retrying the request. Of course, retrying adds more load to the system. And when you add lots of load to software, it starts slowing down, and eventually it starts failing---or it gets very slow, and that’s indistinguishable from failure. So the way that you handle failures can actually make things worse. One tiny part starts slowing down, and soon you’re increasing load, and then it falls over, and then everything around it starts to fall over as well.


Linkerd is designed to handle these situations. It’s very good about shedding load, about failing the right way, and about being able to retry in a way that doesn’t escalate the problem. There’s other stuff it does too- security and optimization and instrumentation- but the reliability angle is what makes it critical.

InfoQ: What are the new additions to Linkerd in the 1.0 release? What are some of the coolest new features and functionality?

Morgan: The big thing we added in Linkerd 1.0 is the service mesh API. The whole point of the service mesh is not just to deploy a bunch of fancy internal-facing proxies everywhere and say hey, we have reliability now. The point is to take the communication between services, and move it out of the realm of the invisible, implied infrastructure, and into the role of a first-class member of the ecosystem. Again, you want this critical layer to be monitored, managed and controlled. The service mesh API is a fundamental part of how you accomplish that.


For 1.0 we also continued to expand and harden the many integrations we have with the surrounding ecosystem---things like Kubernetes, gRPC, Mesos, Consul, Prometheus, and Zipkin. The service mesh is a glue layer, and many Linkerd users are using it to tie Kubernetes into their existing infrastructure, or to gracefully migrate services between architectures.

InfoQ: Version 1.0 is obviously a huge milestone in terms of stability and "readiness." Looking forward, what do you think are some of the features areas that have the most intrigue for Linkerd users, in terms of additional desired capabilities?

Morgan: There’s a hugely exciting roadmap ahead of us. So far we’ve focused mainly on reliability features, and as a result we have these two amazing building blocks---top-of-line service metrics, and the ability to route traffic in arbitrary ways---that can be combined to do some very powerful things, like failing over between datacenters in a principled way, or multi-cloud / hybrid cloud, or auto-scaling based on latency. We also want to get into things like policy enforcement, and serverless, and metering and multi-tenant accounting… the list goes on and on.


Finally, stability and performance and resource consumption are always top-of-mind. We have a huge track of work that’s dedicated solely to making Linkerd faster, smaller, and lighter weight, especially when it comes to things like TLS.

InfoQ: Linkerd is in the Cloud Native Foundation - in general, how do you see cloud-native continuing to grow as an enterprise trend? What are the conditions that are driving enterprises to build applications as cloud-native, and what can we expect to see over the next year or two in terms of how this impacts developers?

Morgan: Cloud native adoption is growing like crazy and the pace is still increasing every day. Just look at the rate of adoption of Docker or of Kubernetes. I think it’s inevitable for enterprise, because it’s linked to cloud adoption. Cloud native is really just the name for the pattern you take when you write software for the cloud. When you’re in the cloud you have to face the fact that all the reliability guarantees you used to have about your dedicated hardware are gone. You have no control over the hardware and random failures and resource contention from another tenant and everything is going wrong all the time. So if you want your application to be reliable and to be scalable, it’s all got to happen at the level of software. The way you build your software so that it’s reliable and scalable in an environment where the foundations you’re building on are not reliable at all---that’s what cloud native is about.

Rate this Article