Focus on the Process, Not on Individual Microservices
The key to success when working with a microservices based distributed system is to focus on the distributed process as a whole, not on the microservices themselves. The services are the least important part, Eric Ess claimed at the recent Microservices Conference in London, in his presentation on how to monitor distributed processes at jet.com.
At Jet a process initiated by a user involves at least a few microservices to complete and is called a distributed process; Ess, Director of Engineering, explains and notes that this is a key term for them when looking at how their system execute user requests.
Jet has about 800 microservices in production today giving a very complex communication topology. Because of this complexity, it’s infeasible for any team to know what’s happening outside of their scope, as well as being impossible for any individual to fully understand the system architecture. Despite this complexity, during problems in production, it’s essential to know exactly what the root cause is, and in which service it originates.
To overcome this challenge, there are two key things they want to accomplish:
- Know how a single process is behaving, what microservices it has passed through and what it’s currently doing- basically be able to follow different type of processes as they move through the system via microservices interacting with each other.
- Validate processes by defining the expected workflow for a given process and then validate that it follows that path when executed. Ess notes that even though a process is not generating any errors, it can still be behaving incorrectly. One example is a bug in A/B testing that routes a process the wrong way, causing a flaw in the testing data.
Ess notes that by focusing on the distributed process as a whole, not on the microservices themselves, they can ignore the services; they are a means for moving the process to the next service and a step towards process completion. The current state of the process and what is happening to it is what they care about.
This requires an altered mindset, with engineers focusing on the behaviour of a process within the system, not on a microservice and how it should behave when receiving a message. A team is not building individual microservices, but microservices that interacts with other services around it.
There are a lot of tools available to evaluate microservices or a system, but not to evaluate the process, or the behaviour of the process, as it’s being executed. In addition, Jet is using F# and since it’s hard to find suitable tools targeting F# they have created their own toolbox.
To provide a view of the running system and its processes, they have created a communication protocol (Dr Orpheus) which provides a set of header metadata that goes into every message and some rules for what a microservice must do when receiving a message with metadata. They are also building a telemetry processing / data streaming engine (XRay) that is doing some basic complex event processing (CEP), collecting data that every microservice emits as it processes messages. Engineers and business people can now supervise all processes and react when they are misbehaving in any way, not following the predefined flow, progressing too slowly or blocking in some service.
Next year’s Microservices Conference in London is scheduled for November 6-7, 2017.