Microservices for a Streaming World
Embrace decentralization, build service-based systems and attack the problems that come with distributed state using stream processing tools, Ben Stopford urged in his presentation at the recent QCon London conference
For Stopford, working with Kafka at Confluent, there are many good reasons for building service-based systems. These include loose coupling, bounded contexts, ease of scaling etc., all of which allow us to build systems that can evolve over time. But by taking this approach we are inherently also building distributed systems, which brings its own complexity with issues around latency, failures and so on.
Two fundamental patterns of distributed systems Stopford describes are:
- Request–Response as a way to decouple services, typically using REST, which works well for UIs and when asking questions.
- Event-driven, characterized by asynchronous or “fire and forget” messaging, great for composing complex dependencies across services.
These can also be combined, using request-response for a REST interface and events for background processing.
Looking into asynchronous and event-based communication, e.g. using queues, Stopford sees this as a very simple model, and as long as only one message is pulled at a time the ordering of messages can also be guaranteed. This can scale to a certain degree while still retaining the ordering guarantee, but Stopford notes that at some point we will lose either availability or the ordering guarantee. Another disadvantage he notes is that messages are transient, thus lacking the possibility to go back in time and read old messages after failures.
Stopford believes that a better approach is using a distributed log as a service backbone, with Kafka being one example. Kafka is based on the concept of a Log, which is an append only data structure. This makes both reads and writes efficient, for reads it’s a matter of a single seek to a position followed by sequential reads and for writes it’s just an append.
Some advantages for microservices that a distributed log enable includes:
- Always on, relying on a fault-tolerant broker, like Kafka.
- Load balancing, with service instances each reading data from a broker.
- Fault-tolerant, since services can fail over but still retain ordering of messages.
- Rewind and replay, allowing for a service to return to old messages and replay them, e.g. after an error is discovered and fixed.
One problem not solved is keeping services consistent. After e.g. failures it’s hard to avoid duplicate messages (“at least once” delivery) making it necessary for services to be idempotent regarding messages they receive, logically creating an “exactly once” delivery mechanism. Stopford notes that this is not yet available in Kafka (but work is ongoing).
Martin Kleppmann addressed the service consistency problem in his presentation at the same conference.