BT

Lessons Learned Working with Distributed Systems

| by Jan Stenberg Follow 38 Followers on Aug 13, 2015. Estimated reading time: 1 minute |

When working with distributed systems problems like partial failure are going to happen. Preparing for this kind of problems and other challenges is the best thing you can do, not just hoping they will not happen, Vaughn Vernon explains in a conversation with InfoQ and refers to a blog post by Jeff Hodges noting its down-to-earth approach and the practical advice given, targeting developers less experienced with distributed systems.

For Vernon, author of Implementing Domain-Driven Design and the new Reactive Messaging Patterns with the Actor Model, two of the best recommendations by Hodges is trying to design for partial availability and using capped exponential back off to restore full operation when dependencies become unavailable. It's the best you can hope to do when failure strikes, and it will strike Vernon notes.

Hodges has found that new developers often think that latency is what makes distributed computing hard but for him the key differentiating factor is the higher probability of failure, especially partial failure and he therefore recommends finding ways to be partially available. He uses a well-designed search system as an example, when a search times out the results gathered up to that time should be returned, thus increasing the systems resilience.

For Hodges one of the basic building blocks when creating robust systems is a backpressure mechanism, where a serving system signals failure back to the requesting system to prevent overloading. Common ways of implementing this includes dropping messages or returning errors before handling a request which is likely to fail.

Hodges advices against coordination between servers as much as possible. Instead he prefers independent servers keeping the communication to a minimum. Whenever two servers have to agree on something the service becomes harder to implement.

Finding higher-level business logic that may be extracted to services has for Hodges several benefits. An extracted service provides increased encapsulation and allowing both for a simpler and faster deploy of code changes. He also thinks that with multiple clients the coordination cost using a service is lower than using a shared library which requires a coordinated deploy to all clients.

Hodges also describes several other lessons he has learned during his career including using feature flags for rolling out infrastructure and factors to consider choosing an identity space for a system.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss
BT