BT

New Early adopter or innovator? InfoQ has been working on some new features for you. Learn more

Distributed Systems Theory for Practical Engineers

| by Andrew Morgan on Jun 12, 2017. Estimated reading time: 2 minutes |

Alvaro Videla, distributed systems engineer and co-author of RabbitMQ in Action, reviewed distributed systems theory at QCon London 2017. This involved breaking distributed systems into different classifications and then discussing the trade-offs between each of them. These included timing models, failure modes and more.

Videla introduced various timing models and whether or not a distributed system knows how long its steps will take. He lists three categories, not to be confused with their alternative meanings in concurrent programming:

  • Synchronous: This is a distributed system in which the timing of each step is known. Although something which would help with problems such as failure detection, it is not reflective of real systems.
  • Asynchronous: A distributed system which takes steps in any order, with no guarantee of the timing of each step. This is more in line with a real system, although a real system is likely to impose some sort of timeout.
  • Semi-synchronous: A distributed system with at least some timing information, so it is able to estimate each step.

Videla also elaborates on the types of interprocess communication. This is simple, and a binary choice of communicating via message passing, or by shared memory.

The final classification given by Videla is the failure mode, which determines what kind of process failures are assumed. They are as follows:

  • Crash-stop: When a process crashes, it never recovers. This is not reflective of the real world; when a machine fails it would not be disposed of, and would instead be re-used.
  • Crash-recovery: When a process fails it can recover, usually by making use of sort of recovery algorithm. This may involve reading from a database or communicating with other processes.
  • Omission Faults: Where processes fail to send or receive messages. The example given by Videla is a cache which can receive messages from the group but not reply to them. It would still be useful to a client, as it is still up to date with the latest data.
  • Arbitrary Failure Modes: When processes start to send or receive the incorrect information, the outcome of which could be the ability to invalidate the state of the system.

Videla stresses that when deciding between these various models and categories, there is no silver bullet. It’s simply a tradeoff that must be made based on the requirements of the system under development.

It’s also worth noting that while some of these options are not reflective of real-world systems, they tend to be useful in distributed systems theory. This is because they are easier to work with when trying to prove new algorithms in combination with them. Also, if something does not work with the simple model, it tends to be implicit that it will not work with the more complex models too.

Videla also explains failure detectors, which are algorithms which are used to detect whether another process has died. The main problem with these is knowing the difference between a failed process, and a process which has taken a long time. The "Eventually Perfect Failure Detector" was given as a workaround, which only suspects a process to have failed after a given period of time. This means that if later on the process is found to be alive, it can be removed from the suspect list, and its timeout extended.

The full video is available to watch online, and Videla has also written up his talk on an article. Both touch on additional concepts such as quorum and consensus, and also give advice on further reading.

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread
Community comments

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Discuss

Educational Content

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and dont miss out on content that matters to you

BT