QCon New York 2017: The Ordering of Events in Systems

Kavya Joshi, software engineer at Samsara, explored in detail the happens-before principal at QCon New York 2017. She explained how the distributed key-value store, Riak, uses vector clocks to establish causality across nodes. She also looked at concurrency primitives in Go, explaining how they express happens-before constraints naturally.

Joshi explained that in modern software systems, computation is generally split across different nodes or threads in order to scale. This can ultimately lead to data races:

"A data race is when two threads concurrently access a shared memory location, and at least one access is a write"

Because data races tend to be non-deterministic, or have undefined consequences, Joshi explained how they can be particularly difficult to debug. She stated that key to dealing with them is to understand happens-before.

Essentially, happens before is a means to work out the ordering of events in a parallel system, such that X < Y if:

X and Y have happened on the same actor. This is because ordering within a single node or thread is guaranteed to be sequential.
They are a synchronisation pair. For example, if something is fetched from one node and updated on another, or is locked with a mutex.
Implicit from transitivity, for example, an intermediate event such as X < Z < Y can prove ordering.

If none of these criteria are met, then the events must be concurrent, meaning that there must be some sort of conflict resolution. Joshi then outlined three strategies for this to be:

Last write wins (The event with the highest timestamp). This is good if you have immutable data like a cache, but can otherwise lead to data loss.
Automatically merge the data.
Returns conflicts to the application to deal with it.

Looking at Riak, an eventually consistent distributed key-value store, Joshi showed how it makes use of a vector clock in order to establish happens before. This is essentially a logical clock stored in each node, which can be compared via a pair-wise max to its counterparts to determine whether events are concurrent or not. In order to be able to compare clocks, Riak clients pass them around via a causal context object.

Joshi also covered concurrency in Go, mainly focusing on channels compared to mutexes as a thread safe means of sharing data between goroutines. She was able to show how that by using channels, code could be simplified and be easier to reason about. She also highlighted that their wait-until-empty semantics removes the need for patterns such as wait-and-notify, explaining: "Channels allow, and force, the user to express happens-before constraints naturally".

To conclude, Joshi points out that although a key value store like Riak and a language like Go are different, they both take similar approaches to solving concurrency, namely data races and conflict resolution. She also points out that whilst happens-before is an old idea, formulated in 1978, it is an important principle which is still applicable to concurrent systems today.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the Concurrency topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter