BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Mind Your State for Your State of Mind: Pat Helland at QCon SF

Mind Your State for Your State of Mind: Pat Helland at QCon SF

The features of different types of data storage should be considered when selecting how data is stored in an application or system. Is always reading correct data or having low latency the most important requirement? In his keynote at this year’s QCon San Francisco, Pat Helland described trends in storage and computing, durable and session state semantics, and other aspects of storage like transactions, identity and immutability.

Helland, currently working as a software architect at Salesforce, and who is well-known for his work on transaction systems, defines two types of storage: durable state and session state. Durable state is state that is remembered across requests and persisted across failures. It can be done using databases, files, key-value stores, etc, and updated using single updates or transactions. An important property of a store is if you can "read your writes" consistently — after writing a value x, then a subsequent read must return x. This can be a challenge when using weakly consistent storage or caching.

Session state is not much talked about nowadays, but Helland thinks it should be — it’s can be an important part of distributed systems. Here, data is remembered across requests during a session, but not across failures. Session state exists within the endpoints associated with a session which makes is harder when multiple instances are used. With two requests in a session hitting different instances, the session state is not available for both requests.

How data is represented and stored is important, and Helland refers to a paper he wrote in 2005: Data on the inside versus data on the outside, where he described a trend moving from storing data in relational databases towards data leaving the database and moving across boundaries. Data on the inside is classical transactional relational data living in one database. Data on the outside can be messages, files, events, and key-value pairs. This outside data is immutable, but may be versioned, and each piece has a unique identifier which can be a URI, a key, or something similar.

In traditional relational databases and similar storage types, when you write something you can then read the same data — you can read your writes. Helland defines storage that always return the correct data as linearizable data stores, and note that they support this even as they are scaled up. He notes though that these types of stores do not support fast predictable reads and writes, occasionally there may be long delays when a server is not responding.

Non-linearizable stores do not offer read your writes, which means it might return an old value in a read, but they are very fast and consistent in time regarding reading and writing. Scalable cache is a third storage type; it supports fast predictable reads, but not writes, and may return stale data. Since it’s not possible to have all features in one storage type, different use cases will need different types of storage. Is a delayed read or write acceptable? Is it acceptable to return an old version of some data?

Do you want it right (as in read your writes) or do you want it right now (with a fast SLA)?

For Helland, immutability is a solid rock to stand on, and he refers to his paper: Immutability changes everything. Sometimes it’s possible to store immutable things and there are many application patterns that create immutable things or items. He emphasizes that for immutable data there is only one version and it’s always correct. This leads to some interesting options for applications, because now even a non-linearizable store and scalable cache can support read your writes since there is only a correct version of the data.

Back in time applications and databases used to run in the same process which made data storage easy. Later, applications and databases where separated but connected by a session using session state. Even later, they were moved into separate servers, and session state made also this work. Helland notes that stateful sessions are natural in a shared process — both parts know each other, and it worked well in classic SOA. But with microservices it's hard to deal with session state since a request can reach different instances and session state is needed for creating cross request transactions. He still thinks that microservices are great and worth the restrictions, but they put us back into a world where we must be very careful when storing data to avoid errors.

Helland concludes by noting that state is an application pattern — different applications demand different behaviour from durable state so it’s important to select the right type. Is there a need for low latency predictable reads, low latency predictable writes, or the ability to read the writes? Different scenarios affect the solution and the most suitable type of storage should be selected according to application needs and user expectations. He emphasizes that linearizability and read your writes are not always necessary in modern scalable application; instead, it depends on the application requirements.

Most presentations at the conference were recorded and will be available on InfoQ over the coming months. The content of Helland’s presentation is also available in his article Mind Your State for Your State of Mind.

The next QCon conference is QCon London 2020, scheduled for March 2 - 6, 2020.

Rate this Article

Adoption
Style

BT