Martin Fowler on Software Design in the 21st Century
Schemaless data structures are not well understood and it's important to consider the advantages and disadvantages when using these data structures in NoSQL databases. At a recent company event Martin Fowler talked about Schemaless Data Structures, and NoSQL & Consistency.
Schemaless Data Structures:
Being schemaless is often seen as a big advantage with NoSQL databases. Martin believes that the area is not well understood and describes different aspects of schemalessness as well as what advantages and disadvantages of using schemaless data structures.
The main point is that even in a schemaless structure you still have a schema. In order to query the data and find information you have to understand the data, and that's an Implicit Schema, a definition of data e.g. in code. In contrast the schema in a relational database, where only correct data is accepted, is an Explicit Schema.
Martin ends the discussion with claiming that most of the time "Implicit Schema == Bad Thing" prefering an explicit schema to get a clear statement what data looks like, although there are a few cases where schemalessnes is useful. But he also states that a schema does not need to be a fixed storage schema; it can be more in the form of a contract, e.g. a data access layer or XML schema.
NoSQL and Consistency:
In this talk Martin looks at two aspects of consistency in NoSQL databases.
Logical Consistency deals with keeping data consistent when working in one database. For most NoSQL databases (graphs being one exception), the use of aggregates (a concept from Domain Driven Design where you store a cluster of objects at the same time) is an obvious way of avoiding inconsistency.
While describing Replication Consistency, with copies of the same data in several places, Martin introduces the CAP theorem, and with data already replicated over the network he simplifies it into a choice between consistency and availability, He emphasizes that this not a technical issue, it's a business choice whether being consistent or available is the top priority.
Martin ended with a talk discussing the value of software design and technical debt.
Schema is a means to an end
But schemas are just means to an end.
That means you need to know what the (non-functional) requirements are... and then decide if more or less schema is good for a certain purpose.
Schemas are contracts to enable collaboration of different parties in space and time.
Schemas are performance optimizers.
Schemas are... what else?
Fixing a schema too early leads to all sorts of costs due to schema changes. Or even prevents change.
Fixing a schema too late leads to loss of performance/scalability/security etc.
So the most important question to me seems: Can you easily change the rigidity of the schema of certain data?
Is the decision for more or less schema reversible?