Scott Leberknight on Polyglot Persistence
The data persistence solutions in the enterprise software development have come a long way in the recent years. There are more and diverse choices in the database realm than using the relational database system as the default choice for persistence. At the recent Lone Star Software Symposium conference, Scott Leberknight gave a presentation on "Polyglot Persistence" where the developers have a choice of different database products like Amazon SimpleDB, Google Bigtable, Microsoft SQL Data Services (SDS) and CouchDB to choose the data persistence solution.
Back in December 2006, Neal Ford wrote about Polyglot Programming and predicted the wave of language choice we are now seeing in the industry to use the right language for the specific job at hand. Instead of assuming a "default" language like Java or C#, polyglot programming is all about using the right language for the job rather than just the right framework. The same description can be applied to persistence concern as well.
Scott said that the developers have had lot of choices in the areas of MVC and AJAX frameworks like Struts, Apache MyFaces, Spring MVC, Wicket, Rails, Grails, jQuery, dojo and Ext JS and now they have more choices when addressing the data persistence needs in enterprise applications. Polyglot persistence is all about considering your persistence requirements and selecting a persistence mechanism that best meets those requirements.
With the diversity of the functional and technical requirements in the enterprise applications, "One size does not fit all". The factors that are driving the innovation in the data persistence space are:
- High availability
- Fault tolerance
- Flexibility (i.e. "schemaless" databases), and
- New types of applications like social networking web sites
The types of data managed in the applications is very different as well. It can be either Structured (relational data), Semi-Structured (for example, documents in a medical records system) or Unstructured (audio/video stream). Different types of databases like Object oriented, Document oriented, Bigtable, Key value, and Entity Attribute value database systems have emerged to address different data persistence requirements.
He talked about the ACID (Atomic, Consistent, Isolated, and Durable) and BASE (Basically Available, Soft State which becomes Eventually Consistent) concepts which play a big role in transactions and availability of the database systems. ACID and BASE offer different persistence guarantees and have different tradeoffs. For example, in ACID systems we guarantee data consistency at the expense of availability, since for example in a two-phase commit example if even one transactional resource is down the system availability is zero. Whereas with BASE, we tradeoff immediate data consistency for high availability and partition tolerance. It is the problem context which determines whether one can or cannot sacrifice consistency to gain the benefits of higher availability, fault tolerance and redundancy.
Document-oriented databases examples include Lotus Notes, Apache CouchDB, Amazon SimpleDB, and ThruDB. Scott showed examples of using REST API to access a SimpleDB database and the Views in CouchDB to aggregate and report on the documents stored in the database.
Project Voldemort is another data persistence solution which is a distributed key-value storage system with features like automatic replication across multiple servers, transparent server failure handling and automatic data item versioning. It's used at LinkedIn for high-scalability storage problems where simple functional partitioning is not sufficient. Jay Kreps from LinkedIn will be speaking about Project Voldemort at the upcoming QCon conference.
He discussed other database products like Amazon's Dynamo which is a key-value based data storage system and Apache HBase which is an open source, distributed, column-oriented store modeled after the Google Bigtable implementation. There are more alternatives in the data persistence realm such as XML databases, Semantic Web/RDF, Triplestores, and Tuplespaces.
Most of the new data persistence systems offer REST based API in different languages like Java, C#, PHP, Visual Basic and Ruby. Scott suggested that the developers should think about what they really need in the application rather than what's currently popular when it comes to the database solution. Some of the criteria to consider when deciding on a data persistence solution are:
- Project requirements
- Distributed deployment
- Fault tolerance
- Query richness
- Schema evolution
- Extreme scalability
- Ability to enforce relationships
- ACID or BASE
- Key/value storage
It feels to me that poly-anything (poly-driven architecture anyone? where some parts are services, some events, some rpc, some async, etc) is the natural final step to what DDD preaches. Get a little empathy with your domain. Tease out the nuances of how it behaves and changes, and the implementation will begin to suggest itself. If it's a poly-featured domain, it probably requires a poly-featured solution.
Sure there's a parrot joke to be made here somewhere.
Re: Good Summary