InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

Scott Leberknight on Polyglot Persistence

Posted by Srini Penchikala on Jul 27, 2009

Sections
Architecture & Design,
Development,
Operations & Infrastructure
Topics
Persistence ,
Architecture ,
Data Access
Tags
No Fluff Just Stuff Symposiums ,
Data Storage ,
Polyglot Persistence ,
Presentations

The data persistence solutions in the enterprise software development have come a long way in the recent years. There are more and diverse choices in the database realm than using the relational database system as the default choice for persistence. At the recent Lone Star Software Symposium conference, Scott Leberknight gave a presentation on "Polyglot Persistence" where the developers have a choice of different database products like Amazon SimpleDB, Google Bigtable, Microsoft SQL Data Services (SDS) and CouchDB to choose the data persistence solution.

Back in December 2006, Neal Ford wrote about Polyglot Programming and predicted the wave of language choice we are now seeing in the industry to use the right language for the specific job at hand. Instead of assuming a "default" language like Java or C#, polyglot programming is all about using the right language for the job rather than just the right framework. The same description can be applied to persistence concern as well.

Scott said that the developers have had lot of choices in the areas of MVC and AJAX frameworks like Struts, Apache MyFaces, Spring MVC, Wicket, Rails, Grails, jQuery, dojo and Ext JS and now they have more choices when addressing the data persistence needs in enterprise applications. Polyglot persistence is all about considering your persistence requirements and selecting a persistence mechanism that best meets those requirements.

With the diversity of the functional and technical requirements in the enterprise applications, "One size does not fit all". The factors that are driving the innovation in the data persistence space are:

  • Scalability
  • High availability
  • Fault tolerance
  • Distributability
  • Flexibility (i.e. "schemaless" databases), and
  • New types of applications like social networking web sites

The types of data managed in the applications is very different as well. It can be either Structured (relational data), Semi-Structured (for example, documents in a medical records system) or Unstructured (audio/video stream). Different types of databases like Object oriented, Document oriented, Bigtable, Key value, and Entity Attribute value database systems have emerged to address different data persistence requirements.

He talked about the ACID (Atomic, Consistent, Isolated, and Durable) and BASE (Basically Available, Soft State which becomes Eventually Consistent) concepts which play a big role in transactions and availability of the database systems. ACID and BASE offer different persistence guarantees and have different tradeoffs. For example, in ACID systems we guarantee data consistency at the expense of availability, since for example in a two-phase commit example if even one transactional resource is down the system availability is zero. Whereas with BASE, we tradeoff immediate data consistency for high availability and partition tolerance. It is the problem context which determines whether one can or cannot sacrifice consistency to gain the benefits of higher availability, fault tolerance and redundancy.

Document-oriented databases examples include Lotus Notes, Apache CouchDB, Amazon SimpleDB, and ThruDB. Scott showed examples of using REST API to access a SimpleDB database and the Views in CouchDB to aggregate and report on the documents stored in the database.

Project Voldemort is another data persistence solution which is a distributed key-value storage system with features like automatic replication across multiple servers, transparent server failure handling and automatic data item versioning. It's used at LinkedIn for high-scalability storage problems where simple functional partitioning is not sufficient. Jay Kreps from LinkedIn will be speaking about Project Voldemort at the upcoming QCon conference.

He discussed other database products like Amazon's Dynamo which is a key-value based data storage system and Apache HBase which is an open source, distributed, column-oriented store modeled after the Google Bigtable implementation. There are more alternatives in the data persistence realm such as XML databases, Semantic Web/RDF, Triplestores, and Tuplespaces.

Most of the new data persistence systems offer REST based API in different languages like Java, C#, PHP, Visual Basic and Ruby. Scott suggested that the developers should think about what they really need in the application rather than what's currently popular when it comes to the database solution. Some of the criteria to consider when deciding on a data persistence solution are:

  • Project requirements
  • Distributed deployment
  • Fault tolerance
  • Query richness
  • Schema evolution
  • Extreme scalability
  • Ability to enforce relationships
  • ACID or BASE
  • Key/value storage

Srini Penchikala currently works as Security Architect and has 17 yrs of experience in software product management.

Good Summary by Julian Browne Posted
Re: Good Summary by Jean-Baptiste Potonnier Posted
  1. Back to top

    Good Summary

    by Julian Browne

    There's something quite comforting in round-ups like this. Summaries that don't overly evangelise any one approach. I'd add neo4j to the list because it has some of the properties of document and object stores but with interesting features like the ability to model attributes on the relationships themselves. And having just watched a presentation on Gigaspaces XAP 7.0 this morning, which has its roots in tuplespaces, it's pretty amazing what the state of the art looks like in commercial products.



    It feels to me that poly-anything (poly-driven architecture anyone? where some parts are services, some events, some rpc, some async, etc) is the natural final step to what DDD preaches. Get a little empathy with your domain. Tease out the nuances of how it behaves and changes, and the implementation will begin to suggest itself. If it's a poly-featured domain, it probably requires a poly-featured solution.



    Sure there's a parrot joke to be made here somewhere.

  2. Back to top

    Re: Good Summary

    by Jean-Baptiste Potonnier

    You should have a look at mongodb too!

Educational Content

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.

Beauty Is in the Eye of the Beholder

Alex Papadimoulis discusses ugly code, where it comes from, how to avoid it, and how to get rid of it.

Architecting Visa for Massive Scale and Continuous Innovation

John Davies examines Visa’s architecture and shows how enterprises have architected complex integrations incorporating Hadoop, memcached, Ruby on Rails, and others to deliver innovative solutions.

Max Protect: Scalability and Caching at ESPN.com

Sean Comerford unveils ESPN.com’s architecture, what components are used and why, and the current changes the website goes through.

The Seven Deadly Sins of Enterprise Agile Adoption

Are there repeated patterns of failure on Enterprise Agile Enablement efforts? Sanjiv and Arlen discuss Seven Deadly Sins to avoid when adopting Agile in an enterprise.

Questions for an Enterprise Architect

Erik Dörnenburg answers: What is Enterprise and Evolutionary Architecture?, discussing 4 issues: Turning strategy into execution, Ensuring conformance, Where do the architects sit? Buying or building?

Wrap Your SQL Head Around Riak MapReduce

Sean Cribbs explains what Map-Reduce and Riak are, why and how to use Map-Reduce with Riak, and how to convert SQL queries into their Map-Reduce equivalents.