New-age Transactional Systems - Not Your Grandpa's OLTP
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
The content has been bookmarked!
There was an error bookmarking this content! Please retry.
Posted by Michael Hunger on Mar 31, 2010
Shortly after the 1.4 release of MongoDB (from "humongous") on March 25th, its creator Dwight Merriman (former CEO/CTO of DoubleClick) announced that 10gen, the company behind the open-source document database will offer commercial training and support for the product.
InfoQ took this opportunity to talk to Merriman about MongoDB, its features, applicability and place in the community of NoSQL databases. His answers are quoted in the appropriate sections of this article.
MongoDB is a scalable, high-performance next-generation database. Data in MongoDB is stored as documents, which allow for representation of complex relationships, all within a single data object. Documents can be comprised of individual fields of primitive types, "embedded documents", or arrays of documents.
This flexibility allows a developer to model a large subset of problems in a manageable and flexible way without resorting to splitting data up into different tables. In cases where data is not optimally modeled as a single document, MongoDB has the concept of a "DBRef", which is a pointer from a field in a document to another document.
Retrieving and querying data from a MongoDB database is flexible - documents can be dynamically queried based on the main document, any field within the document, on any embedded document, or any document contained within an array. For adressing embedded documents a dot style notation is used.
Written in C++, MongoDB features:
On the goal for of MongoDB, their blog states:
MongoDB was never designed nor intended to be a niche database for a small subset of problems, but a new type of database, that solves lots of real world problems for a large subset of the developer community.
The focus of the MongoDB project is to combine the best traits of the non-relational model, including high scalability, performance, and ease of development, with important features common in traditional databases that are useful in primary operational data stores.
MongoDB wasn't designed in a lab. We built MongoDB from our own experiences building large scale, high availability, robust systems.
MongoDB was first released to the public 16 months ago, on Nov 2nd 2009. The philosophy behind explains that although transactional semantics are reduced in favor of scalability and performance a more full featured approach than just a pure key-value store is needed for general adoption and widespread usage.
The document paradigm is an interesting approach for persisting complex object structures. Especially the aggregates that are proposed by Domain Driven Design (DDD) where only the root entity can be linked to from other entities and the dependend entities and values are only accessible through the root. A MongoDB based Repository could be a simple approach to provide persistence in projects based on DDD. Another related notion is the fact that business domains often speak about documents when relating to business entities. So perhaps also using a document as representation internally makes a better fit than other datastructures or objects themselves.
Still with schema less document databases, data modelling is still important. There are several aspects of relationships that have to be considered carefully before creating documents that would otherwise lead to data duplication, poor performance and other issues.
For example, a blog post with its main article, comments, and votes on comments would be split into multiple tables in a relational database. In MongoDB, a blog post could be represented as a single document, with the comments and votes contained as arrays of documents within the main post document. This approach makes data more manageable, and reduces the necessity for 'JOIN's that impede performance and horizontal scalability in traditional relational databases.
> db.blogposts.save({ title : "My First Post", author: {name : "Jane", id :1},
comments : [{ by: "Abe", text: "First" },
{ by : "Ada", text : "Good post" }]
})
> db.blogposts.find( { "author.name" : "Jane" } )
> db.blogposts.findOne({ title : "My First Post", "author.name": "Jane",
comments : [{ by: "Abe", text: "First" },
{ by : "Ada", text : "Good post" } ]
})
> db.blogposts.find( { "comments.by" : "Ada" } )
> db.blogposts.ensureIndex( { "comments.by" : 1 } );
You can try this example directly in the interactive MongoDB web console shell which also embeds the online tutorial.
Alex Popescu the CTO of InfoQ runs the myNoSQL site with many news, reviews and comparisons of NoSQL data stores (including MongoDB) see for instance his take on production notes.
Teach Me To Code published a 3 part screencast introducing various aspects of MongoDB.
Pivotallabs provides an introductory presentation by 10gen's Michael Dirolf as video and audio version. A presentation providing a quite complete view of MongoDB from Kyle Banker is also available at slideshare.
The database is published under the GNU AGPL v3.0 license, the drivers from mongodb.org are licensed under the Apache License v2.0. Its C++ sourcecode is available from github and can be built on any operating system.
It can also be installed as binary package for Linux, MacOS X, Windows and Solaris.
MongoDB itself runs as the mongod daemon process, the core database server, which is then accessed by the various drivers. Sharding support and database routing is provided by the mongos service.
There are integration efforts to support MongoDB in almost every programming languages. Its drivers are available for C, C++, C# & .NET, ColdFusion, Erlang, Factor, Java, Javascript, PHP, Python, Ruby, Perl and many more.
MongoDB is also supported in other frameworks, like the "blueprints"-connector libraries of gremlin, the graph database library.
It was integrated by Debasish Ghosh as one of the available persistence modules of the scalable actors framework Akka.
Operationally, MongoDB can be run in two modes depending on the needs of the application. The first is 'single master' mode, where there is a single master server for all writes. Reads can be performed off of this database - or can done from any number of read slaves for read scalability (usage scenario: Sourceforge)
For applications where the volume of data or frequency of writes is too high to handle on a single master, MongoDB's auto-sharding mode (currently in alpha) can be used. In this mode, writes are automatically distributed among any number of 'shards' (a shard is simply a group of one or more MongoDB servers), each of which takes responsibility for writes and reads of portions of the dataset.
In either case, MongoDB takes a 'strong consistency' approach (you would consider MongoDB a C-P system in the CAP theorem). High availability is achieved by replicating data to multiple MongoDB nodes, any of which can take the responsibility as the master in a shard at a point in time - and MongoDB handles this failover automatically. This approach allows you to have strongly consistent characteristics, which are important for a number of use cases, while still maintaining a very high level of write availability.
The mongodb site contains an Admin Center to support operations requirements like:
The MongoDB Documentation is available on the mongodb.org wiki (also as PDF) under a Creative Commons License.
10gen has designed MongoDB to solve real-world problems for a large subset of the application development community. In that light, we see (and as evidenced by customer deployments) MongoDB as the approach to data storage for a large proportion of database-backed applications.
Today, 10gen provides support, consulting, training, for clients who use MongoDB in their production applications. In the near future, cloud-based services (such as hosted MongoDB services), as well as advanced management tools for large MongoDB clusters will be available from 10gen.
Since version 1.3 MongoDB has been heavily used in production systems. Well known adaptors of the datastore are:
Of course there are many more usecases for the document store.
The MongoDB team's vision about the Datastore is very broad. They consider the current current 1.4 release to contain about half the intended features, which they will work on in the next year.
John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.
Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.
Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.
Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).
Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.
Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.
One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.
InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.
No comments
Watch Thread Reply