InfoQ

InfoQ

News

My Bookmarks

Login or Register to enable bookmarks for unlimited time.

The content has been bookmarked!

There was an error bookmarking this content! Please retry.

MongoDB Growing Up: Release 1.4 and Commercial Support by 10gen

Posted by Michael Hunger on Mar 31, 2010

Sections
Architecture & Design,
Operations & Infrastructure
Topics
NoSQL ,
Architecture ,
Database Design

Shortly after the 1.4 release of MongoDB (from "humongous") on March 25th, its creator Dwight Merriman (former CEO/CTO of DoubleClick) announced that 10gen, the company behind the open-source document database will offer commercial training and support for the product.

InfoQ took this opportunity to talk to Merriman about MongoDB, its features, applicability and place in the community of NoSQL databases. His answers are quoted in the appropriate sections of this article.

Introduction to MongoDB

MongoDB is a scalable, high-performance next-generation database. Data in MongoDB is stored as documents, which allow for representation of complex relationships, all within a single data object. Documents can be comprised of individual fields of primitive types, "embedded documents", or arrays of documents.

This flexibility allows a developer to model a large subset of problems in a manageable and flexible way without resorting to splitting data up into different tables. In cases where data is not optimally modeled as a single document, MongoDB has the concept of a "DBRef", which is a pointer from a field in a document to another document.

Retrieving and querying data from a MongoDB database is flexible - documents can be dynamically queried based on the main document, any field within the document, on any embedded document, or any document contained within an array. For adressing embedded documents a dot style notation is used.

Features

Written in C++, MongoDB features:

  • Document-oriented storage (the power and flexibility of JSON-like data schemas)
  • inner-objects, embedded arrays, geospatial information
  • Dynamic queries
  • Full index support, including secondary indexes
  • Query profiling
  • Fast, in-place updates
  • Efficient storage of binary data large objects (e.g. photos and videos)
  • Replication and fail-over support
  • Auto-sharding for cloud-level scalability (alpha)
  • MapReduce for complex aggregation
  • Commercial Support, Training, and Consulting

Origin and Intent

On the goal for of MongoDB, their blog states:

MongoDB was never designed nor intended to be a niche database for a small subset of problems, but a new type of database, that solves lots of real world problems for a large subset of the developer community.

The focus of the MongoDB project is to combine the best traits of the non-relational model, including high scalability, performance, and ease of development, with important features common in traditional databases that are useful in primary operational data stores.

MongoDB wasn't designed in a lab. We built MongoDB from our own experiences building large scale, high availability, robust systems.

MongoDB was first released to the public 16 months ago, on Nov 2nd 2009. The philosophy behind explains that although transactional semantics are reduced in favor of scalability and performance a more full featured approach than just a pure key-value store is needed for general adoption and widespread usage.

Relation to DDD

The document paradigm is an interesting approach for persisting complex object structures. Especially the aggregates that are proposed by Domain Driven Design (DDD) where only the root entity can be linked to from other entities and the dependend entities and values are only accessible through the root. A MongoDB based Repository could be a simple approach to provide persistence in projects based on DDD. Another related notion is the fact that business domains often speak about documents when relating to business entities. So perhaps also using a document as representation internally makes a better fit than other datastructures or objects themselves.

Still with schema less document databases, data modelling is still important. There are several aspects of relationships that have to be considered carefully before creating documents that would otherwise lead to data duplication, poor performance and other issues.

Example and Tutorials

For example, a blog post with its main article, comments, and votes on comments would be split into multiple tables in a relational database. In MongoDB, a blog post could be represented as a single document, with the comments and votes contained as arrays of documents within the main post document. This approach makes data more manageable, and reduces the necessity for 'JOIN's that impede performance and horizontal scalability in traditional relational databases.
> db.blogposts.save({ title : "My First Post", author: {name : "Jane", id :1},
  comments : [{ by: "Abe", text: "First" },
              { by : "Ada", text : "Good post" }]
})

> db.blogposts.find( { "author.name" : "Jane" } )

> db.blogposts.findOne({ title : "My First Post", "author.name": "Jane",
  comments : [{ by: "Abe", text: "First" },
              { by : "Ada", text : "Good post" } ]
})
> db.blogposts.find( { "comments.by" : "Ada" } )

> db.blogposts.ensureIndex( { "comments.by" : 1 } );

You can try this example directly in the interactive MongoDB web console shell which also embeds the online tutorial.

Alex Popescu the CTO of InfoQ runs the myNoSQL site with many news, reviews and comparisons of NoSQL data stores (including MongoDB) see for instance his take on production notes.

Teach Me To Code published a 3 part screencast introducing various aspects of MongoDB.

Pivotallabs provides an introductory presentation by 10gen's Michael Dirolf as video and audio version. A presentation providing a quite complete view of MongoDB from Kyle Banker is also available at slideshare.

Installation and Integration

The database is published under the GNU AGPL v3.0 license, the drivers from mongodb.org are licensed under the Apache License v2.0. Its C++ sourcecode is available from github and can be built on any operating system.

It can also be installed as binary package for Linux, MacOS X, Windows and Solaris.

MongoDB itself runs as the mongod daemon process, the core database server, which is then accessed by the various drivers. Sharding support and database routing is provided by the mongos service.

There are integration efforts to support MongoDB in almost every programming languages. Its drivers are available for C, C++, C# & .NET, ColdFusion, Erlang, Factor, Java, Javascript, PHP, Python, Ruby, Perl and many more.

MongoDB is also supported in other frameworks, like the "blueprints"-connector libraries of gremlin, the graph database library.
It was integrated by Debasish Ghosh as one of the available persistence modules of the scalable actors framework Akka.

Operations and Scalability

Operationally, MongoDB can be run in two modes depending on the needs of the application. The first is 'single master' mode, where there is a single master server for all writes. Reads can be performed off of this database - or can done from any number of read slaves for read scalability (usage scenario: Sourceforge)

For applications where the volume of data or frequency of writes is too high to handle on a single master, MongoDB's auto-sharding mode (currently in alpha) can be used. In this mode, writes are automatically distributed among any number of 'shards' (a shard is simply a group of one or more MongoDB servers), each of which takes responsibility for writes and reads of portions of the dataset.

In either case, MongoDB takes a 'strong consistency' approach (you would consider MongoDB a C-P system in the CAP theorem). High availability is achieved by replicating data to multiple MongoDB nodes, any of which can take the responsibility as the master in a shard at a point in time - and MongoDB handles this failover automatically. This approach allows you to have strongly consistent characteristics, which are important for a number of use cases, while still maintaining a very high level of write availability.

The mongodb site contains an Admin Center to support operations requirements like:

Documentation, Support and Training

The MongoDB Documentation is available on the mongodb.org wiki (also as PDF) under a Creative Commons License.

10gen has designed MongoDB to solve real-world problems for a large subset of the application development community. In that light, we see (and as evidenced by customer deployments) MongoDB as the approach to data storage for a large proportion of database-backed applications.

Today, 10gen provides support, consulting, training, for clients who use MongoDB in their production applications. In the near future, cloud-based services (such as hosted MongoDB services), as well as advanced management tools for large MongoDB clusters will be available from 10gen.

Current Usage

Since version 1.3 MongoDB has been heavily used in production systems. Well known adaptors of the datastore are:

Of course there are many more usecases for the document store.

Future development

The MongoDB team's vision about the Datastore is very broad. They consider the current current 1.4 release to contain about half the intended features, which they will work on in the next year.

  • better replication: real time, replica sets, more options for data durability
  • production ready sharding
  • more features for working with embedded documents
  • flushing out more atomic update operators
  • single server durability
  • full text search
  • This article is part of a featured topic series on NoSQL

Related Sponsor

Neo4j is a robust, high-performance, scalable graph database. It is the only NOSQL database that solves the complex, connected data challenges that enterprises face today.

No comments

Watch Thread Reply

Educational Content

New-age Transactional Systems - Not Your Grandpa's OLTP

John Hugg discusses high volume transaction processing applications with high and low frequency profiles, and how VoltDB can be used for that purpose.

Cool Code

Kevlin Henney examines code samples to see what can be learned from them starting from the premise that one won’t write great code unless he knows how to read it.

Collaboration: At the Extremities of Extreme

Jason Ayers share the observations he made watching a team of developers collaborating in real time on the same code base, pushing XP, pair programming and continuous integration to their extremes.

Yesod Web Framework

Michael Snoyman presents Yesod, a web framework written in Haskell and containing a web server, templating, ORM, libraries (templating, gravatar, etc.).

Transactions without Transactions

Richard Kreuter and Kyle Banker on how to avoid classical RDBMS transactional systems by using compensation mechanisms, transactional messaging or transactional procedures.

Attila Szegedi on JVM and GC Performance Tuning at Twitter

Attila Szegedi talks about performance tuning Java and Scala programs at Twitter: how to approach GC problems, the importance of asynchronous I/O, when to use MySQL/Cassandra/Redis, and much more.

10 tips on how to prevent business value risk

One category of risk that project teams need to ensure they address is business value failure – delivering a product that fails to provide value for the business investor.

Interview: Software Systems Architecture: Working With Stakeholders Using Viewpoints and Perspectives

InfoQ spoke to the authors of Software Systems Architecture on a couple of new topics, the System Context viewpoint and Agile, which have been added to the second edition.