Version 1.8 of the MongoDB document-oriented database engine was released March 16. Key changes include the addition of journaling, sharding performance boosts, and shell tab completion.
Journaling adds additional durability to MongoDB through write-ahead redo logs. When enabled, changes are written to the journal logs first. Periodic group commits (currently every 100ms) then handle playing these changes back on the real data. If the server shuts down safely, the journal logs are cleared. When the server starts up any existing journal logs are replayed. This ensures that any journal logs written but not replayed before a server crash will run before any user can connect. The 100ms window between commits is expected to shrink in future releases.
MongoDB is an example of a NoSQL database. Unlike a relational database such as SQL Server, the basic unit of data in MongoDB is the document. Similar to a JavaScript object, a document contains a series of Key-Value pairs with types such as string, object, array, regular expression, and code. These documents are stored in BSON format and grouped into collections (similar to SQL Server tables) by document type. Schema design is based on determining which documents deserve their own collection and which should be embedded into other collections. Embedded documents act like member objects within a class. In a relational system, you would use one table to store an order and a second foreign-keyed table to store order items. In MongoDB, the recommended approach for the same scenario is to have a single collection of orders, where each order stores an array of order items embedded inside it.
Horizontal scaling is handled via auto-sharding, allowing for ordered per-collection data distribution. Each shard is a group of machines set up as a replica set, meaning that each machine in the shard has a full copy of the shard’s data. Failover within a shard happens automatically. Directing a query to the appropriate shards is handled automatically, so applications do not need to keep track of which shard holds what data elements. The new replica-set authentication feature allows automatic authentication among replica-set members using a key file and the –keyfile option.
Covering indexes and sparse indexes are also new in this release. Covering indexes allow data storage within the index itself while sparse indexes exclude documents that do not contain the missing field. Covered indexes enhance performance when all fields requested by a query are contained within the covered index because there is no need to pull up the full document record. Sparse indexes boost performance when searching by a field that is often missing within a collection. Currently a sparse index can have only one field.
Some changes also occurred in MongoDB’s toolset. A discover mode (--discover) has been added to mongostat which will automatically pull stats back from the nodes in a cluster. High-level transaction log dumping and restoring is now provided via mongodump –oplog and mongorestore –oplogReplay.
For additional information on new features in this release, see the MongoDB 1.8 webinar.