MongoDB 2.6 Release - An Interview With Kelly Stirman
MongoDB needs no introduction for NoSQL users. Kelly Stirman, Director of Product Marketing at MongoDB is answering questions about the latest stable 2.6 release. Among other updates, we finally have more info about one of the most watched and voted feature requests at MongoDB jira tracker, collection level locking.
In MongoDB, storage fragmentation can result in unexpected latency when updates are forcing the engine to move a document in the BSON storage. Can you explain to us how 2.6 release can help alleviate this problem?
When a document is updated in MongoDB the data is updated in-place if there is sufficient space. If the size of the updated document is greater than the allocated space, then the document is re-written in a new location. MongoDB will eventually reuse the original space, but this can take time and space may be over-allocated.
In MongoDB 2.6 the default space allocation strategy will be powerOf2Sizes, an option that has been available since MongoDB 2.2. This setting will configure MongoDB to round up allocation sizes to the powers of 2 (e.g., 2, 4, 8, 16, 32, 64, etc). This setting reduces the chances of needing to move documents, and allows space to be reused more efficiently, resulting is much less over-allocation of space and more predictable performance. Users can still use the exact fit allocation strategy which is more space-efficient if the size of documents does not increase.
Indexing data can be of great help both with SQL and NoSQL based databases but can be a pain in scaling write performance sometimes. Can you explain to us how index intersection in 2.6 can reduce the number of required indexes?
As an example, consider a sales reporting application: A product manager wants to identify all customers who have ordered more than a given quantity of a specific part number. Using index intersection the existing indexes for part number and quantity can be combined (intersected) to optimize the query, rather than requiring a separate compound index. This also results in reduced overhead to the working set size, and more efficient updates.
Index intersection currently supports the intersection of two indexes and is best used when the cardinality of the result sets are roughly equivalent, and especially for those queries that can be resolved from covered indexes. In cases where multi-field predicates are known in advance, queries can be resolved more quickly with a compound index.
Index intersection will also improve the performance of some operations on a single index, which you might call “self-intersections.” When using operators such as in or all, MongoDB 2.6 may make multiple scans of the index then intersect the results. This can significantly reduce the number of complete documents that must be returned to resolve the query.
Orphaned documents can produce incorrect results for some queries. Could you expand on how 2.6 release can help us fix this?
Under normal circumstances, there will be no orphaned documents in your system. However, in some failure scenarios during chunk migrations orphaned documents may be left behind. The presence of orphaned documents can produce incorrect results for some queries. While orphaned documents are safe to delete, in versions prior to 2.6 there was no simple way to do so. In MongoDB 2.6 we implemented a new administrative command for sharded clusters:cleanupOrphaned(). This command removes orphaned documents from the shard in a single range of data. There is a nice blog from one of our support engineers on this subject.
MongoDB in enterprise is getting more and more common. How is MongoDB positioned in the NoSQL ecosystem with regards to enterprise adoption and what are the key features that 2.6 will improve upon?
MongoDB is widely used across many organizations, including 30 of the Fortune 100. We see organizations looking to standardize on a few database systems, and many are choosing MongoDB because it can be used for a wide variety of applications due to its flexible data model, rich indexing, scalability, and how it elevates the productivity of their development teams.
MongoDB 2.6 provides a number of security enhancements which are critical to the enterprise. These features include LDAP, x.509 and Kerberos authentication, SSL encryption, user-defined roles, auditing, and field-level security. IBM Guardium also offers integration with MongoDB, providing more extensive auditing abilities.
Another important trend related to the enterprise is the size of our ecosystem, which now includes over 400 partners. A number of these partners provide integrations to MongoDB, including Informatica, Microstrategy, QlikTech, Pentaho, Talend and many others.
Full text search has been a massively requested feature and even though 2.4 had an experimental implementation, committed by no other than Eliot himself, in 2.6 the $text operator is added. How mature is the full text search in 2.6 and how does it fare along with competition in NoSQL databases?
We worked closely with the community over the past year to test text search and to integrate its capabilities into other features. Moving out of beta, text search is now production-ready in MongoDB 2.6, and offers new functionality, including:
Integration with MongoDB’s query engine, so text search can be combined with general query operators to provide richer queries with the ability to limit, skip, sort and filter results. For example, a user could search a collection of blog posts for certain phrases, but limit the search to posts from the last seven days using an additional condition; Multi-language document support; Text search expressions can be used in the Aggregation Framework, providing deeper analytics with counting and grouping of text matches.
Other NoSQL vendors provide integrations to separate, dedicated search engines such as SOLR and Elastic Search. This approach adds complexity and cost to deployments, requires additional skills, and these indexes are inconsistent with the underlying data. We believe native text search offers easier deployments, lower costs, and the ability to leverage existing skills, while maintaining consistency with the underlying data. However, there are some features offered by dedicated search engines that are not available in MongoDB, and we provide similar integration options to these products as other NoSQL vendors. Users can choose from both options with MongoDB.
Probably the most requested feature is more fine-grained locking. What is your roadmap for going deeper than database level? What are the major blockers for going deeper than collection level locking?
It is important to remember that locks in MongoDB are much closer to latches in an RDBMS – they are very simple and usually held for 10 microseconds or less. The more advanced lock yielding algorithm introduced in MongoDB 2.2 significantly reduced the number of issues we see in the community related to lock contention. However, we realize there is still an opportunity to improve concurrency, including more granular locks.
MongoDB 2.8 will have document-level locking. We believe this will provide a more significant improvement to concurrency than collection-level locking, for a wider variety of applications. But finer-grained locking is only one part of improving concurrency, and there are other areas of the database that will be improved to provide greater concurrency overall. Some improvements exist in MongoDB 2.6 (see below), with much more to come in MongoDB 2.8.
Will 2.6 release help with locking issues otherwise?
Yes, a number of enhancements will contribute to improved concurrency. For example, index intersection reduces the number of indexes necessary for many applications, improving the scalability of writes. Much of the work we used to do inside locks is now performed outside the lock, such as parsing and _id generation. We’ve done a lot of work to improve oplog write concurrency, both by making each write faster and changing the locking around how this works. This work improves concurrency in 2.6, and it was required before more granular locking would be beneficial or these things would have immediately become the next bottleneck.