Microsoft Introduces NoSQL Document Database for Microsoft Azure
DocumentDB is a schema-less, SSD-backed database that stores JSON documents and exposes a REST API. According to Microsoft, this Microsoft Azure-only service satisfies a new class of need while not abandoning some useful relational database concepts.
We heard from customers that they need a database that can keep pace with their rapidly evolving applications – something fast, flexible and scalable. Increasingly NoSQL databases are becoming the tool of choice for many developers but running and managing these databases can be costly, especially at scale. We also heard that customers wanted more of the capabilities inherent to relational database systems – rich queries and transactional processing are still important. Most data stores offer extreme choices to developers – strong or eventual consistency, schema-free with limited query capabilities or schematized and rich queries capabilities, transactions or scale and so on. The fact is that numerous real world scenarios exist between these extremes and we want to address them.
DocumentDB offers four distinct consistency levels for reads and queries - Strong, Bounded Staleness, Session, and Eventual. These well-defined consistency levels allow you to make sound tradeoffs between consistency, availability and latency. Bounded staleness guarantees both total ordering of writes as well as maximum staleness, a consistency level that is useful for applications dealing with time and ordered operations. Session consistency provides read your own write guarantees and can be a good match for user centric apps.
The document indexing and transaction features also echo capabilities found in a relational database. With regards to indexing:
As you add documents to a collection, DocumentDB automatically indexes them and they are available for you to query. Automatic indexing of documents without requiring schema or secondary indexes is a key capability of DocumentDB and is enabled by write-optimized, lock-free and log-structured index maintenance techniques.
There are a range of options when configuring DocumentDB indexes. While automatic indexing is enabled by default, it’s possible to turn off this behavior or fine tune it. Certain documents can be included or excluded from the index based on path or pattern, and index updates run either synchronously or asynchronously.
The service also delivers a handful of choices for applying server-side business logic. DocumentDB supports user-defined functions, stored procedures, and triggers.
Note that DocumentDB does not require any special JSON conventions to codify the relationships among various documents; the SQL query language of DocumentDB provides very powerful hierarchical and relational query operators to query and project documents without any special annotations or need to codify relationships among documents using distinguished properties.
But how does this database perform? GigaOm reported that DocumentDB uses the Hekaton in-memory engine from SQL Server 2014, and Microsoft Vice President Scott Guthrie revealed that this technology is already being heavily used at Microsoft.
Over the last year, we have used DocumentDB internally within Microsoft for several high-profile services. We now have DocumentDB databases that are each 100s of TBs in size, each processing millions of complex DocumentDB queries per day, with predictable performance of low single digit ms latency.
Where does this leave the other NoSQL offerings on Microsoft Azure? The company still offers its NoSQL key-value store named Azure Table Storage, and they recently launched a caching service based on Redis. What this means to MongoDB – who recently added a managed service on top of Microsoft Azure – remains to be seen. One Windows-based NoSQL provider has come out with sharp criticism of DocumentDB. Oren Eini, the creator of RavenDB, was underwhelmed by the product and listed his grievances in a blog post.
- No sorting option, or a good paging story
- SQL Injection, without any other alternative
- Hard to deploy and to keep current with your codebase
- Poor development story & no testing story
- Poor client API
- Lots of table scans
- Limited queries and few optimization options
- Single document transactions (from the client)
- No cross collection transactions at all
- Very small document sizes allowed