Microsoft Introduces NoSQL Document Database for Microsoft Azure

Microsoft may be synonymous with relational databases thanks to their flagship SQL Server product, but a new NoSQL offering looks to change that. Last week, Microsoft announced the preview of DocumentDB, a cloud-hosted managed document database with deep JavaScript support and features like automatic indexing and transactions.

DocumentDB is a schema-less, SSD-backed database that stores JSON documents and exposes a REST API. According to Microsoft, this Microsoft Azure-only service satisfies a new class of need while not abandoning some useful relational database concepts.

We heard from customers that they need a database that can keep pace with their rapidly evolving applications – something fast, flexible and scalable. Increasingly NoSQL databases are becoming the tool of choice for many developers but running and managing these databases can be costly, especially at scale. We also heard that customers wanted more of the capabilities inherent to relational database systems – rich queries and transactional processing are still important. Most data stores offer extreme choices to developers – strong or eventual consistency, schema-free with limited query capabilities or schematized and rich queries capabilities, transactions or scale and so on. The fact is that numerous real world scenarios exist between these extremes and we want to address them.

How does DocumentDB resemble traditional relational databases? Documents are queried using a lightweight SQL syntax that recognizes native JavaScript constructs like objects and arrays. Unlike many popular NoSQL databases that rely solely on eventually consistent data storage, DocumentDB exposes four different consistency models.

DocumentDB offers four distinct consistency levels for reads and queries - Strong, Bounded Staleness, Session, and Eventual. These well-defined consistency levels allow you to make sound tradeoffs between consistency, availability and latency. Bounded staleness guarantees both total ordering of writes as well as maximum staleness, a consistency level that is useful for applications dealing with time and ordered operations. Session consistency provides read your own write guarantees and can be a good match for user centric apps.

The document indexing and transaction features also echo capabilities found in a relational database. With regards to indexing:

As you add documents to a collection, DocumentDB automatically indexes them and they are available for you to query. Automatic indexing of documents without requiring schema or secondary indexes is a key capability of DocumentDB and is enabled by write-optimized, lock-free and log-structured index maintenance techniques.

There are a range of options when configuring DocumentDB indexes. While automatic indexing is enabled by default, it’s possible to turn off this behavior or fine tune it. Certain documents can be included or excluded from the index based on path or pattern, and index updates run either synchronously or asynchronously.

The service also delivers a handful of choices for applying server-side business logic. DocumentDB supports user-defined functions, stored procedures, and triggers.

By virtue of its deep commitment to JavaScript and JSON directly within the database engine, DocumentDB provides an intuitive programming model for executing JavaScript based application logic directly on the collections in terms of stored procedures and triggers. This allows for both, (a) efficient implementation of concurrency control, recovery, automatic indexing of the JSON object graphs directly in the database engine as well as, (b) naturally expressing control flow, variable scoping, assignment and integration of exception handling primitives with database transactions directly in terms of the JavaScript programming language.

…

DocumentDB implicitly wraps the JavaScript based stored procedures and triggers within an ambient ACID transactions with snapshot isolation across documents within a collection. During the course of its execution, if the JavaScript throws an exception, then the entire transaction is aborted.

…

Note that DocumentDB does not require any special JSON conventions to codify the relationships among various documents; the SQL query language of DocumentDB provides very powerful hierarchical and relational query operators to query and project documents without any special annotations or need to codify relationships among documents using distinguished properties.

But how does this database perform? GigaOm reported that DocumentDB uses the Hekaton in-memory engine from SQL Server 2014, and Microsoft Vice President Scott Guthrie revealed that this technology is already being heavily used at Microsoft.

Over the last year, we have used DocumentDB internally within Microsoft for several high-profile services. We now have DocumentDB databases that are each 100s of TBs in size, each processing millions of complex DocumentDB queries per day, with predictable performance of low single digit ms latency.

Where does this leave the other NoSQL offerings on Microsoft Azure? The company still offers its NoSQL key-value store named Azure Table Storage, and they recently launched a caching service based on Redis. What this means to MongoDB – who recently added a managed service on top of Microsoft Azure – remains to be seen. One Windows-based NoSQL provider has come out with sharp criticism of DocumentDB. Oren Eini, the creator of RavenDB, was underwhelmed by the product and listed his grievances in a blog post.

No sorting option, or a good paging story

SQL Injection, without any other alternative

Hard to deploy and to keep current with your codebase

Poor development story & no testing story

Poor client API

Lots of table scans

Limited queries and few optimization options

Single document transactions (from the client)

No cross collection transactions at all

Very small document sizes allowed

While DocumentDB was only released in a “preview” mode, Microsoft still shipped SDKs for .NET, Node.js, JavaScript and Python. The product prices are based on “capacity units” which are capable of 2,000 reads per second, 500 insert/replace/delete operations per second, and 10GB of storage. During preview, the price for a capacity unit is $0.73 per day, which represents a 50% discount off the expected commercial price. DocumentDB is currently available in Microsoft Azure’s US West, Europe North, and Europe West regions.

InfoQ Software Architects' Newsletter

Write for InfoQ

Rate this Article

This content is in the DevOps topic

Related Topics:

Related Editorial

Related Sponsors

Popular across InfoQ

The InfoQ Newsletter