BT

Exploring the Architecture of the NuoDB Database, Part 2

Posted by Seth Proctor on Jul 17, 2013 |

In Part 1 of this article we have introduced NuoDB and covered some of its main features: 3-tiered architecture, nodes are equal peers, Atoms - the fundamental data unit, and the versioning and concurrency system used to handle data update conflicts and implement consistency. 

Transactions in flight

Now that you’ve learned about the overall model, the internal structure and versioning, let’s walk through a few simple examples to help illustrate how it all works in practice. To keep things simple, consider a simple database with two TEs and one SM. From there it should be pretty simple to extrapolate to a more complicated configuration.

First off, consider how atoms get pulled into both TE caches. Suppose an Atom exists in the archive but isn’t currently in any cache. When a client connects to the first TE and starts a transaction that requires this Atom, the TE will use its catalog structure to discover that there’s no cached copy and go to the SM to fetch the object. At this point the catalog structure reflects that the Atom is in the first TE’s cache, and the TE assumes Chairmanship for the related data.

When a transaction starts on the second TE that needs data in the same Atom, the catalog shows that the Atom is available at both the SM and the first TE. Now the second TE is free to choose where to fetch the Atom but in practice is going to request it from the first TE because it will be faster to get it from an in-memory copy. At this point, the catalog is updated to show the Atom is in two caches.

Next, let’s look at what happens when there’s a change to data within that Atom. For instance, assume an update is made to a row contained in this Atom’s data. To make the change, the TE running this transaction needs to send two messages: a pending update to all peers that have a copy of the data (in this case the other TE and the SM) and a request to the Chairman for permission to make this change. If the update is happening on the first TE, then the second message is short-circuited because the local TE is the Chairman. Both messages are asynchronous so the transaction can proceed. If there’s no conflict, the Chairman will eventually respond that the update is allowed.

If two transactions are both trying to update the same data, then there’s conflict. Whichever transaction gets to the Chairman first with their request will “win.” The other transaction will eventually get a response that their update was denied. At this point the losing transaction will have to roll-back and try again.

Earlier I mentioned that NuoDB supports a flexible commit protocol. Here’s where it comes into play. To acknowledge back to a client that a commit succeeded, a TE must always ensure the A,C and I elements of ACID. Where it has flexibility is around Durability.

In the above example, update messages were sent asynchronously to all peers that have a copy of the object, which included the SM. While the message is sent over a reliable channel there’s no way to know that the SM survived long enough to make the data durable unless we wait to hear an acknowledgement. As part of the database’s configuration you can choose whether or not a TE must wait for that ack before reporting that commit succeeded. Because you can optionally run with journaling on, you actually have several versions of what you need to hear back from an SM.

So commit could mean that messages were sent to all peers, or that an SM responded that the change was journaled or written to the durable archive. Commit can also mean that at least some minimal number of SMs responded. This gives you flexibility in deciding how strong you want your durability guarantees in the face of failure versus how fast you want your distributed database to run.

The Administrative Layer

I said earlier that NuoDB has an administrative tier. This is what supports the on-demand scale-out and migration models that we’ve already walked through. Each host system has a local management agent running, and the agents form a peer-to-peer network separate from the database. We call this collection of connected hosts a Domain.

The single, logical management view through a Domain makes it easy to monitor, manage and automate database activity. Each of the agents is responsible for their local host, and a few will also run in the role of Broker which gives them global view into the Domain as a whole. All Brokers are equal in knowledge and ability, so as long as you have more than one running you always have access to administrative and automation interfaces. Through Brokers you can start, stop or migrate databases, change your configuration, monitor and capture logs from TEs and SMs and ensure that everything is running as expected.

Because a database is just a collection of running processes, it’s easy to run more than one database on a single host. In other words, the management layer supports multi-tenant deployments where each tenant is actually using their own database that is isolated at the process level, protected with its own secured channels and stores data in its own archives. How many databases can we put on a single system? We wrote all about it on our blog in the context of supporting dense deployments.

The Brokers are also what SQL clients use to find TEs when they’re connecting to a database. This lets the Brokers act as simple load-balancers that are aware of what’s happening in the Domain and can make smart decisions about where to connect clients.

Putting it All Together

Given all these features and coming back to my original comments about being cloud-scale, I think there are a few use-cases that NuoDB is uniquely suited to handle. First, not only does it support automated, on-demand scale-out but this same ability provides high-availability (HA). Not HA by pre-provisioning fail-over spares, but HA by being able to monitor for failures and very quickly bring new resources online. This kind of reactive HA means you don’t have to over-provision to stay ahead of failure but you still get good availability in the face of failure.

Second, NuoDB can provide a single, logical database view for databases that scale across data centers and geographically distributed regions. Obviously this helps with HA but it’s also critical for any application where users are physically separated and you want to get their applications and data closer to them. NuoDB supports geographic distribution because data tends to be localized relative to the clients that access it, so between simple load-balancing, on-demand caching and asynchronous replication between regions, the single database still keeps most coordination messages local to a given region.

Third, because MVCC gives NuoDB an efficient snapshot isolation mode, it’s very good at supporting operational and analytic workloads against the same data. In other words, long-running read-only queries can run without conflict against data that is frequently being updated. This helps simplify deployments that would otherwise be running multiple databases while trying to synchronize the data set.

Finally, NuoDB is built for automation. Whether it’s on-demand scale-out or efficiency through multi-tenancy, we treat databases as services that should be driven from simple SLAs and policies. That’s the direction that NuoDB development is headed.

What’s Next for NuoDB?

What I’ve tried to illustrate in this article is that NuoDB represents a new architecture, designed from the start to support scale-out, agile deployment and automation. It’s easy to provision new hosts and expand capacity, to automate expansion when there are load spikes or failures and scale databases across data centers. Coming back to where I started this article, I think that qualifies NuoDB as a cloud-scale database.

What we’re working on now is making that automation even easier. Through simple SLAs you’ll be able to define exactly what you need from a database. Whether you’re trying to do development on your laptop, manage internal databases for your company or stand up a service offering on the Internet, this will make working with NuoDB as a service simple.

At the same time we’re expanding what you can do with a database running while geographically distributed. This will give you flexibility in the face of failure and stronger guarantees about data availability. We’re also fleshing out the server-side programming model, parallelism at the storage layer and our APIs both for SQL and administrative clients. I hope you’ll give us a try, and let me know what you think!

About the Author

Seth Proctor is CTO of NuoDB, Inc. and has 15+ years of experience in the research, design and implementation of scalable systems. That experience includes work on networks, languages, operating systems, security, databases, and distributed environments; all of which are integral to the NuoDB product. Prior to NuoDB, Seth worked at Nokia on its internal private cloud architecture. Before that he was at Sun Microsystems working in the research labs and collaborating with product groups and universities. 

Seth holds eight patents for his cutting edge work across several technical disciplines. He has an additional five patents awaiting approval, all related to achieving greater database efficiency and end-user agility in NuoDB decentralized database deployments. You can contact him here.

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

NuoDB auto-management & adminstration by Michael Waclawiczek

Seth just posted an article on this topic at dev.nuodb.com/techblog/12-preview-automation.

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

1 Discuss

Educational Content

General Feedback
Bugs
Advertising
Editorial
InfoQ.com and all content copyright © 2006-2014 C4Media Inc. InfoQ.com hosted at Contegix, the best ISP we've ever worked with.
Privacy policy
BT