Traditionally we know that:
- SQL databases are limited to a single machine but provide strong transactions, schemas, and querying;
- NoSQL databases abandon transactions and schemas in order to provide scale and fault tolerance.
FoundationDB's SQL Layer combines both: An open-source SQL database that is linearly scalable, fault tolerant, and has true ACID transactions. What used to be opposing sides of a divide are now available as a unified system.
This is important for many companies that have:
- New applications planning for massive scale;
- Existing applications hitting database scale;
- Many applications that should be consolidated to work against a single, fault-tolerant, virtualized database substrate.
In this article, I'll describe FoundationDB and explain how the FoundationDB SQL Layer maps SQL data to the FoundationDB Key-Value Store back end.
The NoSQL Database: FoundationDB’s Key-Value Store
FoundationDB is a distributed key-value store with global ACID transactions and high performance. In the setup of your system, you can define the level of data replication, which provides fault tolerance: when a server or part of the network breaks down, the database will keep operating and so will your application.
The Key-Value + SQL Architecture
We developed the architecture to support multiple layers on top of the Key-Value Store. Each of these layers can provide a different data model, such as SQL, Document, or Graph against the same FoundationDB back end. Many users also build their own custom layers.
The architecture diagram below highlights different key aspects. At the bottom is the FoundationDB cluster. It operates as a single logical database, regardless of the size of the cluster. The SQL Layer runs as a stateless intermediary on top of Key-Value Store. It communicates with the application using SQL, and uses the FoundationDB client API to communicate with the Key-Value Store. Because the SQL Layer is stateless, any number of SQL Layers can run in parallel.
The SQL Layer brings Google F1 capabilities to the Key-Value Store
The SQL Layer is a sophisticated translation layer between SQL and the key-value API. Starting with a SQL statement, it transforms it to the most efficient key-value execution, much as a compiler translates code to a lower-level execution format. It is compliant with the ANSI SQL 92 standard. Developers can leverage the product in combination with ORM’s, a REST API, or access it directly using the SQL Layer command line interface. From a codebase point of view, the SQL Layer is completely separated from the Key-Value Store. It communicates with the Key-Value Store using the FoundationDB Java bindings. For those interested, the open-source code is available on FoundationDB’s SQL Layer GitHub repo. Currently, the only comparable system is Google’s F1, a SQL engine build on top of their Spanner technology.
The SQL Layer is built from a set of components as shown in the simplified diagram below. Queries are sent from applications using one of the supported SQL clients. The statements are parsed and transformed into a tree of planning nodes. The optimizer computes the best execution plan and expresses it as a tree of operators that will be executed by the execution framework. In the execution phase, requests for data are sent to the storage abstraction layer, which uses the Java Key-Value API to transfer data to and from the FoundationDB cluster. The database modeled is stored in the Information Schema, which is used by the different components.
Mapping SQL Data to a Key-Value Store
The SQL Layer needs to manage two types of data, namely metadata of the information schema, which describes the tables present and the indexes available. Secondly, it needs to store actual data, including table content, indexes, and sequences. We will start with a description on how the data is stored into the Key-Value Store.
At a basic level, each key is a pointer to unique row in a table, and the value contains the data for that row. The assignment of keys is determined using the construct of a Table-Group, which is a grouping of one or more tables. This concept will later be discussed in more detail. The SQL Layer creates a directory for every Table-Group using the Key-Value Store Directory Layer, a tool to manage the keyspace for users. The Directory Layer assigns a short byte-array as a unique key for every individual directory, while maintaining separate metadata to facilitate look-up by name.
The example below demonstrates the mapping of the directories, with examples of keys that could be assigned for the given statements:
CREATE TABLE schema_a.table1(id INT PRIMARY KEY, c CHAR(10)); CREATE TABLE schema_a.table2(id INT PRIMARY KEY);
Some of the directories that are defined in the Key-Value Store are:
Directory |
Tuple |
Raw Key |
sql/ |
(9) |
\x15\x09 |
sql/data/ |
(3) |
\x15\x03 |
sql/data/table/ |
(31) |
\x15\x1F |
sql/data/table/schema_a/table1/ |
(215) |
\x15\xD7 |
sql/data/table/schema_a/table2/ |
(247) |
\x15\xF7 |
To store data, three types of formats are available, ‘Tuple’, Row_Data’, and ‘Protobuf’. When using the default ‘Tuple’ storage format, a row is stored as a single key-value pair. The key is a tuple formed by concatenating the directory prefix, the position of a table within the Table-Group, and the primary key. The value is a tuple formed from all of the columns in the row.
For example, the following inserts to the tables from the previous example would yield the given keys and values:
INSERT INTO schema_a.table1 VALUES (1, 'hello'), (2, 'world'); INSERT INTO schema_a.table2 VALUES (5);
Raw Key |
Tuple Key |
Raw Value |
Tuple Value |
\x15\xD7\x15\x01\x15\x01 |
(215, 1, 1) |
\x15\x01\x02hello\x00 |
(1, 'hello') |
\x15\xD7\x15\x01\x15\x02 |
(215, 1, 2) |
\x15\x02\x02world\x00 |
(2, 'world') |
\x15\xF7\x15\x01\x15\x05 |
(247, 1, 5) |
\x15\x05 |
(5) |
Knowing the structure of the keys in the Key-Value Store, you can read data from the store directly. This will be demonstrated using the FoundationDB Python API. In the SQL Layer, keys and values are encoded with the '.pack()' method, and can be decoded with the '.unpack()' method. The example below shows how you can retrieve and decode data.
import fdb fdb.api_version(200) db = fdb.open() directory = fdb.directory.open(db,('sql','data','table','schema_a','table1')) for key, value in db[directory.range()]: print fdb.tuple.unpack(key), ' --> ', fdb.tuple.unpack(value)
The code above prints something similar to:
(215, 1, 1) --> (1, u'hello') (215, 1, 2) --> (2, u'world')
Let's take a closer look at Table-Groups. A single table belongs to its own group. An additional table can be added to the group if it creates a ‘grouping foreign key’ that references the first table. When we add a grouping foreign key to a table, the child table is interleaved within the directory of the parent table. The table is now part of the Table-Group named after the root table. The data of two different tables are interleaved in the same directory. This enables fast range scans and makes access to an object and joins within the Table-Group almost free. To demonstrate the principle, we continue our example with the following SQL statements:
CREATE TABLE schema_a.table3(id INT PRIMARY KEY, id_1 INT, GROUPING FOREIGN KEY (id_1) REFERENCES schema_a.table1(id)); INSERT INTO schema_a.table3 VALUES (100, 2), (200, 2), (300, 1);
This yields the following results:
directory = fdb.directory.open(db,('sql','data','table','schema_a','table1')) for key, value in db[directory.range()]: print fdb.tuple.unpack(key), ' --> ', fdb.tuple.unpack(value) (215, 1, 1) --> (1, u'hello') (215, 1, 1, 2, 300) --> (300, 1) (215, 1, 2) --> (2, u'world') (215, 1, 2, 2, 100) --> (100, 2) (215, 1, 2, 2, 200) --> (200, 2)
The inserted rows of the third table are interleaved with the rows in the first table, as the keys of the third table are in the namespace range of the first table rows. The two extra values in the key refer to the position within the Table-Group and the primary key in the third table. Joins of table 1 and 3 on the reference key do not need standard join processing; a linear scan is all that is needed. This way of ordering yields a significant advantage over traditional relational database systems.
Indexes make practical use of the fact that keys have ordering. All table indexes only have a key value containing two parts. An index is located in the directory of the table it belongs to, in a subdirectory with the name of the index. This is the first part of the key tuple. The second part is a combination of the values of the columns on which the index is set, followed by the remaining values of the columns necessary to identify the row of the entry.
For example, we can add an index to our table on column c.
CREATE INDEX index_on_c ON schema_a.table1(c) STORAGE_FORMAT tuple;
Using Python to read the content of the index, we need to add the following in the Python interpreter:
directory = fdb.directory.open(db, ('sql', 'data', 'table', 'schema_a', 'table1', 'index_on_c')) for key, value in db[directory.range()]: print fdb.tuple.unpack(key), ' --> ', fdb.tuple.unpack(value)
This will print something similar to the output below. It shows the two parts of the key: the byte value of directory from the index, and the column c value on which the index was created, followed by its primary key value. This last part links the indexed value to a specific row. The value belonging to the index key is empty.
(20127, u'hello', 1) --> () (20127, u'world', 2) --> ()
To fine-tune the behavior of the SQL Layer, three storage formats are available: the described tuple format, a column keys format, and a protobuf format. The column keys format will create a separate key-value pair for every column value in a row. The protobuf storage format will generate a protobuf message for every row.
The metadata also needs to be stored and organized. The SQL Layer uses protobuf messages to communicate the structure of the SQL-based data. The structure is formed by the schemas, groups, tables, columns, indexes, foreign keys, etc.
Mixed-mode SQL and NoSQL
The SQL Layer can be combined with direct (application-level) use of the Key-Value API when the latter is read-only. The data is accessible via the Key-Value API but adding or modifying data in the key ranges used by the SQL Layer can easily corrupt the system. Some examples of problems that arise are: the lack of maintenance of indexes, lack of constraint enforcement and no verification of data and metadata versions. Generic benefits, even for performing reads, are not clear as the SQL Layer adds very little overhead. In general, performance costs are network latency dominated.
Conclusion
The marriage between SQL and NoSQL is clear example of a mutually beneficial relationship. The SQL layer profits from the scalability, fault tolerance and the global ACID transaction properties of the FoundationDB Key-Value Store, and likewise could your application, try it! For applications that need to execute a lot of small reads and writes of data, FoundationDB offers a scalable and secure solution, SQL or NoSQL.
About the Author
Sytze Harkema is a software engineer working for FoundationDB since March 2014. He is focused on developing the SQL Layer, making it the best storage solution for scalable SQL applications. Sytze studied at TU Delft (the Netherlands) and Harvard (USA).