VMware has recently released the first stable version of SQLFire , a distributed SQL database geared towards high availability and horizontal scalability.
On the surface SQLFire offers most features expected by a traditional relational database system including support for SQL queries and a JDBC driver (support for .NET is also provided). It can be deployed in embedded mode (inside a Java process) or with a client-server architecture. In theory SQLFire could be a drop-in replacement for any Java application that uses SQL for storage since the JDBC driver, the query engine and the network server for SQLFire are integrated directly from Apache Derby.
SQLFire however offers several additional features that are beyond the paradigm of a single monolithic database server. The central idea is that while the SQL language can cope with modern applications and is familiar to developers who like to use their existing tools, the centralized model where all data exists in a single location is not always scalable. Therefore SQLFire combines the traditional SQL interface with GemFire technology that is based on horizontal clustering. SQLFire is deployed on cluster groups with multiple hosts that can contain:
- Data store nodes that host data and can execute SQL statements
- Accessor nodes that can execute SQL statements but do not have data
- Locator nodes that do not have data and do not execute statements. These are used for cluster discovery
All nodes have single-hop access to other nodes. The exact cluster architecture is completely abstracted for the application code that connects via JDBC in SQLFire. Behind the scenes SQLFire offers the following:
- Replication of tables across the cluster
- Partitioning of tables across the cluster
- Parallel execution of queries (built-in Ma/R)
- Distributed transactions
- Optional persistence to the disk
- Ability to use an external RDBMS (e.g. MySQL) for storage
SQLFire introduces custom DDL extensions that activate these extra features. Here is an example where the Airlines table is replicated to the cluster nodes and the Flights table is partitioned according to a specific column:
CREATE TABLE AIRLINES ( AIRLINE CHAR(2) NOT NULL CONSTRAINT AIRLINES_PK PRIMARY KEY, AIRLINE_FULL VARCHAR(24), ECONOMY_SEATS INTEGER, BUSINESS_SEATS INTEGER, FIRSTCLASS_SEATS INTEGER ) REPLICATE; CREATE TABLE FLIGHTS ( FLIGHT_ID CHAR(6) NOT NULL , ORIG_AIRPORT CHAR(3), DEPART_TIME TIME, DEST_AIRPORT CHAR(3), ARRIVE_TIME TIME, MILES INTEGER, AIRCRAFT VARCHAR(6), CONSTRAINT FLIGHTS_PK PRIMARY KEY ( FLIGHT_ID) ) PARTITION BY COLUMN (FLIGHT_ID);
Note that both of these tables are memory only by default, since the PERSISTENT keyword offered by SQLFire was not used. There are other extensions for table co-location, server grouping, parallel procedures e.t.c. By selecting a combination of SQLFire features it can be deployed:
- as a stand alone SQL in-memory database
- as a stand alone SQL persistent database
- as a mixed distributed database (only a subset of tables is in memory/replicated/partitioned)
- as a memory cache (several eviction policies are available)
- as a cache for an existing RDBMS
The last scenario is interesting because it attempts to split the data of an RDBMS into two distinct categories. Data that is historical and does not need quick access since it is used for reference only purposes (this can be accessed by the RDBMS), and data that is transactional or holds current state (e.g. client sessions) which is accessed via SQLFire as an intermediate cache.
For more information see the reference guide and the blog. While many of its components are open-source, SQLFire itself is proprietary offered as a commercial solution. A trial version is available for evaluation purposes.