BT

Facilitating the Spread of Knowledge and Innovation in Professional Software Development

Write for InfoQ

Topics

Choose your language

InfoQ Homepage News Simba Technologies is bringing the power of SQL to Cassandra, Hadoop, BigQuery, and MongoDB

Simba Technologies is bringing the power of SQL to Cassandra, Hadoop, BigQuery, and MongoDB

This item in japanese

SQL was created for a reason. At its heart, is a domain specific language designed to allow non-professional programmers to query databases. Sometimes referred to as “casual programmers”, these people include regular employees, business analysts, secretaries, and even the occasional corporate vice-president. When a company moves from a relational database to a NoSQL offering the need for ad hock reporting doesn’t go away, it just becomes harder.

Using the ODBC standard, Simba Technologies is bringing the power of SQL to Apache Cassandra, Apache Hadoop/Hive, Google BigQuery, and MongoDB. Recently we had a chance to talk with George Chow, CTO of Simba Technologies, about his company’s MongoDB driver.

InfoQ: To start, what can I do with MongoDB via your ODBC driver? Is this a read-only offering or can I modify data as well?

George Chow: Our driver is currently read-only. We have a long list of features on our backlog (which includes write-back) but we want partners and customers to help us set the priorities of these features.

InfoQ: MongoDB supports fairly complex records with a lot of nesting. As a SQL developer, what would these complex records look like to me? Or in other words, how are complex documents mapped to columns in my result set?

George Chow: The driver currently exposes complex document constituents (e.g., array or nested document) as a JSON-encoded VARCHAR. We are planning to decompose these into additional denormalized columns. For example, consider a simple document such as

{ column1: "somevalue",
  column2: "someothervalue",
  somearray3: [ { text: "name1", author: "...", description: "..." 
  { text: "name2"... }
  ...
  { text: "..."... }
  { text: "..."... }
  { text: "..."... } ]
}

The columns would be:

  • column1
  • column2
  • somearray3.text
  • somearray3.author
  • somearray3.description

InfoQ: Your way of decomposing documents into JSON-based varChar columns sounds like it will be easy enough to use. Is it already available or is this for a future release?

George Chow: The decomposition of documents info JSON-based varChar columns is on our feature backlog.

InfoQ: Are your ODBC drivers for MongoDB compatible with SQL Server's Linked Server feature? If so, can a single query span both SQL Server tables and MongoDB stores?

George Chow: We have built out the driver to provide directly access from common desktop apps such as Microsoft Excel and Tableau. We expect that any ODBC application can use the driver but to date we’ve focused our energy on these two apps. Our other Big Data/NoSQL drivers (such as that for Hadoop/Hive and Google BigQuery) have similar immediate coverage and beyond. Microsoft SQL Server's linked server feature is definitely doable though we haven't identified it as a priority. What's your perspective on linked server for NoSQL datasources? Do you see it as an important bridging technology?

InfoQ: I was envisioning was a transition from SQL Server or NoSQL. For example, say you have this stored procedure that normally calls into a table. With this link, you could change the proc to call into MongoDB without having to alter any of the applications that use it. I can't think of an easier way to transition from one backend to another.

I wasn't really thinking of using the driver for end users, but it makes a lot of sense. There aren't a whole lot of tools that allow end users to directly query a NoSQL style database. Can you provide an example of a query and what the results would look like in Excel?

George Chow: To give you an idea of how the driver works, consider this dataset:

I would define an ODBC DSN and also set the schema like so:

Finally, I can query it from Excel. For instance, Excel generates SELECT * FROM "music"."tracks" as the first query to any datasource.

Rate this Article

Adoption
Style

BT