BT

Combine SQL Server with Hadoop Using PolyBase

| by Jonathan Allen Follow 576 Followers on Jun 02, 2016. Estimated reading time: 1 minute |

With the recently released SQL Server 2016, you can now use SQL queries against Hadoop and Azure blob storage. The technology, known as PolyBase, was previously available in SQL Server Parallel Data Warehouse.

As with linked servers, PolyBase attempts to offload as much of the processing as it can onto the source database. Then means that when executing a query against Hadoop or Azure blob storage, it will generate the appropriate map/reduce operations. This is known as a “pushdown” and it has limitations that the developer needs to be aware of:

SQL Server allows the following basic expressions and operators for predicate pushdown.

  • Binary comparison operators ( <, >, =, !=, <>, >=, <= ) for numeric, date, and time values.
  • Arithmetic operators ( +, -, *, /, % ).
  • Logical operators (AND, OR).
  • Unary operators (NOT, IS NULL, IS NOT NULL).

The documentation goes on to say that,

The operators BETWEEN, NOT, IN, and LIKE might be pushed-down. This depends on how the query optimizer rewrites them as a series of statements that use basic relational operators.

Pushdowns can be explicitly forced or disabled using OPTION (FORCE EXTERNALPUSHDOWN) and OPTION (DISABLE EXTERNALPUSHDOWN) respectively.

Combining Relational and Non-Relational Data

Since PolyBase is based on normal T-SQL, developers have access to the usual operations such as joins. This means you can execute a map-reduce operation against a Hadoop cluster, and then join the results to tables in your SQL Server database in a single query.

Import/Export

Moving data between SQL Server, Hadoop, and Azure blob storage no longer requires external ETL processes or tools. Instead, you use the same INSERT INTO/SELECT FROM syntax that you would use between any two normal tables. From a developer perspective, Hadoop and Azure blob storage are just two more table types to choose from when provisioning a database. (From an operations standpoint, you still need to manage security and backups separately.)

Rate this Article

Adoption Stage
Style

Hello stranger!

You need to Register an InfoQ account or or login to post comments. But there's so much more behind being registered.

Get the most out of the InfoQ experience.

Tell us what you think

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Broken link by Roberto Pepato

The link on "As with linked servers..." is broken

Re: Broken link by Charles Humble

Thanks for letting us know. Fixed.

Charles Humble
Head of editorial, InfoQ.com

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

Allowed html: a,b,br,blockquote,i,li,pre,u,ul,p

Email me replies to any of my messages in this thread

2 Discuss

Login to InfoQ to interact with what matters most to you.


Recover your password...

Follow

Follow your favorite topics and editors

Quick overview of most important highlights in the industry and on the site.

Like

More signal, less noise

Build your own feed by choosing topics you want to read about and editors you want to hear from.

Notifications

Stay up-to-date

Set up your notifications and don't miss out on content that matters to you

BT