Data Warehousing in SQL Server 2014 RTM
SQL Server has been released to manufacturing with general availability set for April 1st. Through this week we’ll be covering various aspects of the new release starting with data warehousing. But first a note about pricing.
Microsoft is heavily pushing a batteries included pricing model. To understand why this is important to them, you have to compare it to the a la carte model used by Oracle. While Oracle starts out as cheaper per processor (10,000 to 23,000 USD), gaining access to all of its features increases the price to 47,500/processor. By contrast SQL Server Enterprise includes off of their on premise offerings at a flat rate of 27,000/processor.
Back in September we covered the new updatable column store tables and there isn’t anything really new in terms of features since then. We do, however, have more advice on how to use it.
First and foremost, make sure you have enough data to justify it. Each horizontal partition in a column store table can hold up to one million rows. So if you don’t have tens or hundreds of millions of rows in a single table then this is probably not the technology for you.
Another thing to consider is access patterns. In a column store, accessing an entire row is relatively expensive compared to a traditional row store. So again, you shouldn’t be using this technology unless you are working with queries that scan ranges.
The following illustration shows a logical view of a typical query. Notice how entire segments and columns can be eliminated, thus reducing the amount of I/O necessary to process the query.
In terms of performance, most workloads see a 5 to 10X speed on queries. In one benchmark, they showed a 91% savings in disc space over a traditional table with “customary indexes”.
It should be noted that while column store tables are considered to be part of their “in memory” offering, they use the buffer pool as a cache just like any other table. So unlike a memory optimized table, you don’t have to load the entire table into memory at startup.
Azure Based AlwaysOn Replicas
For a while now it has been possible, but difficult, to use Windows Azure VMs to host AlwaysOn Replicas. These read-only, synchronized version of the database can geographically dispersed in order to improve latency and throughput for data consumers much like a CDN is used for static website files. And because they are cloud based, one can theoretically spin up new servers during peak demand months rather than having onsite hardware that lies fallow during the off season.
The major change in SQL Server 2014 comes from the tooling. A wizard inside SQL Server Management Studio will guide administrators through the process of selecting a VM size and loading the appropriate encryption keys.
In order to encourage more use of this service, Microsoft is only charging for the VM, storage, and outbound traffic. Ingress traffic from the on premise SQL Server to the Azure Replica is free.
When setting up an AlwaysOn Replica, choosing the correct synchronization mode is really important. Asynchronously updating the replicas makes writes to the master database faster, but can result in reads of stale data. For analytics where one is looking over weeks, months, or years of data that shouldn’t be an issue. This makes it a good fit for hosting column store data.
Conversely, synchronous updates allow for accurate reporting of OLTP data. But if there is a significant amount of latency between the master and replica databases there could be serious performance problems. So you shouldn’t use this option for geographically dispersed replicas.
Hybrid Databases and Data Archiving
A new feature of SQL Server is the ability to create truly hybrid databases. While SQL Server itself runs either on premise or on an Azure VM, some or all of its data files are hosted in Windows Azure Storage.
With a sufficiently large buffer pool and a read-heavy workload, this feature is expected to offer good performance. But since latency can be an issue, this feature should mostly be used when you need to work with archival data. For example, if you want to run analytics over sales receipts for the last ten years but don’t want to take up expensive SAN space.
Windows Azure Express Route
One way to mitigate latency issues is by using Windows Azure Express Route. Rather than tunneling through the Internet with a VPN, this service offers a dedicated connection between an Azure data center and your local infrastructure.
If you choose to use an Exchange Provider you must choose either a 1 Gbps or a 10 Gbps tier. These cost 600 and 10,000 USD per month respectively. In addition there is a 0.070/GB fee for outbound traffic in excess of 15/250 TB per month.
Network Service Providers offer a different pricing model. It starts at 600 per month for 10 Mbps and ranges up to 12,000 per month for 1 Gbps. However, there is no charge for traffic in either direction.
Note that the prices on the website currently reflect a 50% preview discount.
InfoQ Sep 01, 2015