More on Indexes in In-Memory OLTP
Indexes in SQL Server’s In-Memory OLTP don’t work exactly like normal indexes. This isn’t necessarily a bad thing, but the differences need to be kept in mind to avoid performance problems.
Memory-optimized non-clustered indexes differ from disc-based non-clustered indexes in that they are always covering. You don’t need to specify which columns you want to include, “all columns are virtually included” in addition to the actual index columns.
An interesting limitation of non-clustered indexes is that they can only be scanned in one direction. For example, if your index is "OrderDate ASC” then you cannot use the index to retrieve rows based on the order date in descending order.
Another new type of index is the memory-optimized hash. This type of index is designed for “point-lookup operations” and full scans. It cannot be used for ordered scans or inequality seek operations. Microsoft has these guidelines for choosing between non-clustered and hash indexes,
- If you need to perform only point lookups, meaning you need to retrieve only the rows corresponding to a single index key value, use a hash index.
- If you need to retrieve ranges of rows, or need to retrieve the rows in a particular sort-order, use a nonclustered index.
- If you need to do both, particularly if point lookups are frequent, you can consider creating two indexes: it is possible to create both a hash and a nonclustered index with the same index key.
Hash indexes also require the use of a bucket count. The bucket count should be set between N and 2N where N is the expected number of rows. Operationally a range of 0.2N and 5N is considered to be “usable”. Internally bucket counts are always rounded up to the next power of 2.
High than necessary bucket counts are wasteful for memory, which is a sensitive issue when talking about memory optimized tables that must fit entirely in RAM. A bucket count that is too low causes other problems,
If the bucket count is significantly (ten times) lower than the number of unique index keys, there will be many buckets that have multiple index keys. This degrades performance of most DML operations, particularly point lookups (lookups of individual index keys). For example, you may see poor performance of SELECT queries and UPDATE and DELETE operations with equality predicates matching the index key columns in the WHERE clause.
Another problem that can occur with hash indexes is duplicate values. If multiple rows have the same value for the index column(s) then naturally there is going to be a hash collision for those rows. If this happens a lot, say more than ten times per distinct value, then performance can suffer.
For hash indexes the recommendation from the SQL Server team is to convert the hash index into a non-clustered index.
In most cases you will want to use a NONCLUSTERED index instead, as NONCLUSTERED indexes generally perform better in case of duplicates. If you go for this option, consider uniquifying the index key, as indicated below.
For NONCLUSTERED indexes with a lot of duplicates, consider adding additional columns to the index key. For example, you can add the primary key columns to the index key to make it unique, in other words to uniquify the index.
If the values actually are distinct and you want to continue using a hash index, then you can “over-size the index” by using a bucket count that is “20 – 100 times the number of unique index key values” to reduce the chance of hash collisions.
Shane Hastie on Distributed Agile Teams, Product Ownership and the Agile Manifesto Translation Program
Shane Hastie Apr 17, 2015