Search code examples
sqlclustered-indexsolid-state-drivenon-clustered-index

By how much do SSDs narrow the performance gap between clustered and non clustered indices?


Most SQL relational databases support the concept of a clustered index in a table. A clustered index, usually implemented as a B-tree, represents the actual records in a given table, physically ordered by that index on disk/storage. One advantage of this special clustered index is that after traversing the B-tree in search for a record or set of records, the actual data can be found immediately at the leaf nodes.

This stands in contrast to a non clustered index. A non clustered index exists outside the clustered index, and also orders the underlying data using one or more columns. But, the leaf nodes may not have data for all the columns needed in the query. In this case, the database has to do a disk seek to the original data to get this information.

In most database resources I have seen on Stack Overflow and elsewhere, this additional disk seek is viewed as a substantial performance penalty. My question is how would this analysis change assuming that all database files were stored on a solid state drive (SSD)?

From the Wikipedia page for SSDs, the random access time for SSDs is less than 0.1 ms, while random access times for mechanical hard disks are typically 10-100 times slower.

Do SSDs narrow the gap between clustered and non clustered indices, such that the former become less important for overall performance?


Solution

  • First, the additional disk seek is not really a "killer". This can be a big issue in high transaction environments where microseconds and milliseconds count. However, for longer running queries, it will make little difference.

    This is especially true if the database intelligently does "look ahead" disk seeks. Databases are often not waiting for data because another thread is predicting what pages will be needed and working on bringing those back. This is usually done by just taking the "next" pages on a sequential scan.

    SSDs are going to speed up pretty much all operations. They do change the optimization parameters. In particular, I think they are comparably fast on throughput (although I don't keep up with the technology specifically). Their big win is in latency -- the time from issuing the request for a disk block and the time when it is retrieved.

    In my experience (which is a few years old), the performance using SSD was comparable to an in-memory database for most operations.

    Whether this makes cluster indexes redundant is another matter. A key place where they are used is when you want to separate a related small amount of rows (say "undeleted") from a larger amount. By putting them in the same data pages, the clustered index reduces the overall number of rows being read -- it doesn't just make the reads faster.