I have a database application in production, and all the tables use GUID primary keys which are currently set as the clustered indexes. I understand that this is a poor design due to performance considerations. I have been reading much on the topic, including this great article by Kimberly Tripp.
Can I improve the performance by simply creating an auto-incrementing index column of type INT
and setting it to be the clustered index? I understand from Kimberly's article that all non-clustered indexes (like my GUID primary keys going forward, if I do this) will reference the clustered index. But will this actually improve performance if I'm searching for a record using the GUID primary key in the WHERE
clause?
Also, will I have to populate the new column for existing records in the natural order of when the records were created in order to achieve a performance gain?
EDIT: To address whether this question is a duplicate of this other question: the other question is asking about the best practices in general regarding performance considerations for the use of a GUID primary key. No specific approaches are discussed. My question, on the other hand, is asking specifically whether adding an auto-incrementing index column of type INT
will help to ameliorate the issues with a GUID primary key. Furthermore, my question then asks whether I will have to populate the new column in their "natural order" to realize the benefits, which, again, is not addressed in the other question due to its higher level of generality.
There are a few things to consider:
Yes you're correct, the clustered index keys will be present in all nonclustered indexes. Having a smaller key will help with space savings on disk and in the buffer pool.
Having a clustered key of an identity will give you end of the table inserts and potentially (depending on load) make that an insertion hotspot. Where the GUIDS right now are random insert and will not give so much of a hotspot but will cause more page splits which may also adversely affect performance.
To answer the question of improving performance, what is your current problem area? Have any data that we can go off of? If you don't have any problems now, it may not be worth the changes.
When you add the column as an Identity it should seed itself and the order really shouldn't matter.
If you do use an INT column for the key, create a unique non-clustered index on the GUID column to let the optimizer know there will only be a single value (optimization) and to allow for a quick seek. Make it covering if it isn't too expensive.