Search code examples
sql-servertypesprimary-keysql-server-2008-express

How does the Data Type of an SQL table's PK impact query performance?


How does the Data Type of an SQL table's PK impact query performance?

Specifically, I am interested in:

  1. What is the difference between string datatypes (e.g. nvarchar(n), varchar(n)) and numeric datatypes (int, bigint, uniqueidentifier)?

  2. What is the difference between the different string data types?

  3. How does the maximum length of a string data type affect performance? Is there a specific varchar or nvarchar length at which the performance sharply declines?

  4. What is the difference between the different numeric data types?

  5. How do these variations impact:

    1. Equality comparison of Primary Keys?

    2. Joins on Primary Keys ?

    3. Updates by Primary Key ?

    4. Complex value comparisons by Primary Key (e.g. with LIKE on a varchar or <= on an int)?

  6. If there is a significant disparity between the different options, then, What measures can be taken to optimize performance with the slower data types?

  7. How does a composite PK compare to the other options?

Update: To be clear, I understand this is a long question and I am not asking to be spoon-fed all this information. An answer that provides links to reliable online resources where I can find this information is completely sufficient.

Update 2:

I am using SQL Server Express 2008.


Solution

  • I don't have any hard numbers - but from experience and from everything I have learned over the years, I would say:

    • try to use a fixed-length key - INT, BIGINT, CHAR(x) (for x <= 6 characters) - those tend to be easier to deal with, and give SQL Server less overhead to work with. Avoid larger VARCHAR values

    • since SQL Server has a limitation on 900 bytes for each index entry - don't even try to use a VARCHAR(MAX) or something outrageous like that.....

    • since the primary key in SQL Server is by default your clustering key, all those rules for the clustering key will apply. A good clustering key is:

      • narrow (4-8 bytes are perfect)
      • static (never or hardly ever changes)
      • unique (otherwise SQL Server will have to add a 4-byte uniqueifier .....)
      • ever-increasing (i.e. INT IDENTITY is perfect) to reduce the index and page fragmentation due to page splits in your index structures

    By far the best, most authoritative and most exhaustive resource on SQL Server indexing (and what kind of things to do and what to avoid) would be Kimberly Tripp's blog, especially her Indexes category. Great stuff !