I am designing a Members Table to store the users of a website. It will be used every time a user logs on to the website and occasionally accessed to update user details.
The users will log on with an email address and password and every account will have a unique email address. Therefore it seems logical that the Email column of the Members Table should be it's clustered index as the majority of queries on this table will be against the Email column as users log on. Making the Email column unique and the key to the clustered index should make querying user's data as they log on fast and improve performance.
But as I understand it, it would be wrong to make the Email column the Primary Key for two reasons. One, a Primary Key should be constant, so if a user decided to change their email address then all foreign keys would have to be updated and that would be bad. Secondly email addresses are strings which would make Joins slower than if the PK was an int.
So can I make a Non Clustered Index the Primary Key? So that the table has both a Clustered Index with Email as it's unique key, and an int primary key as a Non Clustered index on top?
Thanks, Duncan
Primary key is a logical database design and only has to be unique and non-NULL (which is implemented with an index).
In addition, you have a choice of a single clustered index, which should be narrow, unique, increasing and static (email is probably NOT good for this).
I would make an IDENTITY int primary key and cluster on that.
I would add a unique non-clustered index on email and "include" additional columns so that your most frequent heavy queries become covering (i.e. the password hash). Note that you should not need to add the clustered key to the included columns, since that is always included as the bookmark in the non-clustered index).
Look at the execution plans to ensure that you are not seeing any table scans or clustered index scans in the user table.
I would add that typically people think that seeing queries use a clustered index is a good thing. I would argue that a non-clustered index scan or seek used in a query where the indexes are covering is just as good on a heap (a table without a clustered index) as on a clustered index and better than a clustered index scan or seek. I would also argue that a clustered index is a name which leads people to all kinds of assumptions about things (to start with, it's not really an index on a table, it indicates that the table is stored completely in the index structure) and misconceptions about its importance. Clustered indexes are most important in very large operations where a large amount of data is needed in the order of clustering.
Real (read) query speed on typical OLTP queries comes from covering the query with the narrowest possible non-clustered indexes on all the tables in the query with every column in the appropriate order and correct sort direction for the query/parameters.