I've been trying to read up on the use of clustered indexes as a means of improving query performance.
Essentially, I have a 'Messages' table, which is used in a chat application. When the user opens a chat, we read the message history from the table based on the customer the user is speaking to.
The messages table is structured with the columns:
Id : CustomerId : Content
The query most often used on the table will be something like
SELECT * FROM Message WHERE CustomerId = @CustomerId
My question is, is the CustomerId column a suitable candidate for a clustered index? Also, given that the ID field is the primary key, will SQL Server still need to 'uniqueify' the clustered index?
The table will be heavy on both selects and inserts.
The clustered index does not need to be unique, so it is possible.
However, the issue is that each time a new message is inserted, SQL Server needs to find a space for the new row next to the other rows for the same customer. This can often be inefficient, because pages need to be split, resulting in many half-filled pages. And, things get even more complicated if you have deletes on the rows as well.
There are several options. In a busy database, you can leave room on the pages for additional inserts. Or, another option is to partition the table based on the customer id. It all depends.
Under most circumstances, an identity column on the messages table would be the primary key and the clustered key as well. An additional index on the customer table would be sufficient. But, there are definitely alternative structures that can work better in some scenarios.