Would like some advice from this. I got a table where I want to keep track of an object and a list of keys related to the object. Example:
OBJECTID ITEMTYPE ITEMKEY
-------- -------- -------
1 1 THE
1 1 BROWN
1 2 APPLE
1 3 ORANGE
2 2 WINDOW
Both OBJECTID and ITEMKEY have high selectivity (i.e. the OBJECTID and ITEMKEY are very varied). My access are two ways:
By OBJECTID: Each time an object changes, the list of key changes so a key is needed based on OBJECTID. Changes happen frequently.
By ITEMKEY: This is for keyword searching and also happens frequently.
So I probably need two keys, and choose one for clustered index (the one that is more frequently accessed, or where I want the speed to be, for now lets assume i will prioritize OBJECTID for clustered). What I am confused about is how I should design it.
My questions is, which is better:
a) A Clustered index of (OBJECTID,ITEMTYPE,ITEMKEY), and then an index of (ITEMKEY). My concern is that since a clustered index is so big (2 ints, 1 string) the index will be big, because all index items got to point back to the clustered key.
b) Create a new column with a running identity DIRECTORYID (integer) as primary key and clustered index, and declare two index for (OBJECTID,ITEMTYPE,ITEMKEY) and just (ITEMKEY). This will minimize index space but have higher lookup costs.
c) A Clustered index of (OBJECTID,ITEMTYPE,ITEMKEY), and a materialized view of (ITEMKEY,ITEMTYPE,OBJECTID) on it. My logic is that this is avoids a key lookup and will still be just as big as the index with a lookup in a), at cost of higher overhead.
d) Err...maybe there is a better way given the requirements?
Thanks in advance, Andrew
If ever possible, try to keep your clustered key as small as possible, since it will be also added to all non-clustered indices on your table.
Therefore, I would use an INT if ever possible, or possibly a combination of two INT - but certainly never a VARCHAR
column - especially if that column is potentially wide (> 10 chars) and is bound to change.
So of the options you present, I personally would choose b) - why??
Adding a surrogate DirectoryID
will satisfy all crucial criteria for a clustering key:
and your other non-clustered indices will be minimally impacted.
See Kimberly Tripp's outstanding blog post on the main criteria for choosing a good clustering key on your SQL Server tables - very useful and enlightening!
To satisfy your query requirements, I would add two non-clustered indices, one on ObjectID
(possibly including other columns frequently needed), and another on ItemKey
to search by keyname.