I have a table containing log entries for a single week for about a thousand web servers. Each server writes about 60,000 entries per day to the table, so there are 420,000 entries per week for each server. The table is truncated weekly. Each log entry contains the servername, which is a varchar (this cannot be changed).
The main operation is to select * from table where servername = 'particular'
, so as to retrieve the 420,000 records for a server, and a C# program then analyzes the data from that server once selected.
Should I create a clustered index on the servername column to speed up the read operation? (It currently takes over half an hour to execute the above SQL statement.)
Would partitioning help? The computer has only two physical drives.
The query is run for each server once per week. After the query is run for all servers, the table is truncated.
The "standard" ideal clustered key is something like an INT IDENTITY
that keeps increasing and is narrow.
However, if your primary use for this table is the listed query, then I think a clustered index on servername
makes sense. You will see a big increase in speed if the table is wide, since you will eliminate an expensive key/bookmark lookup that runs on a SELECT *
from a nonclustered index (unless you include
all the fields in the table).
EDIT:
KM pointed out this will slow down inserts, which is true. For this scenario you may want to consider a two-field key on servername, idfield
where idfield
is an INT Identity
. This would still allow access based only on servername
in your query but will insert new records at the end PER SERVER. You will still have fragmentation and reordering.