Search code examples
sql-servercoldfusionsql-server-2000

Big Table Advice (SQL Server)


I'm experiencing massive slowness when accessing one of my tables and I need some re-factoring advice. Sorry if this is not the correct area for this sort of thing.

I'm working on a project that aims to report on server performance statistics for our internal servers. I'm processing windows performance logs every night (12 servers, 10 performance counters and logging every 15 seconds). I'm storing the data in a table as follows:

CREATE TABLE [dbo].[log](
    [id] [int] IDENTITY(1,1) NOT NULL,
    [logfile_id] [int] NOT NULL,
    [test_id] [int] NOT NULL,
    [timestamp] [datetime] NOT NULL,
    [value] [float] NOT NULL,
CONSTRAINT [PK_log] PRIMARY KEY CLUSTERED 
(
    [id] ASC
)WITH FILLFACTOR = 90 ON [PRIMARY]
) ON [PRIMARY]

There's currently 16,529,131 rows and it will keep on growing.

I access the data to produce reports and create graphs from coldfusion like so:

SET NOCOUNT ON

CREATE TABLE ##RowNumber ( RowNumber int IDENTITY (1, 1), log_id char(9) )

INSERT ##RowNumber (log_id)
SELECT l.id
FROM log l, logfile lf
WHERE lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#"
and l.timestamp >= #arguments.report_from#
and l.timestamp < #arguments.report_to#
and l.logfile_id = lf.id
order by l.timestamp asc

select rn.RowNumber, l.value, l.timestamp
from log l, logfile lf, ##RowNumber rn
where lf.server_id = #arguments.server_id#
and l.test_id = #arguments.test_id#
and l.logfile_id = lf.id
and rn.log_id = l.id
and ((rn.rownumber % #modu# = 0) or (rn.rownumber = 1)) 
order by l.timestamp asc

DROP TABLE ##RowNumber

SET NOCOUNT OFF

(for not CF devs #value# inserts value and ## maps to #)

I basically create a temporary table so that I can use the rownumber to select every x rows. In this way I'm only selecting the amount of rows I can display. this helps but it's still very slow.

SQL Server Management Studio tells me my index's are as follows (I have pretty much no knowledge about using index's properly):

IX_logfile_id (Non-Unique, Non-Clustered)
IX_test_id (Non-Unique, Non-Clustered)
IX_timestamp (Non-Unique, Non-Clustered)
PK_log (Clustered)

I would be very grateful to anyone who could give some advice that could help me speed things up a bit. I don't mind re-organising things and I have complete control of the project (perhaps not over the server hardware though).

Cheers (sorry for the long post)


Solution

  • Your problem is that you chose a bad clustered key. Nobody is ever interested in retrieving one particular log value by ID. I your system is like anything else I've seen, then all queries are going to ask for:

    • all counters for all servers over a range of dates
    • specific counter values over all servers for a range of dates
    • all counters for one server over a range of dates
    • specific counter for specific server over a range of dates

    Given the size of the table, all your non-clustered indexes are useless. They are all going to hit the index tipping point, guaranteed, so they might just as well not exists. I assume all your non-clustered index are defined as a simple index over the field in the name, with no include fields.

    I'm going to pretend I actually know your requirements. You must forget common sense about storage and actually duplicate all your data in every non-clustered index. Here is my advice:

    • Drop the clustered index on [id], is a as useless as is it gets.
    • Organize the table with a clustered index (logfile_it, test_id, timestamp).
    • Non-clusterd index on (test_id, logfile_id, timestamp) include (value)
    • NC index on (logfile_id, timestamp) include (value)
    • NC index on (test_id, timestamp) include (value)
    • NC index on (timestamp) include (value)
    • Add maintenance tasks to reorganize all indexes periodically as they are prone to fragmentation

    The clustered index covers the query 'history of specific counter value at a specific machine'. The non clustered indexes cover various other possible queries (all counters at a machine over time, specific counter across all machines over time etc).

    You notice I did not comment anything about your query script. That is because there isn't anything in the world you can do to make the queries run faster over the table structure you have.

    Now one thing you shouldn't do is actually implement my advice. I said I'm going to pretend I know your requirements. But I actually don't. I just gave an example of a possible structure. What you really should do is study the topic and figure out the correct index structure for your requirements:

    Also a google on 'covering index' will bring up a lot of good articles.

    And of course, at the end of the day storage is not free so you'll have to balance the requirement to have a non-clustered index on every possible combination with the need to keep the size of the database in check. Luckly you have a very small and narrow table, so duplicating it over many non-clustered index is no big deal. Also I wouldn't be concerned about insert performance, 120 counters at 15 seconds each means 8-9 inserts per second, which is nothing.