When I entered my current (employer's) company a new database schema was designed and will be the base of a lot of the future tools that are/will be created. With my limited SQL knowledge I think the table is rather well designed. My only concern is that almost every table has a multy-part primary key. Every table has at least a CustomerId and key of it's own. While these are indeed defining for a certain record, I have the feeling that multiple keys (we're talking quadruple here) are very inefficient.
Today I was seeing some unimaginable CPU usage over a simple, repeated query that joins two tables, selects a single string field from the first and distincts them.
select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
on f.id = b.fId
where b.cId = @id;
Checking the execution plan (I'm no EP Hero) I noticed that there are three major CPU points. The distinct (as expected) and two seeks over the indeces. I would personally think that the indices seek should be extremely fast, but they take up 18% of the cost each. Is this normal? Is it due to the (quadruple) clustered indexes?
--UPDATE--
The query is used for creating a Lucene index. It's a one-time processing that happens about weekly (sounds contradictive, I know). I can't re-use any results here as far as I see.
Could you please run the following queries and post their output:
SELECT COUNT(*), COUNT(DISTINCT fieldname)
FROM foo
SELECT COUNT(*), COUNT(DISTINCT cId), COUNT(DISTINCT fId)
FROM bar
This will help to estimate which indexes best suit your needs.
Meanwhile make sure you have the following indexes:
foo (FIELDNAME)
bar (cId, fId)
and rewrite your query:
SELECT DISTINCT(fieldname)
FROM foo f
WHERE EXISTS (
SELECT 1
FROM bar b
WHERE b.fId = f.id
AND b.cId = @id
)
This query should use an index on f.FIELDNAME
to build the DISTINCT
list and the index on bar
to filter out the non-existent values.