Search code examples
sqlclustered-index

Is this execution plan a motivation for re thinking my primary keys


When I entered my current (employer's) company a new database schema was designed and will be the base of a lot of the future tools that are/will be created. With my limited SQL knowledge I think the table is rather well designed. My only concern is that almost every table has a multy-part primary key. Every table has at least a CustomerId and key of it's own. While these are indeed defining for a certain record, I have the feeling that multiple keys (we're talking quadruple here) are very inefficient.

Today I was seeing some unimaginable CPU usage over a simple, repeated query that joins two tables, selects a single string field from the first and distincts them.

select distinct(f.FIELDNAME) as fieldName
from foo f
inner join bar b
   on f.id = b.fId
where b.cId = @id;

Checking the execution plan (I'm no EP Hero) I noticed that there are three major CPU points. The distinct (as expected) and two seeks over the indeces. I would personally think that the indices seek should be extremely fast, but they take up 18% of the cost each. Is this normal? Is it due to the (quadruple) clustered indexes?

--UPDATE--
The query is used for creating a Lucene index. It's a one-time processing that happens about weekly (sounds contradictive, I know). I can't re-use any results here as far as I see.


Solution

  • Could you please run the following queries and post their output:

    SELECT  COUNT(*), COUNT(DISTINCT fieldname)
    FROM    foo
    
    SELECT  COUNT(*), COUNT(DISTINCT cId), COUNT(DISTINCT fId)
    FROM    bar
    

    This will help to estimate which indexes best suit your needs.

    Meanwhile make sure you have the following indexes:

    foo (FIELDNAME)
    bar (cId, fId)
    

    and rewrite your query:

    SELECT  DISTINCT(fieldname)
    FROM    foo f
    WHERE   EXISTS (
            SELECT  1
            FROM    bar b
            WHERE   b.fId = f.id
                    AND b.cId = @id
            )
    

    This query should use an index on f.FIELDNAME to build the DISTINCT list and the index on bar to filter out the non-existent values.