Search code examples
sql-server-2005selectlarge-data-volumes

Select Count(*) over large amount of data


I want to do this for a Report but i have 20,000,000 of records in my table and it causes an TimeOut in my application.

SELECT
        T.transactionStatusID,
        TS.shortName AS TransactionStatusDefShortName,
        count(*) AS qtyTransactions
    FROM
        Transactions T 

    INNER JOIN TransactionTypesCurrencies TTC
                ON  T.id_Ent = TTC.id_Ent
                    AND T.trnTypeCurrencyID = TTC.trnTypeCurrencyID
            INNER JOIN TransactionStatusDef TS
                ON  T.id_Ent = TS.ent_Ent
                AND T.transactionStatusID = TS.ID
WHERE
    T.id_Ent = @id_Ent
GROUP BY
        T.transactionStatusID,
        TS.shortName

as far as i know COUNT(*) causes a full table scan and it makes my query to take too much time, im Using MS SQL 2005, any help ?

Edit:

The project leader tells me that the Query is just for one day it could help?


Solution

  • as far as i know COUNT(*) causes a full table scan and it makes my query to take too much time, im Using MS SQL 2005, any help ?

    COUNT(*) can use any source that is able to give the answer, this includes indexes.

    In your very case, I'd create a covering index on (id_ent, transactionStatusID) with trnTypeCurrencyID:

    CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID)
    

    and rewrite the query a little:

    SELECT  transactionStatusID, qtyTransactions, TS.shortName
    FROM    (
            SELECT  T.transactionStatusID,
                    COUNT(*) AS qtyTransactions
            FROM    Transactions T
            JOIN    TransactionTypesCurrencies TTC
            ON      TTC.id_Ent = T.id_Ent
                    AND TTC.trnTypeCurrencyID = T.trnTypeCurrencyID
            WHERE   T.id_Ent = @id_Ent
            GROUP BY
                    T.transactionStatusID
            ) TD
    JOIN    TransactionStatusDef TS
    ON      TS.ent_Ent = @id_Ent
            AND TS.ID = TD.transactionStatusID
    

    The index will filter on id_ent and parallelize on transactionStatusID. Since you have trnTypeCurrencyID covered, the engine will not have to lookup the value in the table, it's already present in the index.

    The GROUP BY clause also includes only the columns from the index so it parallelizes much better.

    Update:

    By adding WITH (ONLINE = ON) you can leave the table operational for the time the index is being created:

    CREATE INDEX ON Transactions (id_ent, transactionStatusID) INCLUDE (trnTypeCurrencyID) WITH (ONLINE = ON)