Search code examples
sqlsql-serverjoinlarge-data-volumes

Sql query with joins between four tables with millions of rows


We have a transact sql statement that queries 4 tables with millions of rows in each.

It takes several minutes, even though it has been optimized with indexes and statistics according to TuningAdvisor.

The structure of the query is like:

SELECT E.EmployeeName
    , SUM(M.Amount) AS TotalAmount
    , SUM(B.Amount) AS BudgetAmount
    , SUM(T.Hours) AS TotalHours
    , SUM(TB.Hours) AS BudgetHours
    , SUM(CASE WHEN T.Type = 'Waste' THEN T.Hours ELSE 0 END) AS WastedHours
FROM Employees E
LEFT JOIN MoneyTransactions M
    ON E.EmployeeID = M.EmployeeID
LEFT JOIN BudgetTransactions B
    ON E.EmployeeID = B.EmployeeID
LEFT JOIN TimeTransactions T
    ON E.EmployeeID = T.EmployeeID
LEFT JOIN TimeBudgetTransactions TB
    ON E.EmployeeID = TB.EmployeeID
GROUP BY E.EmployeeName

Since each transaction table contains millions of rows, I consider splitting it up into one query per transaction table, using table variables like @real, @budget, and @hours, and then joining these in a final SELECT. But in tests it seems to not speed up.

How would you deal with that in order to speed it up?


Solution

  • I'm not sure the query you posted will yield the results you're expecting.

    It will cross join all the dimension tables (MoneyTransactions etc.) and multiply all the results.

    Try this:

    SELECT  E.EmployeeName,
            (
            SELECT  SUM(amount)
            FROM    MoneyTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS TotalAmount,
            (
            SELECT  SUM(amount)
            FROM    BudgetTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS BudgetAmount,
            (
            SELECT  SUM(hours)
            FROM    TimeTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS TotalHours,
            (
            SELECT  SUM(hours)
            FROM    TimeBudgetTransactions m
            WHERE   M.EmployeeID = E.EmployeeID
            ) AS BudgetHours
    FROM    Employees E