Search code examples
sql-servert-sql

Optimizing SQL Server performance: merging budget and actuals tables after same CTE transformations and dimension joins


I have created a SQL query where I merge columns from a Budget table and columns from an Actuals table together using the same dimensions to join on customer, date and product. They need to undergo the same steps. If a product has been sold at least one time in that year by a customer than the yearmonth from 1 to 12 should be shown regardless if it has been sold or not in that particular yearmonth. This is the same for the budget.

Another requirement is that sometimes there is a budget for particular customer, product, yearmonth but no sales and it could be the other way around, or there is also both.

I created separate CTE_s, for the joins for Budget and joins for Actuals but they basically have the same steps. As I do it like that, it seems that my query is not performant at all. Now it is based on dummy data but with the actual data, I speak about less than 250.000 rows.

I know I can create indexes on my columns but I'm very sure it is possible to make this query easier but I don't have the experience.

This a dummy set of data:

DIM_CUSTOMER:

DECLARE @DIM_CUSTOMERS TABLE
                       (
                           [BusinessKey] INT, 
                           [CustomerCode] INT, 
                           [Customer] NVARCHAR(255), 
                           [Country] NVARCHAR(255)
                       )
 
INSERT INTO @DIM_CUSTOMERS
VALUES 
(10000, 1001, 'Kevin N.V.', 'Belgium'),
(10001, 1002, 'V.Z.W. Frederik', 'Belgium'),
(10002, 1003, 'Klaas S.A.', 'France'),
(10003, 1004, 'Nils N.V.', 'Belgium'),
(10004, 1005, 'Kevin N.V.', 'Belgium')
SELECT * FROM @DIM_CUSTOMERS

DIM_PRODUCTS:

DECLARE @DIM_PRODUCTS TABLE
                      (
                          [BusinessKey] INT, 
                          [Product] NVARCHAR(50), 
                          [ProductGroup] NVARCHAR(50)
                      )
 
INSERT INTO @DIM_PRODUCTS
VALUES
(9000, 'PH114', 'NXPLT'),
(9001, 'PH272', 'NXPLT'),
(9002, 'PH878', 'NXPLT'),
(9003, 'PH900', 'DSAKS')

DIM_DATES:

DECLARE @DIM_DATES TABLE
                   (
                       [BusinessKey] INT, 
                       [Year] INT, 
                       [Month] INT, 
                       [YearMonth] INT, 
                       [YearMonthText] NVARCHAR(20)
                   )

INSERT INTO @DIM_DATES
VALUES
(202201, 2022, 1, 202201, '2022.01'),
(202202, 2022, 2, 202202, '2022.02'),
(202203, 2022, 3, 202203, '2022.03'),
(202204, 2022, 4, 202204, '2022.04'),
(202205, 2022, 5, 202205, '2022.05'),
(202206, 2022, 6, 202206, '2022.06'),
(202207, 2022, 7, 202207, '2022.07'),
(202208, 2022, 8, 202208, '2022.08'),
(202209, 2022, 9, 202209, '2022.09'),
(202210, 2022, 10, 202210, '2022.10'),
(202211, 2022, 11, 202211, '2022.11'),
(202212, 2022, 12, 202212, '2022.12'),
(202301, 2023, 1, 202301, '2023.01'),
(202302, 2023, 2, 202302, '2023.02'),
(202303, 2023, 3, 202303, '2023.03'),
(202304, 2023, 4, 202304, '2023.04'),
(202305, 2023, 5, 202305, '2023.05'),
(202306, 2023, 6, 202306, '2023.06'),
(202307, 2023, 7, 202307, '2023.07'),
(202308, 2023, 8, 202308, '2023.08'),
(202309, 2023, 9, 202309, '2023.09'),
(202310, 2023, 10, 202310, '2023.10'),
(202311, 2023, 11, 202311, '2023.11'),
(202312, 2023, 12, 202312, '2023.12'),
(202401, 2024, 1, 202401, '2024.01'),
(202402, 2024, 2, 202402, '2024.02'),
(202403, 2024, 3, 202403, '2024.03'),
(202404, 2024, 4, 202404, '2024.04'),
(202405, 2024, 5, 202405, '2024.05'),
(202406, 2024, 6, 202406, '2024.06'),
(202407, 2024, 7, 202407, '2024.07'),
(202408, 2024, 8, 202408, '2024.08'),
(202409, 2024, 9, 202409, '2024.09'),
(202410, 2024, 10, 202410, '2024.10'),
(202411, 2024, 11, 202411, '2024.11'),
(202412, 2024, 12, 202412, '2024.12')

FACT_SALES:

DECLARE @FACT_SALES TABLE
                    (
                        [ID] INT, 
                        [FK_Product] INT, 
                        [FK_Customer] INT, 
                        [FK_Date] INT, 
                        [Sales] FLOAT
                    )

INSERT INTO @FACT_SALES
VALUES
(1, 9000, 10000, 202303, 90.48),
(2, 9000, 10000, 202304, 20.40),
(3, 9002, 10000, 202305, 250.85),
(4, 9002, 10000, 202303, 100.50),
(5, 9000, 10000, 202403, 38.40),
(6, 9000, 10000, 202406, 474.50),
(7, 9001, 10000, 202403, 128.60),
(8, 9001, 10000, 202404, 144.97),
(9, 9000, 10002, 202303, 199.60),
(10, 9001, 10002, 202302, 58.97),
(11, 9001, 10002, 202402, 40.88),
(12, 9001, 10000, 202203, 14.5),
(13, 9002, 10000, 202305, 50.60),
(14, 9001, 10003, 202402, 70.85),
(15, 9001, 10003, 202408, 150.45),
(16, 9000, 10003, 202304, 46.63),
(17, 9002, 10003, 202305, 10.12),
(18, 9002, 10003, 202405, 90.12),
(19, 9000, 10004, 202303, 54.98),
(20, 9000, 10004, 202304, 87.89),
(21, 9002, 10004, 202306, 77.88),
(22, 9000, 10000, 202304, 77.50)

FACT_BUDGET:

DECLARE @FACT_BUDGET TABLE
                     (
                         [ID] INT, 
                         [Product] NVARCHAR(20), 
                         [FK_CustomerCode] INT, 
                         [FK_Date] INT, 
                         [Budget] FLOAT
                     )

INSERT INTO @FACT_BUDGET
VALUES
(1, 'PH114', 1001, 202303, 50.5),
(2, 'PH114', 1002, 202403, 100.40),
(3, 'PH878', 1003, 202306, 250.85),
(4, 'PH114', 1003, 202301, 85.96)

My query attempt:

;WITH SALES_CUSTOMERS_PRODUCTS_YEARS_CTE AS
(
    SELECT DISTINCT dim_dates.[Year] AS [VolumesYear],
                    fact_sales.[FK_Product],
                    fact_sales.[FK_Customer]
    FROM @FACT_SALES fact_sales
    INNER JOIN @DIM_DATES dim_dates
        ON fact_sales.[FK_Date] = dim_dates.[BusinessKey]
), 
CUSTOMER_SALES_CTE AS
(
    SELECT sales_customers_products_years_cte.[FK_Customer],
           sales_customers_products_years_cte.[FK_Product],
           dim_dates.[Year],
           dim_dates.[Month],
           dim_dates.[YearMonthText],
           fact_sales.[Sales]
    FROM SALES_CUSTOMERS_PRODUCTS_YEARS_CTE sales_customers_products_years_cte
    INNER JOIN @DIM_DATES dim_dates
        ON sales_customers_products_years_cte.[VolumesYear] = dim_dates.[Year]
    LEFT OUTER JOIN @FACT_SALES fact_sales
        ON sales_customers_products_years_cte.[FK_Customer] = fact_sales.[FK_Customer]
        AND sales_customers_products_years_cte.[FK_Product] = fact_sales.[FK_Product]
        AND dim_dates.[BusinessKey] = fact_sales.[FK_Date]
), 
ACTUALS_CTE AS
(
    SELECT dim_customers.[BusinessKey],
           dim_products.[Product],
           customer_sales_cte.[YearMonthText],
           SUM(customer_sales_cte.[Sales]) AS [Sales]
    FROM @DIM_CUSTOMERS dim_customers
    LEFT OUTER JOIN CUSTOMER_SALES_CTE customer_sales_cte
        ON dim_customers.[BusinessKey] = customer_sales_cte.[FK_Customer]
    LEFT OUTER JOIN @DIM_PRODUCTS dim_products
        ON customer_sales_cte.[FK_Product] = dim_products.[BusinessKey]
    WHERE dim_products.[ProductGroup] = 'NXPLT'
    GROUP BY dim_customers.[BusinessKey],
             dim_products.[Product],
             customer_sales_cte.[YearMonthText],
             customer_sales_cte.[Year],
             customer_sales_cte.[Month]
), 
BUDGET_CUSTOMERS_PRODUCTS_YEARS_CTE AS
(
    SELECT DISTINCT dim_dates.[Year] AS [BudgetYear],
                    fact_budget.[Product],
                    fact_budget.[FK_CustomerCode]
    FROM @FACT_BUDGET fact_budget
    INNER JOIN @DIM_DATES dim_dates
        ON fact_budget.[FK_Date] = dim_dates.[BusinessKey]
), 
CUSTOMER_BUDGET_CTE AS
(
    SELECT budget_customers_products_years_cte.[FK_CustomerCode],
           budget_customers_products_years_cte.[Product],
           dim_dates.[Year],
           dim_dates.[Month],
           dim_dates.[YearMonthText],
           fact_budget.[Budget]
    FROM BUDGET_CUSTOMERS_PRODUCTS_YEARS_CTE budget_customers_products_years_cte
    INNER JOIN @DIM_DATES dim_dates
        ON budget_customers_products_years_cte.[BudgetYear] = dim_dates.[Year]
    LEFT OUTER JOIN @FACT_BUDGET fact_budget
        ON budget_customers_products_years_cte.FK_CustomerCode = fact_budget.[FK_CustomerCode]
        AND budget_customers_products_years_cte.Product = fact_budget.Product
        AND dim_dates.[BusinessKey] = fact_budget.[FK_Date]
), 
BUDGET_CTE AS
(
    SELECT dim_customers.[BusinessKey],
           dim_products.[Product],
           customer_budget_cte.[YearMonthText],
           SUM(customer_budget_cte.Budget) AS [Budget]
    FROM @DIM_CUSTOMERS dim_customers
    LEFT OUTER JOIN CUSTOMER_BUDGET_CTE customer_budget_cte
        ON dim_customers.[CustomerCode] = customer_budget_cte.[FK_CustomerCode]
    LEFT OUTER JOIN (
        SELECT DISTINCT [Product],
                        [ProductGroup]
        FROM @DIM_PRODUCTS
    ) dim_products
            ON customer_budget_cte.[Product] = dim_products.[Product]
    WHERE dim_products.[ProductGroup] = 'NXPLT'
    GROUP BY dim_customers.[BusinessKey],
             dim_products.[Product],
             customer_budget_cte.[YearMonthText],
             customer_budget_cte.[Year],
             customer_budget_cte.[Month]
)
SELECT 
    customer.[BusinessKey],
    [Product],
    [YearMonthText],
    ISNULL(MIN([Budget]), 0) AS [Budget],
    ISNULL(MIN([Sales]), 0) AS [Actuals]
FROM
    @DIM_CUSTOMERS customer
LEFT OUTER JOIN
    (SELECT 
         [BusinessKey],
         [Product],
         [YearMonthText],
         [Budget],
         NULL AS [Sales]
     FROM 
         BUDGET_CTE budget_cte
     UNION ALL
     SELECT 
         [BusinessKey],
         [Product],
         [YearMonthText],
         NULL AS [Budget],
         [Sales]
     FROM 
         ACTUALS_CTE actuals_cte) a ON customer.[BusinessKey] = a.[BusinessKey]
WHERE 
    a.[Product] IS NOT NULL
GROUP BY 
    customer.[BusinessKey], [Product], [YearMonthText];

Thank you!


Solution

  • I think you are on the right track. However, assuming that the FACT_BUDGET and FACT_SALES tables are sparsely populated relative to the customer/product/month combinations in the final report, I would suggest deferring any combining of budget/sales data with the generated date ranges as much as possible.

    My suggested order of operations is:

    1. Aggregate budget data by customer/product/month.
    2. Aggregate sales data by customer/product/month.
    3. Combine aggregated budget and sales data using a full outer join.
    4. Extract a distinct combinations of customer/product/year from the combined aggregated budget and sales data.
    5. Expand the customer/product/year combination to include all 12 months.
    6. Left join the generated customer/product/month data with the combined aggregated budget and sales data.

    Something like:

    WITH CTE_SumBudget AS (
        SELECT
            P.BusinessKey AS ProductKey,
            C.BusinessKey AS CustomerKey,
            B.FK_Date AS DateKey,
            SUM(B.Budget) AS Budget
        FROM FACT_BUDGET B
        JOIN DIM_PRODUCTS P ON P.Product = B.Product
        JOIN DIM_CUSTOMERS C ON C.CustomerCode = B.FK_CustomerCode
        --WHERE P.ProductGroup = 'NXPLT'
        GROUP BY 
            P.BusinessKey,
            C.BusinessKey,
            B.FK_Date
    ),
    CTE_SumSales AS (
        SELECT
            S.FK_Product AS ProductKey,
            S.FK_Customer AS CustomerKey,
            S.FK_Date AS DateKey,
            SUM(S.Sales) AS Sales
        FROM FACT_SALES S
        --JOIN DIM_PRODUCTS P ON P.BusinessKey = B.FK_Product
        --WHERE P.ProductGroup = 'NXPLT'
        GROUP BY 
            S.FK_Product,
            S.FK_Customer,
            S.FK_Date
    ),
    CTE_SumBudgetSales AS (
        SELECT
            COALESCE(SS.ProductKey, SB.ProductKey) AS ProductKey,
            COALESCE(SS.CustomerKey, SB.CustomerKey) AS CustomerKey,
            COALESCE(SS.DateKey, SB.DateKey) AS DateKey,
            SB.Budget,
            SS.Sales
        FROM CTE_SumBudget SB
        FULL OUTER JOIN CTE_SumSales SS
            ON SS.ProductKey = SB.ProductKey
            AND SS.CustomerKey = SB.CustomerKey
            AND SS.DateKey = SB.DateKey
      
    ),
    CTE_CustomerProductYears AS (
        SELECT DISTINCT
            SBS.ProductKey,
            SBS.CustomerKey,
            D.Year
        FROM CTE_SumBudgetSales SBS
        JOIN DIM_DATES D ON D.BusinessKey = SBS.DateKey
    ),
    CTE_CustomerProductYearMonths AS (
        SELECT
            CPY.ProductKey,
            CPY.CustomerKey,
            D.BusinessKey AS DateKey
        FROM CTE_CustomerProductYears CPY
        JOIN DIM_DATES D ON D.Year = CPY.Year
    )
    SELECT
        C.BusinessKey,
        P.Product,
        D.YearMonthText,
        ISNULL(SBS.Budget, 0) AS Budget,
        ISNULL(SBS.Sales, 0) AS Actuals
    FROM CTE_CustomerProductYearMonths CPYM
    JOIN DIM_PRODUCTS P ON P.BusinessKey = CPYM.ProductKey
    JOIN DIM_CUSTOMERS C ON C.BusinessKey = CPYM.CustomerKey
    JOIN DIM_DATES D ON D.BusinessKey = CPYM.DateKey
    LEFT JOIN CTE_SumBudgetSales SBS
        ON SBS.ProductKey = CPYM.ProductKey
        AND SBS.CustomerKey = CPYM.CustomerKey
        AND SBS.DateKey = CPYM.DateKey
    ORDER BY
        C.BusinessKey,
        P.Product,
        D.YearMonthText
    ;
    

    For the purposes of the above query, I have replaced the table variable references with normal table references. I am assuming the source tables contain typical primary key definitions, foreign key constrains, and appropriate indexes to support the join conditions.

    Partial results

    BusinessKey Product YearMonthText Budget Actuals
    10000 PH114 2023.01 0 0
    10000 PH114 2023.02 0 0
    10000 PH114 2023.03 50.5 90.48
    10000 PH114 2023.04 0 97.9
    10000 PH114 2023.05 0 0
    10000 PH114 2023.06 0 0
    10000 PH114 2023.07 0 0
    10000 PH114 2023.08 0 0
    10000 PH114 2023.09 0 0
    10000 PH114 2023.10 0 0
    10000 PH114 2023.11 0 0
    10000 PH114 2023.12 0 0
    10000 PH114 2024.01 0 0
    10000 PH114 2024.02 0 0
    10000 PH114 2024.03 0 38.4
    10000 PH114 2024.04 0 0
    10000 PH114 2024.05 0 0
    10000 PH114 2024.06 0 474.5
    10000 PH114 2024.07 0 0
    ... ... ... ... ...
    10000 PH272 2022.03 0 14.5
    ... ... ... ... ...
    10000 PH272 2024.03 0 128.6
    10000 PH272 2024.04 0 144.97
    ... ... ... ... ...
    10000 PH878 2023.03 0 100.5
    10000 PH878 2023.04 0 0
    10000 PH878 2023.05 0 301.45
    ... ... ... ... ...

    One thing I noted when looking at the execution plans was that SQL was executing two different variants of the CTE_SumBudgetSales portion of the query - once to get the full budget and sales data, and once just to extract just the customer/product/date information. As a possible alternative that avoids this, I split and modified the query to write the CTE_SumBudgetSales to a #temp table, and then reference that table in the logic that followed.

    -- Modification to the above that splits the query into two parts,
    -- storing intermediate data into a temporary table
    
    -- Aggregate budget and sales data into a temporary table
    IF OBJECT_ID('tempdb..#SumBudgetSales') IS NOT NULL
        DROP TABLE #SumBudgetSales;
    
    WITH CTE_SumBudget AS (
        SELECT
            P.BusinessKey AS ProductKey,
            C.BusinessKey AS CustomerKey,
            B.FK_Date AS DateKey,
            SUM(B.Budget) AS Budget
        FROM FACT_BUDGET B
        JOIN DIM_PRODUCTS P ON P.Product = B.Product
        JOIN DIM_CUSTOMERS C ON C.CustomerCode = B.FK_CustomerCode
        --WHERE P.ProductGroup = 'NXPLT'
        GROUP BY 
            P.BusinessKey,
            C.BusinessKey,
            B.FK_Date
    ),
    CTE_SumSales AS (
        SELECT
            S.FK_Product AS ProductKey,
            S.FK_Customer AS CustomerKey,
            S.FK_Date AS DateKey,
            SUM(S.Sales) AS Sales
        FROM FACT_SALES S
        --JOIN DIM_PRODUCTS P ON P.BusinessKey = B.FK_Product
        --WHERE P.ProductGroup = 'NXPLT'
        GROUP BY 
            S.FK_Product,
            S.FK_Customer,
            S.FK_Date
    ),
    CTE_SumBudgetSales AS (
        SELECT
            COALESCE(SS.ProductKey, SB.ProductKey) AS ProductKey,
            COALESCE(SS.CustomerKey, SB.CustomerKey) AS CustomerKey,
            COALESCE(SS.DateKey, SB.DateKey) AS DateKey,
            SB.Budget,
            SS.Sales
        FROM CTE_SumBudget SB
        FULL OUTER JOIN CTE_SumSales SS
            ON SS.ProductKey = SB.ProductKey
            AND SS.CustomerKey = SB.CustomerKey
            AND SS.DateKey = SB.DateKey
      
    )
    SELECT *
    INTO #SumBudgetSales
    FROM CTE_SumBudgetSales;
    
    -- Index to assist with final left-join
    CREATE INDEX #IX_SumBudgetSales
        ON #SumBudgetSales(ProductKey, CustomerKey, DateKey);
      
    -- Now use the temp table for the final query
    WITH CTE_CustomerProductYears AS (
        SELECT DISTINCT
            SBS.ProductKey,
            SBS.CustomerKey,
            D.Year
        FROM #SumBudgetSales SBS
        JOIN DIM_DATES D ON D.BusinessKey = SBS.DateKey
    ),
    CTE_CustomerProductYearMonths AS (
        SELECT
            CPY.ProductKey,
            CPY.CustomerKey,
            D.BusinessKey AS DateKey
        FROM CTE_CustomerProductYears CPY
        JOIN DIM_DATES D ON D.Year = CPY.Year
    )
    SELECT
        C.BusinessKey,
        P.Product,
        D.YearMonthText,
        ISNULL(SBS.Budget, 0) AS Budget,
        ISNULL(SBS.Sales, 0) AS Actuals
    FROM CTE_CustomerProductYearMonths CPYM
    JOIN DIM_PRODUCTS P ON P.BusinessKey = CPYM.ProductKey
    JOIN DIM_CUSTOMERS C ON C.BusinessKey = CPYM.CustomerKey
    JOIN DIM_DATES D ON D.BusinessKey = CPYM.DateKey
    LEFT JOIN #SumBudgetSales SBS
        ON SBS.ProductKey = CPYM.ProductKey
        AND SBS.CustomerKey = CPYM.CustomerKey
        AND SBS.DateKey = CPYM.DateKey
    ORDER BY
        C.BusinessKey,
        P.Product,
        D.YearMonthText
    ;
    
    DROP TABLE #SumBudgetSales;
    

    See this db<>fiddle for a demo that includes:

    1. Table definitions and data inserts.
    2. The proposed query.
    3. The modified (split) proposed query using a #temp table.
    4. The original query, modified to use ordinary tables.
    5. The original query, as posted using table variables

    The fiddle also includes execution plans for review.

    There is not much difference in the general complexity of the execution plans, perhaps a slight improvement with the proposed queries, if you count the number of table and index reference steps. (I did not perform an in-depth analysis.)

    I would call the results for this small sample data set inconclusive, and recommend that you run further tests against scaled-up data sets to determine what real-world performance looks like.

    One more note: Your posted query included the condition WHERE dim_products.[ProductGroup] = 'NXPLT', but none of your sample data is excluded by this condition. (There are no product 9003 references.) I left this condition commented out in my query. You can uncomment this or add other conditions as needed.