I have created a SQL query where I merge columns from a Budget
table and columns from an Actuals
table together using the same dimensions to join on customer, date and product. They need to undergo the same steps. If a product has been sold at least one time in that year by a customer than the yearmonth from 1 to 12 should be shown regardless if it has been sold or not in that particular yearmonth. This is the same for the budget.
Another requirement is that sometimes there is a budget for particular customer, product, yearmonth but no sales and it could be the other way around, or there is also both.
I created separate CTE_s, for the joins for Budget
and joins for Actuals
but they basically have the same steps. As I do it like that, it seems that my query is not performant at all. Now it is based on dummy data but with the actual data, I speak about less than 250.000 rows.
I know I can create indexes on my columns but I'm very sure it is possible to make this query easier but I don't have the experience.
This a dummy set of data:
DIM_CUSTOMER
:
DECLARE @DIM_CUSTOMERS TABLE
(
[BusinessKey] INT,
[CustomerCode] INT,
[Customer] NVARCHAR(255),
[Country] NVARCHAR(255)
)
INSERT INTO @DIM_CUSTOMERS
VALUES
(10000, 1001, 'Kevin N.V.', 'Belgium'),
(10001, 1002, 'V.Z.W. Frederik', 'Belgium'),
(10002, 1003, 'Klaas S.A.', 'France'),
(10003, 1004, 'Nils N.V.', 'Belgium'),
(10004, 1005, 'Kevin N.V.', 'Belgium')
SELECT * FROM @DIM_CUSTOMERS
DIM_PRODUCTS
:
DECLARE @DIM_PRODUCTS TABLE
(
[BusinessKey] INT,
[Product] NVARCHAR(50),
[ProductGroup] NVARCHAR(50)
)
INSERT INTO @DIM_PRODUCTS
VALUES
(9000, 'PH114', 'NXPLT'),
(9001, 'PH272', 'NXPLT'),
(9002, 'PH878', 'NXPLT'),
(9003, 'PH900', 'DSAKS')
DIM_DATES
:
DECLARE @DIM_DATES TABLE
(
[BusinessKey] INT,
[Year] INT,
[Month] INT,
[YearMonth] INT,
[YearMonthText] NVARCHAR(20)
)
INSERT INTO @DIM_DATES
VALUES
(202201, 2022, 1, 202201, '2022.01'),
(202202, 2022, 2, 202202, '2022.02'),
(202203, 2022, 3, 202203, '2022.03'),
(202204, 2022, 4, 202204, '2022.04'),
(202205, 2022, 5, 202205, '2022.05'),
(202206, 2022, 6, 202206, '2022.06'),
(202207, 2022, 7, 202207, '2022.07'),
(202208, 2022, 8, 202208, '2022.08'),
(202209, 2022, 9, 202209, '2022.09'),
(202210, 2022, 10, 202210, '2022.10'),
(202211, 2022, 11, 202211, '2022.11'),
(202212, 2022, 12, 202212, '2022.12'),
(202301, 2023, 1, 202301, '2023.01'),
(202302, 2023, 2, 202302, '2023.02'),
(202303, 2023, 3, 202303, '2023.03'),
(202304, 2023, 4, 202304, '2023.04'),
(202305, 2023, 5, 202305, '2023.05'),
(202306, 2023, 6, 202306, '2023.06'),
(202307, 2023, 7, 202307, '2023.07'),
(202308, 2023, 8, 202308, '2023.08'),
(202309, 2023, 9, 202309, '2023.09'),
(202310, 2023, 10, 202310, '2023.10'),
(202311, 2023, 11, 202311, '2023.11'),
(202312, 2023, 12, 202312, '2023.12'),
(202401, 2024, 1, 202401, '2024.01'),
(202402, 2024, 2, 202402, '2024.02'),
(202403, 2024, 3, 202403, '2024.03'),
(202404, 2024, 4, 202404, '2024.04'),
(202405, 2024, 5, 202405, '2024.05'),
(202406, 2024, 6, 202406, '2024.06'),
(202407, 2024, 7, 202407, '2024.07'),
(202408, 2024, 8, 202408, '2024.08'),
(202409, 2024, 9, 202409, '2024.09'),
(202410, 2024, 10, 202410, '2024.10'),
(202411, 2024, 11, 202411, '2024.11'),
(202412, 2024, 12, 202412, '2024.12')
FACT_SALES
:
DECLARE @FACT_SALES TABLE
(
[ID] INT,
[FK_Product] INT,
[FK_Customer] INT,
[FK_Date] INT,
[Sales] FLOAT
)
INSERT INTO @FACT_SALES
VALUES
(1, 9000, 10000, 202303, 90.48),
(2, 9000, 10000, 202304, 20.40),
(3, 9002, 10000, 202305, 250.85),
(4, 9002, 10000, 202303, 100.50),
(5, 9000, 10000, 202403, 38.40),
(6, 9000, 10000, 202406, 474.50),
(7, 9001, 10000, 202403, 128.60),
(8, 9001, 10000, 202404, 144.97),
(9, 9000, 10002, 202303, 199.60),
(10, 9001, 10002, 202302, 58.97),
(11, 9001, 10002, 202402, 40.88),
(12, 9001, 10000, 202203, 14.5),
(13, 9002, 10000, 202305, 50.60),
(14, 9001, 10003, 202402, 70.85),
(15, 9001, 10003, 202408, 150.45),
(16, 9000, 10003, 202304, 46.63),
(17, 9002, 10003, 202305, 10.12),
(18, 9002, 10003, 202405, 90.12),
(19, 9000, 10004, 202303, 54.98),
(20, 9000, 10004, 202304, 87.89),
(21, 9002, 10004, 202306, 77.88),
(22, 9000, 10000, 202304, 77.50)
FACT_BUDGET
:
DECLARE @FACT_BUDGET TABLE
(
[ID] INT,
[Product] NVARCHAR(20),
[FK_CustomerCode] INT,
[FK_Date] INT,
[Budget] FLOAT
)
INSERT INTO @FACT_BUDGET
VALUES
(1, 'PH114', 1001, 202303, 50.5),
(2, 'PH114', 1002, 202403, 100.40),
(3, 'PH878', 1003, 202306, 250.85),
(4, 'PH114', 1003, 202301, 85.96)
My query attempt:
;WITH SALES_CUSTOMERS_PRODUCTS_YEARS_CTE AS
(
SELECT DISTINCT dim_dates.[Year] AS [VolumesYear],
fact_sales.[FK_Product],
fact_sales.[FK_Customer]
FROM @FACT_SALES fact_sales
INNER JOIN @DIM_DATES dim_dates
ON fact_sales.[FK_Date] = dim_dates.[BusinessKey]
),
CUSTOMER_SALES_CTE AS
(
SELECT sales_customers_products_years_cte.[FK_Customer],
sales_customers_products_years_cte.[FK_Product],
dim_dates.[Year],
dim_dates.[Month],
dim_dates.[YearMonthText],
fact_sales.[Sales]
FROM SALES_CUSTOMERS_PRODUCTS_YEARS_CTE sales_customers_products_years_cte
INNER JOIN @DIM_DATES dim_dates
ON sales_customers_products_years_cte.[VolumesYear] = dim_dates.[Year]
LEFT OUTER JOIN @FACT_SALES fact_sales
ON sales_customers_products_years_cte.[FK_Customer] = fact_sales.[FK_Customer]
AND sales_customers_products_years_cte.[FK_Product] = fact_sales.[FK_Product]
AND dim_dates.[BusinessKey] = fact_sales.[FK_Date]
),
ACTUALS_CTE AS
(
SELECT dim_customers.[BusinessKey],
dim_products.[Product],
customer_sales_cte.[YearMonthText],
SUM(customer_sales_cte.[Sales]) AS [Sales]
FROM @DIM_CUSTOMERS dim_customers
LEFT OUTER JOIN CUSTOMER_SALES_CTE customer_sales_cte
ON dim_customers.[BusinessKey] = customer_sales_cte.[FK_Customer]
LEFT OUTER JOIN @DIM_PRODUCTS dim_products
ON customer_sales_cte.[FK_Product] = dim_products.[BusinessKey]
WHERE dim_products.[ProductGroup] = 'NXPLT'
GROUP BY dim_customers.[BusinessKey],
dim_products.[Product],
customer_sales_cte.[YearMonthText],
customer_sales_cte.[Year],
customer_sales_cte.[Month]
),
BUDGET_CUSTOMERS_PRODUCTS_YEARS_CTE AS
(
SELECT DISTINCT dim_dates.[Year] AS [BudgetYear],
fact_budget.[Product],
fact_budget.[FK_CustomerCode]
FROM @FACT_BUDGET fact_budget
INNER JOIN @DIM_DATES dim_dates
ON fact_budget.[FK_Date] = dim_dates.[BusinessKey]
),
CUSTOMER_BUDGET_CTE AS
(
SELECT budget_customers_products_years_cte.[FK_CustomerCode],
budget_customers_products_years_cte.[Product],
dim_dates.[Year],
dim_dates.[Month],
dim_dates.[YearMonthText],
fact_budget.[Budget]
FROM BUDGET_CUSTOMERS_PRODUCTS_YEARS_CTE budget_customers_products_years_cte
INNER JOIN @DIM_DATES dim_dates
ON budget_customers_products_years_cte.[BudgetYear] = dim_dates.[Year]
LEFT OUTER JOIN @FACT_BUDGET fact_budget
ON budget_customers_products_years_cte.FK_CustomerCode = fact_budget.[FK_CustomerCode]
AND budget_customers_products_years_cte.Product = fact_budget.Product
AND dim_dates.[BusinessKey] = fact_budget.[FK_Date]
),
BUDGET_CTE AS
(
SELECT dim_customers.[BusinessKey],
dim_products.[Product],
customer_budget_cte.[YearMonthText],
SUM(customer_budget_cte.Budget) AS [Budget]
FROM @DIM_CUSTOMERS dim_customers
LEFT OUTER JOIN CUSTOMER_BUDGET_CTE customer_budget_cte
ON dim_customers.[CustomerCode] = customer_budget_cte.[FK_CustomerCode]
LEFT OUTER JOIN (
SELECT DISTINCT [Product],
[ProductGroup]
FROM @DIM_PRODUCTS
) dim_products
ON customer_budget_cte.[Product] = dim_products.[Product]
WHERE dim_products.[ProductGroup] = 'NXPLT'
GROUP BY dim_customers.[BusinessKey],
dim_products.[Product],
customer_budget_cte.[YearMonthText],
customer_budget_cte.[Year],
customer_budget_cte.[Month]
)
SELECT
customer.[BusinessKey],
[Product],
[YearMonthText],
ISNULL(MIN([Budget]), 0) AS [Budget],
ISNULL(MIN([Sales]), 0) AS [Actuals]
FROM
@DIM_CUSTOMERS customer
LEFT OUTER JOIN
(SELECT
[BusinessKey],
[Product],
[YearMonthText],
[Budget],
NULL AS [Sales]
FROM
BUDGET_CTE budget_cte
UNION ALL
SELECT
[BusinessKey],
[Product],
[YearMonthText],
NULL AS [Budget],
[Sales]
FROM
ACTUALS_CTE actuals_cte) a ON customer.[BusinessKey] = a.[BusinessKey]
WHERE
a.[Product] IS NOT NULL
GROUP BY
customer.[BusinessKey], [Product], [YearMonthText];
Thank you!
I think you are on the right track. However, assuming that the FACT_BUDGET
and FACT_SALES
tables are sparsely populated relative to the customer/product/month combinations in the final report, I would suggest deferring any combining of budget/sales data with the generated date ranges as much as possible.
My suggested order of operations is:
Something like:
WITH CTE_SumBudget AS (
SELECT
P.BusinessKey AS ProductKey,
C.BusinessKey AS CustomerKey,
B.FK_Date AS DateKey,
SUM(B.Budget) AS Budget
FROM FACT_BUDGET B
JOIN DIM_PRODUCTS P ON P.Product = B.Product
JOIN DIM_CUSTOMERS C ON C.CustomerCode = B.FK_CustomerCode
--WHERE P.ProductGroup = 'NXPLT'
GROUP BY
P.BusinessKey,
C.BusinessKey,
B.FK_Date
),
CTE_SumSales AS (
SELECT
S.FK_Product AS ProductKey,
S.FK_Customer AS CustomerKey,
S.FK_Date AS DateKey,
SUM(S.Sales) AS Sales
FROM FACT_SALES S
--JOIN DIM_PRODUCTS P ON P.BusinessKey = B.FK_Product
--WHERE P.ProductGroup = 'NXPLT'
GROUP BY
S.FK_Product,
S.FK_Customer,
S.FK_Date
),
CTE_SumBudgetSales AS (
SELECT
COALESCE(SS.ProductKey, SB.ProductKey) AS ProductKey,
COALESCE(SS.CustomerKey, SB.CustomerKey) AS CustomerKey,
COALESCE(SS.DateKey, SB.DateKey) AS DateKey,
SB.Budget,
SS.Sales
FROM CTE_SumBudget SB
FULL OUTER JOIN CTE_SumSales SS
ON SS.ProductKey = SB.ProductKey
AND SS.CustomerKey = SB.CustomerKey
AND SS.DateKey = SB.DateKey
),
CTE_CustomerProductYears AS (
SELECT DISTINCT
SBS.ProductKey,
SBS.CustomerKey,
D.Year
FROM CTE_SumBudgetSales SBS
JOIN DIM_DATES D ON D.BusinessKey = SBS.DateKey
),
CTE_CustomerProductYearMonths AS (
SELECT
CPY.ProductKey,
CPY.CustomerKey,
D.BusinessKey AS DateKey
FROM CTE_CustomerProductYears CPY
JOIN DIM_DATES D ON D.Year = CPY.Year
)
SELECT
C.BusinessKey,
P.Product,
D.YearMonthText,
ISNULL(SBS.Budget, 0) AS Budget,
ISNULL(SBS.Sales, 0) AS Actuals
FROM CTE_CustomerProductYearMonths CPYM
JOIN DIM_PRODUCTS P ON P.BusinessKey = CPYM.ProductKey
JOIN DIM_CUSTOMERS C ON C.BusinessKey = CPYM.CustomerKey
JOIN DIM_DATES D ON D.BusinessKey = CPYM.DateKey
LEFT JOIN CTE_SumBudgetSales SBS
ON SBS.ProductKey = CPYM.ProductKey
AND SBS.CustomerKey = CPYM.CustomerKey
AND SBS.DateKey = CPYM.DateKey
ORDER BY
C.BusinessKey,
P.Product,
D.YearMonthText
;
For the purposes of the above query, I have replaced the table variable references with normal table references. I am assuming the source tables contain typical primary key definitions, foreign key constrains, and appropriate indexes to support the join conditions.
Partial results
BusinessKey | Product | YearMonthText | Budget | Actuals |
---|---|---|---|---|
10000 | PH114 | 2023.01 | 0 | 0 |
10000 | PH114 | 2023.02 | 0 | 0 |
10000 | PH114 | 2023.03 | 50.5 | 90.48 |
10000 | PH114 | 2023.04 | 0 | 97.9 |
10000 | PH114 | 2023.05 | 0 | 0 |
10000 | PH114 | 2023.06 | 0 | 0 |
10000 | PH114 | 2023.07 | 0 | 0 |
10000 | PH114 | 2023.08 | 0 | 0 |
10000 | PH114 | 2023.09 | 0 | 0 |
10000 | PH114 | 2023.10 | 0 | 0 |
10000 | PH114 | 2023.11 | 0 | 0 |
10000 | PH114 | 2023.12 | 0 | 0 |
10000 | PH114 | 2024.01 | 0 | 0 |
10000 | PH114 | 2024.02 | 0 | 0 |
10000 | PH114 | 2024.03 | 0 | 38.4 |
10000 | PH114 | 2024.04 | 0 | 0 |
10000 | PH114 | 2024.05 | 0 | 0 |
10000 | PH114 | 2024.06 | 0 | 474.5 |
10000 | PH114 | 2024.07 | 0 | 0 |
... | ... | ... | ... | ... |
10000 | PH272 | 2022.03 | 0 | 14.5 |
... | ... | ... | ... | ... |
10000 | PH272 | 2024.03 | 0 | 128.6 |
10000 | PH272 | 2024.04 | 0 | 144.97 |
... | ... | ... | ... | ... |
10000 | PH878 | 2023.03 | 0 | 100.5 |
10000 | PH878 | 2023.04 | 0 | 0 |
10000 | PH878 | 2023.05 | 0 | 301.45 |
... | ... | ... | ... | ... |
One thing I noted when looking at the execution plans was that SQL was executing two different variants of the CTE_SumBudgetSales
portion of the query - once to get the full budget and sales data, and once just to extract just the customer/product/date information. As a possible alternative that avoids this, I split and modified the query to write the CTE_SumBudgetSales
to a #temp table, and then reference that table in the logic that followed.
-- Modification to the above that splits the query into two parts,
-- storing intermediate data into a temporary table
-- Aggregate budget and sales data into a temporary table
IF OBJECT_ID('tempdb..#SumBudgetSales') IS NOT NULL
DROP TABLE #SumBudgetSales;
WITH CTE_SumBudget AS (
SELECT
P.BusinessKey AS ProductKey,
C.BusinessKey AS CustomerKey,
B.FK_Date AS DateKey,
SUM(B.Budget) AS Budget
FROM FACT_BUDGET B
JOIN DIM_PRODUCTS P ON P.Product = B.Product
JOIN DIM_CUSTOMERS C ON C.CustomerCode = B.FK_CustomerCode
--WHERE P.ProductGroup = 'NXPLT'
GROUP BY
P.BusinessKey,
C.BusinessKey,
B.FK_Date
),
CTE_SumSales AS (
SELECT
S.FK_Product AS ProductKey,
S.FK_Customer AS CustomerKey,
S.FK_Date AS DateKey,
SUM(S.Sales) AS Sales
FROM FACT_SALES S
--JOIN DIM_PRODUCTS P ON P.BusinessKey = B.FK_Product
--WHERE P.ProductGroup = 'NXPLT'
GROUP BY
S.FK_Product,
S.FK_Customer,
S.FK_Date
),
CTE_SumBudgetSales AS (
SELECT
COALESCE(SS.ProductKey, SB.ProductKey) AS ProductKey,
COALESCE(SS.CustomerKey, SB.CustomerKey) AS CustomerKey,
COALESCE(SS.DateKey, SB.DateKey) AS DateKey,
SB.Budget,
SS.Sales
FROM CTE_SumBudget SB
FULL OUTER JOIN CTE_SumSales SS
ON SS.ProductKey = SB.ProductKey
AND SS.CustomerKey = SB.CustomerKey
AND SS.DateKey = SB.DateKey
)
SELECT *
INTO #SumBudgetSales
FROM CTE_SumBudgetSales;
-- Index to assist with final left-join
CREATE INDEX #IX_SumBudgetSales
ON #SumBudgetSales(ProductKey, CustomerKey, DateKey);
-- Now use the temp table for the final query
WITH CTE_CustomerProductYears AS (
SELECT DISTINCT
SBS.ProductKey,
SBS.CustomerKey,
D.Year
FROM #SumBudgetSales SBS
JOIN DIM_DATES D ON D.BusinessKey = SBS.DateKey
),
CTE_CustomerProductYearMonths AS (
SELECT
CPY.ProductKey,
CPY.CustomerKey,
D.BusinessKey AS DateKey
FROM CTE_CustomerProductYears CPY
JOIN DIM_DATES D ON D.Year = CPY.Year
)
SELECT
C.BusinessKey,
P.Product,
D.YearMonthText,
ISNULL(SBS.Budget, 0) AS Budget,
ISNULL(SBS.Sales, 0) AS Actuals
FROM CTE_CustomerProductYearMonths CPYM
JOIN DIM_PRODUCTS P ON P.BusinessKey = CPYM.ProductKey
JOIN DIM_CUSTOMERS C ON C.BusinessKey = CPYM.CustomerKey
JOIN DIM_DATES D ON D.BusinessKey = CPYM.DateKey
LEFT JOIN #SumBudgetSales SBS
ON SBS.ProductKey = CPYM.ProductKey
AND SBS.CustomerKey = CPYM.CustomerKey
AND SBS.DateKey = CPYM.DateKey
ORDER BY
C.BusinessKey,
P.Product,
D.YearMonthText
;
DROP TABLE #SumBudgetSales;
See this db<>fiddle for a demo that includes:
The fiddle also includes execution plans for review.
There is not much difference in the general complexity of the execution plans, perhaps a slight improvement with the proposed queries, if you count the number of table and index reference steps. (I did not perform an in-depth analysis.)
I would call the results for this small sample data set inconclusive, and recommend that you run further tests against scaled-up data sets to determine what real-world performance looks like.
One more note: Your posted query included the condition WHERE dim_products.[ProductGroup] = 'NXPLT'
, but none of your sample data is excluded by this condition. (There are no product 9003 references.) I left this condition commented out in my query. You can uncomment this or add other conditions as needed.