The database:
Sales.Customers(CustomerID ,CustomerName)
Sales.Orders(OrderID, CustomerID)
Sales.OrderLines(OrderID, Quantity, UnitPrice)
Sales.Invoices(InvoiceID, OrderID, CustomerID)
Sales.InvoiceLines(InvoiceID, Quantity, UnitPrice)
The goal is to generate the following output:
column1: list CustomerID (only orders converted into invoices)
Column2: list of CustomerName
Column3: number of orders by customer ID (only orders converted into invoices)
Column4: number of invoices by customer ID
Column5: sum (Quantity*UnitPrice) of orders by CustomerID (only orders converted into invoices)
Column6: sum (Quantity*UnitPrice) of invoices by Customer ID
Column7: the difference between column 4 and column 5
I managed to join the first 4 tables and get the expected results
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
But once I joined the last one, COUNT columns were still correct but both SUM columns were not returning expected numbers (1.6x from the expected SUM).
The query returning incorrect SUM:
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(DISTINCT(il.Quantity * il.UnitPrice)) AS [TOTAL invoices]
, SUM(DISTINCT(ol.Quantity * ol.UnitPrice)) AS [TOTAL orders]
FROM Sales.Invoices AS i
LEFT JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
LEFT JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
LEFT JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
LEFT JOIN Sales. Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
ORDER BY c.CustomerID
I tried different types of joins, different orders, subqueries, and division queries... But always ended up with some errors or incorrect results. Any help would be highly appreciated!
To troubleshoot, remove all the group by and sums, pick one order (using a where clause) and actually look at the detail data.
This is all part of the basic problem solving process. You can't solve a problem looking at a summary you need to look at the detail.
Firstly, you have "double counting"
An order might have one invoice and three invoice lines. When you join those you get three records. That order might also have two order lines. When you join those three records to two order lines you get six (3x2) records. So now when you sum up lines or invoices you have too many records and the figure is overstated.
Secondly, this is definitely incorrect
SUM(DISTINCT(ol.Quantity * ol.UnitPrice))
If seven completely unrelated order lines have the same value. it'll throw out six of them. Which is definitely incorrect.
You can't answer the questions you need by just joining up all the tables.
You need to do a seperate subtable (or CTE) query that summarises order value by customerid then you can join on that.
-- this dataset is unique on customer id
-- so won't double count when joined to any other dataset
-- that is unique on customerid
with ordersummary as (
select o.customerid, SUM(ol.Quantity * ol.UnitPrice) as ordervalue
from Sales.Orders AS o
INNER JOIN Sales.OrderLines AS ol
ON ol.OrderID = o.OrderID
group by customerid
),
-- this dataset is also unique on customer id
-- so won't double count when joined to any other dataset
-- that is unique on customerid
invoicesummary as
(
Select
c.CustomerID
, c.CustomerName
, COUNT(DISTINCT(o.OrderID)) AS [number orders]
, COUNT(DISTINCT(i.InvoiceID)) AS [number invoices]
, SUM(il.Quantity * il.UnitPrice) AS [TOTAL invoices]
FROM Sales.Invoices AS i
INNER JOIN Sales.InvoiceLines AS il
ON il.InvoiceID = i.InvoiceID
INNER JOIN Sales.Orders AS o
ON o.orderID = i.OrderID
INNER JOIN Sales.Customers AS c
ON c.CustomerID = i.CustomerID
GROUP BY c.CustomerID, c.CustomerName
)
-- now we use our CTE's
-- and join on thoe unique customerid value. No double counting.
SELECT
invoicesummary.CustomerID
, invoicesummary.CustomerName
, invoicesummary.[number orders]
, invoicesummary.[number invoices]
, invoicesummary.[TOTAL invoices]
, ordersummary.ordervalue
FROM invoicesummary
INNER JOIN ordersummary
ON ordersummary.customerid=invoicesummary.Customerid
ORDER BY invoicesummary.CustomerID