I'm trying to get the number of orders for different businesses
So, I have two tables:
Table1:
Order_Number | Order_Date | Store_ID | Customer_ID | Is_Valid |
---|---|---|---|---|
1 | 01-01-2020 | 55 | 25 | Valid |
2 | 24-05-2020 | 90 | 34 | Invalid |
3 | 03-09-2021 | 56 | 25 | Valid |
4 | 04-10-2021 | 210 | 100 | Invalid |
Table2:
Order_Number | Order_Date | Store_ID | Customer_ID | Is_Valid | Product | IIM |
---|---|---|---|---|---|---|
1 | 01-01-2020 | 55 | 25 | Valid | Coca-Cola | False |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | False |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | True |
1 | 01-01-2020 | 55 | 25 | Valid | some drink | True |
2 | 24-05-2020 | 90 | 34 | Invalid | some drink | False |
3 | 03-09-2021 | 56 | 25 | Valid | some drink | True |
3 | 03-09-2021 | 56 | 25 | Valid | some drink | True |
4 | 04-10-2021 | 210 | 100 | Invalid | some drink | True |
4 | 04-10-2021 | 210 | 100 | Invalid | some drink | False |
So table 2 is very similar to table 1, the difference is that in table 2, I have, for an order, a row created for each product that was in that order. Also, I have the IIM column, where if for an order there is at least one product where IIM = True, the entire order is considered IIM = True
Here comes my question/problem:
I want, in a single query, to get the number of orders where:
(Only taking into account valid orders)
To achieve this, I did this:
SELECT
yearmonth,
date_,
SUM(cant_store_25) AS cant_store_25,
SUM(cant_store_25_iim) AS cant_store_25_iim,
SUM(cant_store_plus) AS cant_store_plus,
SUM(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT
yearmonth,
date_,
order_,
MAX(cant_store_25) AS cant_store_25,
MAX(cant_store_25_iim) AS cant_store_25_iim,
MAX(cant_store_plus) AS cant_store_plus,
MAX(cant_store_plus_iim) AS cant_store_plus_iim
FROM
(SELECT DISTINCT
t1.Order_Number AS order_,
t1.Order_Date::DATE AS date_,
YEAR(t1.Order_Date::DATE) * 100 + MONTH(t1.Order_Date::DATE) AS yearmonth,
CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END AS cant_store_25,
CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END AS cant_store_25_iim,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END AS cant_store_plus,
CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END AS cant_store_plus_iim,
MAX(IIM) AS IIM_
FROM
Table1 t1
LEFT JOIN
Table2 t2 ON t1.Customer_ID = t2.Customer_ID AND t1.Order_Number = t2.Order_Number
WHERE
t1.Is_Valid = 'Valid'
GROUP BY
t1.Order_Number, date_, yearmonth, t1.Store_ID, IIM)
GROUP BY
1, 2, 3)
WHERE
yearmonth = 202208
GROUP BY
1, 2;
So is there a better way to achieve this? My database is small so this query works for me, but maybe I can improve several things to make it more efficient. Honestly, I'm new and I want to learn just in case in the future I have to work with larger databases :)
I don't know if it's relevant, but I use Snowflake
Thanks!
Difficult to check but the first query and subquery can be done with something simpler like the following:
SELECT t1.Order_Number AS order_,
t1.Order_Date::DATE AS date_,
MAX(CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END) AS cant_store_25,
MAX(CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END) AS cant_store_25_iim,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END) AS cant_store_plus,
MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END) AS cant_store_plus_iim
FROM Table1 t1
LEFT JOIN Table2 t2
ON t1.Customer_ID = t2.Customer_ID
AND t1.Order_Number = t2.Order_Number
WHERE t1.Is_Valid = 'Valid'
AND YEAR(t1.Order_Date::DATE) = 2022
AND MONTH(t1.Order_Date::DATE) = 8
GROUP BY t1.Order_Number
,date_
and then the outer one will just SUM
the data. The main idea is that you can calculate the conditions for each order earlies applying the MAX
function. The good part is we do not need a DISTINCT
as we are grouping.
Also, I have remove MAX(IIM) AS IIM_
as it is not used, and add the year-month filtering earlier. As you are filtering for particular year-month, there is no point to calculate all the data and then get only its.