Search code examples
sqlperformancesnowflake-cloud-data-platformquery-optimization

Try to optimize/improve my query, where I bring the number of orders and a column for each filter that I apply


I'm trying to get the number of orders for different businesses

So, I have two tables:

Table1:

Order_Number Order_Date Store_ID Customer_ID Is_Valid
1 01-01-2020 55 25 Valid
2 24-05-2020 90 34 Invalid
3 03-09-2021 56 25 Valid
4 04-10-2021 210 100 Invalid

Table2:

Order_Number Order_Date Store_ID Customer_ID Is_Valid Product IIM
1 01-01-2020 55 25 Valid Coca-Cola False
1 01-01-2020 55 25 Valid some drink False
1 01-01-2020 55 25 Valid some drink True
1 01-01-2020 55 25 Valid some drink True
2 24-05-2020 90 34 Invalid some drink False
3 03-09-2021 56 25 Valid some drink True
3 03-09-2021 56 25 Valid some drink True
4 04-10-2021 210 100 Invalid some drink True
4 04-10-2021 210 100 Invalid some drink False

So table 2 is very similar to table 1, the difference is that in table 2, I have, for an order, a row created for each product that was in that order. Also, I have the IIM column, where if for an order there is at least one product where IIM = True, the entire order is considered IIM = True

Here comes my question/problem:

I want, in a single query, to get the number of orders where:

  • store_id = 25
  • store_id = 25 and IIM = True
  • store_Id excluding store_Id 1,2,3
  • store_id excluding 1,2,3 and IIM = True

(Only taking into account valid orders)

To achieve this, I did this:

SELECT
    yearmonth,
    date_,
    SUM(cant_store_25) AS cant_store_25,
    SUM(cant_store_25_iim) AS cant_store_25_iim,
    SUM(cant_store_plus) AS cant_store_plus,
    SUM(cant_store_plus_iim) AS cant_store_plus_iim
FROM
    (SELECT
        yearmonth,
        date_,
        order_,
        MAX(cant_store_25) AS cant_store_25,
        MAX(cant_store_25_iim) AS cant_store_25_iim,
        MAX(cant_store_plus) AS cant_store_plus,
        MAX(cant_store_plus_iim) AS cant_store_plus_iim
    FROM
        (SELECT DISTINCT
            t1.Order_Number AS order_,
            t1.Order_Date::DATE AS date_,
            YEAR(t1.Order_Date::DATE) * 100 + MONTH(t1.Order_Date::DATE) AS yearmonth,
            CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END AS cant_store_25,
            CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END AS cant_store_25_iim,
            CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END AS cant_store_plus,
            CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END AS cant_store_plus_iim,
            MAX(IIM) AS IIM_
        FROM
            Table1 t1
        LEFT JOIN
            Table2 t2 ON t1.Customer_ID = t2.Customer_ID AND t1.Order_Number = t2.Order_Number
        WHERE
            t1.Is_Valid = 'Valid'
        GROUP BY
            t1.Order_Number, date_, yearmonth, t1.Store_ID, IIM)
    GROUP BY
        1, 2, 3)
WHERE
    yearmonth = 202208
GROUP BY
    1, 2;

So is there a better way to achieve this? My database is small so this query works for me, but maybe I can improve several things to make it more efficient. Honestly, I'm new and I want to learn just in case in the future I have to work with larger databases :)

I don't know if it's relevant, but I use Snowflake

Thanks!


Solution

  • Difficult to check but the first query and subquery can be done with something simpler like the following:

    SELECT t1.Order_Number AS order_,
           t1.Order_Date::DATE AS date_,
           MAX(CASE WHEN t1.Store_ID = 25 THEN 1 ELSE 0 END) AS cant_store_25,
           MAX(CASE WHEN t1.Store_ID = 25 AND IIM = TRUE THEN 1 ELSE 0 END) AS cant_store_25_iim,
           MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) THEN 1 ELSE 0 END) AS cant_store_plus,
           MAX(CASE WHEN t1.Store_ID NOT IN (1, 2, 3) AND IIM THEN 1 ELSE 0 END) AS cant_store_plus_iim
    FROM Table1 t1
    LEFT JOIN Table2 t2 
        ON t1.Customer_ID = t2.Customer_ID 
        AND t1.Order_Number = t2.Order_Number
    WHERE t1.Is_Valid = 'Valid'
        AND YEAR(t1.Order_Date::DATE) = 2022
        AND MONTH(t1.Order_Date::DATE) = 8
    GROUP BY t1.Order_Number
            ,date_
    

    and then the outer one will just SUM the data. The main idea is that you can calculate the conditions for each order earlies applying the MAX function. The good part is we do not need a DISTINCT as we are grouping.

    Also, I have remove MAX(IIM) AS IIM_ as it is not used, and add the year-month filtering earlier. As you are filtering for particular year-month, there is no point to calculate all the data and then get only its.