Search code examples
sqlfiltergoogle-bigquerydatetime-formatpartitioning

Find the top 2 vendors per country, in each year available in the dataset? - bigquery SQL


I've joined 2 tables and extract the top 2 vendors by country (code at the bottom of this post and link) with the help of Tim Biegeleisen on this link , now I would like to go 1 step further and group the top 2 vendors by year by country.

The original ORDERS table looks like this :

country_name date_local vendor_id gmv_local is_successful_order
Taiwan 2012-10-02 2870 559.6 true
Taiwan 2012-10-02 3812 573.5 true
Singapore 2012-10-02 941 778.6 true
Singapore 2014-10-02 13 120.6 true
Bangkok 2014-10-02 227 563.6 true

This table is merged with the vendor table

id vendor_name country_name
2870 C House Taiwan
941 A House Singapore
227 9 House Bangkok

I would like to extract the year from the "date_local" column into timestamp format, where the items in the Year column will look like "2012-01-01T00:00:00" from the original date format of "2012-10-02"

Then I would like to list out the top 2 vendors of by year of each country in total revenue

The resulting table should look like this:

year country_name vendor_name total_gmv
2012-01-01T00:00:00 Singapore A House 1119.76
2012-01-01T00:00:00 Singapore B House 819.63
2012-01-01T00:00:00 Taiwan C House 119.6
2012-01-01T00:00:00 Taiwan D House 119.6
2012-01-01T00:00:00 Bangkok 9 House 119.6
2014-01-01T00:00:00 Singapore A House 2119.76
2014-01-01T00:00:00 Singapore B House 1819.63
2014-01-01T00:00:00 Taiwan C House 1019.6
2014-01-01T00:00:00 Taiwan D House 919.6
2014-01-01T00:00:00 Bangkok 9 House 189.6

printscreen of target results

I know I need to add one more partition of the Extract( year FROM date_local) as year and set it to the correct GETDATE() timestamp format to add the extra layer needed, but I don't know how to add it to the code

WITH cte AS (
    SELECT Ord.country_name, vn.vendor_name, ROUND(SUM(Ord.gmv_local), 2) AS total_gmv,
           ROW_NUMBER() OVER (PARTITION BY Ord.country_name
                              ORDER BY SUM(Ord.gmv_local) DESC) rn
    FROM `Orders` AS Ord
    LEFT JOIN `Vendors` AS vn
        ON Ord.vendor_id = vn.id
    GROUP BY Ord.country_name, vn.vendor_name
)

SELECT country_name, vendor_name, total_gmv
FROM cte
WHERE rn <= 1
ORDER BY country_name, total_gmv DESC;

Solution

  • If I understand correctly, you only need to also aggregate by year, and then add year to the partition of the call to ROW_NUMBER:

    SELECT
        Ord.country_name,
        vn.vendor_name,
        EXTRACT(year FROM date_local) AS year,
        ROUND(SUM(Ord.gmv_local), 2) AS total_gmv
    FROM ORDERS AS Ord
    LEFT JOIN `primeval-falcon-306603.foodpanda_BI_Exercise.Vendors` AS vn
        ON Ord.vendor_id = vn.id
    GROUP BY
        Ord.country_name,
        vn.vendor_name,
        EXTRACT(year FROM date_local)
    QUALIFY ROW_NUMBER() OVER (PARTITION BY country_name, EXTRACT(year FROM date_local)
                               ORDER BY total_gmv DESC) <= 2
    ORDER BY
        Ord.country_name DESC,
        total_gmv DESC;