Search code examples
sqlpivot-tableamazon-redshifttop-n

SQL query challenge - find top frequent items in columns and summarize result to a pivot table


I am looking for a query to do following transformation.

Basically I want to find top 3 frequent SELL_COUNTRY and top 3 frequent category, on per website, per day bases. (for example, website 1, date 6-5-2017, there are 2*US, 1*JP and 1*UK for SELL_COUNTRY, therefore TOP1_SELL_COUNTRY is US, and JP and UK going to TOP2_SELL_COUNTRY and TOP3_SELL_COUNTRY. Same idea for CATEGORY column)

My current solution involves many subqueries, which works, but I feel it is too complicated. I am interested in how sql master would do it in an elegant way.

Currently I know how to do it uses

From enter image description here

To enter image description here


Solution

  • I would do that in 3 steps:

    1. group by country and rank by count
    2. group by category and rank by count
    3. blend results using conditional aggregate (which will just place the values in the necessary cells because the result of the CASE would be just your value and many NULL values, so min() outputs the value)

    Like this:

    WITH
    countries as (
        SELECT *, row_number() over (partition by website,date order by count desc)
        FROM (
            SELECT
             website
            ,date::date
            ,sell_country
            ,count(1)
            FROM your_table
            GROUP BY 1,2,3
        )
    )
    ,categories as (
        SELECT *, row_number() over (partition by website,date order by count desc)
        FROM (
            SELECT
             website
            ,date::date
            ,category
            ,count(1)
            FROM your_table
            GROUP BY 1,2,3
        )
    )
    SELECT
     website
    ,date
    ,coalesce(min(case when t1.row_number=1 then t1.sell_country end),'NA') as top1_sell_country
    ,coalesce(min(case when t1.row_number=2 then t1.sell_country end),'NA') as top2_sell_country
    ,coalesce(min(case when t1.row_number=3 then t1.sell_country end),'NA') as top3_sell_country
    ,coalesce(min(case when t2.row_number=1 then t2.category end),'NA') as top1_sell_category
    ,coalesce(min(case when t2.row_number=2 then t2.category end),'NA') as top2_sell_category
    ,coalesce(min(case when t2.row_number=3 then t2.category end),'NA') as top3_sell_category
    FROM countries t1 
    FULL JOIN categories t2
    USING (website,date)
    GROUP BY 1,2
    ORDER BY 1,2