Search code examples
sqlhivecuberollupgrouping-sets

Calculating multiple averages across different parts of the table?


I have the following transactions table:

customer_id purchase_date   product         category        department      quantity    store_id
    1       2020-10-01      Kit Kat         Candy           Food                2       store_A
    1       2020-10-01      Snickers        Candy           Food                1       store_A
    1       2020-10-01      Snickers        Candy           Food                1       store_A
    2       2020-10-01      Snickers        Candy           Food                2       store_A
    2       2020-10-01      Baguette        Bread           Food                5       store_A
    2       2020-10-01      iPhone          Cell phones     Electronics         2       store_A
    3       2020-10-01      Sony PS5        Games           Electronics         1       store_A

I would like to calculate the average number of products purchased (for each product in the table). I'm also looking to calculate averages across each category and each department by accounting for all products within the same category or department respectively. Care should be taken to divide over unique customers AND the product quantity being greater than 0 (a 0 quantity indicates a refund, and should not be accounted for).

So basically, the output table would like below:

enter image description here

...where store_id and average_level_type are partition columns.

Is there a way to achieve this in a single pass over the transactions table? or do I need to break down my approach into multiple steps?

Thanks!


Solution

  • How about using “union all” as below -

    Select store_id, 'product' as average_level_type,product as id, sum(quantity) as total_quantity,
    Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
    from transactions
    where quantity > 0
    group by store_id,product
    Union all
    Select store_id, 'category' as average_level_type, category as id, sum(quantity) as total_quantity,
    Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
    from transactions
    where quantity > 0
    group by store_id,category
    Union all
    Select store_id, 'department' as average_level_type,department as id, sum(quantity) as total_quantity,
    Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average
    from transactions
    where quantity > 0
    group by store_id,department;
    

    If you want to avoid using union all in that case you can use something like rollup() or group by grouping sets() to achieve the same but the query would be a little more complicated to get the output in the exact format which you have shown in the question.

    EDIT : Below is how you can use grouping sets to get the same output -

    Select store_id,
           case when G_ID = 3 then 'product' 
                when G_ID = 5 then 'category'
                when G_ID = 6 then 'department' end As average_level_type,
           case when G_ID = 3 then product 
                when G_ID = 5 then category
                when G_ID = 6 then department end As id,
           total_quantity,
           unique_customer_count,
           average
    from            
        (select store_id, product, category, department, sum(quantity) as total_quantity, Count(distinct customer_id) as unique_customer_count, sum(quantity)/count(distinct customer_id) as average, GROUPING__ID As G_ID
        from transactions
        group by store_id,product,category,department
        grouping sets((store_id,product),(store_id,category),(store_id,department))
        ) Tab
    order by 2    
    ;