Search code examples
sqlimpala

SQL calculate share of grouped variables to total count


Its probably very easy, bot somehow I cannot get the desired result:

My data looks like this: I have a large table with items sold. Each item has a category assigned (here A-D) and country. I would like to calculate how many items were sold in Europe, in each category, and what is the share of this particular category to total sales

my data looks like this:

country item_id item_cat
Europe 1 A
Europe 2 A
Europe 3 B
Europe 4 B
Europe 5 C
Europe 6 C
Europe 7 C
USA 8 D
USA 9 D
USA 10 D

my desired output like this:

country item_cat cat_sales total_sales share
Europe A 2 7 0.29
Europe B 2 7 0.29
Europe C 3 7 0.43

what I tried is:

SELECT 
   country, 
   item_cat, 
   count(*) as cat_sales,
   count(*) OVER () as total_sales,
   cat_sales / total_sales as share
FROM data
where country='Europe'
group by item_cat

but SQL tells me I cannot group and use windowing in one request. How could i solve this? Thanks in advance


Solution

  • A few ways, one would be to pre-count the total sales in a CTE and then select from it for the remaining aggregate.

    I don't use impala however in standard SQL this should work

    with tot as (
        select *, 
          Count(*) over(partition by country) * 1.0 as total_sales
        from t
    )
    select country, item_cat, 
        Count(*) as cat_sales, 
        total_sales,
        Round(Count(*) / total_sales, 2) as Share
    from tot
    where country='europe'
    group by country, item_cat, total_sales