Search code examples
sqlpostgresqlgroup-bywindow-functions

Group by with WIndows Function- Postgresql


I know, there is no point using Group by here. But I was experimenting something & getting an error saying column cost should be in group by or used in an aggregate function. I want to understand what's going on internally and why the logic is incorrect. Let's say I have a table :

name | date | category | cost,   
jill  | 2019-04-01 | pen | 10 , 
jill  | 2019-04-01 | pen | 40 , 
jill  | 2019-04-01 | coat | 20 , 
Farida | 2019-03-01 | coat | 25,
Farida | 2019-03-02 | coat | 15

Writing the code as :

select
     first_name, cast(o_date as date), sum(cost) over(partition by first_name) as tot 
from tab1
group by 
        1,2;

According to the query, first group by will be executed which will give:

Jill | 2019-04-01
Farida | 2019-03-01
Farida | 2019-03-02

and then we are squashing the rows by taking sum of cost column but for every partition of first_name.

I am expecting the output as

Jill | 2019-04-1 | 50
Farida | 2019-03-01 | 60
Farida | 2019-03-02 |60

The code is working fine without group by clause(which I already know how to do it). Why can't we use group by here? Please tell what causing the code to be incorrect?


Solution

  • A sum() followed by an OVER clause is not an aggregation function though it has the same name as one.

    That follows that cost, in your query, is neither an argument to an aggregation function nor in the GROUP BY clause.

    But you can use a window (not "windows" by the way) function on the result of an aggregation function.

    So the following is allowed. sum() is used as an aggregation function on cost first, and then the window function sum() is used on that.

    SELECT first_name,
           cast(o_date AS date),
           sum(sum(cost)) OVER (PARTITION BY first_name) AS tot 
           FROM tab1
           GROUP BY first_name,
                    cast(o_date AS date);
    

    And as a side note: I recommend against using column ordinals in GROUP BY clauses. That's too easy to mess up. Prefer to use the column expressions instead.