Search code examples
postgresqljoingroup-byaggregate-functions

PostgreSQL: Using different columns in select and group by does not cause error


So, I was learning about windows functions from here, https://www.postgresqltutorial.com/postgresql-window-function/

Before the section for window functions start, there is a brief introduction to the aggregate functions and group by clause.

These are the reference tables I am using as per the tutorial:

enter image description here enter image description here

As far as I understand, one is supposed to use the same field in SELECT and GROUP BY clause. e.g.

SELECT p.group_id, avg(p.price)  from PRODUCTS p GROUP BY p.group_id;

Otherwise, PostgreSQL will give out an error: column "" must appear in the GROUP BY clause or be used in an aggregate function.

But, I am bit confused by the results of the below query:

select group_name,avg(price) from product_groups inner join products p using (group_id) group by group_id;

First of all, I am using different columns in select and group by i.e. group_name and group_id respectively. But, it does not throw an error similar to above!

And, it gives the following result. It looks like group_name and group_id are being used interchangeably here. Could any please explain what is going on here?

enter image description here


Solution

  • Your understanding of the rules for GROUP BY are almost complete. In general, whatever columns appear in the SELECT clause also have to appear in GROUP BY. However, there is an exception in the ANSI standard, which states that if a column in GROUP BY which uniquely determines the values of other columns, then those other columns can also appear in the SELECT clause.

    Having a closer look at your second query:

    SELECT group_name, AVG(price)           -- valid, because if we know
    FROM product_groups                     -- group_id then we also know
    INNER JOIN products p USING (group_id)  -- group_name
    GROUP BY group_id;
    

    In the products table, if group_id is either the primary key or otherwise a unique key, then if we pick a certain value for group_id, we also know uniquely what the group_name would be. If so, then technically your second query is ANSI compliant. Postgres, more than most other databases, tends to closely follow the ANSI standard, so it accepts your query as being valid.