Search code examples
sqldatabasepostgresqlcommon-table-expression

SQL: querying the max of sum written either as subquery or CTE fails


Assume a table salaries that contains employee_id, department_id and salary, as follows.

e_id | dpt_id | salary
---- | ------ | ------
   1 |      2 |   1000
   2 |      2 |   2000
   3 |      5 |   3000
   4 |      6 |    500
   5 |      5 |    100
   6 |      1 |   1050

I wanted to bring the dpt with the maximum/minimum salaries and I used the following query which failed:

select sum_sal_per_dpt.dpt_id, max(sum_sal_per_dpt.total_salaries)
from (
      select dpt_id, sum(salary) as total_salaries
      from salaries
      group by dpt_id ) as sum_sal_per_dpt
group by sum_sal_per_dpt.dpt_id
union
select sum_sal_per_dpt.dpt_id, min(sum_sal_per_dpt.total_salaries)
from (
      select dpt_id, sum(salary) as total_salaries
      from salaries
      group by dpt_id ) as sum_sal_per_dpt
group by sum_sal_per_dpt.dpt_id

The output I get is a table with the dpt_id and the summation of salaries, as if the subquery is only executed:

dpt_id | salary
------ | ------
     2 |   3000
     5 |   3100
     6 |    500
     1 |   1050

instead of

dpt_id | salary
------ | ------
     5 |   3100
     6 |    500

Same if I write it in a CTE fashion.

with
sum_sal_per_dpt as (select dpt_id, sum(salary) as total_salaries
                    from salaries
                    group by dpt_id)
select sum_sal_per_dpt.dpt_id, max(sum_sal_per_dpt.total_salaries)
from sum_sal_per_dpt
group by sum_sal_per_dpt.dpt_id
union
select sum_sal_per_dpt.dpt_id, min(sum_sal_per_dpt.total_salaries)
from sum_sal_per_dpt
group by sum_sal_per_dpt.dpt_id

Can someone explain?


Solution

  • The following should give you the results you expect:

    ;with data as
    (
    select dpt_id, sum(salary) as total_salaries
    from salaries
    group by dpt_id
    )
    select dpt_id, total_salaries
    from data 
    where total_salaries = (select max(total_salaries) from data)
    union 
    select dpt_id, total_salaries
    from data 
    where total_salaries = (select min(total_salaries) from data)
    

    Note that the above code will give you multiple rows if you have multiple max and/or min values, you can easily adjust this depending on what you want to see.

    Within your code, you were using group by dpt_id, which is why you were still getting max values per dpt_id, hence multiple rows, additional confusion might have been created with using union instead of union all, which would remove the rows that are the same in both tables.