Search code examples
sqlpostgresqljoinduplicatessum

Sum of a column excluding duplicate join records


Let's say I have a table containing cities with their populations and a table containing buildings of different types. I want to know the percentage of people living in cities containing schools.

Cities:

name pop
A 10
B 100
C 1000

Buildings:

is_school city
false A
true B
true B
true C

If I do somthing like:

SELECT
  SUM(CASE WHEN building.is_school = true THEN city.pop ELSE 0 END) school,
  SUM(city.pop) total
FROM city
LEFT JOIN building ON building.city = city.name;

I get city B population summed twice. I would like to have:

school total
1100 1110

but I get:

school total
1200 1210

fiddle

I can do a subquery:

SELECT
  SUM(CASE WHEN city.name in (
    SELECT city.name
    FROM city
    LEFT JOIN building ON building.city = city.name
    WHERE building.is_school = true
  ) THEN city.pop ELSE 0 END),
  SUM(city.pop)
FROM city;

but it feels unnecessary convoluted considering what I'm trying to achieve, is there really no other way ?


Solution

  • You can do:

    SELECT
      1.0 * sum(case when exists (
      select 1 from building b where b.city = c.name and is_school) then 1 else 0 end)
      / count(*)
    FROM city c;
    

    Result:

    ?column?
    --------
    0.66666666666666666667
    

    Or, you can use a lateral join:

    select 1.0 * sum(coalesce(n, 0)) / count(*) as p
    from city c
    left join lateral (
      select 1 from building b where b.city = c.name and b.is_school limit 1
    ) x (n) on true
    

    See running examples at db<>fiddle.