Search code examples
sqlpostgresqlleft-joinaggregate-functions

Left outer join two levels deep in Postgres results in cartesian product


Given the following 4 tables:

CREATE TABLE events ( id, name )
CREATE TABLE profiles ( id, event_id )
CREATE TABLE donations ( amount, profile_id )
CREATE TABLE event_members( id, event_id, user_id )

I'm attempting to get a list of all events, along with a count of any members, and a sum of any donations. The issue is the sum of donations is coming back wrong (appears to be a cartesian result of donations * # of event_members).

Here is the SQL query (Postgres)

SELECT events.name, COUNT(DISTINCT event_members.id), SUM(donations.amount)
FROM            events
LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id
LEFT OUTER JOIN event_members ON event_members.event_id = events.id
GROUP BY events.name

The sum(donations.amount) is coming back = to the actual sum of donations * number of rows in event_members. If I comment out the count(distinct event_members.id) and the event_members left outer join, the sum is correct.


Solution

  • You seem to have this two independent structures (-[ means 1-N association):

    events -[ profiles -[ donations
    events -[ event members
    

    I wrapped the second one into a subquery:

    SELECT events.name,
      member_count.the_member_count
      COUNT(DISTINCT event_members.id),
      SUM(donations.amount)
    
    FROM            events
    LEFT OUTER JOIN profiles      ON events.id = profiles.event_id
    LEFT OUTER JOIN donations     ON donations.profile_id = profiles.id
    
    LEFT OUTER JOIN (
      SELECT
        event_id,
        COUNT(*) AS the_member_count
      FROM event_members
      GROUP BY event_id
    ) AS member_count
      ON member_count.event_id = events.id
    
    GROUP BY events.name