Search code examples
sqlpostgresqlduplicatesleft-joingreatest-n-per-group

Unify columns from different tables while selecting distinct rows


Tables

User

id name email is_active
1 john [email protected] FALSE
2 mike [email protected] TRUE
3 monica [email protected] TRUE
4 joey [email protected] FALSE
5 ross [email protected] FALSE

Subscriptions

id house_id plan name status
1 1 A banana a month inactive
2 2 An apple a month active
3 3 A pear a month active

House

id name
1 John's House
2 Mike's House
3 Monica's House
4 Joey's House
5 Ross's House

House_Contact (legacy table)

id house_id is_primary
1 1 TRUE
2 2 FALSE
2 3 TRUE

House_User (new table)

id house_id is_owner user_id
1 2 FALSE 2
2 4 FALSE 4
3 5 FALSE 5

Expected Results

The resulting table should include the following:

  • Does the user have a subscription regardless of status? If so, include, if not, disregard.
  • Get email & is_active from User table (if they have subscription)
  • Get is_primary OR is_owner (if they have a subscription)
  • Results should be distinct (no duplicate users)
house_id email is_owner is_active
1 [email protected] TRUE FALSE
2 [email protected] FALSE TRUE
3 [email protected] TRUE TRUE

What I tried

SELECT
    u.email AS "email",
    u.is_active AS "is_active",
    h.id AS "house_id",
    is_owner
FROM
    house c
    INNER JOIN (
        SELECT
            house_id,
            user_id
        FROM
            house_user) hu ON h.id = hu.house_id
    INNER JOIN (
        SELECT
            id,
            email,
            is_active
        FROM
            USER) u ON hu.user_id = u.id
    INNER JOIN (
        SELECT
            id,
            email,
            is_primary
        FROM
            house_contact) hc ON u.email = ch.email
    INNER JOIN (
        SELECT
            house_id,
            is_primary is_owner
        FROM
            house_contact
    UNION
    SELECT
        house_id,
        is_owner is_owner
    FROM
        house_user) t ON u.id = t.house_id)
ORDER BY
    u.email

Results are half than if I remove the INNER JOIN with UNION statement. No idea how to proceed.

I'm particularly confused with unifying the column and the possible duplication.


Solution

  • My educated guess:

    SELECT DISTINCT ON (u.id)
          u.id, u.email, u.is_active, h.house_id, h.is_primary
    FROM  "user" u
    LEFT  JOIN (
       SELECT hu.user_id, hu.house_id
            , GREATEST(hc.is_primary, hu.is_owner) AS is_primary
       FROM   house_user hu
       LEFT   JOIN house_contact hc USING (house_id)
       WHERE  EXISTS (SELECT FROM subscription WHERE house_id = hu.house_id)
       ) h ON h.user_id = u.id
    ORDER  BY u.id, h.is_primary DESC NULLS LAST, h.house_id;
    

    We don't need table house in the query at all.

    I see three possible sources of conflict:

    1. house_contact.is_primary vs. house_user.is_owner. Both seem to mean the same. The DB design is broken in this respect. Taking GREATEST() of both, which means true if either is true.

    2. We don't care about subscription.status, so just make sure the house has at least one subscription of any kind with EXISTS, thereby avoiding possible duplicates a priori.

    3. A user can live in multiple houses. We want only one row per user. So show the first house with is_primary (the one with the smallest house_id) if any. If there is no house, there is also no subscription. But the outer LEFT JOIN keeps the user in the result. Change to JOIN to skip users without subscription.

    About DISTINCT ON:

    About sorting boolean values: