Search code examples
sqlpostgresqlgroup-by

How to get arrays of results from a secondary table inside a query on a primary table?


I have the following tables inside of a Postgresql database:

ER diagram

The shifts table contains "Morning Shift" and "Evening Shift" as possible types. Each shift can have multiple users and users can have multiple shifts.

I need a list of all shifts, grouped by their dates, and for every shift containing all users attending the shift. So something like this:

{
...
  '2023-01-01': {
    'morning_shift': [
      { 'username': 'Some user', 'role': 'Some role', 'tag': 'Some tag' },
      ...
    ],
    'evening_shift': [
      { 'username': 'Some user', 'role': 'Some role', 'tag': 'Some tag' },
      ...
    ]   
  }
...
}

So far I have come up with the following query:

SELECT shifts.date,
  array(
    SELECT users.username
    FROM shifts_users
    JOIN shifts on shifts_users.shift_id = shifts.id
    JOIN users on users.id = shifts_users.user_id
    WHERE shifts.type = 'Morning Shift') as morning_shift,
  array(
    SELECT users.username
    FROM shifts_users
    JOIN shifts on shifts_users.shift_id = shifts.id
    JOIN users on users.id = shifts_users.user_id
    WHERE shifts.type = 'Evening Shift') as evening_shift
FROM shifts
GROUP BY shifts.date
ORDER BY shifts.date ASC

With that I receive the following result:

{
...
  '2023-01-01': {
    'morning_shift': [
      'Some user',
      ...
    ],
    'evening_shift': [
      'Some user',
      ...
    ]   
  }
...
}

Now I ran into the problem, that I only can get one column with subqueries, which means I currently only get the usernames but not the other values I need.

Is there maybe a different approach I should use or a different table setup to make life easier?


Solution

  • Your query is more broken than you realize. Both subqueries are uncorrelated to the outer SELECT and gather all users for all days for the given shift, not just the ones for the outer day. Using the same table name shifts without alias in the subquery hides the outer table of the same name.

    After untangling that mess, you'll find that you have to aggregate in two steps. Your original query could work like this:

    SELECT date
         , min(shift_users) FILTER (WHERE type = 'Morning Shift') AS morning_shift
         , min(shift_users) FILTER (WHERE type = 'Evening Shift') AS evening_shift
    FROM  (
       SELECT s.date, s.type
            , ARRAY (
                SELECT u.username
                FROM   shifts_users su         
                JOIN   users        u ON u.id = su.user_id
                WHERE  su.shift_id = s.id
                ) AS shift_users
       FROM   shifts s
       ) sub
    GROUP  BY date
    ORDER  BY date;
    

    In the outer SELECT I apply a "poor-man's crosstab" technique. See:

    If you want whole rows from table users instead of just the username, you could use ARRAY (SELECT u FROM .... But an array of records is hard to read / process.
    You may want a JSON representation instead. But there is no min(json) aggregate function (nor min(jsonb) or similar). There are various ways around this. I would rewrite the query as:

    WITH su AS (
       SELECT su.shift_id, array_agg(u.username) AS shift_users
       FROM   shifts_users su         
       JOIN   users        u ON u.id = su.user_id
       GROUP  BY su.shift_id
       )
    SELECT d.date, m.shift_users AS morning_shift, e.shift_users AS evening_shift
    FROM  (
       SELECT date
            , min(id) FILTER (WHERE type = 'Morning Shift') AS m_shift_id
            , min(id) FILTER (WHERE type = 'Evening Shift') AS e_shift_id
       FROM   shifts s
       GROUP  BY date
       ) d
    LEFT   JOIN su m ON m.shift_id = d.m_shift_id
    LEFT   JOIN su e ON e.shift_id = d.e_shift_id
    ORDER  BY d.date;
    

    Aggregate days (d) and shift-users (su) separately - the latter in a CTE, and then join to it twice. Should also be substantially faster while aggregating the whole table.

    Now, to get whole user rows:

    WITH su AS (
       SELECT su.shift_id, array_agg(u) AS shift_users  -- Postgres array of records
       FROM   ...  -- rest is identical
    

    u being the reference to the whale table row (table alias in this case).
    And since we got rid of aggregation in the outer SELECT, we can switch to JSON at will:

    WITH su AS (
       SELECT su.shift_id, json_agg(u) AS shift_users  -- JSON array of objects
       FROM   ...  -- rest is identical