Search code examples
sqljsonpostgresqlhierarchical-datapostgresql-performance

Create a JSON object from parent -> child relationship without duplication


I want to query a database, get ALL of the user's data, and send it to my front end in a JSON object (with many layers of nesting).

e.g.

{
 user_id: 1,
 username: james,
 messages: [
  {
   message_id: 'fewfef',
   message: 'lorum ipsum'
   ... : {
    ...
   }
  }
 ]
}

Sample schema/data:

--user table (parent)
CREATE TABLE userdata (
    user_id integer,
    username text
);

INSERT INTO userdata VALUES (1, 'james');

-- messages table (child) connected to user table
CREATE TABLE messages(
    message_id integer,
    fk_messages_userdata integer,
    message text
);

INSERT INTO messages VALUES (1, 1, 'hello');
INSERT INTO messages VALUES (2, 1, 'lorum ipsum');
INSERT INTO messages VALUES (3, 1, 'test123');

-- querying all data at once
SELECT u.username, m.message_id, m.message FROM userdata u
    INNER JOIN messages m
        ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';

This outputs the data as so:

username|message_id|message    |
--------+----------+-----------+
james   |         1|hello      |
james   |         2|lorum ipsum|
james   |         3|test123    |

The issue is I have the username is repeated for each message. For larger databases and more layers of nesting this would cause a lot of useless data being queried/sent.

Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?

For example I could run these queries:

-- getting only user metadata
SELECT username from userdata WHERE user_id = '1';
-- output
username|
--------+
james   |

-- getting only user's messages
SELECT m.message_id, m.message as message_id FROM userdata u
    INNER JOIN messages m
        ON u.user_id = m.fk_messages_userdata
WHERE u.user_id = '1';
--output
message_id|message_id |
----------+-----------+
         1|hello      |
         2|lorum ipsum|
         3|test123    |

This way I get only the data I need, and its a little easier to work with, as it comes to the backed more organized. But is there a disadvantage of running separate queries instead of one big one? Are there any other ways to do this?


Solution

  • Is it better to do one query to get all of this data and send it to the backend, or make a seperate query for each table, and only get the data I want?

    It's best to run only one query and get only the data you want. As long as it doesn't get too complicated - which it doesn't IMO:

    SELECT to_json(usr)
    FROM  (
       SELECT u.user_id, u.username
           , (SELECT json_agg(msg)  -- aggregation in correlated subquery
              FROM  (
                SELECT m.message_id, m.message
                FROM   messages m
                WHERE  m.fk_messages_userdata = u.user_id
                ) msg
             ) AS messages
       FROM   userdata u
       WHERE  u.user_id = 1  -- provide user_id here once!
       ) usr;
    

    fiddle

    There are many other ways.

    A (LEFT) JOIN LATERAL instead of the correlated subquery. See:

    json_build_object() instead of converting whole rows from subselects. See:

    But this version above should be shortest and fastest.

    Related: