Search code examples
sqlpostgresqljsonb

How to retrieve messages from a given date using Jsonb?


I have the following table:

CREATE TABLE tbl (tbl_id int, messages jsonb);
CREATE INDEX index_tbl ON tbl USING gin (messages);

And the JSON messages as array:

[{"user_id":1,"created_at":"2016-12-20","content":"Suspendisse accumsan tortor quis turpis. Sed ante."},
 {"user_id":2,"created_at":"2016-12-20","content":"Morbi sem mauris tibulum sagittis sapien."},
 {"user_id":1,"created_at":"2016-10-21","content":"In blandit ultrices enim. Phasellus id sapien in sapien iaculis congue."},
 {"user_id":3,"created_at":"2016-11-20","content":"Quisque ut erat. Curabitur gravida nisi at nibh. In hac habitasse platea dictumst."},
 {"user_id":4,"created_at":"2016-12-21","content":"Nunc rhoncus dui vel sem. Sed sagittis. ectus. Pellentesque at nulla. Suspendisse potenti. Cras in purus eu magna vulputate luctus."},
 {"user_id":6,"created_at":"2016-12-21","content":"Praesent id massa id nisl venenatis lacinia. iaculis congue."}]

How can I select the messages where the user is in only one month? For example, users of the month of December only return me the messages of users 2, 4 and 6 because user 1 is in the month of November as well.


Solution

  • This returns all ..

    messages where the user is in only one month

    WITH msg AS (
       SELECT tbl_id
            ,(msg->>'user_id')::int                                      AS user_id
            , date_trunc('month', (msg->>'created_at')::timestamp)::date AS created_month
            , msg->>'content'                                            AS content
       FROM   tbl t, jsonb_array_elements(t.messages) msg
       )
    SELECT m.*
    FROM  (
       SELECT user_id
       FROM   msg 
       GROUP  BY 1
       HAVING count(DISTINCT created_month) = 1
       ) u
    JOIN msg m USING (user_id);
    
    1. Unnest the JSON array with jsonb_array_elements().

    2. Extract values for relevant keys. Only the month of the date is relevant, use date_trunc() and cast the result back to date.

    3. Run the actual query on the CTE msg resulting from step 1. and 2. Identify users that posted in only one month - across the whole table, not just within one row. You did not clarify, I chose this interpretation.

    4. Self-join to the same CTE msg to return all messages of identifies users.

    The GIN index is not going to help, since you need to inspect every single array element either way.

    Storing messages in a normalized table to begin with would be simpler and faster for your query:

    message (user_id int, created_at date, content text)