Search code examples
sqlpostgresqldatetimedate-arithmeticlateral-join

How to group records based on registered time considering entries in another table for the reporting month?


I have a table A with registered time associated for an account , there can be only one entry for each id.

For all the Id's present in Table A there will be entries in Table B like below with state

enter image description here

Expected transformed table

For each ID in table A If there is NO corresponding entry for an ID is present in table B for the registered time's month - want to classify it as New for that reporting month. Likewise if there are no corresponding entries in subsequent months in table B for an ID would like to classify them as well as New.

Eg:

ID 111 ~ Registered in November 2020 => Table B has no entries for ID 111 => Transformed table has an entry for ID 111 for the november month with state New.
ID 112 ~ Registered in November 2020 => Table B has entries for ID 112 in the month of November => Transformed table has no entry for ID 112
ID 113 ~ Registered in November 2020 => Table B has entries for ID 113 starting December => Transformed table has an entry for ID 113 for the november month with state New.
ID 114 ~ Registered in November 2020 => Table B has entries for ID 114 starting Feb 2021 => Transformed table has an entry for ID 114 for the months of november,December,Jan month with state New.

enter image description here


Solution

  • If I am following this correctly, you can use generate_series() and a lateral join:

    select a.id, 'new' state, s.dt
    from tablea a
    cross join lateral (
        select generate_series(
            date_trunc('month', a.registered_time), 
            coalesce(
                date_trunc('month', min(b.time)) - interval '1 month', 
                date_trunc('month', a.registered_time)
            ),
            '1 month'
        )
        from tableb b
        where b.id = a.id
    ) s(dt)
    

    The trick lies in the generation of the argument to generate_series(): if there is at least one entry available in b, we generate the date series from the beginning of the month of the registered time of a until the prior month to the earliest date in b; in case there is a date in b in the same month as in a, this generates an empty range, and the original row is filtered out. Else, we fall back on the registered time as end of range (which generates a range made of a single date).

    Demo on DB Fiddle:

     id | state | dt                 
    --: | :---- | :------------------
    111 | new   | 2020-11-01 00:00:00
    113 | new   | 2020-11-01 00:00:00
    114 | new   | 2020-11-01 00:00:00
    114 | new   | 2020-12-01 00:00:00
    114 | new   | 2021-01-01 00:00:00