Search code examples
sqlpostgresqlevent-sourcingsql-function

Turn set of SQL change rows into dates


I have a table (visibility_history) that tracks the dates when a user changes their profile's visibility. We create 1 visibility_history row when a user profile is first created and additional rows whenever a user changes their profile visibility.

It looks approximately like

user_id visible_before visible_after date
1 nil true Jan 1, 2021
1 true false Jan 4, 2021
1 false true Jan 5, 2021
2 nil false Jan 4, 2021
3 nil true Jan 4, 2021

In the above example, user 1 was created on Jan 1, and user 2 and 3 were created on Jan 4. For each row representing a change in a user's visibility, we record both the visibility of their profile before the change and after the change.

I want to a query that will return the set of dates where each user's profile was visible. For example, I would make a query to return the dates where a user's profile is visible between Jan 2 and Jan 6. The results would be

user_id date
1 Jan 2, 2021
1 Jan 3, 2021
1 Jan 5, 2021
1 Jan 6, 2021
3 Jan 4, 2021
3 Jan 5, 2021
3 Jan 6, 2021

I believe that I need to use a tally table but I don't know how in this case.


Solution

  • You need to define bounds of your interest since it cannot be determined from your sample data you want Jan 6 but not Jan 7. I used calendar CTE in my solution, anyway you can hardcode date constants if you want.

    Then you precompute upper (inclusive) bound of each interval of visibility (I assume there are no rows with visible_before = visible_after) and join pregenerated dates with valid interval for each user based on matching appropriate date range.

    with visibility_history (user_id,visible_before,visible_after,date) as (values
      (1,null ,true , date '2021-01-01'),
      (1,true ,false, date '2021-01-04'),
      (1,false,true , date '2021-01-05'),
      (2,null ,false, date '2021-01-04'),
      (3,null ,true , date '2021-01-04')
    ), calendar (min_date, max_date) as (values
      (date '2021-01-02', date '2021-01-06')
    ), precomputed_interval (user_id, visible_after, since, till) as (
      select h.user_id, h.visible_after, h.date
           , coalesce(lead(h.date) over (partition by h.user_id order by h.date) - interval '1 day', calendar.max_date)
      from visibility_history h
      cross join calendar
    )
    select i.user_id, s.d
    from generate_series((select min_date from calendar), (select max_date from calendar), '1 day'::interval) as s(d)
    join precomputed_interval i on s.d between i.since and i.till
    where i.visible_after
    order by i.user_id, s.d
    

    Db fiddle here.