I have a user table with following fields: User_ID, Email, Used_date.
As we can see, a user can switch to multiple emails over period. From used_date field I want to create date range fields (Email_Start_Date and Email_End_Date). They will store a time period when a user has used that email.
Its possible that a user can switch back to an older email. In this case same email will have two date ranges.
Also I want to fill the gap between last day of previous email and start date of current email.
For example if user used someone@gmail.com for a period of 8/28/2020 - 8/31/2020.
Further he switched to someone1@gmail.com on 9/3/2020.
Then in output someone@gmail.com will have a date range of 8/28/2020 - 9/2/2020.
This is a case of gaps-and-islands. But I don't know how to implement this.
Thanks everyone!
I would suggest just the difference of row numbers and aggregation:
select user_id, email, min(used_date) as email_start_date,
lead(min(used_date)) over (partition by user_id order by min(used_date)) - interval '1 day' as email_end_date
from (select t.*,
row_number() over (partition by user_id order by used_date) as seqnum,
row_number() over (partition by user_id, email order by used_date) as seqnum_2
from t
) t
group by user_id, email, (seqnum - seqnum_2);
Actually, you can also do this with lag()
and no aggregation:
select user_id, email, min(used_date) as email_start_date,
lead(used_date) over (partition by user_id order by used_date) - interval '1 day' as email_end_date
from (select t.*,
lag(email) over (partition by user_id order by used_date) as prev_email
from t
) t
where prev_email is null or prev_email <> email;
This second one is simple. It just keeps the rows where the email changes (or the row where the data starts for the user). It then uses lead()
to get the end date.
Here is a db<>fiddle.