Task: find the most recent occurence of a user in the datatable having the information on date of service usage.
Sample data:
sample_pl = pl.DataFrame(
{
'id':['id1', 'id1', 'id1', 'id1',
'id2', 'id2', 'id2', 'id2',
'id3', 'id3', 'id3', 'id3'],
'dt_event': ['2023-06-01', '2023-06-04', '2023-06-10', '2023-06-28',
'2023-06-01', '2023-06-04', '2023-06-10', '2023-06-28',
'2023-06-01', '2023-06-04', '2023-06-10', '2023-06-28']
},
schema={
'id': pl.Utf8, 'dt_event': pl.Utf8
}
)
Manipulations to convert from str to date and the final answer:
(
sample_pl
.with_columns(pl.col('dt_event').str.to_date())
.with_columns(
pl.when(pl.col('dt_event') == pl.col('dt_event').min().over('id'))
.then(1)
.otherwise(0)
.alias('is_most_recent')
)
)