Search code examples
pythonpython-3.xpandasdataframepandas-merge

How to check if id in DF1 appeared within the past 30 minutes in DF2? using Pandas


I have DF1 with customer_id, datetime and fruites purchases, and DF2 with customer_id, datetime and Veggies purchases, how to check if within the past 30 minutes of the fruit purchases a customer did a Veggies puchase?

df1.head()
customer_id puchase_date fruit_item
1 2019-08-01 23:55:55 Apples
2 2019-08-01 23:58:32 Bananas
df2.head()
customer_id puchase_date veggies_item
1 2019-08-01 23:44:55 Eggplants
2 2019-08-01 22:00:32 Carrots
#after writing the required code and adding a new column to df1
df1.head()
customer_id puchase_date fruit_item baught_veggies_last_30_minutes?
1 2019-08-01 23:55:55 Apples Yes
2 2019-08-01 23:58:32 Bananas No

Solution

  • You can use merge_asof. You want to merge within 30 mins of the purchase date, so you set that using the tolerance parameter. Note that you misspelt purchase as puchase. I spelt it the same way so that you can run it without error.

    out = (pd.merge_asof(df1.assign(puchase_date=pd.to_datetime(df1['puchase_date'])).sort_values(by='puchase_date'),
                        df2.assign(puchase_date=pd.to_datetime(df2['puchase_date'])).sort_values(by='puchase_date'), 
                        on='puchase_date', 
                        by='customer_id', 
                        tolerance=pd.Timedelta('30 minute'))
           .rename(columns={'veggies_item':'bought_veggies_last_30_minutes'})
           .assign(bought_veggies_last_30_minutes=lambda x: x['bought_veggies_last_30_minutes']
                   .notna().replace({True: 'Yes', False:'No'})))
    

    Output:

       customer_id        puchase_date fruit_item bought_veggies_last_30_minutes
    0            1 2019-08-01 23:55:55     Apples                            Yes
    1            2 2019-08-01 23:58:32    Bananas                             No