Search code examples
pythonpandasdataframeisin

Series finds value through direct == comparison but fails on isin() process


I am trying to filter a dataframe through this method:

filter = (df_1['Year']==2022) & **(df_1['Quarter'].isin(['2022Q3','2022Q4']))**
df_incumbent = df_1[filter].groupby(['Lane','Carrier Scac','Quarter']).agg({'Cust Linehaul':np.mean,'Load Number':'count'}).unstack('Quarter',fill_value='NaN')
df_incumbent

And it returns nothing. However when I do a direct comparison item by item it returns a dataframe.

filter = (df_1['Year']==2022) & (df_1['Quarter']=='2022Q3')

Some more information on the df_1['Quarter']. If it helps with some clue.

df_1['Quarter'].unique()

<PeriodArray>
['2021Q4', '2021Q1', '2021Q3', '2021Q2', '2022Q1', '2022Q2', '1970Q1',
    'NaT', '2022Q3', '2022Q4']
Length: 10, dtype: period[Q-DEC]

I have used isin successfully by passing a list of items, but not sure why this one is not working. I tried to go through other questions, but not able to still solve it.


Solution

  • Looks like isin only works on list with the same dtype, try create PeriodIndex first:

    df_1['Quarter'].isin(pd.PeriodIndex(['2022Q3','2022Q4'], freq='Q'))
    

    Example:

    s = pd.to_datetime(['2022-10-01', '2023-10-02']).to_period('Q-DEC')
    s
    # PeriodIndex(['2022Q4', '2023Q4'], dtype='period[Q-DEC]')
    
    s.isin(['2022Q4'])
    # array([False, False])
    
    s.isin(pd.PeriodIndex(['2022Q4'], freq='Q'))
    # array([ True, False])