Search code examples
pythonpandasdatetimeisin

check which dates from column 'long_date' are also in array holy_date


i get an empty df although i know some rows should be in it

any thoughts how to fix this?

the df after the 7th line run looks like this:

long_date country
2020-11-07 Portugal
2020-01-01 Portugal

the holy_date looks like this: ['2020-01-01','2020-01-06']

from numpy.ma.extras import isin
import holidays
df = df[(df['country'] == 'Portugal')]
min_year = (pd.DatetimeIndex(df.long_date).year.min())
max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
df.long_date= pd.to_datetime(df.long_date).dt.date
df = pd.concat([df,df.long_date.isin(holy_date).rename('bh')],axis =1)
df[df['bh']==True]

Solution

  • The problem come from the fact that you are trying to identify strings in a datetime column. What you should do is to eliminate the row

    df.long_date= pd.to_datetime(df.long_date).dt.date
    

    and use this instead: I added a few dates to your data

    long_date   country
    0  2020-11-07  Portugal
    1  2020-11-01  Portugal
    2  2020-10-01  Portugal
    3  2020-06-11  Portugal
    

    and

    from numpy.ma.extras import isin
    import holidays
    import pandas as pd
    df = pd.read_csv('holyday.csv', sep=";")
    print(df)
    df = df[(df['country'] == 'Portugal')]
    min_year = (pd.DatetimeIndex(df.long_date).year.min())
    max_year = (pd.DatetimeIndex(df.long_date).year.max())+1
    
    holy_date = [i.strftime('%Y-%m-%d') for i in [*holidays.CountryHoliday('Portugal',years = np.arange(min_year,max_year,1)).keys()]]
    holy_date = list(holy_date)
    #df.long_date= pd.to_datetime(df.long_date).dt.date
    df = pd.concat([df,df['long_date'].isin(holy_date).rename('bh')],axis =1)
    print(df)
    df[df['bh']==True]
    

    produces this:

        long_date   country
    0  2020-11-07  Portugal
    1  2020-11-01  Portugal
    2  2020-10-01  Portugal
    3  2020-06-11  Portugal
        long_date   country     bh
    0  2020-11-07  Portugal  False
    1  2020-11-01  Portugal   True
    2  2020-10-01  Portugal  False
    3  2020-06-11  Portugal   True
    
    
    
        long_date   country    bh
    1  2020-11-01  Portugal  True
    3  2020-06-11  Portugal  True