Search code examples
pythonpandasdatetimeseriesstrftime

Python in operator not working as expected when comparing string and strftime values


I'm working with datetime values converted to strings (years) in a dataframe. I would like to check whether a given year exists in my dataframe.year_as_string column using the in operator. However, my expression unexpectedly evaluates to False (see the second print statement). Why does this happen?

NB: I can probably solve my problem in a simpler way (as in the 3rd print statement), but I am really curious as to why the second statement evaluates to False.

import pandas as pd

ind = pd.to_datetime(['2013-12-31', '2014-12-31'])

df = pd.DataFrame([1, 2], index=ind)
df = df.reset_index()
df.columns = ['year', 'value']
df['year_as_string'] = df.year.dt.strftime('%Y')

# 1. the string '2013' is equal to the first element of the list
print('2013' == df['year_as_string'][0])

# 2. but that same string is not 'in' the list?! Why does this evaluate to False?
print('2013' in df['year_as_string'])

# 3. I further saw that strftiming the DatetimeIndex itself does evaluate as I would expect
year = ind.strftime('%Y')
print('2013' in year)

Solution

  • The in operator with a Pandas series will check the index, much like using in with a dictionary will check keys only. Instead, you can use in with a series' NumPy array representation:

    '2013' in df['year_as_string'].values
    

    A more Pandorable approach would be to construct a Boolean series and then use pd.Series.any:

    (df['year_as_string'] == '2013').any()
    

    Equivalently:

    df['year_as_string'].eq('2013').any()
    

    Even better, avoid converting to strings unless absolutely necessary:

    df['year_as_int'] = df['year'].dt.year
    df['year_as_int'].eq(2013).any()