Search code examples
pythonpandasdatetimecontains

str contains equivalent for datetime64 pandas


I have a bunch of data as follows and I only want the 2019 entries.

+----------+
|   Date   |
+----------+
| 20190329 |
| 20180331 |
| 20190331 |
| 20180331 |
| 20190401 |
+----------+

Date type is datetime64[ns]. I tried df = df[df['Date'].str.contains('2019')] before I'd checked the type and it gives AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas.

Is there an alternative?


Solution

  • Looks like you have a column of integers. In this instance, my recommended solution would be a conversion to datetime, following which you'd access the year attribute:

    pd.to_datetime(df['Date'].astype(str)).dt.year == 2019  # you compare ints
    
    0     True
    1    False
    2     True
    3    False
    4     True
    Name: Date, dtype: bool
    
    df[pd.to_datetime(df['Date'].astype(str)).dt.year == 2019]
    
           Date
    0  20190329
    2  20190331
    4  20190401
    

    Another alternative (slightly faster, but I don't like this because of the potential for abuse) would be to slice the strings and compare:

    df['Date'].astype(str).str[:4] == '2019'  # you compare strings
    
    0     True
    1    False
    2     True
    3    False
    4     True
    Name: Date, dtype: bool