I have a bunch of data as follows and I only want the 2019 entries.
+----------+
| Date |
+----------+
| 20190329 |
| 20180331 |
| 20190331 |
| 20180331 |
| 20190401 |
+----------+
Date type is datetime64[ns]
. I tried df = df[df['Date'].str.contains('2019')]
before I'd checked the type and it gives AttributeError: Can only use .str accessor with string values, which use np.object_ dtype in pandas
.
Is there an alternative?
Looks like you have a column of integers. In this instance, my recommended solution would be a conversion to datetime, following which you'd access the year attribute:
pd.to_datetime(df['Date'].astype(str)).dt.year == 2019 # you compare ints
0 True
1 False
2 True
3 False
4 True
Name: Date, dtype: bool
df[pd.to_datetime(df['Date'].astype(str)).dt.year == 2019]
Date
0 20190329
2 20190331
4 20190401
Another alternative (slightly faster, but I don't like this because of the potential for abuse) would be to slice the strings and compare:
df['Date'].astype(str).str[:4] == '2019' # you compare strings
0 True
1 False
2 True
3 False
4 True
Name: Date, dtype: bool