Search code examples
python-3.xpandasseriesstrftimestring-to-datetime

How to 'exactly' use .to_datetime on a series of date strings?


I am trying to convert a series of date strings to dates. But I found that even with 'exact' parameter set to True, when there is no day value in the string the .to_datetime is adding a default value of 01.

PS: I am using pandas 0.24.2

data = np.array(['2014-10','2015-06/07','20-14-08','2a115-09'])

ser = pd.Series(data)

pd.to_datetime(ser, errors='coerce', format='%Y-%m-%d', exact=True)

#Result:
0   2014-10-01
1          NaT
2          NaT
3          NaT
dtype: datetime64[ns]

#Expected
0   NaT
1   NaT
2   NaT
3   NaT
dtype: datetime64[ns]

Solution

  • You can try to conditionally check if the date format is present with str.match and np.where:

    notice I added one more date to showcase the correct result

    data = np.array(['2014-10','2015-06/07','20-14-08','2a115-09', '2018-09-20'])
    
    m = pd.Series(data).str.match('\d{4}-\d{2}-\d{2}')
    data_new = pd.Series(np.where(m, 
                                  pd.to_datetime(data, format='%Y-%m-%d',errors='coerce'), 
                                  np.datetime64('NaT')))
    
    0          NaT
    1          NaT
    2          NaT
    3          NaT
    4   2018-09-20
    dtype: datetime64[ns]