Search code examples
pythonpandasdataframedatetimeperiod

Convert a column of datetime and strings to period in pandas


I would like to understand if I can convert a column with mixed types (DateTime and Strings) to a PeriodIndex (for instance month).

I have the following DataFrame:

booking_date          ...      credit           debit
None                  ...      10185.00     -10185.00
2017-01-01 00:00:00   ...       1796.00          0.00
2018-07-01 00:00:00   ...       7423.20        -11.54
2017-04-01 00:00:00   ...       1704.00          0.00
2017-12-01 00:00:00   ...       1938.60      -1938.60
2018-12-01 00:00:00   ...       1403.47       -102.01
2018-01-01 00:00:00   ...       2028.00        -76.38
2019-01-01 00:00:00   ...        800.00       -256.98
Total                 ...      10185.00     -10185.00

I'm trying to apply the PeriodIndex to booking_date:

df['booking_date'] = pd.PeriodIndex(df['booking_date'].values, freq='M')

However, I receive the following error:

pandas._libs.tslibs.parsing.DateParseError: Unknown datetime string format, unable to parse: TOTAL

Anyway I can get around with this?

Thanks!


Solution

  • If need Periods only cannot mixing with strings:

    df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
    print (df)
      booking_date  ...    credit     debit
    0          NaT  ...  10185.00 -10185.00
    1      2017-01  ...   1796.00      0.00
    2      2018-07  ...   7423.20    -11.54
    3      2017-04  ...   1704.00      0.00
    4      2017-12  ...   1938.60  -1938.60
    5      2018-12  ...   1403.47   -102.01
    6      2018-01  ...   2028.00    -76.38
    7      2019-01  ...    800.00   -256.98
    8          NaT  ...  10185.00 -10185.00
    

    But it is possible:

    orig = df['booking_date']
    
    df['booking_date'] = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
    
    df.loc[df['booking_date'].isna(), 'booking_date'] = orig
    print (df)
      booking_date  ...    credit     debit
    0         None  ...  10185.00 -10185.00
    1      2017-01  ...   1796.00      0.00
    2      2018-07  ...   7423.20    -11.54
    3      2017-04  ...   1704.00      0.00
    4      2017-12  ...   1938.60  -1938.60
    5      2018-12  ...   1403.47   -102.01
    6      2018-01  ...   2028.00    -76.38
    7      2019-01  ...    800.00   -256.98
    8        Total  ...  10185.00 -10185.00
    
    print (df['booking_date'].apply(type))
    0                             <class 'NoneType'>
    1    <class 'pandas._libs.tslibs.period.Period'>
    2    <class 'pandas._libs.tslibs.period.Period'>
    3    <class 'pandas._libs.tslibs.period.Period'>
    4    <class 'pandas._libs.tslibs.period.Period'>
    5    <class 'pandas._libs.tslibs.period.Period'>
    6    <class 'pandas._libs.tslibs.period.Period'>
    7    <class 'pandas._libs.tslibs.period.Period'>
    8                                  <class 'str'>
    Name: booking_date, dtype: object
    

    new = pd.to_datetime(df['booking_date'], errors='coerce').dt.to_period('m')
    
    df['booking_date'] = np.where(new.isna(), df['booking_date'], new)
    print (df)
      booking_date  ...    credit     debit
    0         None  ...  10185.00 -10185.00
    1      2017-01  ...   1796.00      0.00
    2      2018-07  ...   7423.20    -11.54
    3      2017-04  ...   1704.00      0.00
    4      2017-12  ...   1938.60  -1938.60
    5      2018-12  ...   1403.47   -102.01
    6      2018-01  ...   2028.00    -76.38
    7      2019-01  ...    800.00   -256.98
    8        Total  ...  10185.00 -10185.00
    
    print (df['booking_date'].apply(type))
    0                             <class 'NoneType'>
    1    <class 'pandas._libs.tslibs.period.Period'>
    2    <class 'pandas._libs.tslibs.period.Period'>
    3    <class 'pandas._libs.tslibs.period.Period'>
    4    <class 'pandas._libs.tslibs.period.Period'>
    5    <class 'pandas._libs.tslibs.period.Period'>
    6    <class 'pandas._libs.tslibs.period.Period'>
    7    <class 'pandas._libs.tslibs.period.Period'>
    8                                  <class 'str'>
    Name: booking_date, dtype: object