Search code examples
pandasdateseriessurvival-analysis

How to convert a pandas series into datetime type?


I read some of the post related to this topic but nothing worked.

I am trying to convert to column of my dataframe called dem_inclusiondate and sae_hospit_date because I need to do a survival analysis, and need the duration between the inclusion date and the hospitalization

However, the type of these columns are Series and I can't find a way to convert them into date type.

I tried this following your comment

  baseline_all_patients["dem_inclusiondate"]
    .to_datetime(baseline_all_patients["dem_inclusiondate"], format="%Y-%m-%d")

but this error occurs: 'Series' object has no attribute 'to_datetime'

Sorry I am new, I don't know if my question is clear

Thank you for your help.


Solution

  • I believer this should help. Lets generate some data.

    df = pd.DataFrame({'date_begin':['2020.6.7', '2020.5.3', '2020.1.1'],
                       'date_end':['2020.6.17', '2020.6.1', '2020.1.20']})
    

    Then the syntax to convert stings in pandas is pretty easy. See more in Documentation

    df['date_begin'] = pd.to_datetime(df['date_begin'], yearfirst=True)
    df['date_end']   = pd.to_datetime(df['date_end'],   yearfirst=True)
    

    Now timeDeltas are might give you some problems. That's because months and years have different lenghts. Depending on the accuracy you require, you might want to use Numpy (np) timedelta or pandas' own timedelta.

    (df['date_end'] - df['date_begin']) / pd.Timedelta('1 days') 
    (df['date_end'] - df['date_begin']) / np.timedelta64(1, 'D')
    (df['date_end'] - df['date_begin']) / np.timedelta64(1, 'M')
    (df['date_end'] - df['date_begin']) / np.timedelta64(1, 'Y')