Search code examples
pythonpandasnanmissing-data

TypeError: cannot subtract DatetimeArray from ndarray when using time stamp data


I am trying to calculate the number of days between two columns where each column stored as a TimeStamp object and contain NaN values. When I try to make the calculation, I am receiving TypeError: cannot subtract DatetimeArray from ndarray error. My question is that how I can achieve what I'd like when having NaN values. The best case scenerio for me is that if there is a NaN value, the result should be NaN as well.

import datetime
import pandas as pd

d1 = {'col1':  pd.Timestamp(2017, 1, 1, 12), 'col2' : [np.nan]}
x= pd.DataFrame(d1)

x['col3'] = (x['col2'] - x['col1']).dt.days.astype('int64')


Solution

  • Convert the columns to the correct format: pd.to_datetime. Use 'Int64' instead of 'int64'.

    In general, if you print out the type np.nan, then it will be a float. And if this type suits you, then put the float type.

    import pandas as pd
    import numpy as np
    
    d1 = {'col1':  [pd.Timestamp(2017, 1, 1, 12)], 'col2' : [np.nan]}
    x= pd.DataFrame(d1)
    x['col1'] = pd.to_datetime(x['col1'], errors='raise')
    x['col2'] = pd.to_datetime(x['col2'], errors='raise')
    
    x['col3'] = (x['col2'] - x['col1']).dt.days.astype('Int64')
    
    print(x)
    print(type(np.nan))