I am trying to calculate the number of days between two columns where each column stored as a TimeStamp object and contain NaN values. When I try to make the calculation, I am receiving TypeError: cannot subtract DatetimeArray from ndarray
error. My question is that how I can achieve what I'd like when having NaN values. The best case scenerio for me is that if there is a NaN value, the result should be NaN as well.
import datetime
import pandas as pd
d1 = {'col1': pd.Timestamp(2017, 1, 1, 12), 'col2' : [np.nan]}
x= pd.DataFrame(d1)
x['col3'] = (x['col2'] - x['col1']).dt.days.astype('int64')
Convert the columns to the correct format: pd.to_datetime. Use 'Int64' instead of 'int64'.
In general, if you print out the type np.nan, then it will be a float. And if this type suits you, then put the float type.
import pandas as pd
import numpy as np
d1 = {'col1': [pd.Timestamp(2017, 1, 1, 12)], 'col2' : [np.nan]}
x= pd.DataFrame(d1)
x['col1'] = pd.to_datetime(x['col1'], errors='raise')
x['col2'] = pd.to_datetime(x['col2'], errors='raise')
x['col3'] = (x['col2'] - x['col1']).dt.days.astype('Int64')
print(x)
print(type(np.nan))