I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value. The reason for this is that my dataframe contains a column with dates, like:
ID Date
519457 25/02/2020 10:03
519462 25/02/2020 10:07
519468 25/02/2020 10:12
... ...
And Numpy requires the format to be floating point numbers, as so:
ID Date
519457 43886.41875
519462 43886.42153
519468 43886.425
... ...
How can I make this change without having to modify the spreadsheet itself? I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple. I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.
The code I am using is the following
xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")
df["E_t"]=df["Date"].diff()
Any help or pointers would be really appreciated!
PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.
I think you need:
#https://stackoverflow.com/a/9574948/2901002
@rewritten to vectorized solution
def excel_date(date1):
temp = pd.Timestamp(1899, 12, 30) # Note, not 31st Dec but 30th!
delta = date1 - temp
return (delta.dt.days) + (delta.dt.seconds) / 86400
df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)
print (df)
ID Date
0 519457 43886.418750
1 519462 43886.421528
2 519468 43886.425000