Search code examples
pythonpandasnumpydataframexlrd

How to convert dataframe dates into floating point numbers?


I am trying to import a dataframe from a spreadsheet using pandas and then carry out numpy operations with its columns. The problem is that I obtain the error specified in the title: TypeError: Cannot do inplace boolean setting on mixed-types with a non np.nan value. The reason for this is that my dataframe contains a column with dates, like:

  ID         Date         
519457    25/02/2020 10:03    
519462    25/02/2020 10:07     
519468    25/02/2020 10:12
 ...           ...

And Numpy requires the format to be floating point numbers, as so:

  ID         Date         
519457    43886.41875     
519462    43886.42153     
519468    43886.425 
 ...         ...      

How can I make this change without having to modify the spreadsheet itself? I have seen a lot of posts on the forum asking the opposite, and asking about the error, and read the docs on xlrd.xldate, but have not managed to do this, which seems very simple. I am sure this kind of problem has been dealt with before, but have not been able to find a similar post.

The code I am using is the following

xls=pd.ExcelFile(r'/home/.../TwoData.xlsx')
xls.sheet_names
df=pd.read_excel(xls,"Hoja 1")

df["E_t"]=df["Date"].diff()

Any help or pointers would be really appreciated!

PS. I have seen solutions that require computing the exact number that wants to be obtained, but this is not possible in this case due to the size of the dataframes.


Solution

  • I think you need:

    #https://stackoverflow.com/a/9574948/2901002
    @rewritten to vectorized solution
    def excel_date(date1):
        temp = pd.Timestamp(1899, 12, 30)    # Note, not 31st Dec but 30th!
        delta = date1 - temp
        return (delta.dt.days) + (delta.dt.seconds) / 86400
    
    df["Date"] = pd.to_datetime(df["Date"]).pipe(excel_date)
    
    print (df)
           ID          Date
    0  519457  43886.418750
    1  519462  43886.421528
    2  519468  43886.425000