Search code examples
pythonpandasprecision

Using pandas read_excel with defined float precision when text present in data columns/rows


I am fairly new to python (background in MATLAB and engineering) and am running into an issue.

I have these excel files I am handed off that contain a header section, and then a data section below it. The data section has numbers, some dates (unimportant to me), and strings. I am writing a code to parse through the files and sheets within them (there are various) and compile them into df/objects so I can manipulate them into post processing things for me (a la MATLAB).

Now the issue I am running into is that some of my data is coming in fine when using pd.read_excel but some of it is small (order of magnitude in the thousandths, ten-thousandths and hundred-thousandths) and is being imported in as a 0 value integer instead of a float (example below). I have tried using astype, dtype and some other forms of conversion but it either eliminates the strings (which I need to use to anchor parts of my code as it parses for specific things or lack thereof) altogether or eliminates the strings in the column where the issue is arising (which I still need to use/keep intact). Other methods flat out give me errors :/

I have confirmed that the data in the sheets is fine and contain their expected value. I noticed some older posts where the precision issue lied with excel. This is not the case for me.

How can I define the floating precision of certain varying cells or of just numbers in general without messing with the strings in the sheet? The specific place (row and column wise) where these appear varies sheet by sheet so hardcoding array or excel locations is not a good solution and there are so many sets/sheets that manually editing the excel files is out of the question for me. The only thing I can think of is to reimport the excel data midway through once I have some sort of variable reference to where I might find these incorrectly zero values but again that seems cumbersome and inefficient. I'm hoping there might be some elegant workaround.

Thanks in advance!

-squeaky EDIT: Answer helped me realize I was referencing an outdated file. read_excel actually was working as expected and giving me the values I was looking for after updating the filepath.


Sheet.xlsx

header 1 header 2 header 3
text 12 0.0001
NaN 0.001 text
text text 11

Python Code

import pandas as pd

directory = C:\Data

sheet = sheet1.xlsx

importdata = pd.read_excel(directory, sheet_name=sheet, header=None)

print(importdata.loc[2,1]) #getting this

output: 0

print(importdata.loc[1,1]) #want this with more precision... fine with 12.00000 or whatever

output: 12

Solution

  • I'm not getting that kind of behavior, maybe it's because you're not importing your data correctly or you're not printing the right value. Here is the code I tested with your input file example.

    import pandas as pd
    import_data=pd.read_excel('sheet1.xlsx', sheet_name=0)
    print(import_data.loc[1,'header 2']) 
    

    and the output result is

    0.001
    

    as expected.