Search code examples
pythontype-conversiondbf

"Couldn't convert string to float" in python reading .DBF files - handling erroneous sensor data


I'm trying to read some sensor data, which is stored in .DBF-format, which usually works perfectly. However, one of my sensors seems to be having some issues, as all of a sudden, my usual scripts don't work any longer.

Let me clarify: The only table per file consists of four columns (time, temperature, humidity, dewpoint) and at some point (spanning roughly line 25 to 150 or so), only the value of humidity apparently is set to "--327", which python tries to convert to float and obviously fails to do. From the experiment I did and the values before and after that timespan (always 0.0) I know, that these values are in fact not interesting for my analysis.

Now the thing that I can't wrap my head around is, that I'm reading the data line-by-line wise (I have some other stuff to do anyways, before creating a pandas df) and I thought I was prepared for weird values, by doing this:

import numpy as np
import dbfread

data = dbfread.DBF(file)
for line in data:
    try:
         val_humi = line["humidity"]
    except Exception:
        val_humi = np.nan
        print("Error reading the humidity")
    default_further_processing_of_read_value_and_df_insertion(val_humi)

While probably not being best practice, this usually works for me as I don't really care about these few values being set to np.nan in a file containing several thousand entries.

How can I get rid of the error message I'm receiving so that the rest of my script can continue working? I'm grateful for any suggestions.

ValueError: could not convert string to float: b'--327'

P.S.: My first idea was to directly load the dbf into a pd.dataframe, but the error was the same.

My next best idea so far was to manually check each read value (performance is not an issue) for it being equal to "--327" and then manually setting it to "0", but I can't even get there as the error message is thrown as soon as I'm attempting to extract the value. Maybe there is some option like <"set_read_data_type" = str> or so?

Thanks in advance!

EDIT: Maybe this also helps: If I import the dbf-file into excel, the erroneous values appear as empty cells, while only opening the file with something like DBFViewer2000 shows the --327 value, that python also sees.


Solution

  • Using my dbf library, you can specify your own float conversion routine to handle weird errors:

    import dbf
    
    def fix_float(string):
        try:
            return float(string)
        except ValueError:
            return 0.0
    
    data = dbf.Table(file, default_data_types={'N': fix_float})
    data.open()
    for record in data:
        # do stuff
    

    Note that there are a few differences in how dbf chooses to handle data and data access, so spend a few minutes exploring it.