Search code examples
pythonsqlcsvxlrdstring

python: xlrd/csv - empty cell treatment when xlrd/csv packages read objects into memory


Is there an option to change the default way the csv and xlrd packages handle empty cells? By default empty cells are assigned an empty string value = ''. This is problematic when one is working with databases because an empty string is not a None value, which many python packages that interface with databases (SQLAlchemy for example) can handle as a Null for database consumption.

For example if an empty cell occurred in a field that is suppose to be a decimal/integer/float/double then the database will throw up an exception because an insert of a string was made to a field of type decimal/integer/float/double.

I haven't found any examples or documentation that shows how I can do this. My current approach is to inspect the data and do the following:

if item[i] == '': 
   item[i] = None

The problem with this is that I don't own the data and have no control over its quality. I can imagine that this would be a common occurrence since a lot of apps are using files/data that are produced by sources other then them.

If there is a way to change the default treatment then that would be a sensible approach in my opinion.


Solution

  • I have the same setup as yourself (sqlalchemy for the ORM, and data that I have little control over, being fed through excel files). I found that I need to curate the data from the xlrd before dumping it in the database. I am not aware of any tweaks that you can apply on the xlrd module.

    On a more general note: It is probably best to try and get as large a sample of example excel files as you can and see if your application can cope with it. I found that occasionally weird characters make it through the excel (people copy paste from different languages) which cause crushes further down. Also found that in some cases the file format was not UTF-8 but iso-8859 or something else. I ended up using iconv for converting the files.

    you may also want to have a look at this stackoverflow article

    Overall xlrd has worked for us, but I am less than impressed with the activity around the project. Seems like I am using a library that has little maintenance.