Search code examples
pythoncsvdbf

Is there a way to replace and trim values in individual fields using the dbf to csv python library dbf by ethanfurman?


I am using the python library, dbf, by Ethan Furman to convert a number of dbf files to csv. It works extremely well for that. I would like to further edit some of the fields during the conversion process but am unsure how to do it. Specifically, I would like to replace string fields that contain only 1 or more blanks with empty strings, (eg. " " replace with "") and date fields that contain "00000000" with empty strings "". I would very much appreciate it if someone could describe how to edit the fields and write out the updated records during the conversion process. Obviously, I could write a simple secondary script to edit the csv files output during conversion but I would like to do it all in one step if possible. Here is the code I am using to convert the files:

import csv
import dbf
import os
import sys

folder=sys.argv[1]

for dirpath, dirnames, filenames in os.walk(folder):
    for filename in filenames:
        if filename.endswith('.DBF'):
            db=dbf.Table(filename, ignore_memos=True)
            db.open()
            csv_fn = filename[:-4]+ ".csv"
            dbf.export(db, filename=csv_fn, format='csv', header=True)

Solution

  • By default, when using a dbf table the data types returned are simple -- i.e. int, str, bool, datetime.datetime, etc. But you can make your own data types and have those used instead by specifying them in the default_data_types parameter:

    db = dbf.Table(
            filename,
            ignore_memos=True,
            default_data_types={
                'C': my_white_space_stripping_data_type,
                'D': my_empty_date_str_data_type,
                },
            )
    

    Fortunately, dbf comes with four enhanced data types already:

    • Char -- automatically strips trailing whitespace, and ignores trailing whitespace for comparisons

    • Logical -- supports True, False, and None (None is returned when the field value is not true or false -- I've seen ?, ' ', and other weird garbage)

    • Date -- supports an empty date, such as 00000000, and displays them as ''

    • DateTime -- supports an empty date/time, and displays them as ''

    Typically, if you're using one of the enhanced data types you probably want them all, so instead of the dictionary you can just pass a string:

    db = dbf.Table(
            filename,
            ignore_memos=True,
            default_data_types='enhanced',
            )
    

    Now, when a csv file is exported, trailing white-space is dropped, and empty date fields become ''.

    Keep in mind that empty logical fields will become '?' instead of '', so you may want the longer form of specifying a dict to default_data_types and only overriding C and D.