Search code examples
pythondbf

Effectively querying DBF files with Python


I have the need to read from a legacy VFP DBF database and gather all rows which have an etd within the current week.

I am using dbf however it seems that when querying the table, it begins the query at the very first record in the table. This causes performance issues when attempting to find data within the last week, as it has to iterate over every line in the database (60k+) every time it runs.


table = dbf.Table(r'\\server\file.dbf')

table.open()

for row in table:

    if (self.monday < row.etd < self.friday) and ('LOC' not in row.route):
        self.datatable.Rows.Add(row.manifest, row.route, row.etd, row.eta, row.inst, row.subname)
    else:
        continue

I tried to "reverse" the table with for row in table[::-1]:

However, this takes the same amount of time as I believe it needs to load the database into memory prior to the [::-1]

What would be a more efficient way to query these DBF files?


Solution

  • As you know, dbf does not support index files. It does, however, have some methods reminiscent of VFP that could help:

    # untested
    
    table = ...
    
    potental_records = []
    with table:               # auto opens and closes
        table.bottom()        # goes to end of table
        while True:
            table.skip(-1)    # move to previous record
            row = table.current_record
            if self.monday > row.etd:
                # gone back beyond range
                break
            elif row.etd < self.friday:
                potential_records.append(row)
    
    # at this point the table is closed and potential_records should have all
    # records in the etd range.
    

    The above will only work if the records are physically ordered by etd.