Search code examples
pythonvisual-foxprodbf

Python DBF: How to associate a .cdx index with a .dbf table


I have been given an ambiguous task of automating a data extraction from various Visual FoxPro tables.

There are several pairs of .DBF and .CDX files. With the Python dbf package, I seem to be able to work with them. I have two files, an ABC.DBF and an ABC.CDX. I can load the table file using,

>>> import dbf
>>> table = dbf.Table('ABC.DBF')
>>> print(table[3])
  0 - table_key : '\x00\x00\x04'
  1 - field_1   : -1
  2 - field_2   : 0
  3 - field_3   : 34
  4 - field_ 4  : 2
  ...

>>>

It's my understanding that .cdx files are indexes. I suspect that corresponds to the table_key field. According to the author, dbf can read indexes:

I can read IDX files, but not update them. My day job changed and dbf files are not a large part of the new one. – Ethan Furman May 26 '16 at 21:05

Reading is all I need to do. I see that four classes exist, Idx, Index, IndexFile, and IndexLocation. These seem like good candidates.

The Idx class reads in a table and filename, which is promising.

>>> index = dbf.Idx(table, 'ABC.CDX')

I'm not sure how to make use of this object, though. I see that it has some generators, backward and forward, but when I try to use them I get an error

>>> print(list(index.forward()))
dbf.NotFoundError: 'Record 67305477 is not in table ABC.DBF'

How does one associate the .cdx index file to the .dbf table?


Solution

  • .idx and .cdx are not the same, and dbf cannot currently read .cdx files.

    If you need the table to be sorted, you can create an in-memory index:

    my_index = table.create_index(key=lambda r: r.table_key)
    

    You can also create a full-fledged function:

    def active(rec):
        # do not show deleted records
        if is_deleted(rec):
            return DoNotIndex
        return rec.table_key
    
    my_index = table.create_index(active)
    

    and then loop through the index instead of the table:

    for record in my_index:
        ...