Search code examples
pythondbf

Problems opening DBF files in python


I am trying to open en transform several DBF files to a dataframe. Most of them worked fine, but for one of the files I receive the error: "UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte"

I have read this error on some other topics such as opening csv and xlsx and other files. The proposed solution was to include encoding = 'utf-8' in the reading the file part. I haven't found a solution for DBF files unfortunately and I have very limited knowledge on DBF files.

What I have tried so far:

1)

from dbfread import DBF
dbf = DBF('file.DBF')
dbf = pd.DataFrame(dbf)

UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 8: character maps to <undefined>

2)

from simpledbf import Dbf5
dbf = Dbf5('file.DBF')
dbf = dbf.to_dataframe()

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf6 in position 15: invalid start byte

3)

# this block of code copied from https://gist.github.com/ryan-hill/f90b1c68f60d12baea81 
import pysal as ps

def dbf2DF(dbfile, upper=True): #Reads in DBF files and returns Pandas DF
    db = ps.table(dbfile) #Pysal to open DBF
    d = {col: db.by_col(col) for col in db.header} #Convert dbf to dictionary
    #pandasDF = pd.DataFrame(db[:]) #Convert to Pandas DF
    pandasDF = pd.DataFrame(d) #Convert to Pandas DF
    if upper == True: #Make columns uppercase if wanted 
        pandasDF.columns = map(str.upper, db.header) 
    db.close() 
    return pandasDF

dfb = dbf2DF('file.DBF')

AttributeError: module 'pysal' has no attribute 'open'

And last, if I try to install the dbfpy module, I receive: SyntaxError: invalid syntax

Any suggestions on how to solve this?


Solution

  • Try using my dbf library:

    import dbf
    
    table = dbf.Table('file.DBF')
    

    Print it to see if an encoding is present in the file:

    print table    # print(table) in Python 3
    

    One of my test tables looks like this:

        Table:         tempy.dbf
        Type:          dBase III Plus
        Codepage:      ascii (plain ol ascii)
        Status:        DbfStatus.CLOSED
        Last updated:  2019-07-26
        Record count:  1
        Field count:   2
        Record length: 31 
        --Fields--
          0) name C(20)
          1) desc M
    

    The important line being the Codepage line -- it sounds like that is not properly set for your DBF file. If you know what it should be, you can either open it with that codepage (temporarily) with:

    table = dbf.Table('file.DBF', codepage='...')
    

    Or you can change it permanently (updates the DBF file) with:

    table.open()
    table.codepage = dbf.CodePage('cp1252') # for example
    table.close()