Search code examples
pythonunicodeordereddictionary

python Unicode decode error when accessing records of OrderedDict


using python 3.5.2 on windows (32), I'm reading a DBF file which returns me an OrderedDict.

from dbfread import DBF
Table = DBF('FME.DBF')
for record in Table:
   print(record)

When accessing the first record all is ok until I reach a record which contains diacritics:

Traceback (most recent call last):
  File "getdbe.py", line 3, in <module>
    for record in Table:
  File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\dbf.py", line 311, in _iter_records
    for field in self.fields]
  File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\dbf.py", line 311, in <listcomp>
    for field in self.fields]
  File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\field_parser.py", line 75, in parse
    return func(field, data)
  File "...\AppData\Local\Programs\Python\Python35-32\lib\site-packages\dbfread\field_parser.py", line 83, in parseC
    return decode_text(data.rstrip(b'\0 '), self.encoding)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x82 in position 11: ordinal not in range(128)

Even if I don't print the record I still have the problem.

Any idea ?


Solution

  • dbfread failed to detect the correct encoding from your DBF file. From the Character Encodings section of the documentation:

    dbfread will try to detect the character encoding (code page) used in the file by looking at the language_driver byte. If this fails it reverts to ASCII. You can override this by passing encoding='my-encoding'.

    Emphasis mine.

    You'll have to pass in an explicit encoding; this will invariably be a Windows codepage. Take a look at the supported codecs in Python; you'll have to use one that starts with cp here. If you don't know what codepage to you you'll have some trial-and-error work to do. Note that some codepages overlap in characters, so even if a codepage appears to produce legible results, you may want to continue searching and trying out different records in your data file to see what fits best.