Search code examples
excelcharacter-encodingdbf

Get Encoding of a DBF File


I want to write a java programm that reads a DBF file which contains german letters like 'ö'. The problem I am facing is that I don't know which encoding the file uses. When I open notepad++ or the Windows editor, it says ANSI. But both programms show the 'ö' as '”'. But when I open Excel the 'ö' is shown.

I also tryed to change the encoding in notepad++, but nothing worked. Does someone know a way to see which encoding Excel is currently using/which encoding the file uses?


Solution

  • Strings in your .DBF file are encoded as cp850 (although any of ['cp1026', 'cp437', 'cp775', 'cp850', 'cp852', 'cp857', 'cp858', 'cp861', 'cp865', 'cp895'] could apply and hard to guess from given isolated example).

    Explanation:

    You face a mojibake case (example in Python for its universal intelligibility):

    'ö'.encode('cp850').decode('cp1252')
    
    '”'
    

    BTW, opening a .dbf file in a text editor gives no sense because it's a binary one (see .dbf header structure). Hence, any algorithm guessing text encoding (like Notepad++'s one) must fail…

    Further reading (DBase/FoxPro never escaped from limitations of 8-bit encoding):