Search code examples
pythonencodingdbf

Writing dbf file with custom encoding (DBF Package)


I have some Characters in Farsi and I want to write them to a dbf file with my custom codepage which is 1 byte per character. I think this problem can be solved in one of these two ways:

1- Passing my custom codepage to the dbf table.

2- Writing binary data directly to the dbf file without using the default codepage of dbf package (which is utf8).

How can I solve this problem with either of these approaches?

Here is the code:

import dbf

man = 'مرد'
woman = 'زن'
row1 = (man, woman)
row2 = (man, woman)

with open('./file.dbf', 'w') as f:
    table = dbf.Table(filename='./file.dbf',
        field_specs='field1 C(3); field2 C(3)', codepage='customCodePage', on_disk=True)
    table.open(dbf.READ_WRITE)
    table.append(row1)
    table.append(row2)
    table.close()

Solution

  • After trying to register my codec I ended up translating my data from utf8 to "Custom Farsi codec" and then to equivalent character of windows-1256 that has the same decimal codepoint. So when the user reads the data with the custom codec, the windows-1256 characters will point to the right decimal in custom codec, of course characters in this raw form are not meaningful.

    An example would be Letter پ in unicode has decimal codepoint of 1662 and in custom codec it has codepoint of 148. the equivalent of 148 codepoint in windows-1256 is ”. so the پ translates to ” using 3 different dictionaries. I did this for all characters in Farsi keyboard.