With respect to a project I am working on, I have many .unl files (informix) for different countries and these need to be imported into postgres. To do that, I need to translate an informix schema to postgres schema using python.
Assuming I have this line of code in my python script with which I want to open all .unl files:
open(file, 'r', encoding='latin1')
For countries that use encoding = latin1
, the script works fine and things look good in postgres. Except for Poland
When I specify encoding = latin2
for Poland, the import script is still successful executed but the polish text ends up looking different in postgres. An example, the output looks like this unexpectedly:
But if the encoding is correct, the expected result should look like this:
I tried and still can't figure out yet how to fix it. I really appriciate any suggestions on how to solve this problem. Thank you in advance!
You face a flagrant mojibake case.
Proof in the following (partially commented) code snippet: type .\SO\78540135.py
file = r'.\SO\78540135.txt' str_text = 'Aleksańdra Świętochowskiego' # create a sample file: utf-8 encoded with open( file, 'w', encoding = 'utf-8') as f: f.write( str_text) # read the file using wrong encoding with open( file, 'r', encoding = 'latin2') as f: str_name = f.read() print( '\nmojibake', str_name) # read the file using correct encoding with open( file, 'r', encoding = 'utf-8') as f: str_name = f.read() print( '\nUTF8text', str_name)
Output: python .\SO\78540135.py
mojibake AleksaĹdra ĹwiÄtochowskiego UTF8text Aleksańdra Świętochowskiego