Search code examples
pythonencodingwindows-xp

Windows XP encoding for non-english and english characters


The problem:

I am writing a txt file with greek characters, using python and cp1253 encoding but the program throws an error at some characters.

UnicodeEncodeError: 'charmap' codec can't encode character '\u2265' in position 389: character maps to <undefined>

The question:

I believe that this problem can be solved if I use an encoding that includes both languages and is compatible with Windows XP. So my question is:

How does Windows XP handle bilingual text? Does it use "mixed" encodings?


Edit I am returning after some months and I am realizing how naive my question is. Anyway I am keeping it pretty much unchanged and I will answer it for new developers who have the same problem


Solution

  • The problem, obviously, is that the text I was trying to write contains characters that are not included in the encoding.

    To solve the problem I tried to replace all the "bad" characters with normal ones. In order to find to find all these characters I used the following script

    bad_chars = []
    with open(name, 'w', encoding = 'cp1253') as res:
        for i in range(len(whole_text)):
            try:
                res.write(whole_text[i])
            except:
                bad_chars.append(whole_text[i])
    

    Then I created a dictionary with the correct characters and I replaced them in the text.

    chars_to_change = {'∆':'Δ', 'Ω':'Ω', '₂':'2'}
    for c1, c2 in chars_to_change.items():
        whole_text = whole_text.replace(c1, c2)
    

    Note that there might be better solutions, especially in the first part of the solution. Please edit if you find an improvement or error