Search code examples
pythondictionaryencodingutf-8configparser

Encodings in ConfigParser (Python)


Python 3.1.3 What I need is to read dictionary from cp1251-file using ConfigParser. My example:

config = configparser.ConfigParser()
config.optionxform = str
config.read("file.cfg")
DataStrings = config.items("DATA")
DataBase = dict()
for Dstr in DataStrings:
    str1 = Dstr[0]
    str2 = Dstr[1]
DataBase[str1] = str2

After that I'm trying to replace some words in some UTF-8 files according dictionary. But sometimes it doesn't works (for example, with symbols of "new line-carriage return"). My file in UTF-8 and configuration file (dictionary) in CP1251. Seems like trouble, I have to decode config into UTF-8. I've tryed this:

str1 = Dstr[0].encode('cp1251').decode('utf-8-sig')

But error "'utf8' codec can't decode byte 0xcf in position 0" appeared. If I use .decode('','ignore') - I just lose almost all config file. What should I do?


Solution

  • Python 3.1 is in the no-mans-land of Python versions. Ideally you'd upgrade to Python 3.5, which would let you do config.read("file.cfg", encoding="cp1251")

    If you must stay on 3.1x, you can use the ConfigParser.readfp() method to read from a previously opened file using the correct encoding:

    import configparser
    
    config = configparser.ConfigParser()
    config.optionxform = str
    config_file = open("file.cfg", encoding="cp1251")
    config.readfp(config_file)