I am trying to use StringIO to feed ConfigObj. I would like to do this in my unit tests, so that I can mock config "files", on the fly, depending on what I want to test in the configuration objects.
I have a whole bunch of things that I am taking care of in the configuration module (I am reading several conf file, aggregating and "formatting" information for the rest of the apps). However, in the tests, I am facing a unicode error from hell. I think I have pinned down my problem to the minimal functionning code, that I have extracted and over-simplified for the purpose of this question.
I am doing the following:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import configobj
import io
def main():
"""Main stuff"""
input_config = """
[Header]
author = PloucPlouc
description = Test config
[Study]
name_of_study = Testing
version = 9999
"""
# Just not to trust my default encoding
input_config = unicode(input_config, "utf-8")
test_config_fileio = io.StringIO(input_config)
print configobj.ConfigObj(infile=test_config_fileio, encoding="UTF8")
if __name__ == "__main__":
main()
It produces the following traceback:
Traceback (most recent call last):
File "test_configobj.py", line 101, in <module>
main()
File "test_configobj.py", line 98, in main
print configobj.ConfigObj(infile=test_config_fileio, encoding='UTF8')
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1242, in __init__
self._load(infile, configspec)
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1302, in _load
infile = self._handle_bom(infile)
File "/work/irlin168_1/USER/Apps/python272/lib/python2.7/site-packages/configobj-4.7.2-py2.7.egg/configobj.py", line 1442, in _handle_bom
if not line.startswith(BOM):
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 0: ordinal not in range(128)
I am using Python-2.7.2 (32 bits) on linux. My locale for the console and for the editor (Kile) are set to fr_FR.utf8.
I thought I could do this.
From the io.StringIO documentation, I got this:
The StringIO object can accept either Unicode or 8-bit strings, but mixing the two may take some care.
And from ConfigObj documentation, I can do this:
>>> config = ConfigObj('config.ini', encoding='UTF8') >>> config['name'] u'Michael Foord'
and this:
infile: None
You don't need to specify an infile. If you omit it, an empty ConfigObj will be created. infile can be :
[...] A StringIO instance or file object, or any object with a read method. The filename attribute of your ConfigObj will be None [5].
'encoding': None
By default ConfigObj does not decode the file/strings you pass it into Unicode [8]. If you want your config file as Unicode (keys and members) you need to provide an encoding to decode the file with. This encoding will also be used to encode the config file when writing.
My question is why does it produce this? What else did I not understand from (simple) Unicode handling?...
By looking at this answer, I changed:
input_config = unicode(input_config, "utf8")
to (importing codecs module breforehand):
input_config = unicode(input_config, "utf8").strip(codecs.BOM_UTF8.decode("utf8", "strict"))
in order to get rid of possible included byte order mark, but it did not help.
Thanks a lot
NB: I have the same traceback if I use StringIO.StringIO instead of io.StringIO.
This line:
input_config = unicode(input_config, "utf8")
is converting your input to Unicode, but this line:
print configobj.ConfigObj(infile=test_config_fileio, encoding="UTF8")
is declaring the input to be a UTF-8-encoded byte string. The error indicates a Unicode string was passed when a byte string was expected, so commenting out the first line above should resolve the issue. I don't have configobj
at the moment so can't test it.