I have a list contents which contains lxml.etree._ElementStringResult and lxml.etree._ElementUnicodeResult
for x in contents:
final_content += (x.encode('utf-8')) + '\n'
and
final_content = reduce(lambda a, x: a+x.encode('utf-8') + '\n', contents)
The first code is running fine while the second code is raising a unicode decode error.
<ipython-input-129-17a363dfff6c> in <lambda>(a, x)
----> 1 final_content = reduce(lambda a, x: a+x.encode('utf-8') + '\n', contents)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
37: ordinal not in range(128)
Edit:
The reduce is failing because the first element is not encoded.
When i changed the code to
final_content = contents[0]
for x in range(1,len(contents)):
final_content += contents[x].encode('utf-8')
It is raising the same error like the reduce block above.
The error is because your \n
is not utf-8 encoded. Simply setting to be a unicode string should fix the issue:
final_content = reduce(lambda a, x: a + x.encode('utf-8') + u'\n', contents)
Sorry 'answer owner' about edit here your question without your permission, but question is closed and I can't post the right answer. Be free to remove this content:
Op, you are assuming that both codes are the same behaviour, but is not! Because on first map iteration you are concatenating first and second element without the \n
. And you are doing it encoding second element but without encoding first one. The right translation from your classical for
loop to your reduce
approach is:
final_content = reduce(lambda a, x:
a+x.encode('utf-8') + u'\n',
contents,
u'\n') # <----- initializer
Notice that without initializer you are doing:
contents[0] + contents[1].encode('utf-8')
and this is what raises the error!