Search code examples
pythonloopsfunctional-programmingpython-unicode

Reduce is giving error while for loop is working fine


I have a list contents which contains lxml.etree._ElementStringResult and lxml.etree._ElementUnicodeResult

for x in contents:
        final_content += (x.encode('utf-8')) + '\n'

and

final_content = reduce(lambda a, x: a+x.encode('utf-8') + '\n', contents)

The first code is running fine while the second code is raising a unicode decode error.

<ipython-input-129-17a363dfff6c> in <lambda>(a, x)
----> 1 final_content = reduce(lambda a, x: a+x.encode('utf-8') + '\n', contents)

UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
37: ordinal not in range(128)

Edit:

The reduce is failing because the first element is not encoded.

When i changed the code to

final_content = contents[0]
for x in range(1,len(contents)):
     final_content += contents[x].encode('utf-8')

It is raising the same error like the reduce block above.


Solution

  • The error is because your \n is not utf-8 encoded. Simply setting to be a unicode string should fix the issue:

    final_content = reduce(lambda a, x: a + x.encode('utf-8') + u'\n', contents)
    

    Sorry 'answer owner' about edit here your question without your permission, but question is closed and I can't post the right answer. Be free to remove this content:

    Op, you are assuming that both codes are the same behaviour, but is not! Because on first map iteration you are concatenating first and second element without the \n. And you are doing it encoding second element but without encoding first one. The right translation from your classical for loop to your reduce approach is:

    final_content = reduce(lambda a, x: 
                               a+x.encode('utf-8') + u'\n', 
                           contents, 
                           u'\n')    # <----- initializer
    

    Notice that without initializer you are doing:

    contents[0] + contents[1].encode('utf-8')
    

    and this is what raises the error!