Search code examples
pythonstringunicodeencodingutf-16le

Unicode writing fail: charmap can't encode character


I have a text in which I have a summary. I grab this summary with some regexes as the text always has the same structure.
Within the summary there is a sentence "NAME is classified as ....", and I have to replace it by a title grabbed in the text and composed of word1 and word2 separated by a comma. As long as I do that, it is working fine (hence I won't add the full code because it is very large and I can't do it, and anyway the problem is not outside of what I will provide.
I need to add unicode character \u2191 or \u2193 depending on word1 to it , which is associated to a positive or negative value in a dictionary. This has to be done before replacing the sentence. My code is basically the following:

import re
import io
file=open(Summaries_file,'a')#also tried open(Summaries_file,'a', encoding="UTF_16_LE") and file=io.open(Summaries_file,'a', encoding="UTF_16_LE")
code_dict["page"]="Word1\u2191"
page="page"
summary = "Data is: 111919919. Name is classified as an infered value".
print(summary)
#OUTPUT>"Data is: 111919919. Name is classified as an infered value".
title= "Word1, Word2"

#this is the part added to regular code>>>>  

titlelist=title.split(",")
if titlelist[0]==code_dict[page]:
    titlelist[0]=code_dict[page]+"\u2191"
    title=str(titlelist)
    print(titlelist[0])
    #OUTPUT>"Word1↑"#It displays the arrow well
    print(title) #ok, too.
    #OUTPUT>"Word1↑, Word2"

 #We go back to the end of the normal code
insert=re.compile("is classified as")
print(type(summary))
#<class 'str'>
summary=str(insert.sub(title, summary))
print(summary)
#OUTPUT>"Data is: 111919919. Name Word1↑, Word2 an infered value".

print("passed")
file.write(title+'\n')
file.write(summary+'\n')

Then Traceback (most recent call last):

File "<ipython-input-1-6bc913872cc9>", line 1, in <module>
runfile('C:/Python Scripts/txtad.py', wdir='C:/Users/Laurent/Documents/Python Scripts')

File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
execfile(filename, namespace)

File "C:\Anaconda3\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 88, in execfile
exec(compile(open(filename, 'rb').read(), filename, 'exec'), namespace)

File "C:/Python Scripts/txtad", line 380, in <module>
file.write(title+'\n')

File "C:\Anaconda3\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]

UnicodeEncodeError: 'charmap' codec can't encode character '\u2191' in position 11: character maps to <undefined>

Now, I can't figure it out and I am seriously stuck on this.
I don't know why it fails for writing in the first place since it displays well the signs and that I explicitely encode to the right system in some tests, even opening the file with the right coding.

I tried various things that you can read there:

https://stackoverflow.com/questions/43706177/solving-error-when-adding-an-unicode-character-to-splits-of-a-string-then-revert?noredirect=1#comment74463879_43706177

Indeed the original code is larger, but I tried this and it worked the same way, and input types are strictly identical.

I read these:
UnicodeEncodeError: 'charmap' codec can't encode character '\u2010': character maps to <undefined>
UnicodeEncodeError: 'charmap' codec can't encode characters
Python, Unicode, and the Windows console
python 3.2 UnicodeEncodeError: 'charmap' codec can't encode character '\u2013' in position 9629: character maps to <undefined>
They are more confusing than anything else, in the end.

Anyway the problem doesn't lie in the console like those other post since the problem arises with the write instruction which is not displayed, and in addition, the character is well displayed on my console...
I really can't tell what's going on and how to manage this problem.
Thanks for your insights.


Solution

  • I finally solved this by reading this and the linked articles, and TadhgMcDonald-Jensen's comment; Writing Unicode text to a text file?

    Actually I merely have to open(file,"wb", ) and encoding at the time of writing for every string (since they are no bytes). I guess I could use io or codecs import and open using backward compatibility.