I'm trying to manipulate a text file with song names. I want to clean up the data, by changing all the spaces and tabs into +
.
This is the code:
input = open('music.txt', 'r')
out = open("out.txt", "w")
for line in input:
new_line = line.replace(" ", "+")
new_line2 = new_line.replace("\t", "+")
out.write(new_line2)
#print(new_line2)
fh.close()
out.close()
It gives me an error:
Traceback (most recent call last):
File "music.py", line 3, in <module>
for line in input:
File "C:\Users\nfeyd\AppData\Local\Programs\Python\Python36\lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 2126: character maps to <undefined>
As music.txt is saved in UTF-8, I changed the first line to:
input = open('music.txt', 'r', encoding="utf8")
This gives another error:
UnicodeEncodeError: 'charmap' codec can't encode character '\u039b' in position 21: character maps to <undefined>
I tried other things with the out.write()
but it didn't work.
This is the raw data of music.txt. https://pastebin.com/FVsVinqW
I saved it in windows editor as UTF-8 .txt file.
If your system's default encoding is not UTF-8, you will need to explicitly configure it for both the filehandles you open, on legacy versions of Python 3 on Windows.
with open('music.txt', 'r', encoding='utf-8') as infh,\
open("out.txt", "w", encoding='utf-8') as outfh:
for line in infh:
line = line.replace(" ", "+").replace("\t", "+")
outfh.write(line)
This demonstrates how you can use fewer temporary variables for the replacements; I also refactored to use a with
context manager, and renamed the file handle variables to avoid shadowing the built-in input
function.
Going forward, perhaps a better solution would be to upgrade your Python version; my understanding is that Python should now finally offer UTF-8 by default on Windows, too.