I have a bunch of txt files that is encoded in shift_jis, I want to convert them to utf-8 encoding so the special characters can display properly. This has been probably asked before, but I can't seem to get it right.
Update: I changed my code so it first write to a list then it will write the content from the list.
words = []
with codecs.open("dummy.txt", mode='r+', encoding='shiftjis') as file:
words = file.read()
file.seek(0)
for line in words:
file.write(line.encode('utf-8'))
However now I get runtime error, the program just crashes. Upon further investigation, it seems like the "file.seek(0)" has caused the program to crash. The program runs without error if this line is commented. I don't know why it is so. How is it causing errors?
You can't read and write from the same file at the same time like this. That's why its not working. Input and output is buffered, and the file objects share the same file pointer, so it's hard to predict what would happen. You either need to write the output to a different file or read the entire file into memory, close it, reopen it and write it back out.
with codecs.open("dummy.txt", mode='r', encoding='shiftjis') as file:
lines = file.read()
with codecs.open("dummy.txt", mode='w') as file:
for line in lines:
file.write(line)