I am trying to open a file in Colab that uses gb-2312 encoding. Here is the code I successfully ran in my IDE to read and decode:
file = open(r'file.txt')
opened = file.read()
decoded = opened.encode('latin1').decode('gb2312')
print(decoded)
When I run this code in colab, I get the following error:
'utf-8' codec can't decode byte 0xc6 in position 67: invalid continuation byte
But I can't decode without using read() or list() first, or else I get the following error:
'_io.TextIOWrapper' object has no attribute 'encode'
This seems like a catch-22. Is this a bug with Colab or is there some better way to approach the problem?
The default when opening a file is rt
(read, text mode) and uses an OS-specific default encoding returned by locale.getpreferredencoding(False)
. Use the encoding
parameter to override the default (which appears to be utf-8
):
with open('file.txt', encoding='gb2312') as file:
data = file.read()