There is text tile which compressed by bz2 file. The data in the text file like the following.
1 x3, x32, f5
0 f4, g6, h7, j9
.............
I know how to load the text file by the following code
rf = open('small.txt', 'r')
lines = rf.readlines()
lst_text = []
lst_label = []
for line in lines:
line = line.rstrip('\n')
label, text = line.split('\t')
text_words = text.split(',')
lst_text.append(text_words)
lst_label.append(int(label))
But after the txt is compressed to small.txt.bz2 file. I want to use the following data to read the bz2 file, but there is error.
import bz2
bz_file = bz2.BZ2File("small.txt.bz2")
lines = bz_file.readlines()
for line in lines:
line = line.rstrip('\n')
label, text = line.split('\t')
text_words = text.split(',')
print(label)
errors:
line = line.rstrip('\n')
TypeError: a bytes-like object is required, not 'str'
Could you give me hints how to deal with it, code is best. Thanks!
You get this error because the BZ2file
object open files in binary mode. So your line
is a bytes object, not a string. You could probably work around that by using line = line.rstrip(b'\n')
. But the resulting line would still be a bytes object.
But you should probably use bz2.open
in text mode instead:
with bz2.open("small.txt.bz2", "rt") as bz_file:
for line in bz_file:
label, text = line.rstrip('\n').split('\t')
text_words = text.split(',')
print(label)