I'm trying to read a csv.gz
file in python, I read the file with urllib.request.open()
, then I had two problems, the first one is that the file is in bytes and I need it to be in utf-8 in order to use pandas, the second problem is that I don't precisely understand how I can read this type of file using pandas, I want it to be a dataframe but it is not clear for me the way I can use pandas. This is what I've tried so far, I used to decode, but I don't trust that method since the only way it works is because I'm avoiding the errors. At this point, I'm not completely sure if it really necessary the decoding part.
So I really appreciate any help in the matter, thanks in advance.
df = pd.read_csv('sample.tar.gz', compression='gzip', header=0, sep=' ', quotechar='"', error_bad_lines=False)