python numpy character-encoding file-read

python numpy.loadtxt() crashing because of binary character in txt file

I am using this line to read part of the lines in a txt file, skipping header and footer.

np_data= np.loadtxt(file, delimiter= "\t", skiprows=12, max_rows= 1024)

The problem is that in the footer there is this character: ∞, which causes the following error

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb0 in position 4729: invalid start byte

Is there a way to skip that character or line? For me the combination of skiprows and max_rows does not seem to work. Thank you

Solution

Is there a way to skip that (...)line?

numpy.loadtxt first argument might be

File, filename, list, or generator to read. If the filename extension is .gz or .bz2, the file is first decompressed. Note that generators must return bytes or strings. The strings in a list or produced by a generator are treated as lines.

thus you might envelope file handle to skip lines which you do not want, consider following simple example, let file.csv content be

1,2,3
4,∞,6
7,8,9

then

import numpy as np
with open("file.csv","rb") as f:
    arr = np.loadtxt(filter(lambda x:b"\xe2\x88\x9e" not in x,f), delimiter=",")
print(arr)

gives output

[[1. 2. 3.]
 [7. 8. 9.]]

Explanation: I open file.csv in binary mode, then use filter to select lines from file handle f which do not contain sequence of bytes \xe2\x88\x9e (which is ∞ in Unicode)