I have the following form of a txt file:
Notice that some of the fields are completely missing, but the fact that they are missing is important. In the attached image all the measurements are missing due to technical failure but it can happen that value in only one of the columns is missing while the others are given.
I am trying to import such .txt file with the following code.
import numpy as np
data=np.genfromtxt(filepath, skip_header=1, invalid_raise=False, usecols=(2, 3, 4, 5, 6, 7))
Which results in an error:
Line #2123 (got 2 columns instead of 6)
Line #3171 (got 2 columns instead of 6)
Line #3172 (got 2 columns instead of 6)
but still produces some usable result. As I said, the fact that the data at 13:30 is missing is important and can't be simply ignored. However, the above code does exactly that - ignores/skips the row at 13:30. Instead I would like it to fill that row with some predefined value or just denote it in some other way that can be identified later in the processing.
Any way to do that?
np.genfromtxt()
takes the argument missing_values
. If you set it to:
data=np.genfromtxt(filepath, skip_header=1, invalid_raise=False, usecols=(2,3, 4, 5, 6, 7), missing_value=???)
it should replace missing values by nan
s. But notice that there has to be a filler if this should work. Otherwise you may use the usecols
argument in that way that you first chose your cols with missing values and seperate them from the main data. Afterwards you could merge them again. A second very good approach of dealing with missing values is the use of pandas.read_csv()
instead. Furthermore, its much faster than np.genfromtxt
.