I have a selection of values coming from an experiment and I want to drop some of the lines with respect to other lines. Meaning: I measure a field, a polarization and an error of the polarization. Now the machine doing this measurement sometimes does not write values in some of those lines. So I might get: field = data[0]
field = [1,2,3,3,2,1,nan,4,1,2]
polarization = [nan, 10,230,13,123,50,102,90,45]
error = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.1, 0.1, 0.4, 0.2]
Now I want to delete the first elements of field, polarization and error, because the polarization[0] value = nan. And the [6] value of all arrays because field[6] = nan.
This is how I get my data:
class DataFile(object):
def __init__(self, filename):
self._filename = filename
def read_dat_file(self):
data = np.genfromtxt(self._filename, delimiter=',', \
usecols=(3,4,5,), skip_header=23, skip_footer=3, unpack=True, converters={\
3: lambda x: self._conv(x), \
4: lambda x: self._conv(x), \
5: lambda x: self._2_conv(x)})
return data
a = DataFile("DATFILE.DAT")
print a
The _conv functions just do some unit conversion or to write 'nan' if value is " ". I tried to do something like:
data = data[~np.isnan(data).any(axis=1)]
But then I get back one array and things got messy. My next approach was to count elements, deleting the same elements from all arrays ... and so on. Works, but it's ugly. So whats the best solution here?
You can iterate over rows and create a mask for rows, then use boolean indexing to get the view of rows that passed:
import numpy as np
field = [1,2,3,3,2,1,-1,4,1,2]
polarization = [-1, 10,230,13,123,50,102,90,45,1337]
error = [0.1, 0.1, 0.2, 0.1, 0.1, 0.3, 0.1, 0.1, 0.4, 0.2]
#transposition is needed to get expected row-col format
array = np.array([field, polarization, error]).T
print(array)
#create your filter function
filter = lambda row : row[0] > 0 and row[1] > 0 and row[2] > 0
#create boolean mask by applying filter
mask = np.apply_along_axis(filter, 1, array)
print(mask)
new_array = array[mask]
print(new_array)