Lets say I have a dumb text file with the contents:
Year Recon Observed
1505 162.38 23
1506 46.14 -9999
1507 147.49 -9999
-9999
is used to denote a missing value (don't ask).
So, I should be able to read this into a Numpy array with:
import numpy as np
x = np.genfromtxt("file.txt", dtype = None, names = True, missing_values = -9999)
And have all my little -9999
s turn into numpy.nan. But, I get:
>>> x
array([(1409, 112.38, 23), (1410, 56.14, -9999), (1411, 145.49, -9999)],
dtype=[('Year', '<i8'), ('Recon', '<f8'), ('Observed', '<i8')])
... That's not right...
Am I missing something?
Nope, you're not doing anything wrong. Using the missing_values
argument indeed tells np.genfromtxt
that the corresponding values should be flagged as "missing/invalid". The problem is that dealing with missing values is only supported if you use the usemask=True
argument (I probably should have made that clearer in the documentation, my bad).
With usemask=True
, the output is a masked array. You can transform it into a regular ndarray
with the missing values replaced by np.nan
with the method .filled(np.nan)
.
Be careful, though: if you have column that was detected as having a int
dtype and you try to fill its missing values with np.nan
, you won't get what you expect (np.nan
is only supported for float columns).