This code:
print len(my_series)
print np.percentile(my_series, 98)
print np.percentile(my_series, 99)
gives:
14221 # This is the series length
1644.2 # 98th percentile
nan # 99th percentile?
Why does 98 work fine but 99 gives nan
?
np.percentile treats nan's as very high numbers. So the high percentiles will be in the range where you will end up with a nan. In your case, between 1 and 2 percent of your data will be nan's (98th percentile will return you a number (which is not actually the 98th percentile of all the valid values) and the 99th will return you a nan).
To calculate the percentile without the nan's, you can use np.nanpercentile()
So:
print(np.nanpercentile(my_series, 98))
print(np.nanpercentile(my_series, 99))
Edit:
In new Numpy version, np.percentile
will return nan if nan's are present, so making this problem directly apparent. np.nanpercentile
still works the same. `