I try to implement Hampel tanh estimators to normalize highly asymmetric data. In order to do this, I need to perform the following calculation:
Given x
- a sorted list of numbers and m
- the median of x
, I need to find a
such that approximately 70% of the values in x
fall into the range (m-a; m+a)
. We know nothing about the distribution of values in x
. I write in python using numpy, and the best idea that I had is to write some sort of stochastic iterative search (for example, as was described by Solis and Wets), but I suspect that there is a better approach, either in form of better algorithm or as a ready function. I searched the numpy and scipy documentation, but couldn't find any useful hint.
EDIT
Seth suggested to use scipy.stats.mstats.trimboth, however in my test for a skewed distribution, this suggestion didn't work:
from scipy.stats.mstats import trimboth
import numpy as np
theList = np.log10(1+np.arange(.1, 100))
theMedian = np.median(theList)
trimmedList = trimboth(theList, proportiontocut=0.15)
a = (trimmedList.max() - trimmedList.min()) * 0.5
#check how many elements fall into the range
sel = (theList > (theMedian - a)) * (theList < (theMedian + a))
print np.sum(sel) / float(len(theList))
The output is 0.79 (~80%, instead of 70)
You need to first symmetrize your distribution by folding all values less than the mean over to the right. Then you can use the standard scipy.stats
functions on this one-sided distribution:
from scipy.stats import scoreatpercentile
import numpy as np
theList = np.log10(1+np.arange(.1, 100))
theMedian = np.median(theList)
oneSidedList = theList[:] # copy original list
# fold over to the right all values left of the median
oneSidedList[theList < theMedian] = 2*theMedian - theList[theList < theMedian]
# find the 70th centile of the one-sided distribution
a = scoreatpercentile(oneSidedList, 70) - theMedian
#check how many elements fall into the range
sel = (theList > (theMedian - a)) * (theList < (theMedian + a))
print np.sum(sel) / float(len(theList))
This gives the result of 0.7
as required.