Say I have a sample of N positive real numbers and I want to find a "typical" value for these numbers. Of course "typical" is not very well defined but one could think of the following more concrete problem :
The numbers are distributed such that (roughly speaking) a fraction (1-epsilon) of them is drawn from a Gaussian with positive mean m > 0 and mean square deviation sigma << m and a small fraction epsilon of them is drawn from some other distribution, heavy tailed both for large and small numbers. I want to estimate the mean of the Gaussian within a few standard deviation.
A solution would be to compute the median but while it is O(N), constant factors are not so good for moderate N and moreover it requires quite a bit of coding. I am ready to give up precision on my estimate against code simplicity and/or small N performance (say N is 10 or 20 for instance, and I have at most one or two outliers).
Do you have any suggestion ?
(For instance, if my outliers where only coming from large values, I would compute the average of the log of my values and exponentiate it. Under some further assumptions this gives me, generally, a good estimate and I can compute it easily and with a sharp O(N)).
You could take the mean of the numbers excluding the min and max. The formula is (sum - min - max) / (N - 2), and the terms in the numerator can be computed simply with one pass (watch out for floating point issues though).