Here is my input:
[{name: 'John', age: 50}, {name: 'Bob', age: 50}, {name: 'Paul', age: 0}, {name: 'Alfred', age: 100}]
I want to find the extreme ages, and I want to be able to put in as a variable how sensible the extremes should be. So, for example I would put in that I want to find the 10% most extreme values. So e.g. the output would be something like this:
# the extreme min values => [{name: 'Paul', age: 0}]
# the extreme max values => [{name: 'Alfred', age: 100}]
How do I do that?
I found some resources online that might help, but honestly I'm not able to follow the examples myself:
http://sciruby.com/blog/2013/11/07/statistics-with-ruby-time-series-and-general-linear-models/ http://statsample.apsique.cl/Statsample/Graph/Boxplot.html
Here's one way.
1.Sort the list:
a = [{name: 'John', age: 50}, {name: 'Bob', age: 50}, {name: 'Paul', age: 0}, {name: 'Alfred', age: 100}]
a = a.sort{|a,b| a['age'] <=> b['age']}
2.the first element is now the minimum, the last is the maximum.
min, max = a[0], a[-1]
Note that this is likely not the most efficient way to do this, but for small arrays it is good enough.
Wrt your sensitivities, the above method works by multiplying the length of the array L by the percentage p yielding l, then dividing by two, and taking elements
[0..l-1]
as your lower outliers and
[L-(l-1), L-1]
as your upper outliers
Edit: worked example
L
is the length of the array, p
is the ratio of outliers that you want.
l = (L*p)/2
For p=0.2, L=20
we'd want the four outliers, two on the minimum and two on the maximum side.
l = (L*p)/2 = 2
min = a[0..1]
max = a[(L-(l-1), L-1] = a[(20-(2-1), 19] = a[19,19]
Note that this indicates there's an error in what I told you above - max should probably be a[(L-l),(L-1)]
instead.