I have list of string items of any length, I need to "normalize" this list so that each item is part of a normal distribution, appending the weight to the string.
What is more effective and mathematical/statistical way to go about this other than what I have below?
func normalizeAppend(in []string, shuffle bool) []string {
var ret []string
if shuffle {
shuffleStrings(in)
}
l := len(in)
switch {
case remain(l, 3) == 0:
l3 := (l / 3)
var low, mid, high []string
for i, v := range in {
o := i + 1
switch {
case o <= l3:
low = append(low, v)
case o > l3 && o <= l3*2:
mid = append(mid, v)
case o >= l3*2:
high = append(high, v)
}
}
q1 := 1600 / len(low)
q2 := 6800 / len(mid)
q3 := 1600 / len(high)
for _, v := range low {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range mid {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range high {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
case remain(l, 2) == 0 && l >= 4:
l4 := (l / 4)
var first, second, third, fourth []string
for i, v := range in {
o := i + 1
switch {
case o <= l4:
first = append(first, v)
case o > l4 && o <= l4*2:
second = append(second, v)
case o > l4*2 && o <= l4*3:
third = append(third, v)
case o > l4*3:
fourth = append(fourth, v)
}
}
q1 := 1600 / len(first)
q2 := 3400 / len(second)
q3 := 3400 / len(third)
q4 := 1600 / len(fourth)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, q1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, q2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, q3))
}
for _, v := range fourth {
ret = append(ret, fmt.Sprintf("%s_%d", v, q4))
}
default:
var first, second, third []string
q1 := (1 + math.Floor(float64(l)*.16))
q3 := (float64(l) - math.Floor(float64(l)*.16))
var o float64
for i, v := range in {
o = float64(i + 1)
switch {
case o <= q1:
first = append(first, v)
case o > q1 && o < q3:
second = append(second, v)
case o >= q3:
third = append(third, v)
}
}
lq1 := 1600 / len(first)
lq2 := 3400 / len(second)
lq3 := 1600 / len(third)
for _, v := range first {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq1))
}
for _, v := range second {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq2))
}
for _, v := range third {
ret = append(ret, fmt.Sprintf("%s_%d", v, lq3))
}
}
return ret
}
Some requested clarification:
I have a list of items that will chosen from the list many times one at a time by weighted selection, to start with I have a list with (implied) weights of 1:
[a_1, b_1, c_1, d_1, e_1, f_1, g_1, h_1, i_1, j_1, k_1]
I'm looking for a better way to make that list into something producing a more 'normal' distribution of weighting for selection:
[a_1, b_2, c_3, d_5, e_14, f_30, g_14, h_5, i_3, j_2, k_1]
or perhaps it is likely I need to change my methods to something more grounded statistically. Bottom line is I want to control selection from a list of items in many ways, one of which here is ensuring that items are returned in way approximating a normal curve.
If you just want to calculate the weights for a given list, then you need the following things:
The first one is quite simple. You want the mean to be in the center of the list. Therefore (assuming zero-based indexing):
mean = (list.size - 1) / 2
The second is kind of arbitrary and depends on how steep you want your weights to fall off. Weights of the normal distribution are practically zero beyond a distance of 3 * standard_deviation
from the mean
. So a good standard deviation in most cases is probably something between a fourth and a sixth list length:
standard_deviation = (1/4 .. 1/6) * list.size
variance = standard_deviation^2
Assuming that you want integer weights, you need to discretize the weights from the normal distribution. The easiest way to do this is by specifying the maximum weight (of the element at the mean position).
That's it. The weight for an element at position i
is then:
weight[i] = round(max_weight * exp(-(i - mean)^2 / (2 * variance)))